Cannot build ports on fresh 10.0
I installed a new VM with 10.0 today from this .iso: ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/amd64/10.0/FreeBSD-10.0-CURRENT-amd64-20130601-r251213-release.iso And I'm behind a firewall and I'm not sure I can use pkg(1); my one attempt failed: # pkg install m4 Updating repository catalogue Repository catalogue is up-to-date, no need to fetch fresh copy pkg: Package 'm4' was not found in the repositories So I'm building from source. But the configure step of various ports fails like so: checking whether make sets $(MAKE)... yes checking build system type... Invalid configuration `amd64-portbld-freebsd10.0': machine `amd64-portbld' not recognized configure: error: /bin/sh libltdl/config/config.sub amd64-portbld-freebsd10.0 failed === Script configure failed unexpectedly. Please report the problem to autoto...@freebsd.org [maintainer] and attach the /usr/ports/devel/libtool/work/libtool-2.4.2/config.log including the output of the failure of your make command. Also, it might be a good idea to provide an overview of all packages installed on your system (e.g. a /usr/local/sbin/pkg-static info -g -Ea). *** Error code 1 Any ideas what could be wrong? It's a fresh install with default options, so it seems hard to believe I managed to screw something up already. Posting on -current since this happens to many of the ports when I try to build them. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Cannot build ports on fresh 10.0
On Tue, Jun 4, 2013 at 3:07 PM, hiren panchasara hi...@freebsd.org wrote: On Tue, Jun 4, 2013 at 2:42 PM, m...@freebsd.org wrote: I installed a new VM with 10.0 today from this .iso: ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/ISO-IMAGES/amd64/10.0/FreeBSD-10.0-CURRENT-amd64-20130601-r251213-release.iso And I'm behind a firewall and I'm not sure I can use pkg(1); my one attempt failed: # pkg install m4 Updating repository catalogue Repository catalogue is up-to-date, no need to fetch fresh copy pkg: Package 'm4' was not found in the repositories So I'm building from source. But the configure step of various ports fails like so: checking whether make sets $(MAKE)... yes checking build system type... Invalid configuration `amd64-portbld-freebsd10.0': machine `amd64-portbld' not recognized configure: error: /bin/sh libltdl/config/config.sub amd64-portbld-freebsd10.0 failed === Script configure failed unexpectedly. Please report the problem to autoto...@freebsd.org [maintainer] and attach the /usr/ports/devel/libtool/work/libtool-2.4.2/config.log including the output of the failure of your make command. Also, it might be a good idea to provide an overview of all packages installed on your system (e.g. a /usr/local/sbin/pkg-static info -g -Ea). *** Error code 1 _Probably_ similar to this? http://lists.freebsd.org/pipermail/freebsd-ports/2013-June/084040.html Looks very likely. I'm testing the patch now. My rudimentary search of -ports wasn't good enough, I guess. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
[RFC] use a shared lock for VOP_GETEXTATTR
VOP_GETEXTATTR is currently called with an exclusive lock, which seems like overkill for what is essentially a read operation. I had a look over the various in-tree filesystems and it didn't look like any of them will have a problem if a shared-mode lock is used for vop_getextattr. Does anyone know otherwise? Is someone using extended attributes regularly who can test this? [sorry if this is a duplicate; I first sent from my non-FreeBSD mail] Thanks, matthew 0001-Use-a-shared-lock-for-VOP_GETEXTATTR-as-it-is-a-read.patch Description: Binary data ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [RFC] use a shared lock for VOP_GETEXTATTR
On Wed, Mar 27, 2013 at 10:32 PM, Konstantin Belousov kostik...@gmail.com wrote: On Wed, Mar 27, 2013 at 06:37:51PM -0700, m...@freebsd.org wrote: VOP_GETEXTATTR is currently called with an exclusive lock, which seems like overkill for what is essentially a read operation. I had a look over the various in-tree filesystems and it didn't look like any of them will have a problem if a shared-mode lock is used for vop_getextattr. Does anyone know otherwise? Is someone using extended attributes regularly who can test this? I think this change should be fine. At least it seems to for UFS. What other filesystems did you audited ? I looked over zfs, pseudofs, unionfs, ffs/ufs. None seemed to have any asserts on the lock type nor anything that looked like it would try to modify anything. zfs, I think it was, even used VOP_ISLOCKED to get the lock type in one path (but I think that was after a lookup(), so it may have been on a different vnode). Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: kmem_map auto-sizing and size dependencies
On Fri, Jan 18, 2013 at 7:29 AM, Andre Oppermann an...@freebsd.org wrote: The (inital?) size of the kmem_map is determined by some voodoo magic, a sprinkle of nmbclusters * PAGE_SIZE incrementor and lots of tunables. However it seems to work out to an effective kmem_map_size of about 58MB on my 16GB AMD64 dev machine: vm.kvm_size: 549755809792 vm.kvm_free: 530233421824 vm.kmem_size: 16,594,300,928 vm.kmem_size_min: 0 vm.kmem_size_max: 329,853,485,875 vm.kmem_size_scale: 1 vm.kmem_map_size: 59,518,976 vm.kmem_map_free: 16,534,777,856 The kmem_map serves kernel malloc (via UMA), contigmalloc and everthing else that uses UMA for memory allocation. Mbuf memory too is managed by UMA which obtains the backing kernel memory from the kmem_map. The limits of the various mbuf memory types have been considerably raised recently and may make use of 50-75% of all physically present memory, or available KVM space, whichever is smaller. Now my questions/comments are: Does the kmem_map automatically extend itself if more memory is requested? Not that I recall. Should it be set to a larger initial value based on min(physical,KVM) space available? It needs to be smaller than the physical space, because the only limit on the kernel's use of (pinned) memory is the size of the map. So if it is too large there is nothing to stop the kernel from consuming all available memory. The lowmem handler is called when running out of virtual space only (i.e. a failure to allocate a range in the map). The naming and output of the various vm.kmem_* and vm.kvm_* sysctls is confusing and not easy to reconcile. Either we need some more detailing more aspects or less. Plus perhaps sysctl subtrees to better describe the hierarchy of the maps. Why are separate kmem submaps being used? Is it to limit memory usage of certain subsystems? Are those limits actually enforced? I mostly know about memguard, since I added memguard_fudge(). IIRC some of the submaps are used. The memguard_map specifically is used to know whether an allocation is guarded or not, so at free(9) it can be handled as normal malloc() or as memguard. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Memory modified after free - by whom?
On Mon, Dec 10, 2012 at 3:10 PM, Adrian Chadd adr...@freebsd.org wrote: 9216 sounds like a jumbo frame mbuf. So the NIC is writing to an mbuf after it's finalised/freed. I have a similar bug showing up on ath(4) RX. :( Compile with DEBUG_MEMGUARD in the kernel configuration, and then set vm.memguard.desc to the name of the UMA zone used for the 9216 byte allocations, mbuf_jumbo_9k. This should cause a panic when the memory is touched after free. Cheers, matthew On 10 December 2012 14:22, Garrett Cooper yaneg...@gmail.com wrote: I noticed this while checking the logs on one of my test boxes after restarting the network. Any idea where I should start looking into this (has IPv6 enabled but wasn't using it, em/cxgbe/ixgbe interfaces with the ixgbe interfaces lagged previously, but now not)? It looks suspiciously like the same size as a jumbo frame, but I'm not 100% sure if that's the real problem. Thanks, -Garrett Dec 10 14:03:12 wf158 kernel: em0: link state changed to DOWN Dec 10 14:03:13 wf158 kernel: ix0: link state changed to DOWN Dec 10 14:03:13 wf158 kernel: ix0: link state changed to UP Dec 10 14:03:13 wf158 kernel: ix1: link state changed to DOWN Dec 10 14:03:13 wf158 kernel: ix1: link state changed to UP Dec 10 14:03:13 wf158 kernel: ix0: link state changed to DOWN Dec 10 14:03:13 wf158 kernel: ix0: link state changed to UP Dec 10 14:03:15 wf158 kernel: em0: link state changed to UP Dec 10 14:03:15 wf158 dhclient: New IP Address (em0): 10.7.169.89 Dec 10 14:03:15 wf158 dhclient: New Subnet Mask (em0): 255.255.240.0 Dec 10 14:03:15 wf158 dhclient: New Broadcast Address (em0): 10.7.175.255 Dec 10 14:03:15 wf158 dhclient: New Routers (em0): 10.7.160.1 Dec 10 14:03:16 wf158 kernel: ix0: link state changed to DOWN Dec 10 14:03:16 wf158 kernel: ix0: link state changed to UP Dec 10 14:03:31 wf158 kernel: in6_purgeaddr: err=65, destination address delete failed Dec 10 14:03:31 wf158 dhclient[4539]: My address (10.7.169.89) was deleted, dhclient exiting Dec 10 14:03:32 wf158 dhclient[4521]: short write: wanted 20 got 0 bytes Dec 10 14:03:32 wf158 dhclient[4521]: exiting. Dec 10 14:03:33 wf158 kernel: em0: link state changed to DOWN Dec 10 14:03:33 wf158 kernel: ix1: link state changed to DOWN Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP Dec 10 14:03:34 wf158 kernel: ix1: link state changed to DOWN Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP Dec 10 14:03:34 wf158 kernel: ix0: link state changed to DOWN Dec 10 14:03:34 wf158 kernel: ix0: link state changed to UP Dec 10 14:03:34 wf158 kernel: ix1: link state changed to DOWN Dec 10 14:03:34 wf158 kernel: ix1: link state changed to UP Dec 10 14:03:36 wf158 kernel: em0: link state changed to UP Dec 10 14:03:36 wf158 dhclient: New IP Address (em0): 10.7.169.89 Dec 10 14:03:36 wf158 dhclient: New Subnet Mask (em0): 255.255.240.0 Dec 10 14:03:36 wf158 dhclient: New Broadcast Address (em0): 10.7.175.255 Dec 10 14:03:36 wf158 dhclient: New Routers (em0): 10.7.160.1 Dec 10 14:05:34 wf158 kernel: Memory modified after free 0xff81c016d000(9216) val= @ 0xff81c016d000 Dec 10 14:05:35 wf158 kernel: Memory modified after free 0xff81b5cdc000(9216) val= @ 0xff81b5cdc000 # uname -a FreeBSD wf158.west.isilon.com 10.0-CURRENT FreeBSD 10.0-CURRENT #1 r+2760369-dirty: Mon Dec 10 08:04:46 PST 2012 r...@wf158.west.isilon.com:/usr/obj/usr/src/sys/ISI-GENERIC amd64 ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic: sbuf_trim makes no sense on sbuf 0xffffff82434d8898 with drain
On Sun, Nov 25, 2012 at 2:26 PM, Niclas Zeising zeising+free...@daemonic.se wrote: Hi! I consistently get this panic while trying to boot a kernel build from r243530. It happens when the entropy harvesting rc.d script starts. r243380 worked fine, I haven't tested any revisions in between. Attached is the backtrace from the kernel, as gotten by kgdb. The machine uses zfs as a root pool, and there have been churn in this area. To my untrained eyes, however, the issue seem related to hdaa.c. Please let me know if I can provide any more information. r243530 added the new sysctl that is causing panic. I'm not sure why there's an sbuf_trim() call; there shouldn't be more than a few \n at the end. IMO the sbuf_trim() can be eliminated. Alternately, the panic check can be removed and we could allow sbuf_trim() to remove any un-emitted whitespace for an sbuf with drain. CC'ing mav@ who introduced the code. (I introduced sbuf drains). Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: FILE's _file can only hold a short
On Thu, Nov 1, 2012 at 7:41 PM, Eitan Adler li...@eitanadler.com wrote: On 1 November 2012 10:40, Ian Lepore free...@damnhippie.dyndns.org wrote: On Wed, 2012-10-31 at 11:12 -0700, m...@freebsd.org wrote: I seem to recall a thread earlier on this limitation, but looking at actual libc/stdio sources, the 4 year old check for open(2)'s fd being less than SHRT_MAX is still there. I thought I saw a patch to change this to an int, but it's not in the tree. Was this in a PR or a mailing list thread or am I just imagining things? We've run into this limitation at work, where some processes have around 32k open file descriptors and then try to use the libc FILE interface. Since we control ABI we can just change this to int, but I had been hoping there was a FreeBSD revision we could pull instead of having another diff. FWIW, I also remember some discussion recently (this year) on some mailing list about this, but I can't find it now. I thought it was somehow related to in-lib versus external uses of the funopen() function, but I may be conflating two unrelated discusssions in my head. Perhaps http://freebsd.1045724.n5.nabble.com/stdio-and-short-file-descriptors-revisited-td5747703.html ? Yes, that was it exactly. Thanks! My (quick) search had not been fruitful. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
FILE's _file can only hold a short
I seem to recall a thread earlier on this limitation, but looking at actual libc/stdio sources, the 4 year old check for open(2)'s fd being less than SHRT_MAX is still there. I thought I saw a patch to change this to an int, but it's not in the tree. Was this in a PR or a mailing list thread or am I just imagining things? We've run into this limitation at work, where some processes have around 32k open file descriptors and then try to use the libc FILE interface. Since we control ABI we can just change this to int, but I had been hoping there was a FreeBSD revision we could pull instead of having another diff. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
make tinderbox failures
Running make tinderbox locally I see failures that aren't being reported by the automated tinderbox builds. I'm curious what's different about the environment. Failures are in the following: arm ARMADAXP kernel failed, check _.arm.ARMADAXP for details mips SENTRY5 kernel failed, check _.mips.SENTRY5 for details mips XLPN32 kernel failed, check _.mips.XLPN32 for details mips XLR kernel failed, check _.mips.XLR for details mips XLRN32 kernel failed, check _.mips.XLRN32 for details _.arm.ARMADAXP: /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c: In function 'platform_mp_start_ap': /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c:181: warning: passing argument 1 of 'pmap_kextract' makes integer from pointer without a cast _.mips.SENTRY5: config: Error: device siba is unknown _.mips.XLPN32 and _.mips.XLRN32: linking kernel.debug stack_machdep.o: In function `stack_capture': /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): undefined reference to `stack_zero' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): relocation truncated to fit: R_MIPS_26 against `stack_zero' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c): undefined reference to `stack_put' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c): relocation truncated to fit: R_MIPS_26 against `stack_put' _.mips.XLR: linking kernel.debug board.o: In function `xlr_board_info_setup': /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): undefined reference to `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): relocation truncated to fit: R_MIPS_26 against `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): undefined reference to `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): relocation truncated to fit: R_MIPS_26 against `__ucmpdi2' Since I don't work on arm or mips I generally ignore these. But it makes it harder to have confidence in a global change when make tinderbox throws errors locally. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: make tinderbox failures
On Fri, Oct 26, 2012 at 3:24 PM, Warner Losh i...@bsdimp.com wrote: On Oct 26, 2012, at 1:08 PM, Garrett Cooper wrote: On Fri, Oct 26, 2012 at 10:26 AM, m...@freebsd.org wrote: Running make tinderbox locally I see failures that aren't being reported by the automated tinderbox builds. I'm curious what's different about the environment. Failures are in the following: arm ARMADAXP kernel failed, check _.arm.ARMADAXP for details mips SENTRY5 kernel failed, check _.mips.SENTRY5 for details mips XLPN32 kernel failed, check _.mips.XLPN32 for details mips XLR kernel failed, check _.mips.XLR for details mips XLRN32 kernel failed, check _.mips.XLRN32 for details _.arm.ARMADAXP: /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c: In function 'platform_mp_start_ap': /data/sb/bsd.git/sys/arm/mv/armadaxp/armadaxp_mp.c:181: warning: passing argument 1 of 'pmap_kextract' makes integer from pointer without a cast This still needs some love... _.mips.SENTRY5: config: Error: device siba is unknown OK. Traced this down and fixed it. I'd planned on moving siba to files, and it looks like I never finished that. My bad. _.mips.XLPN32 and _.mips.XLRN32: linking kernel.debug stack_machdep.o: In function `stack_capture': /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): undefined reference to `stack_zero' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x34): relocation truncated to fit: R_MIPS_26 against `stack_zero' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c): undefined reference to `stack_put' /data/sb/bsd.git/sys/mips/mips/stack_machdep.c:(.text+0x13c): relocation truncated to fit: R_MIPS_26 against `stack_put' stack_machdep needs to depend on options stack or ddb. Fixed. Unsure why other kernels didn't uncover this. _.mips.XLR: linking kernel.debug board.o: In function `xlr_board_info_setup': /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): undefined reference to `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3a4): relocation truncated to fit: R_MIPS_26 against `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): undefined reference to `__ucmpdi2' /data/sb/bsd.git/sys/mips/rmi/board.c:(.text+0x3e4): relocation truncated to fit: R_MIPS_26 against `__ucmpdi2' Fixed earlier this week. Since I don't work on arm or mips I generally ignore these. But it makes it harder to have confidence in a global change when make tinderbox throws errors locally. These are probably due to the changes that Warner made recently to the mips conf files. Some of them... Thanks for fixes! Is the reason the public tinderbox didn't see these because it has explicit lists of arch/CONF kernels to build, while make tinderbox on my machine just builds all the capitalized files as configuration files? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic with DEBUG_MEMGUARD on PowerPC
On Sat, Jul 14, 2012 at 8:39 AM, Justin Hibbits chmeeed...@gmail.com wrote: On Jul 13, 2012, at 12:20 AM, m...@freebsd.org wrote: On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits chmeeed...@gmail.com wrote: On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote: On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits chmeeed...@gmail.com wrote: When tracking down a panic exposed by INVARIANTS, I tried setting DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory. However, this causes a panic at bootup. It shows up right after the first WARNING: WITNESS message, with the following: Tracing, and printf() debugging, I see arguments to vm_map_findspace(): start: 0xD000, length: 4246446080, and map-max_offset = 4026531839. Beyond that, I'm lost with tracking this down. Machine is a dual processor PowerPC G4, with 2GB RAM. The length is 0xFD1BA000 which is almost 4GB. Asking for 4GB of virtual space for 2GB of RAM sounds about right (it's been a while since I was in this code), unless this is a 32-bit kernel, in which case it'd be too much since there isn't that much virtual space available. So, is the kernel 32-bit? What are the values used and returned by memguard_fudge()? The intent of that routine is to get kmeminit() to allocate a larger map so memguard can use part of it for private virtual addresses. But it shouldn't be asking for too much; i.e. the intent was to check both physical and virtual space available and be greedy, but not too greedy. There were some issues with that code for some platforms that e.g. didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425. It is a 32-bit kernel, on 32-bit hardware. The values for memguard_fudge are (defaults): tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0 When setting vm.kmem_size/vm.kmem_size_max to 2GB they are: tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648 (all 2GB). But the start and map-max_offset remain the same on all runs I make. memguard_fudge is still broken for 32-bit architectures with no vm_kmem_max. In the absence of a km_max to limit the value, we essentially use twice the physical memory for the virtual limit. But with 2GB on a 32-bit machine, this requires 4GB of virtual space. Setting vm_kmem_size_max to 2GB should work; I'd expect to see tmp=about 200MB, which is much larger than the input 112MB but the allocation should work. But I don't really know what else PowerPC has need of for virtual space, so that still could be too large. You can try smaller values of vm_kmem_size_max, like 1GB or 512MB. You shouldn't need to set vm_kmem_size at all. At some point the added space for the memguard_map will be small enough that the kmem_suballoc will work. Hmm, what is the min_offset and max_offset of kernel_map when the call to memguard_fudge is made? Thanks, matthew Without setting vm.kmem_size/vm.kmem_size_max, I see the following: map: 0x100, min_offset: 0xD000, max_offset: 0xEFFF It does boot when I set vm.kmem_size=256M/vm.kmem_size_max=512M. When I tried 512M/1024M, it panicked at the same place -- kmem_suballoc from kmeminit. So it looks like I have to set vm.kmem_size/vm.kmem_size_max way back in order for it to boot with memguard(9). Please try the attached patch (or at http://people.freebsd.org/~mdf/memguard.diff). Thanks, matthew memguard.diff Description: Binary data ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic with DEBUG_MEMGUARD on PowerPC
On Sat, Jul 14, 2012 at 8:39 AM, Justin Hibbits chmeeed...@gmail.com wrote: On Jul 13, 2012, at 12:20 AM, m...@freebsd.org wrote: On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits chmeeed...@gmail.com wrote: On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote: On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits chmeeed...@gmail.com wrote: When tracking down a panic exposed by INVARIANTS, I tried setting DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory. However, this causes a panic at bootup. It shows up right after the first WARNING: WITNESS message, with the following: Tracing, and printf() debugging, I see arguments to vm_map_findspace(): start: 0xD000, length: 4246446080, and map-max_offset = 4026531839. Beyond that, I'm lost with tracking this down. Machine is a dual processor PowerPC G4, with 2GB RAM. The length is 0xFD1BA000 which is almost 4GB. Asking for 4GB of virtual space for 2GB of RAM sounds about right (it's been a while since I was in this code), unless this is a 32-bit kernel, in which case it'd be too much since there isn't that much virtual space available. So, is the kernel 32-bit? What are the values used and returned by memguard_fudge()? The intent of that routine is to get kmeminit() to allocate a larger map so memguard can use part of it for private virtual addresses. But it shouldn't be asking for too much; i.e. the intent was to check both physical and virtual space available and be greedy, but not too greedy. There were some issues with that code for some platforms that e.g. didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425. It is a 32-bit kernel, on 32-bit hardware. The values for memguard_fudge are (defaults): tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0 When setting vm.kmem_size/vm.kmem_size_max to 2GB they are: tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648 (all 2GB). But the start and map-max_offset remain the same on all runs I make. memguard_fudge is still broken for 32-bit architectures with no vm_kmem_max. In the absence of a km_max to limit the value, we essentially use twice the physical memory for the virtual limit. But with 2GB on a 32-bit machine, this requires 4GB of virtual space. Setting vm_kmem_size_max to 2GB should work; I'd expect to see tmp=about 200MB, which is much larger than the input 112MB but the allocation should work. But I don't really know what else PowerPC has need of for virtual space, so that still could be too large. You can try smaller values of vm_kmem_size_max, like 1GB or 512MB. You shouldn't need to set vm_kmem_size at all. At some point the added space for the memguard_map will be small enough that the kmem_suballoc will work. Hmm, what is the min_offset and max_offset of kernel_map when the call to memguard_fudge is made? Thanks, matthew Without setting vm.kmem_size/vm.kmem_size_max, I see the following: map: 0x100, min_offset: 0xD000, max_offset: 0xEFFF It does boot when I set vm.kmem_size=256M/vm.kmem_size_max=512M. When I tried 512M/1024M, it panicked at the same place -- kmem_suballoc from kmeminit. So it looks like I have to set vm.kmem_size/vm.kmem_size_max way back in order for it to boot with memguard(9). I'll see about crafting a patch when I can get access to a machine that works (my current woes with my laptop, VM image, etc. are quite frustrating). The gist is that, in the absence of a kmem_max, the routine for determining the size of the kmem map should use the kernel_map's size as a limiting factor. In this case it looks like the map's size is 512MB. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic with DEBUG_MEMGUARD on PowerPC
On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits chmeeed...@gmail.com wrote: When tracking down a panic exposed by INVARIANTS, I tried setting DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory. However, this causes a panic at bootup. It shows up right after the first WARNING: WITNESS message, with the following: panic: kmem_suballoc: bad status return of 3 cpuid = 0 KDB: stack backtrace: 0xd0004ad0: at kdb_backtrace+0x4c 0xd0004b40: at panic+0x224 0xd0004ba0: at kmem_suballoc+0x8c 0xd0004bd0: at kmeminit+0x1ac 0xd0004c20: at mi_startup+0x13c 0xd0004c50: at btext+0xc0 Tracing, and printf() debugging, I see arguments to vm_map_findspace(): start: 0xD000, length: 4246446080, and map-max_offset = 4026531839. Beyond that, I'm lost with tracking this down. Machine is a dual processor PowerPC G4, with 2GB RAM. The length is 0xFD1BA000 which is almost 4GB. Asking for 4GB of virtual space for 2GB of RAM sounds about right (it's been a while since I was in this code), unless this is a 32-bit kernel, in which case it'd be too much since there isn't that much virtual space available. So, is the kernel 32-bit? What are the values used and returned by memguard_fudge()? The intent of that routine is to get kmeminit() to allocate a larger map so memguard can use part of it for private virtual addresses. But it shouldn't be asking for too much; i.e. the intent was to check both physical and virtual space available and be greedy, but not too greedy. There were some issues with that code for some platforms that e.g. didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: panic with DEBUG_MEMGUARD on PowerPC
On Thu, Jul 12, 2012 at 6:33 PM, Justin Hibbits chmeeed...@gmail.com wrote: On Jul 12, 2012, at 9:11 PM, m...@freebsd.org wrote: On Thu, Jul 12, 2012 at 4:43 PM, Justin Hibbits chmeeed...@gmail.com wrote: When tracking down a panic exposed by INVARIANTS, I tried setting DEBUG_MEMGUARD, so I could find the culprit that's trashing freed memory. However, this causes a panic at bootup. It shows up right after the first WARNING: WITNESS message, with the following: Tracing, and printf() debugging, I see arguments to vm_map_findspace(): start: 0xD000, length: 4246446080, and map-max_offset = 4026531839. Beyond that, I'm lost with tracking this down. Machine is a dual processor PowerPC G4, with 2GB RAM. The length is 0xFD1BA000 which is almost 4GB. Asking for 4GB of virtual space for 2GB of RAM sounds about right (it's been a while since I was in this code), unless this is a 32-bit kernel, in which case it'd be too much since there isn't that much virtual space available. So, is the kernel 32-bit? What are the values used and returned by memguard_fudge()? The intent of that routine is to get kmeminit() to allocate a larger map so memguard can use part of it for private virtual addresses. But it shouldn't be asking for too much; i.e. the intent was to check both physical and virtual space available and be greedy, but not too greedy. There were some issues with that code for some platforms that e.g. didn't define a VM_KMEM_SIZE_MAX, but alc@ fixed that in r216425. It is a 32-bit kernel, on 32-bit hardware. The values for memguard_fudge are (defaults): tmp: 4246446080, vm_kmem_size: 117440512, vm_kmem_size_max: 0 When setting vm.kmem_size/vm.kmem_size_max to 2GB they are: tmp: 2147483648, vm_kmem_size: 214793648, vm_kmem_sizee_max: 2147483648 (all 2GB). But the start and map-max_offset remain the same on all runs I make. memguard_fudge is still broken for 32-bit architectures with no vm_kmem_max. In the absence of a km_max to limit the value, we essentially use twice the physical memory for the virtual limit. But with 2GB on a 32-bit machine, this requires 4GB of virtual space. Setting vm_kmem_size_max to 2GB should work; I'd expect to see tmp=about 200MB, which is much larger than the input 112MB but the allocation should work. But I don't really know what else PowerPC has need of for virtual space, so that still could be too large. You can try smaller values of vm_kmem_size_max, like 1GB or 512MB. You shouldn't need to set vm_kmem_size at all. At some point the added space for the memguard_map will be small enough that the kmem_suballoc will work. Hmm, what is the min_offset and max_offset of kernel_map when the call to memguard_fudge is made? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sysctl filesystem ?
On Tue, Jun 26, 2012 at 1:59 AM, Robert Watson rwat...@freebsd.org wrote: On Tue, 26 Jun 2012, Chris Rees wrote: as well as we don't depend of /proc for normal operation we shouldn't for say /proc/sysctl improvements are welcome, better documentation is welcome, changes to what is OK - isn't. /proc/sysctl might be useful. Just because Linux uses it doesn't make it a bad idea. One of the problems we've encounted with synthetic file systems is that off-the-shelf file system tools (e.g., cp, dd, cat) make simplistic (but not unreasonable) assumptions about the statistic content of files. This comes up frequently with procfs-like systems where the size of, say, memory map data can be considerably larger than the perhaps 128-byte, 256-byte, or even 8k buffers that might exist in a stock file access tool. Unless we change all of those tools to use buffers much bigger than they currently do, which even suggets changing the C library buffer to defaults for FILE *, that places an onus on the file system to provide persisting snapshots of data until it's sure that a user process is done -- e.g., over many system calls. sysctl is not immune to the requirement of atomicity, but it has explicit control over it: sysctl is a single system call, rather than an unbounded open-read-seek-repeat-etc cycle, and has been carefully crafted to provide this and other MIB-like properties, such as a basic data type model so that command line tools know how to render content rather than having to guess and/or get it wrong. sysctl has some file-system like properties, but on the whole, it's not a file system -- it's much more like an SNMP MIB. While you can map anything into anything (including Turing machines), I think the sysctl command line tool and API, despite its limitations, is a better match for accessing this sort of monitoring and control data than the POSIX file API, and would recommend against trying to move to a sysctl file system. While I understand the problems you allude to, the sysctl(8) binary can protect itself from them. IMO the biggest problem with sysctls not being files is that it makes no sense from the core UNIX philosophy that everything is a file. Sockets and pipes and character devices and even unseekable things like stdout are files; why aren't these other objects that allow read, write, and have their own namespace? Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
2012/1/17 Gleb Smirnoff gleb...@freebsd.org: New panic has been introduced somewhere between r229851 and r229932, that happens on shutdown if kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. Uptime: 1h0m17s Rebooting... panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq $0,0x514d32(%rip) db db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 --More--^M ^Msys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db db show locks exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 db So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). No mutexes should be obtained after it. However, since cpu_reset() does priting we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in the subr_witness.c as mutex to obtain before smp rendezvous, this triggers yet another printf from witness, that finally panics due to recursing on cnputs_mtx. At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't seem possible to fit it into the heirarchy in any sane way, since a print can come from basically anywhere. If anyone has a better fix, that'd be great, but I haven't been able to think of one. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SU+J systems do not fsck themselves
On Wed, Dec 28, 2011 at 8:54 AM, Maxim Khitrov m...@mxcrypt.com wrote: On Wed, Dec 28, 2011 at 11:42 AM, Matthias Andree matthias.and...@gmx.de wrote: Am 27.12.2011 22:53, schrieb David Thiel: I've had multiple machines now (9.0-RC3, amd64, i386 and earlier 9-CURRENT on ppc) running SU+J that have had unexplained panics and crashes start happening relating to disk I/O. When I end up running a full fsck, it keeps turning out that the disk is dirty and corrupted, but no mechanism is in place with SU+J to detect and fix this. A bgfsck never happens, but a manual fsck in single-user does indeed fix the crashing and weird behavior. Others have tested their SU+J volumes and found them to have errors as well. This makes me super nervous. The one thing I figured is that in the light of power outages, or crashing virtualization hosts, you really really really need to disable disk write caches, and this affects softupdates, journalling, asynch file systems, just about everything. The fact that makes matters worse is that journalling or softupdates allow you to mount a silently-corrupted file system, whereas the traditional UFS/UFS2 sync/asynch mounts will fsck themselves in the foreground, so they get fixed before the FS panics. So can you be sure that: - your driver, chip set and hard disk execute ordered writes in order, If they don't, it's a bug. Not that there isn't buggy firmware out there, but each layer of software does need to rely on the one below actually doing what it's promised. - your driver, chip set and hard disk actually write data to permanent storage BEFORE acknowledging a successful write? Not required by SU as they use an explicit BIO_FLUSH which should be handled by the driver. Whenever I fixed these issues, I had no more corruptions. For ata and sata, there are loader tunables you will want to set, hw.ata.wc=0 and kern.cam.ada.write_cache=0. This should not be necessary if the driver and firmware are not buggy. If your drives are under ada, ad, or ahci related control, try these settings. For SCSI, use camcontrol to turn the write cache off. softupdates is supposed to rectify most of the performance penalties incurred. Note also that you needed to set ahci_load=YES and atapicam_load=YES in 8.X, I've never bothered to check 7.X or 9.X WRT these settings. This is a bit off-topic, but I'm curious what the effect of NCQ is on softupdates? Since that too has the ability to reorder writes to disk, should it be disabled in addition to cache? SU doesn't care about write ordering, as long as everything before a BIO_FLUSH is really flushed by the time the BIO_FLUSH is acknowledged. Cheers, matthew Also, I would say that if you are using a hardware raid controller with a BBU, then allowing the use of controller's cache and write-back policy should be safe for use with softupdates. Any caching mechanism, for that matter, that has a separate power supply source should be ok. For example, the Intel 320 SSDs have a few on-board capacitors that are used to flush the cache in the event of a power loss. - Max ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662
On Tue, Dec 20, 2011 at 5:52 AM, John Baldwin j...@freebsd.org wrote: On Saturday, December 17, 2011 10:41:15 pm m...@freebsd.org wrote: On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev kab...@gmail.com wrote: On Sun, 18 Dec 2011 01:09:00 +0100 O. Hartmann ohart...@zedat.fu-berlin.de wrote: Sleeping thread (tid 100033, pid 16) owns a non sleepable lock panic: sleeping thread cpuid = 0 PID 16 is always USB on my box. You really need to give us a backtrace when you quote panics. It is impossible to make any sense of the above panic message without more context. In the case of this panic, the stack of the thread which panics is useless; it's someone trying to propagate priority that discovered it. A backtrace on tid 100033 would be useful. With WITNESS enabled, it's possible to have this panic display the stack of the incorrectly sleeping thread at the time it acquired the lock, as well, but this code isn't in CURRENT or any release. I have a patch at $WORK I can dig up on Monday. Huh? The stock kernel dumps a stack trace of the offending thread if you have DDB enabled: /* * If the thread is asleep, then we are probably about * to deadlock. To make debugging this easier, just * panic and tell the user which thread misbehaved so * they can hopefully get a stack trace from the truly * misbehaving thread. */ if (TD_IS_SLEEPING(td)) { printf( Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n, td-td_tid, td-td_proc-p_pid); #ifdef DDB db_trace_thread(td, -1); #endif panic(sleeping thread); } Hmm, maybe this wasn't in 7, or maybe I'm just remembering that we added code to print *which* lock it holds (using WITNESS data). I do recall that this panic alone was often not sufficient to debug the problem. Thanks, matthew It may be that we can make use of the STACK API here instead to output this trace even when DDB isn't enabled. The patch below tries to do that (untested). It does some odd thigns though since it is effectively running from a panic context already, so it uses a statically allocated 'struct stack' rather than using stack_create() and uses stack_print_ddb() since it is holding spin locks and can't possibly grab an sx lock: Index: subr_turnstile.c === --- subr_turnstile.c (revision 228534) +++ subr_turnstile.c (working copy) @@ -72,6 +72,7 @@ __FBSDID($FreeBSD$); #include sys/proc.h #include sys/queue.h #include sys/sched.h +#include sys/stack.h #include sys/sysctl.h #include sys/turnstile.h @@ -175,6 +176,7 @@ static void turnstile_fini(void *mem, int size); static void propagate_priority(struct thread *td) { + static struct stack st; struct turnstile *ts; int pri; @@ -217,8 +219,10 @@ propagate_priority(struct thread *td) printf( Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n, td-td_tid, td-td_proc-p_pid); -#ifdef DDB - db_trace_thread(td, -1); +#ifdef STACK + stack_zero(st); + stack_save_td(st, td); + stack_print_ddb(st); #endif panic(sleeping thread); } -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662
On Tue, Dec 20, 2011 at 6:32 AM, John Baldwin j...@freebsd.org wrote: On Tuesday, December 20, 2011 9:22:48 am m...@freebsd.org wrote: On Tue, Dec 20, 2011 at 5:52 AM, John Baldwin j...@freebsd.org wrote: On Saturday, December 17, 2011 10:41:15 pm m...@freebsd.org wrote: On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev kab...@gmail.com wrote: On Sun, 18 Dec 2011 01:09:00 +0100 O. Hartmann ohart...@zedat.fu-berlin.de wrote: Sleeping thread (tid 100033, pid 16) owns a non sleepable lock panic: sleeping thread cpuid = 0 PID 16 is always USB on my box. You really need to give us a backtrace when you quote panics. It is impossible to make any sense of the above panic message without more context. In the case of this panic, the stack of the thread which panics is useless; it's someone trying to propagate priority that discovered it. A backtrace on tid 100033 would be useful. With WITNESS enabled, it's possible to have this panic display the stack of the incorrectly sleeping thread at the time it acquired the lock, as well, but this code isn't in CURRENT or any release. I have a patch at $WORK I can dig up on Monday. Huh? The stock kernel dumps a stack trace of the offending thread if you have DDB enabled: /* * If the thread is asleep, then we are probably about * to deadlock. To make debugging this easier, just * panic and tell the user which thread misbehaved so * they can hopefully get a stack trace from the truly * misbehaving thread. */ if (TD_IS_SLEEPING(td)) { printf( Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n, td-td_tid, td-td_proc-p_pid); #ifdef DDB db_trace_thread(td, -1); #endif panic(sleeping thread); } Hmm, maybe this wasn't in 7, or maybe I'm just remembering that we added code to print *which* lock it holds (using WITNESS data). I do recall that this panic alone was often not sufficient to debug the problem. I think the db_trace_thread() has been around for a while (since 5 or 6), but it is true that we don't tell you which lock is held even with this. That might be a useful thing to output before the panic. This patch isn't quite right since I had to hand-edit it. There's a small chance I can commit this in the near future, but of someone else wants to take it, feel free. Style isn't yet fixed up to be FreeBSD standard either. --- /data/sb/bsd.git/sys/kern/subr_turnstile.c 2011-12-12 10:23:12.542196632 -0800 +++ kern/subr_turnstile.c 2011-12-09 10:59:29.882643558 -0800 @@ -165,10 +165,43 @@ static voidturnstile_dtor(void *mem, int size, void *arg); #endif static int turnstile_init(void *mem, int size, int flags); static voidturnstile_fini(void *mem, int size); +#ifdef INVARIANTS +static void +sleeping_thread_owns_a_nonsleepable_lock(struct thread *td) +{ + printf(Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n, + td-td_tid, td-td_proc-p_pid); +#ifdef DDB + db_trace_thread(td, -1); +#endif +#ifdef WITNESS + struct lock_list_entry *lock_list, *lle; + int i; + + lock_list = td-td_sleeplocks; + if (lock_list == NULL || lock_list-ll_count == 0) { + printf(Thread does not appear to hold any mutexes!\n); + return; + } + + for (lle = lock_list; lle != NULL; lle = lle-ll_next) { + for (i = lle-ll_count - 1; i = 0; i--) { + struct lock_instance *li = lle-ll_children[i]; + + printf(Lock %s acquired at %s:%d\n, + li-li_lock-lo_name, li-li_file, li-li_line); + } + } +#endif /* WITNESS */ +} +#else +#define sleeping_thread_owns_a_nonsleepable_lock(td) do { } while (0) +#endif /* INVARIANTS */ + /* * Walks the chain of turnstiles and their owners to propagate the priority * of the thread being blocked to all the threads holding locks that have to * release their locks before this thread can run again. */ @@ -210,19 +243,31 @@ * If the thread is asleep, then we are probably about * to deadlock. To make debugging this easier, just * panic and tell the user which thread misbehaved so * they can hopefully get a stack trace from the truly * misbehaving thread. */ if (TD_IS_SLEEPING(td)) { - printf( - Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n, - td-td_tid, td-td_proc-p_pid); -#ifdef DDB - db_trace_thread(td, -1); -#endif - panic(sleeping thread); +
Re: extattr_set_*() return type
On Tue, Dec 20, 2011 at 1:49 PM, John Baldwin j...@freebsd.org wrote: Hmm, if these functions are expected to operate like 'write(2)' and are supposed to return the number of bytes written, shouldn't their return value be 'ssize_t' instead of 'int'? It looks like the system calls themselves already do the right thing in setting td_retval[] (they assign a ssize_t to it and td_retval[0] can hold a ssize_t on all of our current platforms). It would seem that the only change would be to the header and probably syscalls.master. I guess this would require a symver bump to fix though. An extended attribute larger than 2GB is a programming abuse, though. Technically int may not be 32 bits but it is on all supported platforms now. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662
On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev kab...@gmail.com wrote: On Sun, 18 Dec 2011 01:09:00 +0100 O. Hartmann ohart...@zedat.fu-berlin.de wrote: Sleeping thread (tid 100033, pid 16) owns a non sleepable lock panic: sleeping thread cpuid = 0 PID 16 is always USB on my box. You really need to give us a backtrace when you quote panics. It is impossible to make any sense of the above panic message without more context. In the case of this panic, the stack of the thread which panics is useless; it's someone trying to propagate priority that discovered it. A backtrace on tid 100033 would be useful. With WITNESS enabled, it's possible to have this panic display the stack of the incorrectly sleeping thread at the time it acquired the lock, as well, but this code isn't in CURRENT or any release. I have a patch at $WORK I can dig up on Monday. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko fi...@ukr.net wrote: В Wed, 14 Dec 2011 00:04:42 +0100 Jilles Tjoelker jil...@stack.nl пишет: On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote: If the algorithm ULE does not contain problems - it means the problem has Core2Duo, or in a piece of code that uses the ULE scheduler. I already wrote in a mailing list that specifically in my case (Core2Duo) partially helps the following patch: --- sched_ule.c.orig 2011-11-24 18:11:48.0 +0200 +++ sched_ule.c 2011-12-10 22:47:08.0 +0200 @@ -794,7 +794,8 @@ * 1.5 * balance_interval. */ balance_ticks = max(balance_interval / 2, 1); - balance_ticks += random() % balance_interval; +// balance_ticks += random() % balance_interval; + balance_ticks += ((int)random()) % balance_interval; if (smp_started == 0 || rebalance == 0) return; tdq = TDQ_SELF(); This avoids a 64-bit division on 64-bit platforms but seems to have no effect otherwise. Because this function is not called very often, the change seems unlikely to help. Yes, this section does not apply to this problem :) Just I posted the latest patch which i using now... @@ -2118,13 +2119,21 @@ struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); + if (td-td_pri_class PRI_FIFO_BIT) + return; + ts = td-td_sched; + /* + * We used up one time slice. + */ + if (--ts-ts_slice 0) + return; This skips most of the periodic functionality (long term load balancer, saving switch count (?), insert index (?), interactivity score update for long running thread) if the thread is not going to be rescheduled right now. It looks wrong but it is a data point if it helps your workload. Yes, I did it for as long as possible to delay the execution of the code in section: ... #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ if (balance_tdq == tdq) { if (balance_ticks --balance_ticks == 0) sched_balance(); } #endif ... tdq = TDQ_SELF(); #ifdef SMP /* * We run the long term load balancer infrequently on the first cpu. */ - if (balance_tdq == tdq) { - if (balance_ticks --balance_ticks == 0) + if (balance_ticks --balance_ticks == 0) { + if (balance_tdq == tdq) sched_balance(); } #endif The main effect of this appears to be to disable the long term load balancer completely after some time. At some point, a CPU other than the first CPU (which uses balance_tdq) will set balance_ticks = 0, and sched_balance() will never be called again. That is, for the same reason as above in the text... It also introduces a hypothetical race condition because the access to balance_ticks is no longer restricted to one CPU under a spinlock. If the long term load balancer may be causing trouble, try setting kern.sched.balance_interval to a higher value with unpatched code. I checked it in the first place - but it did not help fix the situation... The impression of malfunction rebalancing... It seems that the thread is passed on to the same core that is loaded and so... Perhaps this is a consequence of an incorrect definition of the topology CPU? @@ -2144,9 +2153,6 @@ if (TAILQ_EMPTY(tdq-tdq_timeshare.rq_queues[tdq-tdq_ridx])) tdq-tdq_ridx = tdq-tdq_idx; } - ts = td-td_sched; - if (td-td_pri_class PRI_FIFO_BIT) - return; if (PRI_BASE(td-td_pri_class) == PRI_TIMESHARE) { /* * We used a tick; charge it to the thread so @@ -2157,11 +2163,6 @@ sched_priority(td); } /* - * We used up one time slice. - */ - if (--ts-ts_slice 0) - return; - /* * We're out of time, force a requeue at userret(). */ ts-ts_slice = sched_slice; and refusal to use options FULL_PREEMPTION But no one has unsubscribed to my letter, my patch helps or not in the case of Core2Duo... There is a suspicion that the problems stem from the sections of code associated with the SMP... Maybe I'm in something wrong, but I want to help in solving this problem ... Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ? I don't remember what our specific problem at $WORK was, perhaps it was just interrupt threads not getting serviced fast enough, but we've hard-coded this to 1 and removed the code that sets it in sched_initticks(). The same effect should be had by setting the sysctl after a box is up. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: SCHED_ULE should not be the default
On Mon, Dec 12, 2011 at 7:32 AM, Gary Jennejohn gljennj...@googlemail.com wrote: On Mon, 12 Dec 2011 15:13:00 + Vincent Hoffman vi...@unsane.co.uk wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/12/2011 13:47, O. Hartmann wrote: Not fully right, boinc defaults to run on idprio 31 so this isn't an issue. And yes, there are cases where SCHED_ULE shows much better performance then SCHED_4BSD. [...] Do we have any proof at hand for such cases where SCHED_ULE performs much better than SCHED_4BSD? Whenever the subject comes up, it is mentioned, that SCHED_ULE has better performance on boxes with a ncpu 2. But in the end I see here contradictionary statements. People complain about poor performance (especially in scientific environments), and other give contra not being the case. It all a little old now but some if the stuff in http://people.freebsd.org/~kris/scaling/ covers improvements that were seen. http://jeffr-tech.livejournal.com/5705.html shows a little too, reading though Jeffs blog is worth it as it has some interesting stuff on SHED_ULE. I thought there were some more benchmarks floating round but cant find any with a quick google. Vince Within our department, we developed a highly scalable code for planetary science purposes on imagery. It utilizes present GPUs via OpenCL if present. Otherwise it grabs as many cores as it can. By the end of this year I'll get a new desktop box based on Intels new Sandy Bridge-E architecture with plenty of memory. If the colleague who developed the code is willing performing some benchmarks on the same hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most recent Suse. For FreeBSD I intent also to look for performance with both different schedulers available. These observations are not scientific, but I have a CPU from AMD with 6 cores (AMD Phenom(tm) II X6 1090T Processor). My simple test was ``make buildkernel'' while watching the core usage with gkrellm. With SCHED_4BSD all 6 cores are loaded to 97% during the build phase. I've never seen any value above 97% with gkrellm. With SCHED_ULE I never saw all 6 cores loaded this heavily. Usually 2 or more cores were at or below 90%. Not really that significant, but still a noticeable difference in apparent scheduling behavior. Whether the observed difference is due to some change in data from the kernel to gkrellm is beyond me. SCHED_ULE is much sloppier about calculating which thread used a timeslice -- unless the timeslice went 100% to a thread, the fraction it used may get attributed elsewhere. So top's reporting of thread usage is not a useful metric. Total buildworld time is, potentially. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Stop scheduler on panic
On Fri, Dec 2, 2011 at 2:05 AM, Andriy Gapon a...@freebsd.org wrote: on 02/12/2011 06:36 John Baldwin said the following: Ah, ok (I had thought SCHEDULER_STOPPED was going to always be true when kdb was active). But I think these two changes should cover critical_exit() ok. I attempted to start a discussion about this a few times already :-) Should we treat kdb context the same as SCHEDULER_STOPPED context (in the current definition) ? That is, skip all locks in the same fashion? There are pros and contras. Does kdb pause all CPUs with an interrupt (NMI or regular interrupt, I can no longer remember...) when it enters? If so, then I'd say whether it enters via sysctl or panic doesn't matter. It's in a special environment where nothing else is running, which is what is needed for proper exploration of the machine (via breakpoint, for debugging a hang, etc). Maybe the question is, why wouldn't SCHEDULER_STOPPED be true regardless of how kdb is entered? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Stop scheduler on panic
On Thu, Nov 17, 2011 at 12:54 PM, Attilio Rao atti...@freebsd.org wrote: 2011/11/17 Andriy Gapon a...@freebsd.org: BTW, it is my opinion that we really should not let the debugger code call mi_switch for any reason. Yes, I agree with this, this is why the sched_bind() in boot() is broken (immagine calling things like doadump from KDB. KDB right now can be thought as a first cut of this patch because it does disable the CPUs when entering the context, thus, the bug here is that if you stop all CPUs including CPU0 and later on you want bind on it you are death). Another patch related to this area we have at $WORK: #if defined(SMP) - /* -* Bind us to CPU 0 so that all shutdown code runs there. Some -* systems don't shutdown properly (i.e., ACPI power off) if we -* run on another processor. -*/ - thread_lock(curthread); - sched_bind(curthread, 0); - thread_unlock(curthread); - KASSERT(PCPU_GET(cpuid) == 0, (%s: not running on cpu 0, __func__)); + /* +* sched_bind can't be done reliably inside of panic. cpu_reset() will +* rebind us in any case, more reliably. +*/ + if (panicstr == NULL) { + /* +* Bind us to CPU 0 so that all shutdown code runs there. Some +* systems don't shutdown properly (i.e., ACPI power off) if we +* run on another processor. +*/ + thread_lock(curthread); + sched_bind(curthread, 0); + thread_unlock(curthread); + KASSERT(PCPU_GET(cpuid) == 0, (boot: not running on cpu 0)); + } #endif /* We're in the process of rebooting. */ rebooting = 1; Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Stop scheduler on panic
On Thu, Nov 17, 2011 at 1:08 PM, Andriy Gapon a...@freebsd.org wrote: on 17/11/2011 23:02 m...@freebsd.org said the following: Another patch related to this area we have at $WORK: #if defined(SMP) - /* - * Bind us to CPU 0 so that all shutdown code runs there. Some - * systems don't shutdown properly (i.e., ACPI power off) if we - * run on another processor. - */ - thread_lock(curthread); - sched_bind(curthread, 0); - thread_unlock(curthread); - KASSERT(PCPU_GET(cpuid) == 0, (%s: not running on cpu 0, __func__)); + /* + * sched_bind can't be done reliably inside of panic. cpu_reset() will + * rebind us in any case, more reliably. + */ + if (panicstr == NULL) { + /* + * Bind us to CPU 0 so that all shutdown code runs there. Some + * systems don't shutdown properly (i.e., ACPI power off) if we + * run on another processor. + */ + thread_lock(curthread); + sched_bind(curthread, 0); + thread_unlock(curthread); + KASSERT(PCPU_GET(cpuid) == 0, (boot: not running on cpu 0)); + } #endif /* We're in the process of rebooting. */ rebooting = 1; Yes, you have contributed this patch before and it is a part of the bigger stop-scheduler-on-panic patches. Have you had a chance to review those? :) It's been so long I don't remember what I've sent where. I did look over the patch but had no additional comments. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]
On Tue, Nov 15, 2011 at 10:15 AM, Attilio Rao atti...@freebsd.org wrote: 2011/11/7 Kostik Belousov kostik...@gmail.com: On Mon, Nov 07, 2011 at 11:45:38AM -0600, Alan Cox wrote: Ok. I'll offer one final suggestion. Please consider an alternative suffix to func. Perhaps, kbi or KBI. In other words, something that hints at the function's reason for existing. Sure. Below is the extraction of only vm_page_lock() bits, together with the suggested rename. When Attilio provides the promised simplification of the mutex KPI, this can be reduced. My tentative patch is here: http://www.freebsd.org/~attilio/mutexfileline.patch I need to make more compile testing later, but it already compiles GENERIC + modules fine on HEAD. The patch provides a common entrypoint, option independent, for both fast case and debug/compat case. Additively, it almost entirely fixes the standard violation of the reserved namespace, as you described (the notable exception being the macro used in the fast path, that I want to fix as well, but in a separate commit). Now the file/line couplet can be passed to the _ suffix variant of the flag functions. eadler@ reviewed the mutex.h comment. Please let me know what you think about it, as long as we agree on the patch I'll commit it. Out of curiosity, why are function names explicitly spelled out in panic and log messages, instead of using %s and __func__? I've seen this all around FreeBSD, and if there's no reason otherwise, I'd just as soon change to a version that doesn't need updating when the function names change. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]
On Mon, Nov 7, 2011 at 11:35 AM, Kostik Belousov kostik...@gmail.com wrote: On Mon, Nov 07, 2011 at 11:45:38AM -0600, Alan Cox wrote: Ok. I'll offer one final suggestion. Please consider an alternative suffix to func. Perhaps, kbi or KBI. In other words, something that hints at the function's reason for existing. Sure. Below is the extraction of only vm_page_lock() bits, together with the suggested rename. When Attilio provides the promised simplification of the mutex KPI, this can be reduced. diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c index 389aea5..ea4ea34 100644 --- a/sys/vm/vm_page.c +++ b/sys/vm/vm_page.c @@ -2677,6 +2677,44 @@ vm_page_test_dirty(vm_page_t m) vm_page_dirty(m); } +void +vm_page_lock_KBI(vm_page_t m, const char *file, int line) +{ + +#if LOCK_DEBUG 0 || defined(MUTEX_NOINLINE) + _mtx_lock_flags(vm_page_lockptr(m), 0, file, line); +#else + __mtx_lock(vm_page_lockptr(m), 0, file, line); +#endif +} + +void +vm_page_unlock_KBI(vm_page_t m, const char *file, int line) +{ + +#if LOCK_DEBUG 0 || defined(MUTEX_NOINLINE) + _mtx_unlock_flags(vm_page_lockptr(m), 0, file, line); +#else + __mtx_unlock(vm_page_lockptr(m), curthread, 0, file, line); +#endif +} + +int +vm_page_trylock_KBI(vm_page_t m, const char *file, int line) +{ + + return (_mtx_trylock(vm_page_lockptr(m), 0, file, line)); +} + +void +vm_page_lock_assert_KBI(vm_page_t m, int a, const char *file, int line) +{ + +#ifdef INVARIANTS + _mtx_assert(vm_page_lockptr(m), a, file, line); +#endif +} + int so_zerocp_fullpage = 0; /* diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h index 7099b70..a5604b7 100644 --- a/sys/vm/vm_page.h +++ b/sys/vm/vm_page.h @@ -218,11 +218,23 @@ extern struct vpglocks pa_lock[]; #define PA_LOCK_ASSERT(pa, a) mtx_assert(PA_LOCKPTR(pa), (a)) +#ifdef KLD_MODULE +#define vm_page_lock(m) vm_page_lock_KBI((m), LOCK_FILE, LOCK_LINE) +#define vm_page_unlock(m) vm_page_unlock_KBI((m), LOCK_FILE, LOCK_LINE) +#define vm_page_trylock(m) vm_page_trylock_KBI((m), LOCK_FILE, LOCK_LINE) +#ifdef INVARIANTS +#define vm_page_lock_assert(m, a) \ + vm_page_lock_assert_KBI((m), (a), LOCK_FILE, LOCK_LINE) +#else +#define vm_page_lock_assert(m, a) +#endif +#else /* KLD_MODULE */ #define vm_page_lockptr(m) (PA_LOCKPTR(VM_PAGE_TO_PHYS((m Is it not possible to have vm_page_lockptr() be a function for the KLD_MODULE case? Because then the vm_page_lock() functions and friends would all just use mtx_lock, etc., in both the KLD and non-KLD case. Or am I missing something? Thanks, matthew #define vm_page_lock(m) mtx_lock(vm_page_lockptr((m))) #define vm_page_unlock(m) mtx_unlock(vm_page_lockptr((m))) #define vm_page_trylock(m) mtx_trylock(vm_page_lockptr((m))) #define vm_page_lock_assert(m, a) mtx_assert(vm_page_lockptr((m)), (a)) +#endif #define vm_page_queue_free_mtx vm_page_queue_free_lock.data /* @@ -403,6 +415,11 @@ void vm_page_cowfault (vm_page_t); int vm_page_cowsetup(vm_page_t); void vm_page_cowclear (vm_page_t); +void vm_page_lock_KBI(vm_page_t m, const char *file, int line); +void vm_page_unlock_KBI(vm_page_t m, const char *file, int line); +int vm_page_trylock_KBI(vm_page_t m, const char *file, int line); +void vm_page_lock_assert_KBI(vm_page_t m, int a, const char *file, int line); + #ifdef INVARIANTS void vm_page_object_lock_assert(vm_page_t m); #define VM_PAGE_OBJECT_LOCK_ASSERT(m) vm_page_object_lock_assert(m) ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]
On Sun, Nov 6, 2011 at 4:43 AM, Kostik Belousov kostik...@gmail.com wrote: Regarding the _vm_page_lock() vs. vm_page_lock_func(), the mutex.h has a lot of violations in regard of the namespaces, IMO. The __* namespace is reserved for the language implementation, so our freestanding program (kernel) ignores the requirements of the C standard with the names like __mtx_lock_spin(). Using the name _vm_page_lock() is valid, but makes it not unreasonable for other developers to introduce reserved names. So I decided to use the suffixes. vm_map.h locking is free of these violations. I'm pretty sure that when the C standard says, the implementation, they're referring to the compiler and OS it runs on. Which makes the FreeBSD kernel part of the implementation, which is precisely why so many headers have defines that start with __ and then, if certain posix defines are set, also uses non-__ versions of the name. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: vm_page_t related KBI [Was: Re: panic at vm_page_wire with FreeBSD 9.0 Beta 3]
On Sat, Nov 5, 2011 at 7:13 AM, Kostik Belousov kostik...@gmail.com wrote: On Fri, Nov 04, 2011 at 06:03:39PM +0200, Kostik Belousov wrote: Below is the KBI patch after vm_page_bits_t merge is done. Again, I did not spent time converting all in-tree consumers from the (potentially) loadable modules to the new KPI until it is agreed upon. I like my bikeshed orange... I would think a more canonical name would be get/set rather than read/write, especially since these operations aren't reading and writing the page (neither are they getting the page, but at least set is pretty unambiguous). Thanks, matthew diff --git a/sys/nfsclient/nfs_bio.c b/sys/nfsclient/nfs_bio.c index 305c189..7264cd1 100644 --- a/sys/nfsclient/nfs_bio.c +++ b/sys/nfsclient/nfs_bio.c @@ -128,7 +128,7 @@ nfs_getpages(struct vop_getpages_args *ap) * can only occur at the file EOF. */ VM_OBJECT_LOCK(object); - if (pages[ap-a_reqpage]-valid != 0) { + if (vm_page_read_valid(pages[ap-a_reqpage]) != 0) { for (i = 0; i npages; ++i) { if (i != ap-a_reqpage) { vm_page_lock(pages[i]); @@ -198,16 +198,16 @@ nfs_getpages(struct vop_getpages_args *ap) /* * Read operation filled an entire page */ - m-valid = VM_PAGE_BITS_ALL; - KASSERT(m-dirty == 0, + vm_page_write_valid(m, VM_PAGE_BITS_ALL); + KASSERT(vm_page_read_dirty(m) == 0, (nfs_getpages: page %p is dirty, m)); } else if (size toff) { /* * Read operation filled a partial page. */ - m-valid = 0; + vm_page_write_valid(m, 0); vm_page_set_valid(m, 0, size - toff); - KASSERT(m-dirty == 0, + KASSERT(vm_page_read_dirty(m) == 0, (nfs_getpages: page %p is dirty, m)); } else { /* diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c index 389aea5..2f41e70 100644 --- a/sys/vm/vm_page.c +++ b/sys/vm/vm_page.c @@ -2677,6 +2677,66 @@ vm_page_test_dirty(vm_page_t m) vm_page_dirty(m); } +void +vm_page_lock_func(vm_page_t m, const char *file, int line) +{ + +#if LOCK_DEBUG 0 || defined(MUTEX_NOINLINE) + _mtx_lock_flags(vm_page_lockptr(m), 0, file, line); +#else + __mtx_lock(vm_page_lockptr(m), 0, file, line); +#endif +} + +void +vm_page_unlock_func(vm_page_t m, const char *file, int line) +{ + +#if LOCK_DEBUG 0 || defined(MUTEX_NOINLINE) + _mtx_unlock_flags(vm_page_lockptr(m), 0, file, line); +#else + __mtx_unlock(vm_page_lockptr(m), curthread, 0, file, line); +#endif +} + +int +vm_page_trylock_func(vm_page_t m, const char *file, int line) +{ + + return (_mtx_trylock(vm_page_lockptr(m), 0, file, line)); +} + +void +vm_page_lock_assert_func(vm_page_t m, int a, const char *file, int line) +{ + +#ifdef INVARIANTS + _mtx_assert(vm_page_lockptr(m), a, file, line); +#endif +} + +vm_page_bits_t +vm_page_read_dirty_func(vm_page_t m) +{ + + return (m-dirty); +} + +vm_page_bits_t +vm_page_read_valid_func(vm_page_t m) +{ + + return (m-valid); +} + +void +vm_page_write_valid_func(vm_page_t m, vm_page_bits_t v) +{ + + m-valid = v; +} + + int so_zerocp_fullpage = 0; /* diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h index 7099b70..4f8f71e 100644 --- a/sys/vm/vm_page.h +++ b/sys/vm/vm_page.h @@ -218,12 +218,50 @@ extern struct vpglocks pa_lock[]; #define PA_LOCK_ASSERT(pa, a) mtx_assert(PA_LOCKPTR(pa), (a)) +#ifdef KLD_MODULE +#define vm_page_lock(m) vm_page_lock_func((m), LOCK_FILE, LOCK_LINE) +#define vm_page_unlock(m) vm_page_unlock_func((m), LOCK_FILE, LOCK_LINE) +#define vm_page_trylock(m) vm_page_trylock_func((m), LOCK_FILE, LOCK_LINE) +#ifdef INVARIANTS +#define vm_page_lock_assert(m, a) \ + vm_page_lock_assert_func((m), (a), LOCK_FILE, LOCK_LINE) +#else +#define vm_page_lock_assert(m, a) +#endif + +#define vm_page_read_dirty(m) vm_page_read_dirty_func((m)) +#define vm_page_read_valid(m) vm_page_read_valid_func((m)) +#define vm_page_write_valid(m, v) vm_page_write_valid_func((m), (v)) + +#else /* KLD_MODULE */ #define vm_page_lockptr(m) (PA_LOCKPTR(VM_PAGE_TO_PHYS((m #define vm_page_lock(m) mtx_lock(vm_page_lockptr((m))) #define vm_page_unlock(m) mtx_unlock(vm_page_lockptr((m))) #define vm_page_trylock(m) mtx_trylock(vm_page_lockptr((m))) #define vm_page_lock_assert(m, a)
Re: smp_rendezvous runs with interrupts and preemption enabled on unicore systems
On Fri, Oct 28, 2011 at 8:37 AM, Ryan Stone ryst...@gmail.com wrote: I'm seeing issues on a unicore systems running a derivative of FreeBSD 8.2-RELEASE if something calls mem_range_attr_set. It turns out that the root cause is a bug in smp_rendezvous_cpus. The first part of smp_rendezvous_cpus attempts to short-circuit the non-SMP case(note that smp_started is never set to 1 on a unicore system): if (!smp_started) { if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); return; } The problem is that this runs with interrupts enabled, outside of a critical section. My system runs with device_polling enabled with hz set to 2500, so its quite easy to wedge the system by having a thread run mem_range_attr_set. That has to do a smp_rendezvous, and if a timer interrupt happens to go off half-way through the action_func and preempt this thread, the system ends up deadlocked(although once it's wedged, typing at the serial console stands a good chance of unwedging the system. Go figure). I know that smp_rendezvous was reworked substantially on HEAD, but by inspection it looks like the bug is still present, as the short-circuit behaviour is still there. I am not entirely sure of the best way to fix this. Is it as simple as doing a spinlock_enter before setup_func and a spinlock_exit after teardown_func? It seems to boot fine, but I'm not at all confident that I understand the nuances of smp_rendezvous to be sure that there aren't side effects that I don't know about. Looks like Max didn't have time to upstream this fix: struct mtx smp_ipi_mtx; +MTX_SYSINIT(smp_ipi_mtx, smp_ipi_mtx, smp rendezvous, MTX_SPIN); ... static void mp_start(void *dummy) { - mtx_init(smp_ipi_mtx, smp rendezvous, NULL, MTX_SPIN); ... if (!smp_started) { + mtx_lock_spin(smp_ipi_mtx); if (setup_func != NULL) setup_func(arg); if (action_func != NULL) action_func(arg); if (teardown_func != NULL) teardown_func(arg); + mtx_unlock_spin(smp_ipi_mtx); return; } Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces
On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper yaneg...@gmail.com wrote: When loading if_alc as a module on my netbook and running /etc/rc.d/netif restart, I can deterministically panic my netbook with the following message: ) at _bus_dmamap_sync+0x51 alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98 soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401 kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7 ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118 syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f syscall(e6ca3d28) at syscall+0x2e Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (54kernel trap 12 with interrupts disabled Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked @ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362 KDB: stack backtrace: db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at db_trace_self_wrapper+0x26 kdb_backtrace(93a,0,,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1 trap(e6ca32dc) at trap+0x15a calltrap() at calltrap+0x6 I tried to track down what the exact issue was, but I got lost (the locking sort of looks ok to me, but I'm still not an expert with mutex(9)). I still have the vmcore and can provide more helpful details when requested. The locking itself is almost certainly fine. The error message is not very helpful, but what went wrong was the page fault. You just happen to panic on a witness warning before vm_fault can panic due to a bad address. The alc(4) maintainer would probably like info on the trap (line of code and where the bad pointer came from). Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: variable init
On Mon, Aug 1, 2011 at 12:09 AM, Andrew Thompson a...@fud.org.nz wrote: Hi, Looking at, http://www.freebsd.org/cgi/query-pr.cgi?pr=159345 The lock is a global variable, declared as static struct mtx lagg_list_mtx; I would expect this to be zeroed memory, is this guaranteed? It depends. :-) The C standard mandates that this storage be zero. However, seeing as we're the operating system, it's our job to do it. For an application program the zero'd storage is zeroed by the loader. For the operating system, exactly when it's initialized is an architectural decision. I believe that the zeroed storage of FreeBSD is cleared by the boot loader before it transfers control to the operating system. (For reference, on AIX the storage is cleared by the virtual memory management code during bringup of the VMM leading to interesting bugs when some variables used in early boot weren't initialized). Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sgl (or uio) in struct buf / bio
On Mon, Jul 18, 2011 at 3:15 PM, Joel Jacobson jjacob...@panasas.com wrote: a few years ago (2008), i asked about the prospect of getting an sgl passed through struct buf / bio, and was referred to the jhb_bio branch. i never did figure out how to view it, though, and it fell off my radar for a while. did anything ever come of it, and if so/not, are there any plans for switching to something like using a uio instead of a pointer/length pair? To the best of my knowledge, this is still something many of us would like to see done, and no one has had any time to work on. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Heavy I/O blocks FreeBSD box for several seconds
On Mon, Jul 11, 2011 at 4:00 PM, Ali Mashtizadeh mashtiza...@gmail.com wrote: Maybe someone can setup something like reviewboard [1] for developers to use. This may also help folks who want to keep abreast of the current work in a particular subsystem or get involved into the development process more. At my company we use reviews and it seems to help the catch some bugs and help new engineers ramp up faster. [1] http://www.reviewboard.org/ FreeBSD development is completely open; anyone can sign up for the svn-src-* mailing list they are interested in, including svn-src-head@. Code reviews are plenty as well; just check the list archives for discussion of bugs, poor design choices and unintended effects. But most reviews are silent and after-the-fact by looking at the list mail. It's a system that seems to be working just fine for the FreeBSD project so far. This isn't a job for most anyone; it's a volunteer project and so anything that raises the barrier to getting work done for the project should be looked at with skepticism. Is there a specific deficit that you want to address? Thanks, matthew On Mon, Jul 11, 2011 at 2:48 PM, Arnaud Lacombe lacom...@gmail.com wrote: Hi, On Mon, Jul 11, 2011 at 5:14 PM, Andriy Gapon a...@freebsd.org wrote: on 11/07/2011 23:33 Arnaud Lacombe said the following: For the record, I would like to see enforced public review for _every_ patch *before* it is checked in, as a strong rule. gcc system is particularly interesting. But it is not likely to happen in FreeBSD where FreeBSD committers are clearly more free than other at checking-in un-publicly-reviewed stuff (especially _bad_ stuff). This would of course apply even to long-time committers, no matter how it hurt their ego (which I definitively do not care about). Have you just volunteered to review all of the patches that I would like to commit? And are you prepared to take responsibility for quality of your reviews? I am sure that other developers will gladly accept your offer too. _No-one_ can do all the reviews, especially not me (on a purely technical level). ACK must come from subsystem maintainers. Having public review would allow the community review, which is now just not possible today. As about patches from the maintainer, they might be committed without his approval, but still sent for review. If a maintainer goes outside his area, he has to get approval from the other subsystem maintainer. - Arnaud ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: kern.sync_on_panic
On Mon, Jun 27, 2011 at 3:08 AM, Andriy Gapon a...@freebsd.org wrote: on 26/06/2011 08:51 Warner Losh said the following: On Jun 25, 2011, at 8:49 AM, Andriy Gapon wrote: Does anybody actually use kern.sync_on_panic tunable/sysctl? If yes, then in what circumstances do you need it? That is, why any other alternative doesn't work for you? Like: 1. remounting filesystems R/O before panic if you knowingly provoke it for testing 2. using netboot for your test system 3. using su+j, gjournal or a different filesystem altogether 4. using fsck after reboot It seems to me that syncing filesystems in panic context is an adventure. And it may become even more of an adventure if we introduce code that completely stops scheduler in and after panic. I've used it in the past when I was developing a device driver that was in the late stages of maturing. Since all the panics in the system were when the driver dereferenced NULL in that driver, sync was safe because all the data structures were sane except the aforementioned driver. (1) It was a production system, and everything that could be was already mounted r/w. However, some small, but every critical, amount of data was still r/w and it was very important to not lose this data. Production here likely should be in quotes, because it was in the late stages of testing/validation. The problem was without this sometimes the saved state of the GPS receiver and other hardware would wind up being zero, which meant that we'd have to do a cold start which cost us a few hours of time. At the time I was doing this, we saw zero files a couple times a day without this turned on. (2) netbooting wasn't an option since we were qualifying a non-netbooting system. (3) these weren't available at the time, but the goal was to prevent data loss, not to necessarily have to avoid fsck on boot. (4) Data loss without it. Now, I'll be the first to admit this has been a few years, and I haven't done a fresh evaluation to see if things are still safe. I'll also be the first to admit that this was a useful debugging setting late in development, and not in production. I'm also the first to admit this isn't what I'd call a very wide-spread case. But it did come in very handy when chasing a few bugs to be able to do 10 panic/reboot cycles an hour rather than 2 a day. A fine enough use-case for me. I guess the problem ultimately boiled down to peculiarities of UFS behavior, but still... However, please be aware that sync_on_panic might get broken when/if we start stopping scheduler in panic. The entirety of the sync code should be a subroutine in vfs_bio.c so the 'buf' variable is static to the file. At that point it would be reasonable to explicitly call it at the beginning of panic(9) for the sync-on-panic case, either before IPIing the other CPUs, or at least before entering the critical section that prevents the scheduler from running. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [head tinderbox] failure on ia64/ia64
On Mon, Jun 20, 2011 at 11:40 AM, Garrett Cooper yaneg...@gmail.com wrote: On Mon, Jun 20, 2011 at 11:37 AM, FreeBSD Tinderbox tinder...@freebsd.org wrote: TB --- 2011-06-20 17:09:28 - tinderbox 2.7 running on freebsd-current.sentex.ca TB --- 2011-06-20 17:09:28 - starting HEAD tinderbox run for ia64/ia64 TB --- 2011-06-20 17:09:28 - cleaning the object tree TB --- 2011-06-20 17:09:40 - cvsupping the source tree TB --- 2011-06-20 17:09:40 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/HEAD/ia64/ia64/supfile TB --- 2011-06-20 17:10:27 - building world TB --- 2011-06-20 17:10:27 - MAKEOBJDIRPREFIX=/obj TB --- 2011-06-20 17:10:27 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2011-06-20 17:10:27 - TARGET=ia64 TB --- 2011-06-20 17:10:27 - TARGET_ARCH=ia64 TB --- 2011-06-20 17:10:27 - TZ=UTC TB --- 2011-06-20 17:10:27 - __MAKE_CONF=/dev/null TB --- 2011-06-20 17:10:27 - cd /src TB --- 2011-06-20 17:10:27 - /usr/bin/make -B buildworld World build started on Mon Jun 20 17:10:28 UTC 2011 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything World build completed on Mon Jun 20 18:35:55 UTC 2011 TB --- 2011-06-20 18:35:56 - generating LINT kernel config TB --- 2011-06-20 18:35:56 - cd /src/sys/ia64/conf TB --- 2011-06-20 18:35:56 - /usr/bin/make -B LINT TB --- 2011-06-20 18:35:56 - building LINT kernel TB --- 2011-06-20 18:35:56 - MAKEOBJDIRPREFIX=/obj TB --- 2011-06-20 18:35:56 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2011-06-20 18:35:56 - TARGET=ia64 TB --- 2011-06-20 18:35:56 - TARGET_ARCH=ia64 TB --- 2011-06-20 18:35:56 - TZ=UTC TB --- 2011-06-20 18:35:56 - __MAKE_CONF=/dev/null TB --- 2011-06-20 18:35:56 - cd /src TB --- 2011-06-20 18:35:56 - /usr/bin/make -B buildkernel KERNCONF=LINT Kernel build for LINT started on Mon Jun 20 18:35:56 UTC 2011 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies [...] awk -f /src/sys/tools/makeobjops.awk /src/sys/kgssapi/kgss_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/libkern/iconv_converter_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/opencrypto/cryptodev_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/dev/acpica/acpi_if.m -h rm -f .newdep /usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES | MKDEP_CPP=cc -E CC=cc xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -nostdinc -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/dev/ath -I/src/sys/dev/ath/ath_hal -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -I/src/sys/gnu/fs/xfs/FreeBSD -I/src/sys/gnu/fs/xfs/FreeBSD/support -I/src/sys/gnu/fs/xfs -I/src/sys/dev/cxgb -I/src/sys/dev/cxgbe -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=15000 --param inline-unit-growth=100 --param large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 -mfixed-range=f32-f127 -fpic -ffreestanding /src/sys/vm/vm_page.c:2356:2: error: #error PAGE_SIZE is not supported. mkdep: compile failed *** Error code 1 Stop in /obj/ia64.ia64/src/sys/LINT. *** Error code 1 Stop in /src. *** Error code 1 r223307 broke tinderbox on ia64. What should the PAGE_SIZE be for that architecture? The LINT configuration file uses 115, or 32768. It should just be a matter of adding a case and using atomic_clear_long. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: best way to do FreeBSD kernel/userland development on OSX ?
On Mon, Jun 20, 2011 at 1:53 PM, Luigi Rizzo ri...@iet.unipi.it wrote: i recently replaced my laptop with a mac, which opens the problem on how do i do FreeBSD development (including kernel work) on it. the option i am trying now is running a freebsd VM in virtualbox, but it is slightly slow and probably energy hungry. I'd like to keep the svn checkut outside the disk image but unfortunately the default filesystem on osx is case-insensitive so i can't do that -- unless someone shows me how to change the filesystem conventions. suggestions anyone ? With the help of google I did something for this a few months ago. You can create a new disk image, and use case-sensitive for this image. I created mine as something like 8GB but thinly provisioned so it didn't take up all the space until used. I use VMWare for testing code changes ($WORK paid for a license for me), but it's a CPU/battery hog, so I'm not wild about it, but it does work. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: `hw.acpi.thermal.tz0.temperature' disappeared
On Tue, Apr 19, 2011 at 7:48 AM, Taku YAMAMOTO t...@tackymt.homeip.net wrote: This patch works. FreeBSD 9.0-CURRENT #1: Tue Apr 19 10:52:58 MSD 2011 # sysctl hw.acpi.thermal hw.acpi.thermal.min_runtime: 0 hw.acpi.thermal.polling_rate: 10 hw.acpi.thermal.user_override: 0 hw.acpi.thermal.tz0.temperature: 67.5C hw.acpi.thermal.tz0.active: -1 hw.acpi.thermal.tz0.passive_cooling: 0 hw.acpi.thermal.tz0.thermal_flags: 0 hw.acpi.thermal.tz0._PSV: -1 hw.acpi.thermal.tz0._HOT: -1 hw.acpi.thermal.tz0._CRT: 90.0C hw.acpi.thermal.tz0._ACx: -1 hw.acpi.thermal.tz0._TC1: -1 hw.acpi.thermal.tz0._TC2: -1 hw.acpi.thermal.tz0._TSP: -1 We should have 10 _ACx values like this: hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 D'oh, I didn't read the original source carefully enough. Can someone test this patch? As an aside, what kind of h/w do I need for hw.acpi.thermal to show up? I don't see it on my Dell desktop... Thanks, matthew diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c index b64d8df..7226b6c 100644 --- a/sys/dev/acpica/acpi_thermal.c +++ b/sys/dev/acpica/acpi_thermal.c @@ -288,7 +288,8 @@ acpi_tz_attach(device_t dev) critical temp setpoint (shutdown now)); SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, _ACx, CTLTYPE_INT | CTLFLAG_RD, - sc-tz_zone.ac, 0, sysctl_handle_int, IK, ); + sc-tz_zone.ac, sizeof(sc-tz_zone.ac), + sysctl_handle_opaque, IK, ); SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, _TC1, CTLTYPE_INT | CTLFLAG_RW, sc, offsetof(struct acpi_tz_softc, tz_zone.tc1), ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: `hw.acpi.thermal.tz0.temperature' disappeared
On Tue, Apr 19, 2011 at 11:02 AM, Doug Barton do...@freebsd.org wrote: On 4/19/2011 9:44 AM, m...@freebsd.org wrote: As an aside, what kind of h/w do I need for hw.acpi.thermal to show up? I don't see it on my Dell desktop... The hardware is likely to be there for any reasonably modern Dell desktop. Do you have coretemp loaded? I didn't (I had assumed since the relevant sysctls are defined in acpi_thermal.c that having acpi was sufficient), so I just tried that, but still no hw.acpi.thermal node. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: `hw.acpi.thermal.tz0.temperature' disappeared
On Mon, Apr 18, 2011 at 5:05 AM, John Baldwin j...@freebsd.org wrote: On Saturday, April 16, 2011 11:51:22 am Nick Ulen wrote: FreeBSD was successfully upgraded. uname -v FreeBSD 9.0-CURRENT #0: Mon Apr 11 18:14:36 MSD 2011 root@test:/usr/obj/usr/src/sys/GENERIC Everything seems to be working well except `hw.acpi.thermal.tz0.temperature' disappeared from the list of available sysctl variables. sysctl hw.acpi.thermal. hw.acpi.thermal.min_runtime: 0 hw.acpi.thermal.polling_rate: 10 hw.acpi.thermal.user_override: 0 hw.acpi.thermal.tz0.active: -1 hw.acpi.thermal.tz0.passive_cooling: 0 hw.acpi.thermal.tz0.thermal_flags: 0 hw.acpi.thermal.tz0._PSV: -1 hw.acpi.thermal.tz0._HOT: -1 hw.acpi.thermal.tz0._CRT: 90.0C hw.acpi.thermal.tz0._TC1: -1 hw.acpi.thermal.tz0._TC2: -1 hw.acpi.thermal.tz0._TSP: -1 output from: sysctl -a |grep acpi is here: https://privatepaste.com/ca08d4658b I suspect it is still there, but sysctl doesn't know how to display it anymore. This is probably due to the changes with formatting of sysctl information. mdf@ is probably responsible in that case. SYSCTL_ADD_OPAQUE(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, temperature, CTLFLAG_RD, sc-tz_temperature, sizeof(sc-tz_temperature), IK, current thermal zone temperature); I don't seem to have a hw.acpi.thermal sysctl node on my box. Can someone please try this patch? Thanks, matthew diff --git a/sys/dev/acpica/acpi_thermal.c b/sys/dev/acpica/acpi_thermal.c index 515a742..b64d8df 100644 --- a/sys/dev/acpica/acpi_thermal.c +++ b/sys/dev/acpica/acpi_thermal.c @@ -257,10 +257,10 @@ acpi_tz_attach(device_t dev) sc-tz_sysctl_tree = SYSCTL_ADD_NODE(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(acpi_tz_sysctl_tree), OID_AUTO, oidname, CTLFLAG_RD, 0, ); -SYSCTL_ADD_OPAQUE(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), - OID_AUTO, temperature, CTLFLAG_RD, sc-tz_temperature, - sizeof(sc-tz_temperature), IK, - current thermal zone temperature); +SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), + OID_AUTO, temperature, CTLTYPE_INT | CTLFLAG_RD, + sc-tz_temperature, 0, sysctl_handle_int, + IK, current thermal zone temperature); SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, active, CTLTYPE_INT | CTLFLAG_RW, sc, 0, acpi_tz_active_sysctl, I, cooling is active); @@ -286,9 +286,9 @@ acpi_tz_attach(device_t dev) sc, offsetof(struct acpi_tz_softc, tz_zone.crt), acpi_tz_temp_sysctl, IK, critical temp setpoint (shutdown now)); -SYSCTL_ADD_OPAQUE(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), - OID_AUTO, _ACx, CTLFLAG_RD, sc-tz_zone.ac, - sizeof(sc-tz_zone.ac), IK, ); +SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), + OID_AUTO, _ACx, CTLTYPE_INT | CTLFLAG_RD, + sc-tz_zone.ac, 0, sysctl_handle_int, IK, ); SYSCTL_ADD_PROC(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, _TC1, CTLTYPE_INT | CTLFLAG_RW, sc, offsetof(struct acpi_tz_softc, tz_zone.tc1), ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: `hw.acpi.thermal.tz0.temperature' disappeared
On Mon, Apr 18, 2011 at 5:05 AM, John Baldwin j...@freebsd.org wrote: On Saturday, April 16, 2011 11:51:22 am Nick Ulen wrote: FreeBSD was successfully upgraded. uname -v FreeBSD 9.0-CURRENT #0: Mon Apr 11 18:14:36 MSD 2011 root@test:/usr/obj/usr/src/sys/GENERIC Everything seems to be working well except `hw.acpi.thermal.tz0.temperature' disappeared from the list of available sysctl variables. sysctl hw.acpi.thermal. hw.acpi.thermal.min_runtime: 0 hw.acpi.thermal.polling_rate: 10 hw.acpi.thermal.user_override: 0 hw.acpi.thermal.tz0.active: -1 hw.acpi.thermal.tz0.passive_cooling: 0 hw.acpi.thermal.tz0.thermal_flags: 0 hw.acpi.thermal.tz0._PSV: -1 hw.acpi.thermal.tz0._HOT: -1 hw.acpi.thermal.tz0._CRT: 90.0C hw.acpi.thermal.tz0._TC1: -1 hw.acpi.thermal.tz0._TC2: -1 hw.acpi.thermal.tz0._TSP: -1 output from: sysctl -a |grep acpi is here: https://privatepaste.com/ca08d4658b I suspect it is still there, but sysctl doesn't know how to display it anymore. This is probably due to the changes with formatting of sysctl information. mdf@ is probably responsible in that case. SYSCTL_ADD_OPAQUE(sc-tz_sysctl_ctx, SYSCTL_CHILDREN(sc-tz_sysctl_tree), OID_AUTO, temperature, CTLFLAG_RD, sc-tz_temperature, sizeof(sc-tz_temperature), IK, current thermal zone temperature); Oops, yes. The change in r217586 required the type to be set to CTLTYPE_INT to print as format IK. My grep of the source tree shows that acpi_thermal.c is the only affected source file that was using OPAQUE. I'm testing out the fix now. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Panic with mca trap
On Thu, Feb 3, 2011 at 7:09 AM, John Baldwin j...@freebsd.org wrote: On Thursday, February 03, 2011 8:05:31 am John Baldwin wrote: On Tuesday, February 01, 2011 11:58:12 am m...@freebsd.org wrote: On a piece of hardware trying to verify basic build tests, we got an MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS interaction. panic @ time 1296563157.510, thread 0xff000554: blockable sleep lock (sleep mutex) 128 @ /build/mnt/src/sys/vm/uma_core.c:1872 Stack: -- kernel:witness_checkorder+0x7a2 kernel:_mtx_lock_flags+0x81 kernel:uma_zalloc_arg+0x256 kernel:malloc+0xc5 kernel:mca_record_entry+0x30 kernel:mca_scan+0xc9 kernel:mca_intr+0x79 kernel:trap+0x30b kernel:witness_checkorder+0x66 kernel:_mtx_lock_spin_flags+0xa4 kernel:witness_checkorder+0x2a8 kernel:_mtx_lock_spin_flags+0xa4 kernel:tdq_idled+0xe8 kernel:sched_idletd+0x5b kernel:fork_exit+0x9b That's this bit of code in uma_zalloc_arg(): #ifdef INVARIANTS ZONE_LOCK(zone); uma_dbg_alloc(zone, NULL, item); ZONE_UNLOCK(zone); #endif I don't know uma(9) well enough to know the best workaround. Clearly there are times we can be in uma_zalloc_arg() and taking a regular mutex is not acceptable. But what to do for the uma_dbg_free() call so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT or td_critnest 0 or both is outside my current knowledge. I don't expect we'll see this panic again any time soon, but it would be nice to fix the story for WITNESS of when an M_NOWAIT allocation can be done. Actually, this is more my fault. The machine check happened while the interrupted thread was already in a critical section (hence the WITNESS complaint). However, it really isn't correct to be calling malloc() from an arbitrary exception handler, especially one like MC# which can fire pretty much anywhere. I think instead that we should use malloc() when polling the machine check banks, but keep a pre-allocated pool of structures for use with MC# exceptions and CMC interrupts and replenish the pool via an asynchronous task. This is what I'm thinking: Index: mca.c === --- mca.c (revision 218221) +++ mca.c (working copy) @@ -85,6 +85,7 @@ static MALLOC_DEFINE(M_MCA, MCA, Machine Check Architecture); static int mca_count; /* Number of records stored. */ +static int mca_banks; /* Number of per-CPU register banks. */ SYSCTL_NODE(_hw, OID_AUTO, mca, CTLFLAG_RD, NULL, Machine Check Architecture); @@ -102,6 +103,8 @@ SYSCTL_INT(_hw_mca, OID_AUTO, erratum383, CTLFLAG_RD, workaround_erratum383, 0, Is the workaround for Erratum 383 on AMD Family 10h processors enabled?); +static STAILQ_HEAD(, mca_internal) mca_freelist; +static int mca_freecount; static STAILQ_HEAD(, mca_internal) mca_records; static struct callout mca_timer; static int mca_ticks = 3600; /* Check hourly by default. */ @@ -111,7 +114,6 @@ #ifdef DEV_APIC static struct cmc_state **cmc_state; /* Indexed by cpuid, bank */ -static int cmc_banks; static int cmc_throttle = 60; /* Time in seconds to throttle CMCI. */ #endif @@ -415,21 +417,51 @@ return (1); } -static void __nonnull(1) -mca_record_entry(const struct mca_record *record) +static void +mca_fill_freelist(void) { struct mca_internal *rec; + int desired; - rec = malloc(sizeof(*rec), M_MCA, M_NOWAIT); - if (rec == NULL) { - printf(MCA: Unable to allocate space for an event.\n); - mca_log(record); - return; + /* + * Ensure we have at least one record for each bank and one + * record per CPU. + */ + desired = imax(mp_ncpus, mca_banks); + mtx_lock_spin(mca_lock); + while (desired mca_freecount) { + mtx_unlock_spin(mca_lock); + rec = malloc(sizeof(*rec), M_MCA, M_WAITOK); + mtx_lock_spin(mca_lock); + STAILQ_INSERT_TAIL(mca_freelist, rec, link); + mca_freecount++; } + mtx_unlock_spin(mca_lock); +} +static void __nonnull(2) +mca_record_entry(enum scan_mode mode, const struct mca_record *record) +{ + struct mca_internal *rec; + + if (mode == POLLED) { + rec = malloc(sizeof(*rec), M_MCA, M_WAITOK); + mtx_lock_spin(mca_lock); + } else { + mtx_lock_spin(mca_lock); + rec = STAILQ_FIRST(mca_freelist); + if (rec == NULL) { + mtx_unlock_spin(mca_lock); + printf(MCA: Unable to allocate space for an event.\n); + mca_log(record); + return; +
Panic with mca trap
On a piece of hardware trying to verify basic build tests, we got an MCA exception that then panic'd the kernel due to WITNESS/INVARIANTS interaction. panic @ time 1296563157.510, thread 0xff000554: blockable sleep lock (sleep mutex) 128 @ /build/mnt/src/sys/vm/uma_core.c:1872 Stack: -- kernel:witness_checkorder+0x7a2 kernel:_mtx_lock_flags+0x81 kernel:uma_zalloc_arg+0x256 kernel:malloc+0xc5 kernel:mca_record_entry+0x30 kernel:mca_scan+0xc9 kernel:mca_intr+0x79 kernel:trap+0x30b kernel:witness_checkorder+0x66 kernel:_mtx_lock_spin_flags+0xa4 kernel:witness_checkorder+0x2a8 kernel:_mtx_lock_spin_flags+0xa4 kernel:tdq_idled+0xe8 kernel:sched_idletd+0x5b kernel:fork_exit+0x9b That's this bit of code in uma_zalloc_arg(): #ifdef INVARIANTS ZONE_LOCK(zone); uma_dbg_alloc(zone, NULL, item); ZONE_UNLOCK(zone); #endif I don't know uma(9) well enough to know the best workaround. Clearly there are times we can be in uma_zalloc_arg() and taking a regular mutex is not acceptable. But what to do for the uma_dbg_free() call so it's happy, and whether to guard taking the ZONE lock with M_NOWAIT or td_critnest 0 or both is outside my current knowledge. I don't expect we'll see this panic again any time soon, but it would be nice to fix the story for WITNESS of when an M_NOWAIT allocation can be done. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sleep bug in taskqueue(9)
On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky hsela...@c2i.net wrote: On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote: It looks to me like taskqueue_drain(taskqueue_thread, foo) will not correctly detect whether or not a task is currently running. The check is against a field in the taskqueue struct, but for the taskqueue_thread queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. I have not seen any problem with the code as-is in actual use, so this is purely an inspection bug. The following patch should fix the problem. Because it changes the size of struct task I'm not sure if it would be suitable for MFC. 1) The u_char is going to leave a hole in that structure on ARM platforms for example. 2) The existing taskqueue implementation also has a missing check for the pending count wrapping to zero. I.E. it should stick at 0x and not wrap to 0. This commit mail is rather old, and this fix was incorrect, because the task cannot be referenced after it has been run. Some task handlers will free the task as part of the handler. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sleep bug in taskqueue(9)
On Fri, Nov 12, 2010 at 6:23 AM, Hans Petter Selasky hsela...@c2i.net wrote: On Friday 12 November 2010 15:18:46 m...@freebsd.org wrote: On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky hsela...@c2i.net wrote: On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote: It looks to me like taskqueue_drain(taskqueue_thread, foo) will not correctly detect whether or not a task is currently running. The check is against a field in the taskqueue struct, but for the taskqueue_thread queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. I have not seen any problem with the code as-is in actual use, so this is purely an inspection bug. The following patch should fix the problem. Because it changes the size of struct task I'm not sure if it would be suitable for MFC. 1) The u_char is going to leave a hole in that structure on ARM platforms for example. 2) The existing taskqueue implementation also has a missing check for the pending count wrapping to zero. I.E. it should stick at 0x and not wrap to 0. This commit mail is rather old, and this fix was incorrect, because the task cannot be referenced after it has been run. Some task handlers will free the task as part of the handler. Ok, maybe the e-mail got stuck somewhere. Have you fixed the above mentioned issues in a newer patch? If you look at the file history for subr_taskqueue.c: http://svn.freebsd.org/viewvc/base/head/sys/kern/subr_taskqueue.c You will see quite a few commits by me. The most recent relating to detecting if a task is running is being MFC'd today: Revision 213813 - (view) (annotate) - [select for diffs] Modified Wed Oct 13 22:59:04 2010 UTC (4 weeks, 1 day ago) by mdf File length: 10831 byte(s) Diff to previous 213739 Use a safer mechanism for determining if a task is currently running, that does not rely on the lifetime of pointers being the same. This also restores the task KBI. Suggested by: jhb MFC after: 1 month Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sleep bug in taskqueue(9)
On Fri, Nov 12, 2010 at 12:25 PM, Hans Petter Selasky hsela...@c2i.net wrote: On Friday 12 November 2010 17:38:38 m...@freebsd.org wrote: On Fri, Nov 12, 2010 at 6:23 AM, Hans Petter Selasky hsela...@c2i.net wrote: On Friday 12 November 2010 15:18:46 m...@freebsd.org wrote: On Fri, Nov 12, 2010 at 12:56 AM, Hans Petter Selasky hsela...@c2i.net wrote: On Thursday 29 April 2010 01:59:58 Matthew Fleming wrote: It looks to me like taskqueue_drain(taskqueue_thread, foo) will not correctly detect whether or not a task is currently running. The check is against a field in the taskqueue struct, but for the taskqueue_thread queue with more than one thread, multiple threads can simultaneously be running a task, thus stomping over the tq_running field. I have not seen any problem with the code as-is in actual use, so this is purely an inspection bug. The following patch should fix the problem. Because it changes the size of struct task I'm not sure if it would be suitable for MFC. 1) The u_char is going to leave a hole in that structure on ARM platforms for example. 2) The existing taskqueue implementation also has a missing check for the pending count wrapping to zero. I.E. it should stick at 0x and not wrap to 0. This commit mail is rather old, and this fix was incorrect, because the task cannot be referenced after it has been run. Some task handlers will free the task as part of the handler. Ok, maybe the e-mail got stuck somewhere. Have you fixed the above mentioned issues in a newer patch? If you look at the file history for subr_taskqueue.c: http://svn.freebsd.org/viewvc/base/head/sys/kern/subr_taskqueue.c You will see quite a few commits by me. The most recent relating to detecting if a task is running is being MFC'd today: Yes, and I see that this code needs an overflow check, which is one of the issues still not fixed: You keep bringing this up. It is not a new issue. It is not a bug in any of the patches. It is extremely unlikely that a task will be queued 65536 times before execution. It is more worthy of an assert rather than a check, because if a task is enqueued that many times without being run then there's likely a stuck task in the queue. The patch you posted will lie as well, so I would not consider it sufficient if someone wanted to address the issue. Thanks, matthew Before: /* * Count multiple enqueues. */ if (task-ta_pending) { task-ta_pending++; TQ_UNLOCK(queue); return 0; } After: /* * Count multiple enqueues. */ if (task-ta_pending) { if (task-ta_pending != 0x) task-ta_pending++; TQ_UNLOCK(queue); return 0; } Else the ta_pending can wrap to zero and the code will not do what it announces it does. --HPS ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
MTX_DEF versus MTX_SPIN
It's not clear to me from the man pages (perhaps I didn't look at the right one?) in which environments I need a spinlock. For example, I wouldn't think it's safe to use a MTX_DEF in a hard interrupt handler (i.e one that was registered with BUS_SETUP_INTR), but I see some code lying around here that does it and nothing I'm aware of has broken. Perhaps this comes to me still not understanding exactly how interrupts work on FreeBSD. If I capture the stack in a hard interrupt, it looks something like: #0 0xff87b07f43b5 at rnv_hard_intr+0x35 #1 0x8026e7ce at ithread_execute_handlers+0x9e #2 0x8026ead0 at ithread_loop+0x70 #3 0x8026b84c at fork_exit+0x9c #4 0x804a7f7e at fork_trampoline+0xe And there the stack ends. From my perspective, this doesn't look like anything was actually interrupted. By way of explaining what I mean, on AIX we defined 10 levels of software interrupt, and we trained the kernel debugger to understand the save frames that were acquired on interrupt to print a full stack. So a full stack dump might show something like a few frames from one interrupt handler, then a save area, then a frames from a lower priority interrupt, then another save area, then the base-level stack. In this kind of programming environment, it was important to know at what interrupt level your handler would execute, so that locks acquired to synchronize between the top-half and bottom-half were acquired with interupts disabled to the same level. So, back to my question. Is it safe to take a MTX_DEF in a hard interrupt? What about a soft interrupt? I have to assume it's okay in a soft-interrupt context (swi_sched, callout, etc.), since softclock() will acquire a MTX_DEF on behalf of a callout. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: MTX_DEF versus MTX_SPIN
On Wed, Nov 3, 2010 at 9:42 AM, Andriy Gapon a...@icyb.net.ua wrote: on 03/11/2010 18:27 m...@freebsd.org said the following: It's not clear to me from the man pages (perhaps I didn't look at the right one?) in which environments I need a spinlock. For example, I wouldn't think it's safe to use a MTX_DEF in a hard interrupt handler (i.e one that was registered with BUS_SETUP_INTR), but I see some code lying around here that does it and nothing I'm aware of has broken. Such a handler runs in an interrupt thread. The harder interrupt handler is called interrupt filter in FreeBSD terminology. I think that it was formerly known as fast interrupt. So a MTX_DEF is okay in that environment? What would best practices be considered for what code should be run in the interrupt handler versus a soft interrupt? In this case the kinds of things we have to do at some level of interrupt are: - handle a heartbeat interrupt from firmware a few times a second - get a DMA completion interrupt (completely handling this requires calling biodone on all the associated bios) - receive an ECC interrupt (this requires reading registers off the card for details) At the moment we're on stable/7, but we will be migrating the code base to something more recent in another year or so, if that affects the answer. Is there any documentation on best practices for writing a FreeBSD driver? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
kldload of compressed modules
I'm trying to add support for kldload of compressed modules; they work at boot time but not at runtime. I'm having trouble uncompressing the module in the kernel to try processing it. The difficulty is several-fold: 1) gzip(1) doesn't seem to produce a compressed image that net/zlib.c understands. It gets hung up on the first byte, since it's not 0x8 indicating Z_DEFLATED. I tried setting next_in to the input buffer + 2 since that is the Z_DEFLATED marker, but it dies one byte later with the incorrect header check failure. 2) I was using inflate in kern/inflate.c wrong earlier (not skipping the first 10 bytes + comment), but even so that code and the net/zlib code aren't compatible, since they both define an inflate() function that takes differing numbers of arguments, etc. And link_elf.c already can include net/zlib.h when DDB_CTF is set, so that would conflict with sys/inflate.h. It's likely that kern/inflate.c is what I want, since it's the code the boot loader uses, but I'm annoyed that it's only available with device gzip (I can change this, of course) and that it's incompatible with what seems to be the somewhat standard compress/decompress code in net/zlib.c. All of this is on stable/7 at the moment, though if I had something that worked I want to commit to CURRENT and MFC. So, what's the basic problem? Should net/zlib support this gzipped file? Should I have compressed with some other user-space utility? Should I add a new link_elf_compress.c file to support compressed images (and use sys/inflate.h instead of net/zlib.h) instead of trying to add it to the existing link_elf_load_file() functions? For reference, here's the beginning of the compressed module (on amd64): # od -h mnvf.ko.gz | less 000 8b1f0808fd2a4cbe03006e6d66766b2e 020 006f7ded740fd554f7b5484df8426613 040 03404401182e28200484120834254ca3 060 d198440901213a51924c190932496663 100 04261a2a8c9a9c65e6c6d6d9fb5a3cb4 120 5adb9ffa5fb60979362d4b586b42a529 140 54a852ad72ada0c7c5467f886f207def 160 99ce73b947336b7d7df5adeb7d6fb8b3 200 7b9c677e7d9fdef69ce7cf7d9b9fef73 220 3ab652ca525328c637fc594aa4a95840 240 8b304bc5de3979fff625552b0396a95d 260 ad92224c74dafe4546fb626528a8fa66 300 f31fc06597b57be3cd2b6109264197c3 320 26674be6f9e56ff6e8624dfe84e64d50 340 1251914205dd7d57f912c23606b525ae 360 d570b5c2092ab54d9e21d70b7e22628f 400 80cc8eabf3c7bae1df8206df6ad76cb8 420 2d706b865c1dc39745b5abc7774292bf 440 c28765d434babe61e15b90da46440f2b Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
uma_zfree(NULL) is broken
There's explicit protection for free(NULL, M_FOO), but uma_zfree(zone, NULL) will put NULL in the local bucket and then probably return it later from a uma_zalloc call. Obviously it's not a good idea to call uma_zfree(9) on NULL, but in this case it's an easy mistake to make when e.g. converting a set of malloc(9)/free(9) uses into uma(9). So is the right thing to allow a uma_zfree(NULL) and silently succeed, like for free(9)? That would be my guess, but I'm open to alternatives. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: MAXCPU preparations
On Mon, Sep 27, 2010 at 9:21 AM, Sean Bruno sean...@yahoo-inc.com wrote: On Mon, 2010-09-27 at 08:53 -0700, Julian Elischer wrote: On 9/27/10 8:26 AM, Sean Bruno wrote: Does this look like an appropriate modification to libmemstat? Sean //depot/yahoo/ybsd_7/src/lib/libmemstat/memstat.h#4 - /home/seanbru/ybsd_7/src/lib/libmemstat/memstat.h @@ -28,12 +28,13 @@ #ifndef _MEMSTAT_H_ #define _MEMSTAT_H_ +#includesys/param.h /* * Number of CPU slots in library-internal data structures. This should be * at least the value of MAXCPU from param.h. */ -#define MEMSTAT_MAXCPU 64 +#define MEMSTAT_MAXCPU MAXCPU /* defined in sys/${ARCH}/include/param.h */ wouldn't it be better to do a sysctlbyname() and use the real value for the system? That was my initial thought (as prodded by scottl and peter). If it is made dynamic, could this be opening a race condition where the call to sysctlbyname() returns a count of CPUS that is in turn changed by the offlining of a CPU? Or am I thinking to much about this? The maximum number of CPUs supported by a running instance will not change. Only, potentially, the current number of CPUs. So a sysctl to fetch the kernel's compiled-in MAXCPU is safe. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
sys/conf/files aicasm
I can't say I understand much about the syntax of the top part of sys/conf/files, but this line: aicasm optional ahc | ahd \ dependency $S/dev/aic7xxx/aicasm/*.[chyl] \ compile-withCC='${CC}' ${MAKE} -f $S/dev/aic7xxx/aicasm/Makefile MAKESRCPATH=$S/dev/aic7xxx/aicasm \ no-obj no-implicit-rule\ clean aicasm* y.tab.h looks to me like aicasm should only be built if I have device ahc or device ahd in my kernel configuration file. But even if I make a config file without those devices (e.g. by doing include GENERIC then nodevices ahc, ahd) the file is still built. Am I missing something about how these lines in sys/conf/files work? As a side question, why does the Makefile for dev/aic7xxx/aicasm have -I/usr/include in CFLAGS instead of using some marcos to get to the source tree I'm actually building from? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: g_event spinning 100% when doing 'gpart add'
On Fri, Sep 10, 2010 at 6:57 PM, Yuri Pankov yuri.pan...@gmail.com wrote: Nevermind me. That's what I thought why I was getting the same gpart behavior switching between kernels, with and without DEBUG_LOCKS. Sorry about that. Same here, gpart hangs on: 3826 gpart CALL __sysctl(0x7fffa250,0x3,0,0x7fffa268,0,0) 3826 gpart SCTL kern.geom.confxml When I was doing my sbuf changes I temporarily had a patch to change these sysctls to use the new drains. I couldn't get any output from the three kern.geom.conf{txt,dot,xml} sysctls (they didn't hang but the output was empty) so I reverted those parts before committing. I doubt my changes affected this, but I will check on Monday when I'm back at work to see if the lack of output is due to my changes or predates me. Cheers, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
deprecating sprintf(9)
Looking at the uses of kvprintf(9), only [v]sprintf(9) doesn't have a callback function. It seems a little sketchy to me to be doing unsafe sprintf in the kernel anyways. Should we (and by we, I mean me) deprecate sprintf(9) and convert the existing 1200+ uses to strcpy(9) for fixed strings (also potentially bad, but a different beast), or snprintf(9) where the size of the buffer is known? It seems like a large project, but OTOH sprintf(9) is mighty unsafe in the kernel. It's disapproved of for user-space as being unsafe for security reasons as well, but the potential downsides aren't the same, and we'll never clean up ports anyways. :-) Thoughts? matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: deprecating sprintf(9)
On Wed, Sep 8, 2010 at 9:15 AM, Rink Springer r...@freebsd.org wrote: Hi, On Wed, Sep 08, 2010 at 08:51:57AM -0700, m...@freebsd.org wrote: It seems like a large project, but OTOH sprintf(9) is mighty unsafe in the kernel. It's disapproved of for user-space as being unsafe for security reasons as well, but the potential downsides aren't the same, and we'll never clean up ports anyways. :-) Deprecating it may be usable, yet I don't believe we can easily enforce such a policy [1]. If the kernel sources don't use it then the prototype can be removed. Have you looked at how many (potentially) unsecure uses there are in the kernel, to give an idea how useful such an effort would be? I presume all the kernel uses are safe at the moment, but it's an error prone construction. As of this morning grep found 1277 occurrences of sprintf(9) in sys/ and 23 occurrences of vsprintf(9) in sys/. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Wed, Sep 1, 2010 at 6:49 AM, John Baldwin j...@freebsd.org wrote: On Tuesday, August 31, 2010 2:53:12 pm m...@freebsd.org wrote: On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. I don't like 2) only because you'd really need to fix all the other places that use sched_pin() to grab the thread lock. That would be a bit expensive. I think the problem is code that checks THREAD_CAN_MIGRATE() for running threads that aren't curthread. The sched_affinity() bug is probably my fault FWIW. I think something along the lines of 1) is the best approach. Maybe something like the patch below. It moves the CPU assignment out of sched_affinity() and moves it into sched_switch() instead so that it takes effect on the next context switch for this thread. That is the point at which the CPU binding actually takes effect anyway since sched_affinity() doesn't force an immediate context switch. This patch is relative to 7. The patch for 9 is nearly identical (just a fixup for ipi_cpu() vs ipi_selected()). Index: sched_ule.c === --- sched_ule.c (revision 212065) +++ sched_ule.c (working copy) @@ -1885,6 +1885,8 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; + if (THREAD_CAN_MIGRATE(td) !THREAD_CAN_SCHED(td, ts-tscpu)) + ts-ts_cpu = sched_pickcpu(td, 0); if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); else @@ -2536,7 +2538,6 @@ { #ifdef SMP struct td_sched *ts; - int cpu; THREAD_LOCK_ASSERT(td, MA_OWNED); ts = td-td_sched; @@ -2550,17 +2551,13 @@ if (!TD_IS_RUNNING(td)) return; td-td_flags |= TDF_NEEDRESCHED; - if (!THREAD_CAN_MIGRATE(td)) - return; /* - * Assign the new cpu and force a switch before returning to - * userspace. If the target thread is not running locally send - * an ipi to force the issue. + * Force a switch before returning to userspace. If the + * target thread is not running locally send an ipi to force + * the issue. */ - cpu = ts-ts_cpu; - ts-ts_cpu = sched_pickcpu(td, 0); - if (cpu != PCPU_GET(cpuid)) - ipi_selected(1 cpu, IPI_PREEMPT); + if (td != curthread) + ipi_selected(1 ts-ts_cpu, IPI_PREEMPT); #endif } I will test this patch out; thanks for the help! Two questions: 1) How does a thread get moved between CPUs when it's not running? I see that we change the runqueue for non-running threads that are on a
Re: sched_pin() bug in SCHED_ULE
I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Thanks, matthew Index: kern/sched_ule.c === --- kern/sched_ule.c(revision 158580) +++ kern/sched_ule.c(working copy) @@ -697,6 +697,41 @@ return; } +static void +bug67957(struct thread *td) +{ + int idx; + + THREAD_LOCK_ASSERT(td, MA_OWNED); + idx = (td-xxx_idx++ % 5); + stack_save(td-xxx[idx].td_preempt); + td-xxx[idx].td_moveto = td-td_sched-ts_cpu; + td-xxx[idx].td_movefrom = (td-td_oncpu == NOCPU) ? td-td_lastcpu : td-td_oncpu; + td-xxx[idx].td_statewas = td-td_state; + td-xxx[idx].td_pinned = td-td_pinned; + td-xxx[idx].td_by = curthread; +} + +static void +pr_bug67957(struct thread *td, int idx) +{ + int idx, i; + + printf(XXX bug 67957: moving %p from %d to %d\n, + td, td-td_lastcpu, td-td_sched-ts_cpu); + for (i = 0, idx = td-xxx_idx - 1; + i 5 idx = 0; + i++, idx--) { + printf([%d]: pin %d state %d move %d - %d done by %p:\n, + idx, td-xxx[idx % 5].td_pinned, + td-xxx[idx % 5].td_statewas, + td-xxx[idx % 5].td_movefrom, + td-xxx[idx % 5].td_moveto, + td-xxx[idx % 5].td_by); + stack_print_ddb(td-xxx[idx % 5].td_preempt); + } +} + /* * Move a thread from one thread queue to another. */ @@ -739,6 +774,7 @@ TDQ_UNLOCK(from); sched_rem(td); ts-ts_cpu = cpu; + bug67957(td); td-td_lock = TDQ_LOCKPTR(to); tdq_add(to, td, SRQ_YIELDING); } @@ -971,6 +1007,7 @@ tdq = TDQ_CPU(cpu); td = ts-ts_thread; ts-ts_cpu = cpu; + bug67957(td); /* If the lock matches just return the queue. */ if (td-td_lock == TDQ_LOCKPTR(tdq)) @@ -1890,8 +1964,15 @@ SRQ_OURSELF|SRQ_YIELDING; if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); - else + else { + if (!THREAD_CAN_MIGRATE(td) + (ts-ts_flags TSF_BOUND) == 0) { + pr_bug67957(td, idx); + panic(XXX); + } mtx = sched_switch_migrate(tdq, td, srqflag); + } + td-xxx_idx = 0; } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); @@ -2479,8 +2560,10 @@ * target cpu. */ if (td-td_priority = PRI_MAX_ITHD THREAD_CAN_MIGRATE(td) - curthread-td_intr_nesting_level) + curthread-td_intr_nesting_level) { ts-ts_cpu = cpuid; + bug67957(td); + } if (!THREAD_CAN_MIGRATE(td)) cpu = ts-ts_cpu; else @@ -2590,6 +2673,7 @@ */ cpu = ts-ts_cpu; ts-ts_cpu = sched_pickcpu(td, 0); + bug67957(td); if (cpu != PCPU_GET(cpuid)) ipi_selected(1 cpu, IPI_PREEMPT); #endif @@ -2613,6 +2697,7 @@ if (PCPU_GET(cpuid) == cpu) return; ts-ts_cpu = cpu; + bug67957(td); /* When we return from mi_switch we'll be on the correct cpu. */ mi_switch(SW_VOL, NULL); #endif Index: sys/proc.h === --- sys/proc.h (revision 158580) +++ sys/proc.h (working copy) @@ -68,6 +68,8 @@ #include sys/isi_mountroot.h #include
Re: sched_pin() bug in SCHED_ULE
On Tue, Aug 31, 2010 at 10:16 AM, m...@freebsd.org wrote: I recorded the stack any time ts-ts_cpu was set and when a thread was migrated by sched_switch() I printed out the recorded info. Here's what I found: XXX bug 67957: moving 0xff003ff9b800 from 3 to 1 [1]: pin 0 state 4 move 3 - 1 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b5dd4 at sched_affinity+0xd4 #2 0x8024a707 at cpuset_setthread+0x137 #3 0x8024aeae at cpuset_setaffinity+0x21e #4 0x804a82df at freebsd32_cpuset_setaffinity+0x4f #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 [0]: pin 0 state 2 move 0 - 3 done by 0xff000cc44000: #0 0x802b36b4 at bug67957+0x84 #1 0x802b4ad8 at sched_add+0xe8 #2 0x8029b96a at create_thread+0x34a #3 0x8029badc at kern_thr_new+0x8c #4 0x804a8912 at freebsd32_thr_new+0x122 #5 0x80295f49 at isi_syscall+0x99 #6 0x804a630e at ia32_syscall+0x1ce #7 0x8046dc60 at Xint0x80_syscall+0x60 So one thread in the process called cpuset_setaffinity(2), and another thread in the process was forcibly migrated by the IPI while returning from a syscall, while it had td_pinned set. Given this path, it seems reasonable to me to skip the migrate if we notice THREAD_CAN_MIGRATE is false. Opinions? My debug code is below. I'll try to write a short testcase that exhibits this bug. Just a few more thoughts on this. The check in sched_affinity for THREAD_CAN_MIGRATE is racy. Since witness uses sched_pin, it's not simple to take the THREAD lock around an increment of td_pinned. So I'm looking for suggestions on the best way to fix this issue. My thoughts: 1) add a check in sched_switch() for THREAD_CAN_MIGRATE 2) have WITNESS not use sched_pin, and take the THREAD lock when modifying td_pinned 3) have the IPI_PREEMPT handler notice the thread is pinned (and not trying to bind) and postpone the mi_switch, just like it postpones when a thread is in a critical section. Except for the potential complexity of implementation, I think (2) is the best solution. For those who want to play at home, I have a small test program that exhibits this behavior at http://people.freebsd.org/~mdf/cpu_affinity_test.c. It seems to require 4 or more CPUs to hit the assert. You'll also need to patch the kernel to assert when migrating a pinned thread: Index: kern/sched_ule.c === --- kern/sched_ule.c(revision 158580) +++ kern/sched_ule.c(working copy) @@ -1888,11 +1889,26 @@ sched_switch(struct thread *td, struct t srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; if (ts-ts_cpu == cpuid) tdq_add(tdq, td, srqflag); - else + else { + KASSERT(THREAD_CAN_MIGRATE(td) || + (ts-ts_flags TSF_BOUND) != 0, + (Thread %p shouldn't migrate!, td)); mtx = sched_switch_migrate(tdq, td, srqflag); + } } else { /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
sched_pin() bug in SCHED_ULE
Back at the beginning of August I posted about issues with sched_pin() and witness_warn(): http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html After a lot of debugging I think I've basically found the issue. I found this bug on stable/7, but looking at the commit log for CURRENT I don't see any reason the bug doesn't exist there. This analysis is specific to amd64/i386 but the problem is likely to exist on most arches. The basic problem is that in both tdq_move() and sched_setcpu() can manipulate the ts-ts_cpu variable of another process, which may be running. This means that a process running on CPU N may be set to run on CPU M the next context switch. Then, that process may call sched_pin(), then grab a PCPU variable. An IPI_PREEMPT can come in, causing the thread to call mi_switch() in ipi_bitmap_handler(). sched_switch() will then notice that ts-ts_cpu is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the thread to the intended CPU. Thus after sched_pin() and potentially before using any PCPU variable the process can get migrated to another CPU, and now it is using a PCPU variable for the wrong processor. Given that ts_cpu can be set by other threads, it doesn't seem worth checking at sched_pin time whether ts_cpu is not the same as td_oncpu, because to make the check would require taking the thread_lock. Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before calling sched_switch_migrate(). However, sched_pin() is also used by sched_bind(9) to force the thread to stay on the intended cpu. I would handle this by adding a TSF_BINDING state that is different from TSF_BOUND. The thing that has me wondering whether this is all correct is that I don't see any checks in tdq_move() or sched_setcpu() for either TSF_BOUND or THREAD_CAN_MIGRATE(). Perhaps that logic is managed in the various calling functions; in that case I would propose adding asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked TSF_BINDING. Does this analysis seem correct? Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: sched_pin() bug in SCHED_ULE
On Thu, Aug 26, 2010 at 1:49 PM, John Baldwin j...@freebsd.org wrote: On Thursday, August 26, 2010 4:03:38 pm m...@freebsd.org wrote: Back at the beginning of August I posted about issues with sched_pin() and witness_warn(): http://lists.freebsd.org/pipermail/freebsd-hackers/2010-August/032553.html After a lot of debugging I think I've basically found the issue. I found this bug on stable/7, but looking at the commit log for CURRENT I don't see any reason the bug doesn't exist there. This analysis is specific to amd64/i386 but the problem is likely to exist on most arches. The basic problem is that in both tdq_move() and sched_setcpu() can manipulate the ts-ts_cpu variable of another process, which may be running. This means that a process running on CPU N may be set to run on CPU M the next context switch. Then, that process may call sched_pin(), then grab a PCPU variable. An IPI_PREEMPT can come in, causing the thread to call mi_switch() in ipi_bitmap_handler(). sched_switch() will then notice that ts-ts_cpu is not PCPU_GET(cpuid), and call sched_switch_migrate(), migrating the thread to the intended CPU. Thus after sched_pin() and potentially before using any PCPU variable the process can get migrated to another CPU, and now it is using a PCPU variable for the wrong processor. Given that ts_cpu can be set by other threads, it doesn't seem worth checking at sched_pin time whether ts_cpu is not the same as td_oncpu, because to make the check would require taking the thread_lock. Instead, I propose adding a check for !THREAD_CAN_MIGRATE() before calling sched_switch_migrate(). However, sched_pin() is also used by sched_bind(9) to force the thread to stay on the intended cpu. I would handle this by adding a TSF_BINDING state that is different from TSF_BOUND. The thing that has me wondering whether this is all correct is that I don't see any checks in tdq_move() or sched_setcpu() for either TSF_BOUND or THREAD_CAN_MIGRATE(). Perhaps that logic is managed in the various calling functions; in that case I would propose adding asserts that the thread is THREAD_CAN_MIGRATE() unless it's marked TSF_BINDING. Does this analysis seem correct? Calling code does check THREAD_CAN_MIGRATE(), but the problem is that it is not safe to call that on anything but curthread as td_pinned is not locked. It is assumed that only curthread would ever check td_pinned, not other threads. I suspect what is happening is that another CPU is seeing a stale value of td_pinned. You could fix this by grabbing the thread lock in sched_pin()/unpin() for ULE, but that is a bit expensive perhaps. I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td-td_oncpu, td-td_lastcpu and ts-ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td-td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. I believe the below patch fixes the issue, though I'm open to other methods: Index: kern/sched_ule.c === --- kern/sched_ule.c(.../head/src/sys) (revision 158279) +++ kern/sched_ule.c(.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #defineTSF_BOUND 0x0001 /* Thread can not migrate. */ #defineTSF_XFERABLE0x0002 /* Thread was added as transferable. */ +#defineTSF_BINDING 0x0004 /* Thread is being bound. */ static struct td_sched td_sched0; @@ -1859,6 +1858,7 @@ struct mtx *mtx; int srqflag; int cpuid; + int do_switch; THREAD_LOCK_ASSERT(td, MA_OWNED); @@ -1888,10 +1888,21 @@ srqflag = (flags SW_PREEMPT) ? SRQ_OURSELF|SRQ_YIELDING|SRQ_PREEMPTED : SRQ_OURSELF|SRQ_YIELDING; - if (ts-ts_cpu == cpuid) + /* +* Allow the switch to another processor as requested +* only if the thread can migrate or we are in the +* middle of binding for sched_bind(9). This keeps +* sched_pin() quick, since other threads can +* manipulate ts_cpu. +*/ + do_switch = (ts-ts_cpu != cpuid); + if (!THREAD_CAN_MIGRATE(td) + (ts-ts_flags TSF_BINDING) == 0) + do_switch = 0; + if (do_switch) + mtx = sched_switch_migrate(tdq, td, srqflag); + else tdq_add(tdq, td, srqflag); - else - mtx = sched_switch_migrate(tdq, td, srqflag); } else {
Re: sched_pin() bug in SCHED_ULE
On Thu, Aug 26, 2010 at 3:10 PM, Andriy Gapon a...@icyb.net.ua wrote: on 27/08/2010 00:20 m...@freebsd.org said the following: I tried making sched_pin() a real function which used intr_disable/intr_restore around saving off td-td_oncpu, td-td_lastcpu and ts-ts_cpu, and the stack at the time of call. In sched_switch when I saw an unexpected migration I printed all that out. I found that on my boxes, at sched_pin() time ts_cpu was already different from td-td_oncpu, so the specific problem I was having was that while another thread can change ts_cpu it has no way to force that thread to immediately migrate. Like e.g. in sched_affinity where ts_cpu is first changed and then the old cpu is ipi-ed? That could do it. I didn't follow the stack of the places that were touching ts_cpu for non-curthread. I believe the below patch fixes the issue, though I'm open to other methods: Index: kern/sched_ule.c === --- kern/sched_ule.c (.../head/src/sys) (revision 158279) +++ kern/sched_ule.c (.../branches/BR_BUG_67957/src/sys) (revision 158279) @@ -112,6 +112,7 @@ /* flags kept in ts_flags */ #define TSF_BOUND 0x0001 /* Thread can not migrate. */ #define TSF_XFERABLE 0x0002 /* Thread was added as transferable. */ +#define TSF_BINDING 0x0004 /* Thread is being bound. */ I don't really follow why TSF_BINDING is needed, i.e. why TSF_BOUND is not sufficient in this case? I wanted to tighten up the asserts, so that the only time it was okay for a td_pinned thread to migrate was the one time it was moving to the target cpu. Having a separate flag allows that. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [head tinderbox] failure on amd64/amd64
The tinderbox break is my bad. I will have it fixed by the end of the day. :-( matthew On Thu, Aug 12, 2010 at 3:25 AM, FreeBSD Tinderbox tinder...@freebsd.org wrote: TB --- 2010-08-12 01:30:00 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-08-12 01:30:00 - starting HEAD tinderbox run for amd64/amd64 TB --- 2010-08-12 01:30:00 - cleaning the object tree TB --- 2010-08-12 01:30:27 - cvsupping the source tree TB --- 2010-08-12 01:30:27 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/HEAD/amd64/amd64/supfile TB --- 2010-08-12 01:46:12 - building world TB --- 2010-08-12 01:46:12 - MAKEOBJDIRPREFIX=/obj TB --- 2010-08-12 01:46:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-08-12 01:46:12 - TARGET=amd64 TB --- 2010-08-12 01:46:12 - TARGET_ARCH=amd64 TB --- 2010-08-12 01:46:12 - TZ=UTC TB --- 2010-08-12 01:46:12 - __MAKE_CONF=/dev/null TB --- 2010-08-12 01:46:12 - cd /src TB --- 2010-08-12 01:46:12 - /usr/bin/make -B buildworld World build started on Thu Aug 12 01:46:13 UTC 2010 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything [...] cc -O2 -pipe -DACPI_EXEC_APP -fno-strict-aliasing -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c /src/usr.sbin/acpi/acpidb/../../../sys/contrib/dev/acpica/utilities/utxface.c cc -O2 -pipe -DACPI_EXEC_APP -fno-strict-aliasing -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -o acpidb acpidb.o osunixxf.o dbcmds.o dbdisply.o dbexec.o dbfileio.o dbhistry.o dbinput.o dbstats.o dbutils.o dbxface.o dmbuffer.o dmnames.o dmobject.o dmopcode.o dmresrc.o dmresrcl.o dmresrcs.o dmutils.o dmwalk.o evevent.o evgpe.o evgpeblk.o evgpeinit.o evgpeutil.o evmisc.o evregion.o evrgnini.o evsci.o evxface.o evxfevnt.o evxfregn.o hwacpi.o hwgpe.o hwregs.o hwsleep.o hwvalid.o hwxface.o dsfield.o dsinit.o dsmethod.o dsmthdat.o dsobject.o dsopcode.o dsutils.o dswexec.o dswload.o dswscope.o dswstate.o exconfig.o exconvrt.o excreate.o exdebug.o exdump.o exfield.o exfldio.o exmisc.o exmutex.o exnames.o exoparg1.o exoparg2.o exoparg3.o exoparg6.o exprep.o exregion.o exresnte.o exresolv.o exresop.o exstore.o exstoren.o exstorob.o exsystem.o exutils.o psargs.o psloop.o psopc! ode.o psparse.o psscope.o pstree.o psutils.o pswalk.o psxface.o nsaccess.o nsalloc.o nsdump.o nseval.o nsinit.o nsload.o nsnames.o nsobject.o nsparse.o nspredef.o nsrepair.o nsrepair2.o nssearch.o nsutils.o nswalk.o nsxfeval.o nsxfname.o nsxfobj.o rsaddr.o rscalc.o rscreate.o rsdump.o rsinfo.o rsio.o rsirq.o rslist.o rsmemory.o rsmisc.o rsutils.o rsxface.o tbfadt.o tbfind.o tbinstal.o tbutils.o tbxface.o tbxfroot.o utalloc.o utcache.o utcopy.o utdebug.o utdelete.o uteval.o utglobal.o utids.o utinit.o utlock.o utmath.o utmisc.o utmutex.o utobject.o utosi.o utresrc.o utstate.o uttrack.o utxface.o -lpthread gzip -cn /src/usr.sbin/acpi/acpidb/acpidb.8 acpidb.8.gz === usr.sbin/acpi/acpidump (all) cc -O2 -pipe -I/src/usr.sbin/acpi/acpidump/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/usr.sbin/acpi/acpidump/acpi.c cc1: warnings being treated as errors /src/usr.sbin/acpi/acpidump/acpi.c: In function 'acpi_handle_tcpa': /src/usr.sbin/acpi/acpidump/acpi.c:650: warning: format '%lld' expects type 'long long int', but argument 4 has type 'u_int64_t' *** Error code 1 Stop in /src/usr.sbin/acpi/acpidump. *** Error code 1 Stop in /src/usr.sbin/acpi. *** Error code 1 Stop in /src/usr.sbin. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2010-08-12 03:25:04 - WARNING: /usr/bin/make returned exit code 1 TB --- 2010-08-12 03:25:04 - ERROR: failed to build world TB --- 2010-08-12 03:25:04 - 4691.49 user 830.94 system 6903.97 real http://tinderbox.freebsd.org/tinderbox-head-HEAD-amd64-amd64.full ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list
Re: [head tinderbox] failure on amd64/amd64
On Thu, Aug 12, 2010 at 1:51 PM, m...@freebsd.org wrote: The tinderbox break is my bad. I will have it fixed by the end of the day. Actually, the amd64 break is not me, because it broke before it got to my mistake. i386 and presumably all the other 32-bit builds are due to my memguard(9) patch. :-( matthew On Thu, Aug 12, 2010 at 3:25 AM, FreeBSD Tinderbox tinder...@freebsd.org wrote: TB --- 2010-08-12 01:30:00 - tinderbox 2.6 running on freebsd-current.sentex.ca TB --- 2010-08-12 01:30:00 - starting HEAD tinderbox run for amd64/amd64 TB --- 2010-08-12 01:30:00 - cleaning the object tree TB --- 2010-08-12 01:30:27 - cvsupping the source tree TB --- 2010-08-12 01:30:27 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca /tinderbox/HEAD/amd64/amd64/supfile TB --- 2010-08-12 01:46:12 - building world TB --- 2010-08-12 01:46:12 - MAKEOBJDIRPREFIX=/obj TB --- 2010-08-12 01:46:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2010-08-12 01:46:12 - TARGET=amd64 TB --- 2010-08-12 01:46:12 - TARGET_ARCH=amd64 TB --- 2010-08-12 01:46:12 - TZ=UTC TB --- 2010-08-12 01:46:12 - __MAKE_CONF=/dev/null TB --- 2010-08-12 01:46:12 - cd /src TB --- 2010-08-12 01:46:12 - /usr/bin/make -B buildworld World build started on Thu Aug 12 01:46:13 UTC 2010 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything [...] cc -O2 -pipe -DACPI_EXEC_APP -fno-strict-aliasing -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -c /src/usr.sbin/acpi/acpidb/../../../sys/contrib/dev/acpica/utilities/utxface.c cc -O2 -pipe -DACPI_EXEC_APP -fno-strict-aliasing -I/src/usr.sbin/acpi/acpidb/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -Wno-uninitialized -Wno-pointer-sign -o acpidb acpidb.o osunixxf.o dbcmds.o dbdisply.o dbexec.o dbfileio.o dbhistry.o dbinput.o dbstats.o dbutils.o dbxface.o dmbuffer.o dmnames.o dmobject.o dmopcode.o dmresrc.o dmresrcl.o dmresrcs.o dmutils.o dmwalk.o evevent.o evgpe.o evgpeblk.o evgpeinit.o evgpeutil.o evmisc.o evregion.o evrgnini.o evsci.o evxface.o evxfevnt.o evxfregn.o hwacpi.o hwgpe.o hwregs.o hwsleep.o hwvalid.o hwxface.o dsfield.o dsinit.o dsmethod.o dsmthdat.o dsobject.o dsopcode.o dsutils.o dswexec.o dswload.o dswscope.o dswstate.o exconfig.o exconvrt.o excreate.o exdebug.o exdump.o exfield.o exfldio.o exmisc.o exmutex.o exnames.o exoparg1.o exoparg2.o exoparg3.o exoparg6.o exprep.o exregion.o exresnte.o exresolv.o exresop.o exstore.o exstoren.o exstorob.o exsystem.o exutils.o psargs.o psloop.o psopc! ode.o psparse.o psscope.o pstree.o psutils.o pswalk.o psxface.o nsaccess.o nsalloc.o nsdump.o nseval.o nsinit.o nsload.o nsnames.o nsobject.o nsparse.o nspredef.o nsrepair.o nsrepair2.o nssearch.o nsutils.o nswalk.o nsxfeval.o nsxfname.o nsxfobj.o rsaddr.o rscalc.o rscreate.o rsdump.o rsinfo.o rsio.o rsirq.o rslist.o rsmemory.o rsmisc.o rsutils.o rsxface.o tbfadt.o tbfind.o tbinstal.o tbutils.o tbxface.o tbxfroot.o utalloc.o utcache.o utcopy.o utdebug.o utdelete.o uteval.o utglobal.o utids.o utinit.o utlock.o utmath.o utmisc.o utmutex.o utobject.o utosi.o utresrc.o utstate.o uttrack.o utxface.o -lpthread gzip -cn /src/usr.sbin/acpi/acpidb/acpidb.8 acpidb.8.gz === usr.sbin/acpi/acpidump (all) cc -O2 -pipe -I/src/usr.sbin/acpi/acpidump/../../../sys -std=gnu99 -fstack-protector -Wsystem-headers -Werror -Wall -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls -Wold-style-definition -Wno-pointer-sign -c /src/usr.sbin/acpi/acpidump/acpi.c cc1: warnings being treated as errors /src/usr.sbin/acpi/acpidump/acpi.c: In function 'acpi_handle_tcpa': /src/usr.sbin/acpi/acpidump/acpi.c:650: warning: format '%lld' expects type 'long long int', but argument 4 has type 'u_int64_t' *** Error code 1 Stop in /src/usr.sbin/acpi/acpidump. *** Error code 1 Stop in /src/usr.sbin/acpi. *** Error code 1 Stop in /src/usr.sbin. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. *** Error code 1 Stop in /src. TB --- 2010-08-12 03:25:04 - WARNING: /usr/bin/make returned exit code 1 TB --- 2010-08-12 03:25:04 - ERROR: failed to build world TB --- 2010-08-12 03:25:04 - 4691.49 user 830.94 system 6903.97 real http://tinderbox.freebsd.org/tinderbox-head-HEAD-amd64-amd64.full ___ freebsd-current@freebsd.org mailing list
Re: Panic booting vmware i386 after SRAT update
On Thu, Jul 29, 2010 at 7:18 AM, John Baldwin j...@freebsd.org wrote: On Wednesday, July 28, 2010 1:37:42 pm m...@freebsd.org wrote: I have a 2 cpu virtual image of FreeBSD current. It panics during boot after building in the NUMA support. I'll transcribe the SRAT bootverbose messages and panic message as best I can. Table 'SRAT' at 0xfef07f6 SRAT: Found table at 0xfef07f6 SRAT: Found memory domain 0 addr 0 len a: enabled SRAT: Found memory domain 0 addr 10 len ff0: enabled then some MADT: messages about finding cpu 0 and 1 cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 panic: SRAT: CPU with APIC ID 0 is not known I'm playing around now with trying to figure out what went missing, but I thought I'd send this out now. Hmm, check_domains() in srat.c should reject the SRAT table in this case. Oh, I see the problem, try this: Index: srat.c === --- srat.c (revision 210552) +++ srat.c (working copy) @@ -150,7 +150,8 @@ for (i = 0; i num_mem; i++) { found = 0; for (j = 0; j = MAX_APIC_ID; j++) - if (cpus[j].domain == mem_info[i].domain) { + if (cpus[j].enabled + cpus[j].domain == mem_info[i].domain) { cpus[j].has_memory = 1; found++; } This appears to fix the issue for me. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Panic booting vmware i386 after SRAT update
I have a 2 cpu virtual image of FreeBSD current. It panics during boot after building in the NUMA support. I'll transcribe the SRAT bootverbose messages and panic message as best I can. Table 'SRAT' at 0xfef07f6 SRAT: Found table at 0xfef07f6 SRAT: Found memory domain 0 addr 0 len a: enabled SRAT: Found memory domain 0 addr 10 len ff0: enabled then some MADT: messages about finding cpu 0 and 1 cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 panic: SRAT: CPU with APIC ID 0 is not known I'm playing around now with trying to figure out what went missing, but I thought I'd send this out now. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Panic booting vmware i386 after SRAT update
On Wed, Jul 28, 2010 at 10:37 AM, m...@freebsd.org wrote: I have a 2 cpu virtual image of FreeBSD current. It panics during boot after building in the NUMA support. I'll transcribe the SRAT bootverbose messages and panic message as best I can. Table 'SRAT' at 0xfef07f6 SRAT: Found table at 0xfef07f6 SRAT: Found memory domain 0 addr 0 len a: enabled SRAT: Found memory domain 0 addr 10 len ff0: enabled then some MADT: messages about finding cpu 0 and 1 cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 panic: SRAT: CPU with APIC ID 0 is not known I'm playing around now with trying to figure out what went missing, but I thought I'd send this out now. Okay, apparently VMWare is providing two entries of type ACPI_SRAT_TYPE_MEMORY_AFFINITY but no entries of type ACPI_SRAT_TYPE_CPU_AFFINITY. This leads to the assert since no CPUs are enabled; that is there's no affinity information for them. This is probably a VMWare bug. Setting hint.srat.0.disabled=1 in /boot/device.hints works around the issue. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Panic booting vmware i386 after SRAT update
On Wed, Jul 28, 2010 at 11:19 AM, David Cornejo d...@dogwood.com wrote: On Wed, Jul 28, 2010 at 7:37 AM, m...@freebsd.org wrote: I have a 2 cpu virtual image of FreeBSD current. It panics during boot after building in the NUMA support. I'll transcribe the SRAT bootverbose messages and panic message as best I can. Table 'SRAT' at 0xfef07f6 SRAT: Found table at 0xfef07f6 SRAT: Found memory domain 0 addr 0 len a: enabled SRAT: Found memory domain 0 addr 10 len ff0: enabled then some MADT: messages about finding cpu 0 and 1 cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 panic: SRAT: CPU with APIC ID 0 is not known I'm playing around now with trying to figure out what went missing, but I thought I'd send this out now. I have been seeing this since yesterday with amd64 custom kernels - just compiled a GENERIC kernel a few minutes ago and it shows the same symptom. Is this on VMWare or physical hardware? I isolated my issue to VMWare reporting memory domains properties but not CPU affinity. Setting hint.srat.0.disabled=1 in /boot/device.hints fixes the issue for me. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: strange scsi/CAM related dmesg output
On Fri, Jun 18, 2010 at 8:11 AM, Alexander Best alexbes...@uni-muenster.de wrote: On Mon, Jun 7, 2010 at 3:57 PM, John Baldwin j...@freebsd.org wrote: It can happen because the print buffer size thing is not line-buffered, it is printf-invocation buffered. hmmm...can this somehow be fixed? i'm not sure this is specific to scsi/cam. the other day i bootd my system and almost all of the dmesg output was displayed incorrectly. would increasing PRINTF_BUFR_SIZE from 128 to lets say 512 or 1024 solve the issue? I think what jhb meant was that we could look for the '\n' and flush to console there, instead of waiting for the end of the buffer. This would perhaps have more interleaved full lines, but likely fewer interleaved partial lines. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org