Re: AoE driver for FBSD8 or later?
On 14 September 2010 17:35, Max Khon f...@samodelkin.net wrote: George, On Tue, Sep 14, 2010 at 5:01 PM, George Mamalakis mama...@eng.auth.grwrote: On Mon, Sep 13, 2010 at 7:53 PM, Max Khon f...@samodelkin.net mailto: f...@samodelkin.net wrote: Have you tried to contact coraid on this matter? Can you try this port version? http://people.freebsd.org/~fjoe/aoe-2.tar.gzhttp://people.freebsd.org/%7Efjoe/aoe-2.tar.gz http://people.freebsd.org/%7Efjoe/aoe-2.tar.gz Max, thank you very much for your help. The driver works fine; I am able to see all 13T. In case something goes wrong I will inform you. For the time being, everything is OK. I committed the port to the FreeBSD ports tree: ports/net/aoe. I also committed ports/net/vblade (user-space AoE target) -- I used it for testing. I am not sure that it can be used in production due to possible performance problems. Thanks, Max. I wonder the port has not yet been in the Collection until now. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: AoE driver for FBSD8 or later?
On 10 September 2010 17:32, George Mamalakis mama...@eng.auth.gr wrote: Hi everybody, we have a coraid device with 15x1GB disks on it, and would like to use it with fbsd8 (zfs, etc). The http://support.coraid.com/support/freebsd/ is really outdated, and the port that creates the kernel module does not compile on FBSD8 (obviously!). Is there any effort on migrating the driver onto fbsd8 or should I plug the coraid on a linux system and use it from there? This change below looks obvious to me. Not sure if this is enough to make it work though. There are also might be issues with those interfaces which announce itself as IFT_ETHER, but have NULL if_input. # cat files/patch-dev-aoe-aoenet.c --- aoenet.c.orig 2006-05-25 16:10:11.0 + +++ aoenet.c2010-09-10 15:03:01.0 + @@ -77,8 +77,11 @@ #define NECODES (sizeof(aoe_errlist) / sizeof(char *) - 1) #if (__FreeBSD_version 60) #define IFPADDR(ifp) (((struct arpcom *) (ifp))-ac_enaddr) +#elif (__FreeBSD_version 70) +#define IFPADDR(ifp) IFP2ENADDR(ifp) #else -#define IFPADDR(ifp) IFP2ENADDR(ifp) +#include net/if_dl.h +#define IFPADDR(ifp) IF_LLADDR(ifp) #endif #define IFLISTSZ 1024 @@ -223,7 +226,11 @@ m1-m_ext.ref_cnt = NULL; MEXTADD(m1, f-f_data, len, nilfn, +#if (__FreeBSD_version 80) NULL, 0, EXT_NET_DRV); +#else + f-f_data, NULL, 0, EXT_NET_DRV); +#endif m1-m_len = len; m1-m_next = NULL; } -- wbr, pluknet patch-dev-aoe-aoenet.c Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Cannot install using serial console
On 7 September 2010 18:50, Jeremie Le Hen jere...@le-hen.org wrote: Hi list, = Please Cc: me when replying, as I am not subscribed. = I tried to install FreeBSD (201008 -CURRENT snapshot, but I don't think it's important here) in a KVM-backed virtual machine on a headless Linux host, following section 2.12.1 of the handbook. http://www.freebsd.org/doc/handbook/install-advanced.html I've rebuilt the ISO image with the following lines in boot/loader.conf: console=comconsole beastie_disable=YES The kernel boots correctly (see output below) but the output invariably stalls after the following lines: Trying to mount root from ufs:/dev/md0 /stand/sysinstal (Yes, only one `l'.) Any idea? [strip] WARNING: WITNESS option enabled, expect reduced performance. Root mount waiting for: usbus0 uhub0: 2 ports with 2 removable, self powered Trying to mount root from ufs:/dev/md0 As far as I know you need also to enable a serial terminal in /etc/ttys. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: uma_ref_cnt: vm_fault: fault on nofault entry
On 3 August 2010 15:29, pluknet pluk...@gmail.com wrote: The second panic seen today on the same box at uma_find_refcnt(). I'm unsure if it might be caused by Xen hvm setup. Okey, updating the system to 8.1-R seems to fix this. 13 days of uptime since then, while with 8.0 it panicked every 3-4 days. db bt Tracing pid 12 tid 100035 td 0xc776f480 kdb_enter(c0c8dce2,c0c8dce2,c0cab939,c73678b8,0,...) at kdb_enter+0x3a panic(c0cab939,c14b3000,1,c73679e8,c73679d8,...) at panic+0x136 vm_fault(c149,c14b3000,1,0,c7a81760,...) at vm_fault+0x197 trap_pfault(c7bb1748,c7367aa0,c7f7e7bf,c7bb1700,c755c7f8,...) at trap_pfault+0x20e trap(c7367b04) at trap+0x455 calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc0af336b, esp = 0xc7367b44, ebp = 0xc7367b4c --- uma_find_refcnt(c147f700,cbe96800,1,1,cbe96800,...) at uma_find_refcnt+0x5b mb_ctor_clust(cbe96800,800,c7f2a700,1,c0dd7b40,...) at mb_ctor_clust+0x9c uma_zalloc_arg(c147f700,c7f2a700,1,c776f000,0,...) at uma_zalloc_arg+0x8a igb_get_buf(c0dd6e40,c0dd6e40,c0dd6e40,c7367c38,c08a9d00,...) at igb_get_buf+0x146 igb_rxeof(c775f540,0,0,c775f5c0,c7762a00,...) at igb_rxeof+0x2b0 igb_msix_rx(c7760a00,0,109,e3c02e88,2a1bf,...) at igb_msix_rx+0x29 intr_event_execute_handlers(c755c7f8,c7762a00,c0c8ab0b,4f6,c7762a70,...) at intr_event_execute_handlers+0x14b ithread_loop(c7772930,c7367d38,0,0,0,...) at ithread_loop+0x6b fork_exit(c0861b20,c7772930,c7367d38) at fork_exit+0x91 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xc7367d70, ebp = 0 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
page fault in e1000_clear_hw_cntrs_base_generic() during SIOCAIFADDR
Hi. This is reproducible from time to time on boot when handling SIOCAIFADDR called from ifconfig on igb on fresh (and not so fresh) 8-STABLE. How can I help with debugging? Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex igb0 (IGB Core Lock) r = 0 (0xc2655534) locked @ /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:965 KDB: stack backtrace: db_trace_self_wrapper(c08b5055,cce577b8,c060db15,3c5,0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(3c5,0,,c0a94864,cce577f0,...) at kdb_backtrace+0x29 _witness_debugger(c08b74fe,cce57804,4,1,0,...) at _witness_debugger+0x25 witness_warn(5,0,c08e3140,cce5782c,c2956000,...) at witness_warn+0x1fe trap(cce57890) at trap+0x195 calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc3192477, esp = 0xcce578d0, ebp = 0xcce578e0 --- e1000_clear_hw_cntrs_base_generic(c2651004,64,c3185850,c2651000,0,...) at e1000_clear_hw_cntrs_base_generic+0x3e7 igb_init_locked(c2655534,0,c31ac72f,3c5,c31c3d00,...) at igb_init_locked+0x16e2 igb_ioctl(c2642c00,8020690c,c31c3d00,cce57a8c,c457ea9b,...) at igb_ioctl+0x495 in_ifinit(0,c08c391b,1aa,1a6,c2642c00,...) at in_ifinit+0x29e in_control(c2a58b44,8040691a,c31bd100,c2642c00,c2948000,...) at in_control+0xccb ifioctl(c2a58b44,8040691a,c31bd100,c2948000,c31c3b00,...) at ifioctl+0x1820 soo_ioctl(c29b7bd0,8040691a,c31bd100,c254b100,c2948000,...) at soo_ioctl+0x415 kern_ioctl(c2948000,3,8040691a,c31bd100,6073c0,...) at kern_ioctl+0x1fd ioctl(c2948000,cce57cf8,c08e3073,c08c398f,c2956000,...) at ioctl+0x134 syscall(cce57d38) at syscall+0x220 Xint0x80_syscall() at Xint0x80_syscall+0x21 --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281c1543, esp = 0xbfbfe60c, ebp = 0xbfbfe648 --- Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xcc5af000 fault code = supervisor read, page not present instruction pointer = 0x20:0xc3192477 stack pointer = 0x28:0xcce578d0 frame pointer = 0x28:0xcce578e0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 700 (ifconfig) db show ifnet 0xc2642c00 igb0: if_dname = igb if_dunit = 0 if_description = (null) if_index = 2 if_refcount = 2 if_softc = 0xc2651000 if_l2com = 0xc2676b80 if_vnet = 0 if_home_vnet = 0 if_addr = 0xc31c4500 if_llsoftc = 0 if_label = 0 if_pcount = 0 if_flags = 0x8803 if_drv_flags = 0x0040 if_capabilities = 0x000101bb if_capenable = 0x01bb if_snd.ifq_head = 0 if_snd.ifq_tail = 0 if_snd.ifq_len = 0 if_snd.ifq_maxlen = 1023 if_snd.ifq_drops = 0 if_snd.ifq_drv_head = 0 if_snd.ifq_drv_tail = 0 if_snd.ifq_drv_len = 0 if_snd.ifq_drv_maxlen = 1023 if_snd.altq_type = 0 if_snd.altq_flags = 1 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: page fault in e1000_clear_hw_cntrs_base_generic() during SIOCAIFADDR
On 1 September 2010 20:06, John Baldwin j...@freebsd.org wrote: On Wednesday, September 01, 2010 11:53:09 am pluknet wrote: Hi. This is reproducible from time to time on boot when handling SIOCAIFADDR called from ifconfig on igb on fresh (and not so fresh) 8-STABLE. How can I help with debugging? Kernel page fault with the following non-sleepable locks held: exclusive sleep mutex igb0 (IGB Core Lock) r = 0 (0xc2655534) locked @ /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:965 KDB: stack backtrace: db_trace_self_wrapper(c08b5055,cce577b8,c060db15,3c5,0,...) at db_trace_self_wrapper+0x26 kdb_backtrace(3c5,0,,c0a94864,cce577f0,...) at kdb_backtrace+0x29 _witness_debugger(c08b74fe,cce57804,4,1,0,...) at _witness_debugger+0x25 witness_warn(5,0,c08e3140,cce5782c,c2956000,...) at witness_warn+0x1fe trap(cce57890) at trap+0x195 calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc3192477, esp = 0xcce578d0, ebp = 0xcce578e0 --- e1000_clear_hw_cntrs_base_generic(c2651004,64,c3185850,c2651000,0,...) at e1000_clear_hw_cntrs_base_generic+0x3e7 Can you use gdb on your kernel.debug to map this to a source file and line? Here it is (btw, it took about 10-15 reboots to reproduce after adding swap and dumpon setup). Hmm.. don't see where it might access an invalid pointer. #0 doadump () at pcpu.h:231 #1 0xc04a3679 in db_fncall (dummy1=1, dummy2=0, dummy3=-1062122144, dummy4=0xcce636a8 ) at /usr/src/sys/ddb/db_command.c:548 #2 0xc04a3a71 in db_command (last_cmdp=0xc093d19c, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04a3bca in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xc04a5aed in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:229 #5 0xc05fa64e in kdb_trap (type=12, code=0, tf=0xcce63890) at /usr/src/sys/kern/subr_kdb.c:535 #6 0xc084dcdf in trap_fatal (frame=0xcce63890, eva=3428511744) at /usr/src/sys/i386/i386/trap.c:929 #7 0xc084e553 in trap (frame=0xcce63890) at /usr/src/sys/i386/i386/trap.c:328 #8 0xc082f66c in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #9 0xc318c477 in e1000_clear_hw_cntrs_base_generic (hw=0xc2655004) at /usr/src/sys/modules/igb/../../dev/e1000/e1000_mac.c:643 #10 0xc317ec82 in igb_init_locked (adapter=0xc2655000) at /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:1202 #11 0xc31801e5 in igb_ioctl (ifp=0xc2943c00, command=2149607692, data=0xc29db600 ╢╤\235бд╤\235бт╤\235б) at /usr/src/sys/modules/igb/../../dev/e1000/if_igb.c:966 #12 0xc0696c4e in in_ifinit (ifp=0xc2943c00, ia=0xc29db600, sin=Variable sin is not available. ) at /usr/src/sys/netinet/in.c:848 #13 0xc06980cb in in_control (so=0xc2a5d9a8, cmd=2151704858, data=0xc2649400 igb0, ifp=0xc2943c00, td=0xc29b8280) ---Type return to continue, or q return to quit--- at /usr/src/sys/netinet/in.c:563 #14 0xc067c860 in ifioctl (so=0xc2a5d9a8, cmd=2151704858, data=0xc2649400 igb0, td=0xc29b8280) at /usr/src/sys/net/if.c:2523 #15 0xc0617395 in soo_ioctl (fp=0xc29ce310, cmd=2151704858, data=0xc2649400, active_cred=0xc254b100, td=0xc29b8280) at /usr/src/sys/kern/sys_socket.c:212 #16 0xc06113dd in kern_ioctl (td=0xc29b8280, fd=3, com=2151704858, data=0xc2649400 igb0) at file.h:262 #17 0xc0611564 in ioctl (td=0xc29b8280, uap=0xcce63cf8) at /usr/src/sys/kern/sys_generic.c:678 #18 0xc084e160 in syscall (frame=0xcce63d38) at /usr/src/sys/i386/i386/trap.c: #19 0xc082f6d1 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:264 #20 0x0033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) f 9 #9 0xc318c477 in e1000_clear_hw_cntrs_base_generic (hw=0xc2655004) at /usr/src/sys/modules/igb/../../dev/e1000/e1000_mac.c:643 643 E1000_READ_REG(hw, E1000_SYMERRS); (kgdb) list 638 void e1000_clear_hw_cntrs_base_generic(struct e1000_hw *hw) 639 { 640 DEBUGFUNC(e1000_clear_hw_cntrs_base_generic); 641 642 E1000_READ_REG(hw, E1000_CRCERRS); 643 E1000_READ_REG(hw, E1000_SYMERRS); 644 E1000_READ_REG(hw, E1000_MPC); 645 E1000_READ_REG(hw, E1000_SCC); 646 E1000_READ_REG(hw, E1000_ECOL); 647 E1000_READ_REG(hw, E1000_MCC); (kgdb) p *(struct e1000_osdep *)hw-back $6 = {mem_bus_space_tag = 1, mem_bus_space_handle = 3428495360, io_bus_space_tag = 0, io_bus_space_handle = 0, flash_bus_space_tag = 0, flash_bus_space_handle = 0, dev = 0xc261a600} (kgdb) p *hw [...] power_down = 0xc3186340 e1000_null_phy_generic}, type = e1000_phy_vf, [...] (kgdb) p (struct e1000_mac_info *)hw-mac.type $8 = (struct e1000_mac_info *) 0x1a (kgdb) p *(struct e1000_mac_info *)hw-mac $10 = {ops = {init_params = 0x8be58955, id_led_init = 0x80c70845, blink_led = 0x390, check_for_link = 0, check_mng_mode = 0x2d480c7, cleanup_led = 0, clear_hw_cntrs = 0x80c7, clear_vfta = 0x2d0, get_bus_info = 0, set_lan_id = 0x2c880c7, get_link_up_info = 0, led_on = 0xc766
Re: page fault in e1000_clear_hw_cntrs_base_generic() during SIOCAIFADDR
On 1 September 2010 21:31, Jack Vogel jfvo...@gmail.com wrote: LOL, if its the VF its pretty new code, PLEASE anyone, if this is the case make it clear in the title somewhere, ok? Thanks. Sure, this is the VF. I'm sorry, I didn't mention this directly. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 19 August 2010 20:56, pluknet pluk...@gmail.com wrote: On 19 August 2010 20:39, Andriy Gapon a...@icyb.net.ua wrote: on 10/08/2010 19:55 pluknet said the following: On 16 July 2010 19:47, Jung-uk Kim j...@freebsd.org wrote: The patch should apply fine on both sys/amd64/amd64/mp_machdep.c and sys/i386/i386/mp_machdep.c. http://people.freebsd.org/~jkim/mp_machdep2.diff Hi. Just checked on Xen HVM with 3 cores. 1) 8.1 unmodified: FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) 2) 8.1 + patch FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads WARNING: Non-uniform processors. WARNING: Using suboptimal topology. Can you debug, e.g. with printfs, what exactly goes wrong? I wonder if in this case code follows some unusual/unexpected path. Sorry, I'm a bit busy right now. I hope to debug this somewhere in the next week. First, sorry for late replay, and thanks Andriy for kicking me ;) Something really weird there . topo_probe_0xb() falls early on 1st iteration back to topo_probe_0x4(). topo_probe_0x4() returns incorrect data as well. topo_probe: cpu_high = b topo_probe: cpu_vendor_id = 8086 topo_probe_0xb: i = 0, p[1] = 0 topo_probe_0x4: cpu_procinfo = 200800 topo_probe_0x4: cpu_logical = 32 topo_probe_0x4: i = 0, type = 1 topo_probe_0x4: i = 0, level = 1 topo_probe_0x4: i = 0, logical = 1 topo_probe_0x4: i = 0, cores = 16 topo_probe_0x4: i = 1, type = 2 topo_probe_0x4: i = 1, level = 1 topo_probe_0x4: i = 1, logical = 1 topo_probe_0x4: i = 1, cores = 16 topo_probe_0x4: i = 2, type = 3 topo_probe_0x4: i = 2, level = 2 topo_probe_0x4: i = 2, logical = 1 topo_probe_0x4: i = 2, cores = 16 topo_probe#1: mp_ncpus = 3 topo_probe#1: cpu_cores = 1 topo_probe#1: cpu_logical = 32 topo_probe#1: hyperthreading_cpus = 32 topo_probe#2: mp_ncpus = 3 topo_probe#2: cpu_cores = 1 topo_probe#2: cpu_logical = 32 topo_probe#2: hyperthreading_cpus = 32 %%% static void topo_probe_0x4(void) { u_int p[4]; int cores; int i; int level; int logical; int type; cpu_logical = (cpu_feature CPUID_HTT) != 0 ? (cpu_procinfo CPUID_HTT_CORES) 16 : 1; printf(topo_probe_0x4: cpu_procinfo = %x\n, cpu_procinfo); printf(topo_probe_0x4: cpu_logical = %d\n, cpu_logical); if (cpu_logical == 1) { cpu_cores = 1; return; } /* We only support three levels for now. */ for (i = 0; i 3; i++) { cpuid_count(0x04, i, p); type = p[0] 0x1f; printf(topo_probe_0x4: i = %d, type = %d\n, i, type); level = (p[0] 5) 0x7; printf(topo_probe_0x4: i = %d, level = %d\n, i, level); logical = ((p[0] 14) 0xfff) + 1; printf(topo_probe_0x4: i = %d, logical = %d\n, i, logical); cores = ((p[0] 26) 0x3f) + 1; printf(topo_probe_0x4: i = %d, cores = %d\n, i, cores); if (type == 0) break; if (level == 1 cpu_logical == logical * cores) { cpu_cores = cores; cpu_logical = logical; break; } } if (cpu_cores == 0) cpu_cores = 1; if (cpu_logical 1) hyperthreading_cpus = logical_cpus = cpu_logical; } static void topo_probe_0xb(void) { u_int p[4]; int bits; int cnt; int i; int logical; int type; int x; /* We only support three levels for now. */ for (i = 0; i 3; i++) { cpuid_count(0x0b, i, p); /* * Fall back if it is not really supported. */ if (i == 0 p[1] == 0) { printf(topo_probe_0xb: i = %d, p[1] = %d\n, i, p[1]); topo_probe_0x4(); return; } [...] } static void topo_probe(void) { static int cpu_topo_probed = 0; if (cpu_topo_probed) return; printf(topo_probe: cpu_high = %x\n, cpu_high); printf(topo_probe: cpu_vendor_id = %x\n, cpu_vendor_id); logical_cpus = logical_cpus_mask = 0; if (cpu_vendor_id == CPU_VENDOR_AMD) topo_probe_amd(); else if (cpu_vendor_id == CPU_VENDOR_INTEL) { if (cpu_high = 0xb) topo_probe_0xb(); else if (cpu_high = 0x4) topo_probe_0x4(); } printf(topo_probe#1: mp_ncpus = %d\n, mp_ncpus); printf(topo_probe#1: cpu_cores = %d\n, cpu_cores); printf(topo_probe#1: cpu_logical = %d\n, cpu_logical); printf(topo_probe#1: hyperthreading_cpus = %d\n, hyperthreading_cpus); if (cpu_cores == 0
Re: svn commit: r209611 - head/sys/dev/e1000
On 18 August 2010 14:52, pluknet pluk...@gmail.com wrote: On 17 August 2010 20:27, Jack Vogel jfvo...@gmail.com wrote: Cool the first person to actually try and use it :) Yes, there's one key thing you have to do right now that's not documented, because of the simplistic PCI structure the guest has the kernel blacklists it from using MSIX. SO, what you need to do is set the honor_blacklist (that's not the complete string, use sysctl -a |grep blacklist to find it) and set that to 0. It needs to be set at boot. That should get you running. Jack Nice, thanks! It works! By the way, Sometimes after boot I have to kldreload if_igb.ko several times until watchdog go to sleep, so traffic starts flowing. igb0: Watchdog timeout -- resetting igb0: Queue(0) tdh = 1, hw tdt = 1 igb0: TX(0) desc avail = 1023,Next TX to Clean = 0 igb0: Watchdog timeout -- resetting igb0: Queue(0) tdh = 3, hw tdt = 3 igb0: TX(0) desc avail = 1021,Next TX to Clean = 0 igb0: Watchdog timeout -- resetting igb0: Queue(0) tdh = 6, hw tdt = 6 igb0: TX(0) desc avail = 1018,Next TX to Clean = 0 igb0: detached igb0: Intel(R) PRO/1000 Network Connection version - 2.0.1 mem 0xf202-0xf2023fff,0xf2024000-0xf2027fff at device 4.0 on pci0 igb0: Using MSIX interrupts with 3 vectors igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: Ethernet address: 76:99:ea:b0:e0:eb igb0: link state changed to UP stray irq0 stray irq0 igb0: Watchdog timeout -- resetting igb0: Queue(0) tdh = 3, hw tdt = 3 igb0: TX(0) desc avail = 1021,Next TX to Clean = 0 stray irq0 stray irq0 too many stray irq 0's: not logging anymore igb0: promiscuous mode enabled igb0: Watchdog timeout -- resetting igb0: Queue(0) tdh = 28, hw tdt = 28 igb0: TX(0) desc avail = 996,Next TX to Clean = 0 igb0: promiscuous mode disabled igb0: detached igb0: Intel(R) PRO/1000 Network Connection version - 2.0.1 mem 0xf202-0xf2023fff,0xf2024000-0xf2027fff at device 4.0 on pci0 igb0: Using MSIX interrupts with 3 vectors igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: Ethernet address: 76:99:ea:b0:e0:eb igb0: link state changed to UP dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection version - 2.0.1 dev.igb.0.%driver: igb dev.igb.0.%location: slot=4 function=0 handle=\_SB_.PCI0.S4__ dev.igb.0.%pnpinfo: vendor=0x8086 device=0x10ca subvendor=0x8086 subdevice=0xa03c class=0x02 dev.igb.0.%parent: pci0 dev.igb.0.nvm: -1 dev.igb.0.flow_control: 3 dev.igb.0.enable_aim: 1 dev.igb.0.rx_processing_limit: 100 dev.igb.0.link_irq: 0 dev.igb.0.dropped: 0 dev.igb.0.tx_dma_fail: 0 dev.igb.0.device_control: 0 dev.igb.0.rx_control: 0 dev.igb.0.interrupt_mask: 0 dev.igb.0.extended_int_mask: 0 dev.igb.0.tx_buf_alloc: 0 dev.igb.0.rx_buf_alloc: 0 dev.igb.0.fc_high_water: 58976 dev.igb.0.fc_low_water: 58960 dev.igb.0.queue0.txd_head: 424 dev.igb.0.queue0.txd_tail: 424 dev.igb.0.queue0.no_desc_avail: 0 dev.igb.0.queue0.tx_packets: 186 dev.igb.0.queue0.rxd_head: 758 dev.igb.0.queue0.rxd_tail: 758 dev.igb.0.queue0.rx_packets: 4855 dev.igb.0.queue0.rx_bytes: 316295 dev.igb.0.queue0.lro_queued: 0 dev.igb.0.queue0.lro_flushed: 0 dev.igb.0.queue1.txd_head: 0 dev.igb.0.queue1.txd_tail: 0 dev.igb.0.queue1.no_desc_avail: 0 dev.igb.0.queue1.tx_packets: 0 dev.igb.0.queue1.rxd_head: 0 dev.igb.0.queue1.rxd_tail: 1023 dev.igb.0.queue1.rx_packets: 0 dev.igb.0.queue1.rx_bytes: 0 dev.igb.0.queue1.lro_queued: 0 dev.igb.0.queue1.lro_flushed: 0 dev.igb.0.mac_stats.good_pkts_recvd: 0 dev.igb.0.mac_stats.good_pkts_txd: 0 dev.igb.0.mac_stats.good_octets_recvd: 0 dev.igb.0.mac_stats.good_octest_txd: 0 dev.igb.0.mac_stats.mcast_pkts_recvd: 0 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 19 August 2010 20:39, Andriy Gapon a...@icyb.net.ua wrote: on 10/08/2010 19:55 pluknet said the following: On 16 July 2010 19:47, Jung-uk Kim j...@freebsd.org wrote: The patch should apply fine on both sys/amd64/amd64/mp_machdep.c and sys/i386/i386/mp_machdep.c. http://people.freebsd.org/~jkim/mp_machdep2.diff Hi. Just checked on Xen HVM with 3 cores. 1) 8.1 unmodified: FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) 2) 8.1 + patch FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads WARNING: Non-uniform processors. WARNING: Using suboptimal topology. Can you debug, e.g. with printfs, what exactly goes wrong? I wonder if in this case code follows some unusual/unexpected path. Sorry, I'm a bit busy right now. I hope to debug this somewhere in the next week. BTW, could you please also provide CPU name/model/features as detected by the kernel? Sure. CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2763.12-MHz 686-class CPU) Origin = GenuineIntel Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0x1781fbbfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,MMX,FXSR,SSE,SSE2,HTT Features2=0x80982201SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,b31 TSC: P-state invariant real memory = 4194304000 (4000 MB) avail memory = 3932786688 (3750 MB) ACPI APIC Table: Xen HVM FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 2 cpu2 (AP/HT): APIC ID: 4 Just a thought. # HTT might somehow correlate with current maxcpus limit (32). Thanks! -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 19 August 2010 21:27, Andriy Gapon a...@icyb.net.ua wrote: on 19/08/2010 19:56 pluknet said the following: CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2763.12-MHz 686-class CPU) Origin = GenuineIntel Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0x1781fbbfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,MMX,FXSR,SSE,SSE2,HTT Features2=0x80982201SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,b31 TSC: P-state invariant real memory = 4194304000 (4000 MB) avail memory = 3932786688 (3750 MB) ACPI APIC Table: Xen HVM FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 2 cpu2 (AP/HT): APIC ID: 4 Thanks! BTW, what does Intel's code report? Jung-uk's convenience script: http://people.freebsd.org/~jkim/cpu_topology-12212009.sh Software visible enumeration in the system: Number of logical processors visible to the OS: 3 Number of logical processors visible to this process: 3 Number of processor cores visible to this process: 3 Number of physical packages visible to this process: 1 Hierarchical counts by levels of processor topology: # of cores in package 0 visible to this process: 3 . Affinity masks per SMT thread, per core, per package: Individual: P:0, C:0, T:0 -- 1 Core-aggregated: P:0, C:0 -- 1 Individual: P:0, C:1, T:0 -- 2 Core-aggregated: P:0, C:1 -- 2 Individual: P:0, C:2, T:0 -- 4 Core-aggregated: P:0, C:2 -- 4 Pkg-aggregated: P:0 -- 7 APIC ID listings from affinity masks OS cpu 0, Affinity mask 01 - apic id 0 OS cpu 1, Affinity mask 02 - apic id 2 OS cpu 2, Affinity mask 04 - apic id 4 Package 0 Cache and Thread details L1D is Level 1 Data cache, size(KBytes)= 32, Cores/cache= 1, Caches/package= 3 L1I is Level 1 Instruction cache, size(KBytes)= 32, Cores/cache= 1, Caches/package= 3 L2 is Level 2 Unified cache, size(KBytes)= 256, Cores/cache= 1, Caches/package= 3 L3 is Level 3 Unified cache, size(KBytes)= 8192, Cores/cache= 1, Caches/package= 3 ++++ Cache | L1D| L1D| L1D| Size | 32K| 32K| 32K| OScpu#| 0| 1| 2| Core | c0| c1| c2| AffMsk| 1| 2| 4| ++++ Cache | L1I| L1I| L1I| Size | 32K| 32K| 32K| ++++ Cache | L2| L2| L2| Size |256K|256K|256K| ++++ Cache | L3| L3| L3| Size | 8M| 8M| 8M| ++++ Combined socket AffinityMask= 0x7 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 19 August 2010 21:26, Jung-uk Kim j...@freebsd.org wrote: On Thursday 19 August 2010 12:56 pm, pluknet wrote: On 19 August 2010 20:39, Andriy Gapon a...@icyb.net.ua wrote: on 10/08/2010 19:55 pluknet said the following: On 16 July 2010 19:47, Jung-uk Kim j...@freebsd.org wrote: The patch should apply fine on both sys/amd64/amd64/mp_machdep.c and sys/i386/i386/mp_machdep.c. http://people.freebsd.org/~jkim/mp_machdep2.diff Hi. Just checked on Xen HVM with 3 cores. 1) 8.1 unmodified: FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) 2) 8.1 + patch FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads WARNING: Non-uniform processors. WARNING: Using suboptimal topology. Can you debug, e.g. with printfs, what exactly goes wrong? I wonder if in this case code follows some unusual/unexpected path. Sorry, I'm a bit busy right now. I hope to debug this somewhere in the next week. BTW, could you please also provide CPU name/model/features as detected by the kernel? Sure. CPU: Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (2763.12-MHz 686-class CPU) Origin = GenuineIntel Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0x1781fbbfFPU,VME,DE,PSE,TSC,MSR,MCE,CX8,APIC,SEP,MTRR,PG E,MCA,CMOV,PAT,MMX,FXSR,SSE,SSE2,HTT Features2=0x80982201SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,b31 TSC: P-state invariant real memory = 4194304000 (4000 MB) avail memory = 3932786688 (3750 MB) ACPI APIC Table: Xen HVM FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads cpu0 (BSP): APIC ID: 0 cpu1 (AP/HT): APIC ID: 2 cpu2 (AP/HT): APIC ID: 4 Just a thought. # HTT might somehow correlate with current maxcpus limit (32). One thing I am not sure is whether those CPUID instructions are executed on *real* CPUs or translated in HVM. I may add only that b31 of Features2 presents only in Xen HVM environment, and its role is afaik to indicate a Xen guest mode. There is no any mention of this bit in the latest Intel doc (ie it's reserved/unused). Also, at least NetBSD has a special handling of this bit. See commit log for CPUID2_RAZ in sys/arch/x86/include/specialreg.h, 1.37 FWIW RAZ states for reserved and zero or so. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: svn commit: r209611 - head/sys/dev/e1000
On 17 August 2010 20:27, Jack Vogel jfvo...@gmail.com wrote: Cool the first person to actually try and use it :) Yes, there's one key thing you have to do right now that's not documented, because of the simplistic PCI structure the guest has the kernel blacklists it from using MSIX. SO, what you need to do is set the honor_blacklist (that's not the complete string, use sysctl -a |grep blacklist to find it) and set that to 0. It needs to be set at boot. That should get you running. Jack Nice, thanks! It works! On Tue, Aug 17, 2010 at 8:18 AM, pluknet pluk...@gmail.com wrote: Hi, Jack. I set up qemu-kvm on openSUSE 11.3 with 82576 PCI device as you described. Guest fails to attach with: igb0: Intel(R) PRO/1000 Network Connection version - 2.0.1 mem 0xf206-0xf2063fff,0xf2064000-0xf2067fff at device 5.0 on pci0 igb0: Unable to allocate bus resource: interrupt device_attach: igb0 attach returned 6 i...@pci0:0:5:0: class=0x02 card=0xa03c8086 chip=0x10ca8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 11[40] = MSI-X supports 3 messages in map 0x1c Did I missed something? -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: svn commit: r209611 - head/sys/dev/e1000
On 1 July 2010 02:13, Jack Vogel jfvo...@gmail.com wrote: On Wed, Jun 30, 2010 at 2:50 PM, Julian Elischer jul...@elischer.orgwrote: On 6/30/10 10:26 AM, Jack F Vogel wrote: Author: jfv Date: Wed Jun 30 17:26:47 2010 New Revision: 209611 URL: http://svn.freebsd.org/changeset/base/209611 Log: SR-IOV support added to igb What this provides is support for the 'virtual function' interface that a FreeBSD VM may be assigned from a host like KVM on Linux, or newer versions of Xen with such support. When the guest is set up with the capability, a special limited function 82576 PCI device is present in its virtual PCI space, so with this driver installed in the guest that device will be detected and function nearly like the bare metal, as it were. The interface is only allowed a single queue in this configuration however initial performance tests have looked very good. Enjoy!! do these extra devices turn up in a standard ifconfig output? in other words, can we assign them to jails using vimage? They only show up if configured in the PF host, for instance if using Linux and KVM (I did develop and test with Fedora 13) you must load the igb driver there specifying that you want vf's created and how many. Next in the management of the guest you need to assign one of these vf devices to the guest. After you do all that and load this igb driver then yes, it will look just like a standard igbX device. Hi, Jack. I set up qemu-kvm on openSUSE 11.3 with 82576 PCI device as you described. Guest fails to attach with: igb0: Intel(R) PRO/1000 Network Connection version - 2.0.1 mem 0xf206-0xf2063fff,0xf2064000-0xf2067fff at device 5.0 on pci0 igb0: Unable to allocate bus resource: interrupt device_attach: igb0 attach returned 6 i...@pci0:0:5:0:class=0x02 card=0xa03c8086 chip=0x10ca8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' class = network subclass = ethernet cap 11[40] = MSI-X supports 3 messages in map 0x1c Did I missed something? -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 8.1 make world FAILED!
On 9 August 2010 19:55, James Chang james.tech...@gmail.com wrote: Dear Sir, I install FreeBSD 8.1 RELEASE today and want use make world to upgrade to 8.1-STABLE, [...] === lib/libc (install) install -C -o root -g wheel -m 444 libc.a /usr/lib install -C -o root -g wheel -m 444 libc_p.a /usr/lib install -s -o root -g wheel -m 444 -fschg -S libc.so.7 /lib install: rename: /lib/i...@dhuu to /lib/libc.so.7: Operation not permitted *** Error code 71 Stop in /usr/src/lib/libc. *** Error code 1 Stop in /usr/src/lib. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. *** Error code 1 Stop in /usr/src. Are there smothing wrong? Could someone give me a hand? You could see that with DESTDIR mounted over NFS, btw.. Just my 2c. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 16 July 2010 19:47, Jung-uk Kim j...@freebsd.org wrote: The patch should apply fine on both sys/amd64/amd64/mp_machdep.c and sys/i386/i386/mp_machdep.c. http://people.freebsd.org/~jkim/mp_machdep2.diff Hi. Just checked on Xen HVM with 3 cores. 1) 8.1 unmodified: FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) 2) 8.1 + patch FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 0 package(s) x 1 core(s) x 32 HTT threads WARNING: Non-uniform processors. WARNING: Using suboptimal topology. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: uma_ref_cnt: vm_fault: fault on nofault entry
On 8 July 2010 15:34, pluknet pluk...@gmail.com wrote: Hi. 8.0-RELEASE under xen h/w mode. This is the first time I'm seeing this panic (uptime ~2 weeks). http://img38.imageshack.us/img38/9681/screenshot1mm.png Below is transcribed via OCR from a vnc session. db bt^M Tracing pid 18908 tid 101404 td 0xc93f9Z40^M kdb_enter(c0c8dceZ,c0c8dceZ,c0cab939,eb0aZ7e4,Z,...) at kdb_entert0x3a panic(c0cab939,cl4b3000,l,eb0aZ914,eb0aZ904,...) at panict0xl36 vm_fault(cl49,cl4b3000,l,0,c76f3400,...) at vm_fault*0xl97 trap_pfault(8,eb0aZal0,8538,eb0aZal0,cfeeaaa0,...) at trap_pfaultt0xZ0e trap(eb0aZa30) at trapt0x455 calltrap() at calltrap╕0x6^M -- trap 0XC, eip = 0xc0af336b, esp = 0xeb0aZa70, ebp = 0xeb0aZa78 ---^M uma_find_refcnt(cl47a000,c7elc000,3,Z,c7elc000,...) at uma_find_refcntt0x5b mb_ctor_clust(c7elc000,1000,cf87ce00,Z,f8,...) at mb_ctor_clustt0x9c uma_zalloc_arg(cl47a000,cf87ce00,Z,dd0001,c798Z400,...) at uma_zalIoc_argt0x8a m_getmZ(0,790c,Z,l,0,...) at m_getmZt0xb3 m_uiotombuf(eb0aZc58,Z,790c,0,0,...) at m_uiotombuf^0x77 sosend_generic(cf4c44d4,0,eb0a2c58,0,0,...) at sosend_generic*0x525 sosend(cf4c44d4,0,eb0aZc58,0,0,...) at sosendt0x3f^M soo_write(d004d508,eb0aZc58,cdd4f600,0,c93f9Z40,...) at soo_uritet0x63 dofilewrite(eb0aZc58,,,0,d004d508,...) at dofilewrite*0x97 kern_writev(c93f9Z40,6,eb0aZc58,eb0aZc78,l,...) at kern_w|titevt0x58 write(c93f9Z40,eb0aZcf8,c,c0890ce8,46,...) at urite*0x4f syscall(eb0aZd38) at syscal+0x3Z5 Xint0x80_syscall() at Xint0x80_syscal+0x20^M --- syscall (4, FreeBSD ELF3Z, urite), eip = 0xZ85d4Z53, esp = 0xbfbfealc, ebp =^M 0xbfbfea38 --- The second panic seen today on the same box at uma_find_refcnt(). I'm unsure if it might be caused by Xen hvm setup. db bt Tracing pid 12 tid 100035 td 0xc776f480 kdb_enter(c0c8dce2,c0c8dce2,c0cab939,c73678b8,0,...) at kdb_enter+0x3a panic(c0cab939,c14b3000,1,c73679e8,c73679d8,...) at panic+0x136 vm_fault(c149,c14b3000,1,0,c7a81760,...) at vm_fault+0x197 trap_pfault(c7bb1748,c7367aa0,c7f7e7bf,c7bb1700,c755c7f8,...) at trap_pfault+0x20e trap(c7367b04) at trap+0x455 calltrap() at calltrap+0x6 --- trap 0xc, eip = 0xc0af336b, esp = 0xc7367b44, ebp = 0xc7367b4c --- uma_find_refcnt(c147f700,cbe96800,1,1,cbe96800,...) at uma_find_refcnt+0x5b mb_ctor_clust(cbe96800,800,c7f2a700,1,c0dd7b40,...) at mb_ctor_clust+0x9c uma_zalloc_arg(c147f700,c7f2a700,1,c776f000,0,...) at uma_zalloc_arg+0x8a igb_get_buf(c0dd6e40,c0dd6e40,c0dd6e40,c7367c38,c08a9d00,...) at igb_get_buf+0x146 igb_rxeof(c775f540,0,0,c775f5c0,c7762a00,...) at igb_rxeof+0x2b0 igb_msix_rx(c7760a00,0,109,e3c02e88,2a1bf,...) at igb_msix_rx+0x29 intr_event_execute_handlers(c755c7f8,c7762a00,c0c8ab0b,4f6,c7762a70,...) at intr_event_execute_handlers+0x14b ithread_loop(c7772930,c7367d38,0,0,0,...) at ithread_loop+0x6b fork_exit(c0861b20,c7772930,c7367d38) at fork_exit+0x91 fork_trampoline() at fork_trampoline+0x8 --- trap 0, eip = 0, esp = 0xc7367d70, ebp = 0 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
umount -f nfs forces to panic
(3327354752,0,260,645050409,2568,...) at sched_switch+633 mi_switch(260,0,0,3327331656,3956578892,...) at mi_switch+298 sleepq_switch(3327354752,0,3234646915,416,3327355184,...) at sleepq_switch+204 sleepq_catch_signals(0,3327354752,0,3956578960,3230024964,...) at sleepq_catch_signals+91 sleepq_timedwait_sig(3327155684,0,3234648462,257,0,...) at sleepq_timedwait_sig+28 _cv_timedwait_sig(3327155684,3327155664,30001,3310708352,3327354752,...) at _cv_timedwait_sig+420 seltdwait(3956579368,3956579376,3310708352,3327354752,104,...) at seltdwait+193 kern_select(3327354752,8,3217026356,0,0,3956579440,32,30,0) at kern_select+1262 select(3327354752,3956579576,12,3327354752,3956579628,...) at select+102 syscall(3956579640) at syscall+723 Xint0x80_syscall() at Xint0x80_syscall+32 --- syscall (93, FreeBSD ELF32, select), eip = 672695779, esp = 3217026188, ebp = 3217026504 --- Tracing command rpcbind pid 874 tid 100118 td 0xc6535780 sched_switch(3327350656,0,260,649400515,2568,...) at sched_switch+633 mi_switch(260,0,0,3327135048,3956742824,...) at mi_switch+298 sleepq_switch(3327350656,0,3234646915,416,3327351088,...) at sleepq_switch+204 sleepq_catch_signals(0,3327350656,3325702144,3956742892,3230024964,...) at sleepq_catch_signals+91 sleepq_timedwait_sig(3327284708,0,3234648462,257,0,...) at sleepq_timedwait_sig+28 _cv_timedwait_sig(3327284708,3327284688,30001,3310708352,3327350656,...) at _cv_timedwait_sig+420 seltdwait(3956743260,3956743268,3310708352,3327350656,3230602000,...) at seltdwait+193 poll(3327350656,3956743416,12,3327350656,3956743468,...) at poll+831 syscall(3956743480) at syscall+723 Xint0x80_syscall() at Xint0x80_syscall+32 --- syscall (209, FreeBSD ELF32, poll), eip = 672406303, esp = 3217017820, ebp = 3217026472 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 15 July 2010 23:18, Jung-uk Kim j...@freebsd.org wrote: On Thursday 15 July 2010 03:07 pm, Jung-uk Kim wrote: On Thursday 15 July 2010 01:56 pm, Andriy Gapon wrote: on 15/07/2010 19:57 Oliver Fromme said the following: In topo_probe(), cpu_high is 0xd, so topo_probe_0xb() is called. But the cpuid 0xb instruction doesn't seem to return useful data: All values are zero already in the first level, so cpu_cores remains 0. Back in topo_probe(), there is a fallback if cpu_cores is stil 0: It assigns mp_ncpu to cpu_cores, so it gets 8 which is wrong. I patched topo_probe() so it calls topo_probe_0x4() after topo_probe_0xb() if cpu_cores is still 0. I think this is a better fallback procedure. With this patch, cpu_cores gets the value 4 which is the correct one, finally: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) Thank you for debugging this issue! Not sure if this is the best patch that there can be, but its direction is definitely correct. As the Intel document says (translated to our x86 mp_machdep.c terms): if cpu_high = 0xb then we should execute cpuid_count(0xb, 0, p) and examine EBX value (p[1]), only if it's non-zero should we proceed with topo_probe_0xb(), otherwise we should fall back to topo_probe_0x4, etc. I think that your addition achieves this effect, perhaps just not as explicitly as I would preferred. Jung-uk, what do you think? Yes, you're right. Please try new patch: http://people.freebsd.org/~jkim/mp_machdep2.diff I uploaded the patch again, it's compile-tested this now. Just tried with the patch against 8.1-rc2. 2x E5520 - OK, no changes FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads 2x E5440 - now OK FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 2 package(s) x 4 core(s) 1x 5050 - OK, no changes FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 HTT threads -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 14 July 2010 18:14, Oliver Fromme o...@lurza.secnetix.de wrote: In a machine installed yesterday, 8.1-PRERELEASE doesn't seem to detect the number of CPU packages vs. cores per package correctly: | FreeBSD 8.1-PRERELEASE-20100713 #0: Tue Jul 13 19:51:18 UTC 2010 | [...] | CPU: Intel(R) Xeon(R) CPU L5408 @ 2.13GHz (2133.42-MHz K8-class CPU) | Origin = GenuineIntel Id = 0x1067a Family = 6 Model = 17 Stepping = 10 | Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE | Features2=0x40ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE | AMD Features=0x2800SYSCALL,LM | AMD Features2=0x1LAHF | TSC: P-state invariant | real memory = 34359738368 (32768 MB) | avail memory = 33151377408 (31615 MB) | ACPI APIC Table: IBM SERBLADE | FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs | FreeBSD/SMP: 1 package(s) x 8 core(s) Just for the reference, I collected CPU detection from various branches. 6.4 Cores per package: 4 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs 7.3 Cores per package: 4 FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs 8.1-rc FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 8 core(s) Indeed, looks like a regression. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.1-PRERELEASE: CPU packages not detected correctly
On 14 July 2010 18:14, Oliver Fromme o...@lurza.secnetix.de wrote: In a machine installed yesterday, 8.1-PRERELEASE doesn't seem to detect the number of CPU packages vs. cores per package correctly: | FreeBSD 8.1-PRERELEASE-20100713 #0: Tue Jul 13 19:51:18 UTC 2010 | [...] | CPU: Intel(R) Xeon(R) CPU L5408 @ 2.13GHz (2133.42-MHz K8-class CPU) | Origin = GenuineIntel Id = 0x1067a Family = 6 Model = 17 Stepping = 10 | Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE | Features2=0x40ce3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,XSAVE | AMD Features=0x2800SYSCALL,LM | AMD Features2=0x1LAHF | TSC: P-state invariant | real memory = 34359738368 (32768 MB) | avail memory = 33151377408 (31615 MB) | ACPI APIC Table: IBM SERBLADE | FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs | FreeBSD/SMP: 1 package(s) x 8 core(s) | cpu0 (BSP): APIC ID: 0 | cpu1 (AP): APIC ID: 1 | cpu2 (AP): APIC ID: 2 | cpu3 (AP): APIC ID: 3 | cpu4 (AP): APIC ID: 4 | cpu5 (AP): APIC ID: 5 | cpu6 (AP): APIC ID: 6 | cpu7 (AP): APIC ID: 7 | ioapic1 Version 2.0 irqs 24-47 on motherboard | ioapic0 Version 2.0 irqs 0-23 on motherboard I'm pretty sure that this is a 2 x 4 machine (2 CPU packages with 4 cores per package), not 1 x 8. That's what the BIOS displays during POST. Hi, can you show kern.sched.topology_spec ? It would clarify things a bit. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: new umass panic on 7-stable built today
+0x25 usbd_get_string_desc() at usbd_get_string_desc+0x9b usbd_get_string() at usbd_get_string+0x83 uhub_child_pnpinfo_str() at uhub_child_pnpinfo_str+0xd9 devaddq() at devaddq+0xd5 device_attach() at device_attach+0x13a usbd_new_device() at usbd_new_device+0x816 uhub_explore() at uhub_explore+0x1bd usb_discover() at usb_discover+0x38 usb_event_thread() at usb_event_thread+0x8a fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8161196d30, rbp = 0 --- db show msgbuf msgbufp = 0x80e20fe0 magic = 63062, size = 65504, r= 44209, w = 44746, ptr = 0x80e11000, cksum= 3347961 umass0: OCZ ATV, class 0/0, rev 2.00/1.10, addr 2 on uhub2 Fatal trap 12: page fault while in kernel mode cpuid = 4; apic id = 04 fault virtual address = 0x290 fault code = supervisor read data, page not present instruction pointer = 0x8:0x804a9d44 stack pointer = 0x10:0xff8161195db0 frame pointer = 0x10:0xff8161195df0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 47 (usb2) -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
uma_ref_cnt: vm_fault: fault on nofault entry
Hi. 8.0-RELEASE under xen h/w mode. This is the first time I'm seeing this panic (uptime ~2 weeks). http://img38.imageshack.us/img38/9681/screenshot1mm.png Below is transcribed via OCR from a vnc session. db bt^M Tracing pid 18908 tid 101404 td 0xc93f9Z40^M kdb_enter(c0c8dceZ,c0c8dceZ,c0cab939,eb0aZ7e4,Z,...) at kdb_entert0x3a panic(c0cab939,cl4b3000,l,eb0aZ914,eb0aZ904,...) at panict0xl36 vm_fault(cl49,cl4b3000,l,0,c76f3400,...) at vm_fault*0xl97 trap_pfault(8,eb0aZal0,8538,eb0aZal0,cfeeaaa0,...) at trap_pfaultt0xZ0e trap(eb0aZa30) at trapt0x455 calltrap() at calltrap╕0x6^M -- trap 0XC, eip = 0xc0af336b, esp = 0xeb0aZa70, ebp = 0xeb0aZa78 ---^M uma_find_refcnt(cl47a000,c7elc000,3,Z,c7elc000,...) at uma_find_refcntt0x5b mb_ctor_clust(c7elc000,1000,cf87ce00,Z,f8,...) at mb_ctor_clustt0x9c uma_zalloc_arg(cl47a000,cf87ce00,Z,dd0001,c798Z400,...) at uma_zalIoc_argt0x8a m_getmZ(0,790c,Z,l,0,...) at m_getmZt0xb3 m_uiotombuf(eb0aZc58,Z,790c,0,0,...) at m_uiotombuf^0x77 sosend_generic(cf4c44d4,0,eb0a2c58,0,0,...) at sosend_generic*0x525 sosend(cf4c44d4,0,eb0aZc58,0,0,...) at sosendt0x3f^M soo_write(d004d508,eb0aZc58,cdd4f600,0,c93f9Z40,...) at soo_uritet0x63 dofilewrite(eb0aZc58,,,0,d004d508,...) at dofilewrite*0x97 kern_writev(c93f9Z40,6,eb0aZc58,eb0aZc78,l,...) at kern_w|titevt0x58 write(c93f9Z40,eb0aZcf8,c,c0890ce8,46,...) at urite*0x4f syscall(eb0aZd38) at syscal+0x3Z5 Xint0x80_syscall() at Xint0x80_syscal+0x20^M --- syscall (4, FreeBSD ELF3Z, urite), eip = 0xZ85d4Z53, esp = 0xbfbfealc, ebp =^M 0xbfbfea38 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Panic on 8-STABLE at boot
On 29 June 2010 19:58, Mickaël Maillot mickael.mail...@gmail.com wrote: i've got a panic with zfs only machine so i decided to build a witness kernel and just after the first reboot: http://img190.imageshack.us/img190/3314/panicflowtable2.jpg the second boot: no prob uname -a FreeBSD cg196.security-mail.net 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #8: Tue Jun 29 14:30:12 CEST 2010 r...@cg196.security-mail.net:/usr/obj/usr/src/sys/SECUMAIL amd64 dmesg: http://freelooser.fr/freebsd/dmesg.cg196.2010-06-29.log conf:http://freelooser.fr/freebsd/SECUMAIL.txt This afaik was fixed a while ago in CURRENT: http://svn.freebsd.org/viewvc/base?view=revisionrevision=207303 I wonder it's not MFC'ed yet. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
pde.demotions counts even if pg_ps disabled
HI. This is 7.3-RELEASE-p1 right after boot. Looks like pmap_demote_pde is not properly protected w/ pg_ps_enabled. # sysctl vm.pmap vm.pmap.pmap_collect_active: 0 vm.pmap.pmap_collect_inactive: 0 vm.pmap.pv_entry_spare: 1421 vm.pmap.pv_entry_allocs: 686316 vm.pmap.pv_entry_frees: 675473 vm.pmap.pc_chunk_tryfail: 0 vm.pmap.pc_chunk_frees: 5392 vm.pmap.pc_chunk_allocs: 5465 vm.pmap.pc_chunk_count: 73 vm.pmap.pv_entry_count: 10843 vm.pmap.pde.promotions: 0 vm.pmap.pde.p_failures: 0 vm.pmap.pde.mappings: 0 vm.pmap.pde.demotions: 5 vm.pmap.shpgperproc: 200 vm.pmap.pv_entry_max: 4264612 vm.pmap.pg_ps_enabled: 0 # uptime 3:25PM up 19 mins, 2 users, load averages: 0.00, 0.00, 0.01 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ps doesn't respect -o cpu keyword
Hi. I noticed that ps on all supported releases (w/ ULE) shows the value of cpu keyword always as 0 (zero). COMMAND CPU kernel/swapper 0 kernel/firmware taskq 0 kernel/thread taskq 0 kernel/kqueue taskq 0 kernel/acpi_task_0 0 kernel/acpi_task_1 0 kernel/acpi_task_2 0 kernel/igb0 taskq 0 kernel/igb1 taskq 0 init0 g_event 0 g_up0 g_down 0 xpt_thrd0 sctp_iterator 0 pagedaemon 0 vmdaemon0 pagezero0 audit 0 idle/idle: cpu2 0 idle/idle: cpu1 0 P.S. That's only for ps; top is not affected. PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 11 root 171 ki31 0K24K RUN 0 117.2H 43.46% {idle: cpu0} 11 root 171 ki31 0K24K RUN 2 115.3H 40.28% {idle: cpu2} 11 root 171 ki31 0K24K CPU11 103.1H 36.67% {idle: cpu1} 12 root -68- 0K 184K CPU00 981:19 22.75% {irq260: igb1} 0 root -680 0K64K - 1 200:18 1.07% {igb1 taskq} 12 root -32- 0K 184K WAIT1 55:24 0.59% {swi4: clock} 12 root -68- 0K 184K WAIT2 310:31 0.29% {irq259: igb1} 12 root -40- 0K 184K WAIT0 5:48 0.20% {swi2: cambio} 4 root -8- 0K 8K - 2 26:21 0.00% g_down 64298 root 440 10748K 6776K select 2 23:06 0.00% snmpd 16 root 44- 0K 8K syncer 2 22:22 0.00% syncer 12 root -64- 0K 184K WAIT1 16:22 0.00% {irq32: sym0} 13 root 44- 0K 8K - 0 13:55 0.00% yarrow 12 root -44- 0K 184K WAIT2 11:39 0.00% {swi1: netisr 0} -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ps doesn't respect -o cpu keyword
On 29 June 2010 13:25, pluknet pluk...@gmail.com wrote: Hi. I noticed that ps on all supported releases (w/ ULE) shows the value of cpu keyword always as 0 (zero). Please, ignore my stupidity. Thought that's a CPU number. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
mfiutil create .. leads to deadlock in 6-STABLE
hi, I faced w/ subj. issue on IBM ServeRAID M5015 (LSISAS2108 SAS2.0 6Gbps). As I can see, lockup is caused by sleeping on sx lock after Giant was acquired. Can r160217 help me or am I go the wrong way? from r160217: Use a sleep mutex instead of an sx lock for the kernel environment. after `# mfiutil create raid5 8,9,10,11,12,13` db bt 924 Tracing pid 924 tid 100156 td 0xc6fcb340 sched_switch(c6fcb340,0,2) at sched_switch+0x15b mi_switch(2,0,c0ac4020,0,c09b478a,...) at mi_switch+0x270 critical_exit(1,c6fcb340,c677d88a,b,a,...) at critical_exit+0x8b intr_execute_handlers(c67ae678,ed2f16a8,10,ed2f1838,c09141e3,...) at intr_execute_handlers+0x129 lapic_handle_intr(36) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06d2c2f, esp = 0xed2f16ec, ebp = 0xed2f1838 --- vsscanf(c677d880,c09b523b,ed2f1864,ed2f1944,c06cd5c9,...) at vsscanf+0x123 sscanf(c677d880,c09b523b,ed2f1918,ed2f1874,ed2f18f8,...) at sscanf+0x12 res_find(ed2f19c8,0,c0993cb1,ed2f19dc,c09b5c62,0,0,0,0,0,0,ed2f19cc) at res_find+0x225 resource_find(ed2f19c8,0,c0993cb1,ed2f19dc,c09b5c62,0,0,0,0,0,0,ed2f19cc) at resource_find+0x3b resource_int_value(c0993cb1,0,c09b5c62,c6f5843c) at resource_int_value+0x32 device_probe_child(c695d200,c6f58400) at device_probe_child+0xc5 device_probe_and_attach(c6f58400) at device_probe_and_attach+0x7d bus_generic_attach(c695d200,c6f58400,c0993995,c6f58400,c6cb8800,...) at bus_generic_attach+0x16 mfi_add_ld_complete(c697b130,c697b130,c6cb8800,0,c6974400,...) at mfi_add_ld_complete+0xe5 mfi_add_ld(c6974400,0) at mfi_add_ld+0xcc mfi_ldprobe(c6974400) at mfi_ldprobe+0xfe mfi_check_command_post(c6974400,c697ad70,eae97080,c700d0c0,34) at mfi_check_command_post+0x1b6 mfi_user_command(c6974400,c700d0c0,2,c09b478a,27d,...) at mfi_user_command+0x1a4 mfi_ioctl(c6963900,c03c4366,c700d0c0,3,c6fcb340,...) at mfi_ioctl+0x790 devfs_ioctl_f(c70092d0,c03c4366,c700d0c0,c6d37000,c6fcb340) at devfs_ioctl_f+0xaf ioctl(c6fcb340,ed2f1d04) at ioctl+0x396 syscall(3b,3b,3b,bfbfeaa0,240,...) at syscall+0x22f Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x28150877, esp = 0xbfbfea8c, ebp = 0xbfbfeaf8 --- db all cpus in idle db show alllocks Process 924 (mfiutil) thread 0xc6fcb340 (100156) shared sx kernel environment r = 0 (0xc0ac2140) locked @ /usr/src/sys/kern/subr_hints.c:117 exclusive sleep mutex Giant r = 0 (0xc0ac4060) locked @ /usr/src/sys/dev/mfi/mfi.c:1329 exclusive sx MFI config r = 0 (0xc6974578) locked @ /usr/src/sys/dev/mfi/mfi.c:1737 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfiutil create .. leads to deadlock in 6-STABLE
On 8 June 2010 13:54, Garrett Cooper yanef...@gmail.com wrote: On Tue, Jun 8, 2010 at 2:30 AM, pluknet pluk...@gmail.com wrote: hi, I faced w/ subj. issue on IBM ServeRAID M5015 (LSISAS2108 SAS2.0 6Gbps). As I can see, lockup is caused by sleeping on sx lock after Giant was acquired. Can r160217 help me or am I go the wrong way? from r160217: Use a sleep mutex instead of an sx lock for the kernel environment. Where are you coming from :)? From large legacy land. Hi there. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfiutil create .. leads to deadlock in 6-STABLE
On 8 June 2010 13:30, pluknet pluk...@gmail.com wrote: hi, I faced w/ subj. issue on IBM ServeRAID M5015 (LSISAS2108 SAS2.0 6Gbps). Also, subj FreeBSD version has general instability namely w/ this controller (another issue). It locks up every time immediately after: 118Starting sshd. 118Starting cron. 118cron: can't open or create /var/run/cron.pid: Operation not supported 118Local package initialization: 118. 118Starting background file system checks in 60 seconds. ^^ here I've collected alltrace if that makes sense (attached). show alllocks produces nothing here. -- wbr, pluknet typescript Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 6.4-STABLE periodically stucks on boot around cpufreq:est
On 27 May 2010 13:00, pluknet pluk...@gmail.com wrote: Hi, When booting box with recent 6.4-STABLE often it's stuck while trying to switch context in sysctl dev.0.cpu.freq handler. fwiw, that is not an issue for 6.4-RELEASE. I'm going to binary search between them. AFAIK it stucks somewhere here: /etc/rc.d/initrandom # XXX temporary until we can improve the entropy # harvesting rate. # Entropy below is not great, but better than nothing. # This unblocks the generator at startup ( ps -fauxww; sysctl -a; date; df -ib; dmesg; ps -fauxww ) \ | dd of=/dev/random bs=8k 2/dev/null cat /bin/ls | dd of=/dev/random bs=8k 2/dev/null Some useful info (on successful boot): [r...@web72 ~]# sysctl dev.cpufreq dev.cpufreq.0.%driver: cpufreq dev.cpufreq.0.%parent: cpu0 [r...@web72 ~]# sysctl -a dev.cpu dev.cpu.0.%desc: ACPI CPU dev.cpu.0.%driver: cpu dev.cpu.0.%location: handle=\_PR_.CPU0 dev.cpu.0.%pnpinfo: _HID=none _UID=0 dev.cpu.0.%parent: acpi0 dev.cpu.0.freq: 3144 dev.cpu.0.freq_levels: 3144/-1 2751/-1 2358/-1 1965/-1 1572/-1 1179/-1 786/-1 393/-1 dev.cpu.0.cx_supported: C1/0 dev.cpu.0.cx_lowest: C1 dev.cpu.0.cx_usage: 100.00% dev.cpu.1.%desc: ACPI CPU dev.cpu.1.%driver: cpu dev.cpu.1.%location: handle=\_PR_.CPU1 dev.cpu.1.%pnpinfo: _HID=none _UID=0 dev.cpu.1.%parent: acpi0 dev.cpu.1.cx_supported: C1/0 dev.cpu.1.cx_lowest: C1 dev.cpu.1.cx_usage: 100.00% dev.cpu.2.%desc: ACPI CPU dev.cpu.2.%driver: cpu dev.cpu.2.%location: handle=\_PR_.CPU2 dev.cpu.2.%pnpinfo: _HID=none _UID=0 dev.cpu.2.%parent: acpi0 dev.cpu.2.cx_supported: C1/0 dev.cpu.2.cx_lowest: C1 dev.cpu.2.cx_usage: 100.00% dev.cpu.3.%desc: ACPI CPU dev.cpu.3.%driver: cpu dev.cpu.3.%location: handle=\_PR_.CPU3 dev.cpu.3.%pnpinfo: _HID=none _UID=0 dev.cpu.3.%parent: acpi0 dev.cpu.3.cx_supported: C1/0 dev.cpu.3.cx_lowest: C1 dev.cpu.3.cx_usage: 100.00% dev.cpu.4.%desc: ACPI CPU dev.cpu.4.%driver: cpu dev.cpu.4.%location: handle=\_PR_.CPU4 dev.cpu.4.%pnpinfo: _HID=none _UID=0 dev.cpu.4.%parent: acpi0 dev.cpu.4.cx_supported: C1/0 dev.cpu.4.cx_lowest: C1 dev.cpu.4.cx_usage: 100.00% dev.cpu.5.%desc: ACPI CPU dev.cpu.5.%driver: cpu dev.cpu.5.%location: handle=\_PR_.CPU5 dev.cpu.5.%pnpinfo: _HID=none _UID=0 dev.cpu.5.%parent: acpi0 dev.cpu.5.cx_supported: C1/0 dev.cpu.5.cx_lowest: C1 dev.cpu.5.cx_usage: 100.00% dev.cpu.6.%desc: ACPI CPU dev.cpu.6.%driver: cpu dev.cpu.6.%location: handle=\_PR_.CPU6 dev.cpu.6.%pnpinfo: _HID=none _UID=0 dev.cpu.6.%parent: acpi0 dev.cpu.6.cx_supported: C1/0 dev.cpu.6.cx_lowest: C1 dev.cpu.6.cx_usage: 100.00% dev.cpu.7.%desc: ACPI CPU dev.cpu.7.%driver: cpu dev.cpu.7.%location: handle=\_PR_.CPU7 dev.cpu.7.%pnpinfo: _HID=none _UID=0 dev.cpu.7.%parent: acpi0 dev.cpu.7.cx_supported: C1/0 dev.cpu.7.cx_lowest: C1 dev.cpu.7.cx_usage: 100.00% Related part of dmesg: FreeBSD 6.4-STABLE #3: Fri May 21 14:25:41 MSD 2010 acpi_alloc_wakeup_handler: can't alloc wake memory ACPI APIC Table: IBM SERVALNT Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU X5460 @ 3.16GHz (3158.77-MHz 686-class CPU) Origin = GenuineIntel Id = 0x10676 Stepping = 6 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xce3bdSSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA, b19 AMD Features=0x2000LM AMD Features2=0x1LAHF Cores per package: 4 real memory = 3221008384 (3071 MB) avail memory = 3146702848 (3000 MB) FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0 Version 2.0 irqs 0-23 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) hptrr: HPT RocketRAID controller driver v1.1 (May 11 2010 16:10:15) acpi0: IBM SERVALNT on motherboard acpi0: Power Button (fixed) Timecounter ACPI-fast frequency 3579545 Hz quality 1000 acpi_timer0: 24-bit timer at 3.579545MHz port 0x588-0x58b on acpi0 acpi_hpet0: High Precision Event Timer iomem 0xfed0-0xfed003ff on acpi0 Timecounter HPET frequency 14318180 Hz quality 900 cpu0: ACPI CPU on acpi0 acpi_throttle0: ACPI CPU Throttling on cpu0 cpu1: ACPI CPU on acpi0 acpi_throttle1: ACPI CPU Throttling on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 cpu2: ACPI CPU on acpi0 acpi_throttle2: ACPI CPU Throttling on cpu2 acpi_throttle2: failed to attach P_CNT device_attach: acpi_throttle2 attach returned 6 cpu3: ACPI CPU on acpi0 acpi_throttle3: ACPI CPU Throttling on cpu3 acpi_throttle3: failed to attach P_CNT device_attach: acpi_throttle3 attach
6.4-STABLE periodically stucks on boot around cpufreq:est
P_CNT device_attach: acpi_throttle5 attach returned 6 cpu6: ACPI CPU on acpi0 acpi_throttle6: ACPI CPU Throttling on cpu6 acpi_throttle6: failed to attach P_CNT device_attach: acpi_throttle6 attach returned 6 cpu7: ACPI CPU on acpi0 acpi_throttle7: ACPI CPU Throttling on cpu7 acpi_throttle7: failed to attach P_CNT device_attach: acpi_throttle7 attach returned 6 [...] Any hints? db ps pid ppid pgrp uid state wmesg wchancmd 686651 0 R+ CPU 255 sysctl 666451 0 S+ wait 0xc82e6648 sh 655951 0 S+ piperd 0xc85404c8 dd 645951 0 S+ wait 0xc82e6218 sh 595151 0 S+ wait 0xc82e6a78 sh 51 151 0 Ss+ wait 0xc852ec90 sh 50 0 0 0 SL sdflush 0xc0af5654 [softdepflush] 49 0 0 0 SL syncer 0xc0adc1bc [syncer] 48 0 0 0 SL vlruwt 0xc8292a78 [vnlru] 47 0 0 0 SL psleep 0xc0ae7da0 [bufdaemon] 46 0 0 0 RL [pagezero] 45 0 0 0 SL psleep 0xc0af6194 [vmdaemon] db bt 68 Tracing pid 68 tid 100053 td 0xc8291b60 sched_switch(c8291b60,0,1) at sched_switch+0x143 mi_switch(1,0,c8291cc0,0,c0adf600,...) at mi_switch+0x1ba sched_bind(c8291b60,0) at sched_bind+0x52 cpu_est_clockrate(0,e897bad4,c84c1400,3,c84c1400,...) at cpu_est_clockrate+0xc1 cf_levels_method(c8214900,c858f000,e897bb48) at cf_levels_method+0x303 cf_get_method(c8214900,c857f000) at cf_get_method+0x12b cpufreq_curr_sysctl(c8218e40,c81ea000,0,e897bc04,c8218e40,...) at cpufreq_curr_sysctl+0x81 sysctl_root(0,e897bc74,4,e897bc04) at sysctl_root+0x107 userland_sysctl(c8291b60,e897bc74,4,0,bfbfdbdc,0,0,0,e897bc70,0) at userland_sysctl+0x112 __sysctl(c8291b60,e897bd04) at __sysctl+0x93 syscall(3b,bfbf003b,bfbf003b,4,bfbfdbdc,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x2812650b, esp = 0xbfbfdb4c, ebp = 0xbfbfdb88 --- db bt 65 Tracing pid 65 tid 100051 td 0xc82e4000 sched_switch(c82e4000,0,1) at sched_switch+0x143 mi_switch(1,0,c85404c8,ec997bec,c06e32b9,...) at mi_switch+0x1ba sleepq_switch(c85404c8) at sleepq_switch+0x87 sleepq_wait_sig(c85404c8) at sleepq_wait_sig+0x1d msleep(c85404c8,c8540638,14c,c09d8540,0,...) at msleep+0x25a pipe_read(c8536ab0,ec997cbc,c80f5700,0,c82e4000) at pipe_read+0x42d dofileread(c82e4000,0,c8536ab0,ec997cbc,,...) at dofileread+0x85 kern_readv(c82e4000,0,ec997cbc,804f000,2000,...) at kern_readv+0x36 read(c82e4000,ec997d04) at read+0x45 syscall(3b,3b,3b,2004,bfbfeebc,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (3, FreeBSD ELF32, read), eip = 0x2813d8d7, esp = 0xbfbfee0c, ebp = 0xbfbfee58 --- db show allpcpu Current CPU: 1 cpuid= 0 curthread= 0xc80fa9c0: pid 18 swi4: clock sio curpcb = 0xe6ce4d90 fpcurthread = none idlethread = 0xc80fa820: pid 17 idle: cpu0 APIC ID = 0 currentldt = 0x50 cpuid= 1 curthread= 0xc80fa000: pid 16 idle: cpu1 curpcb = 0xe6cd2d90 fpcurthread = none idlethread = 0xc80fa000: pid 16 idle: cpu1 APIC ID = 1 currentldt = 0x50 cpuid= 2 curthread= 0xc80f8d00: pid 15 idle: cpu2 curpcb = 0xe6ccfd90 fpcurthread = none idlethread = 0xc80f8d00: pid 15 idle: cpu2 APIC ID = 2 currentldt = 0x50 cpuid= 3 curthread= 0xc80f8b60: pid 14 idle: cpu3 curpcb = 0xe6cccd90 fpcurthread = none idlethread = 0xc80f8b60: pid 14 idle: cpu3 APIC ID = 3 currentldt = 0x50 cpuid= 4 curthread= 0xc80f89c0: pid 13 idle: cpu4 curpcb = 0xe6cc9d90 fpcurthread = none idlethread = 0xc80f89c0: pid 13 idle: cpu4 APIC ID = 4 currentldt = 0x50 cpuid= 5 curthread= 0xc80f8820: pid 12 idle: cpu5 curpcb = 0xe6cc6d90 fpcurthread = none idlethread = 0xc80f8820: pid 12 idle: cpu5 APIC ID = 5 currentldt = 0x50 cpuid= 6 curthread= 0xc80f8680: pid 11 idle: cpu6 curpcb = 0xe6cc3d90 fpcurthread = none idlethread = 0xc80f8680: pid 11 idle: cpu6 APIC ID = 6 currentldt = 0x50 cpuid= 7 curthread= 0xc80f84e0: pid 10 idle: cpu7 curpcb = 0xe6cc0d90 fpcurthread = none idlethread = 0xc80f84e0: pid 10 idle: cpu7 APIC ID = 7 currentldt = 0x50 db bt 18 Tracing pid 18 tid 100015 td 0xc80fa9c0 sched_switch(c0b21c20,c093091c,0,0,0,...) at sched_switch+0x143 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic on zfs unmount
On 11 December 2009 23:28, Stefan Bethke s...@lassitu.de wrote: I still sometimes get the lost .zfs/snapshot directory, with resulting panic, and it just happened again. I have the full crash dump, if anyone wants to look at details. # cd /jail/foo/.zfs # ls ls: snapshot: Bad file descriptor # cd # zfs umount tank/jail/foo Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xa8 fault code = supervisor write data, page not present instruction pointer = 0x20:0x8033fac5 stack pointer = 0x28:0xff80626cf9d0 frame pointer = 0x28:0xff80626cf9e0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 38362 (zfs) trap number = 12 panic: page fault cpuid = 0 Uptime: 7d3h33m46s Physical memory: 3313 MB #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:223 #1 0x80337bd9 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0x8033802c in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:579 #3 0x805cc2ad in trap_fatal (frame=0xc, eva=Variable eva is not available. ) at /usr/src/sys/amd64/amd64/trap.c:857 #4 0x805cc694 in trap_pfault (frame=0xff80626cf920, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:773 #5 0x805cd06a in trap (frame=0xff80626cf920) at /usr/src/sys/amd64/amd64/trap.c:499 #6 0x805b2943 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #7 0x8033fac5 in _sx_xlock (sx=0x90, opts=0, file=0x80ac1d30 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c, line=1349) at atomic.h:158 #8 0x80a53b85 in zfsctl_umount_snapshots (vfsp=Variable vfsp is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ctldir.c:1349 #9 0x80a604f9 in zfs_umount (vfsp=0xff00017518d0, fflag=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1020 #10 0x803c080a in dounmount (mp=0xff00017518d0, flags=0, td=Variable td is not available. ) at /usr/src/sys/kern/vfs_mount.c:1294 #11 0x803c1038 in unmount (td=0xff002ed50720, uap=0xff80626cfbf0) at /usr/src/sys/kern/vfs_mount.c:1179 #12 0x805cc906 in syscall (frame=0xff80626cfc80) at /usr/src/sys/amd64/amd64/trap.c:989 #13 0x805b2c21 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:373 #14 0x000800f4ba4c in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) -- Stefan Bethke s...@lassitu.de Fon +49 151 14070811 Same trace, when trying to destroy pool with mounted snapshots. Seen on 7.3-amd64 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xc0 fault code = supervisor write data, page not present instruction pointer = 0x8:0x80543525 stack pointer = 0x10:0xff8107cd79c0 frame pointer = 0x10:0xff8107cd79d0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 50534 (zpool) db bt Tracing pid 50534 tid 100409 td 0xff005e3ab740 _sx_xlock() at _sx_xlock+0x15 zfsctl_umount_snapshots() at zfsctl_umount_snapshots+0xa5 zfs_umount() at zfs_umount+0xd0 dounmount() at dounmount+0x2c9 unmount() at unmount+0x30a syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (22, FreeBSD ELF64, unmount), rip = 0x801032cdc, rsp = 0x7fffaac8, rbp = 0x801302000 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD Release 8.0 floppy images
On 10 May 2010 06:05, Joseph Olatt j...@eskimo.com wrote: Hi, Are floppy images no longer supplied with FreeBSD 8? The folling link [1] in the Handbook: ftp://ftp.FreeBSD.org/pub/FreeBSD/releases/i386/8.0-RELEASE/floppies/ results in the following error: 550 /pub/FreeBSD/releases/i386/8.0-RELEASE/floppies/: No such file or directory As for floppies building, it was turned off intentionally starting from 8.0. See http://svn.freebsd.org/viewvc/base?view=revisionrevision=188437 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: em JumboFrame improovement and PCIe addon-card regression [Was: Re: em regression, UDP LOR followed by ssh stall]
On 17 April 2010 00:53, Jack Vogel jfvo...@gmail.com wrote: Why are you using ZERO_COPY_SOCKETS? And is this LOR happening on STABLE or CURRENT? I got exactly this and another similar LORs with GENERIC on head. The second: login: lock order reversal: 1st 0xff0002539a18 em0:rx(0) (em0:rx(0)) @ /home/svn/freebsd/head/sys/dev/e1000/if_em.c:1452 2nd 0x80e3a280 in_multi_mtx (in_multi_mtx) @ /home/svn/freebsd/head/sys/netinet/igmp.c:823 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a _witness_debugger() at _witness_debugger+0x2e witness_checkorder() at witness_checkorder+0x81e _mtx_lock_flags() at _mtx_lock_flags+0x78 igmp_input() at igmp_input+0x4ce ip_input() at ip_input+0xbc netisr_dispatch_src() at netisr_dispatch_src+0xb8 ether_demux() at ether_demux+0x17d ether_input() at ether_input+0x175 em_rxeof() at em_rxeof+0x165 em_handle_que() at em_handle_que+0x68 taskqueue_run() at taskqueue_run+0x91 taskqueue_thread_loop() at taskqueue_thread_loop+0x3f fork_exit() at fork_exit+0x12a fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff8bed30, rbp = 0 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 7.3, reboot after panic: double fault
. * * If TSO was active we either got an interface * without TSO capabilits or TSO was turned off. * Disable it for this connection as too and * immediatly retry with MSS sized segments generated * by this function. */ if (tso) tp-t_flags = ~TF_TSO; tcp_mtudisc(tp-t_inpcb, 0); return (0); But tcp_mtudisc() calls tcp_output(): tcpstat.tcps_mturesent++; tp-t_rtttime = 0; tp-snd_nxt = tp-snd_una; tcp_free_sackholes(tp); tp-snd_recover = tp-snd_max; if (tp-t_flags TF_SACK_PERMIT) EXIT_FASTRECOVERY(tp); tcp_output_send(tp); return (inp); I'm not sure why it's not able to figure out the MTU, perhaps folks on net@ can help. However, it would seem that for the tcp_output() case, tcp_mtudisc() should probably not call tcp_output_send(), but instead tcp_output() should just loop back up to the top after calling tcp_mtudisc() and retry. I'm afraid to be wrong but it looks similar to another report for 8.0-STABLE (may it be a cross-major version regression somewhere around tcp_mtudisc()?): http://lists.freebsd.org/pipermail/freebsd-stable/2010-April/056063.html -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic during work with jailed postgresql8.4
On 1 April 2010 22:18, Oleg Lomaka oleg.lom...@gmail.com wrote: Hello, I have a kernel panic when connect to postgresql8.4 server installed in one of jails from another jail. It's 100% reproducible. Also I have tried to connect from host machine to jailed pg server. That way it works fine without crash. Server configuration uses geli and zfs. Four disks encrypted using geli. And raidz2 is using ad8.eli, ad10.eli, ad12.eli, ad14.eli providers. All jails located at this raidz2 pool. Also I use ezjail for jails management. And it uses NFS to mount directories with base system. atal double fault rip = 0x8063510a rsp = 0xff80eaec5f50 rbp = 0xff80eaec6040 cpuid = 1; apic id = 02 panic: double fault cpuid = 1 Uptime: 7m11s Physical memory: 8169 MB uname -a FreeBSD cerberus.regredi.com 8.0-STABLE FreeBSD 8.0-STABLE #7 r206031: Thu Apr 1 13:43:57 EEST 2010 r...@cerberus.regredi.com:/usr/obj/usr/src/sys/GENERIC amd64 Link to dmesg.boot: http://docs.google.com/leaf?id=0B-irbkAqk9i7OGY2ZWJiODgtOWJmMy00NDQ1LTliZDctZjU3N2YwNmMxNjZlhl=en Link to kernel core backtrace: http://docs.google.com/Doc?docid=0AeirbkAqk9i7ZGc5Yzc2ZndfM2M4NzYydmRwhl=en Looking at backtrace, I wonder whether tp-t_maxseg changes in tcp_mtudisc() at all. You should be able to extract its value on each 2*n frame in that big recursive call. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: nmi_calltrap in siopoll()
kern/143521 2010/2/2 pluknet pluk...@gmail.com: Hi. I've got NMI on an almost idle system - FreeBSD 7.2-R amd64. I guess the reason may be in (hardware?) binary garbage seen once in a while on serial port (loader, then ttyd0). Ask me for more details. Tracing command swi4: clock sio pid 20 tid 100011 td 0xff000144e370 cpustop_handler() at cpustop_handler+64 ipi_nmi_handler() at ipi_nmi_handler+48 trap() at trap+796 nmi_calltrap() at nmi_calltrap+8 --- trap 19, rip = 18446744071567390785, rsp = 18446744067267268592, rbp = 18446744067267558272 --- _mtx_lock_spin() at _mtx_lock_spin+113 siopoll() at siopoll+206 ithread_loop() at ithread_loop+384 fork_exit() at fork_exit+287 fork_trampoline() at fork_trampoline+14 --- trap 0, rip = 0, rsp = 18446744067267558704, rbp = 0 --- (kgdb) thr 13 [Switching to thread 13 (Thread 100011)]#0 cpustop_handler () at atomic.h:264 264 atomic.h: No such file or directory. in atomic.h (kgdb) bt #0 cpustop_handler () at atomic.h:264 #1 0x807ec400 in ipi_nmi_handler () at /usr/src/sys/amd64/amd64/mp_machdep.c:1144 #2 0x807fceec in trap (frame=0xfffe80028f40) at /usr/src/sys/amd64/amd64/trap.c:198 #3 0x807e0aeb in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:427 #4 0x80513841 in _mtx_lock_spin (m=0x80bb3d00, tid=18446742974219215728, opts=Variable opts is not available. ) at /usr/src/sys/kern/kern_mutex.c:474 #5 0x8082b96e in siopoll (dummy=Variable dummy is not available. ) at /usr/src/sys/dev/sio/sio.c:1703 #6 0x804ff940 in ithread_loop (arg=0xff000142bac0) at /usr/src/sys/kern/kern_intr.c:1088 #7 0x804fc1df in fork_exit ( callout=0x804ff7c0 ithread_loop, arg=0xff000142bac0, frame=0xfffe8006fc80) at /usr/src/sys/kern/kern_fork.c:834 #8 0x807e0b5e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:455 #9 0x in ?? () #10 0x in ?? () #11 0x0001 in ?? () #12 0x in ?? () #13 0x in ?? () #14 0x in ?? () ---Type return to continue, or q return to quit---q Quit (kgdb) f 5 #5 0x8082b96e in siopoll (dummy=Variable dummy is not available. ) at /usr/src/sys/dev/sio/sio.c:1703 1703 mtx_lock_spin(sio_lock); (kgdb) i loc _tid = Variable _tid is not available. (kgdb) list 1698 com_events -= incc; 1699 mtx_unlock_spin(sio_lock); 1700 continue; 1701 } 1702 if (com-iptr != com-ibuf) { 1703 mtx_lock_spin(sio_lock); 1704 sioinput(com); 1705 mtx_unlock_spin(sio_lock); 1706 } 1707 if (com-state CS_CHECKMSR) { (kgdb) p sio_lock $1 = {lock_object = {lo_name = 0x80b15380 sio, lo_type = 0x80b15380 sio, lo_flags = 458752, lo_witness_data = { lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 18446742974219094608, mtx_recurse = 0} (kgdb) p/x sio_lock-mtx_lock $10 = 0xff0001430a50 # == td for pid 17 tid 18 Binary garbage is like below (not touching anything on k/board atm). login: FreeBSD/amd64 (ho FreeBSD/amd64 (host) (ttyd0) login: |Ч FreeBSD FreeBSD FreeBS FreeBSD Free FreeBSD/amd64 (host) (ttyd0) login: FreeBSD/amd64 (host) (ttyd0) login: FreeBSD Free FreeBS FreeBSD FreeBS FreeBSD FreeBSD FreeBS FreeBSD FreeBS FreeBSD FreeBSD FreeBS FreeBSD FreeBS═╗Hхи M5 FreeBSD FreeBS FreeBSD FreeBSD FreeBS FreeBSD FreeBS═╗Hхи M5 FreeBSD FreeBS FreeBSD FreeB FreeBSD/amd6 FreeBS FreeBS FreeBSD FreeBSD FreeBS FreeBSD FreeBS FreeBSD FreeBSD FreeBS FreeBSD FreeBS═╗Hхи M5 FreeBSD FreeBS FreeBSD [..a little later..] [r...@host ~]# 88 8 Other useful stuff. (kgdb) f 4 #4 0x80513841 in _mtx_lock_spin (m=0x80bb3d00, tid=18446742974219215728, opts=Variable opts is not available. ) at /usr/src/sys/kern/kern_mutex.c:474 474 while (m-mtx_lock != MTX_UNOWNED) { (kgdb) list 469 lock_profile_obtain_lock_failed(m-lock_object, contested, waittime); 470 while (!_obtain_lock(m, tid)) { 471 472 /* Give interrupts a chance while we spin. */ 473 spinlock_exit(); 474 while (m-mtx_lock != MTX_UNOWNED) { 475 if (i++ 1000) { 476 cpu_spinwait(); 477 continue; 478 } (kgdb) f 3 #3 0x807e0aeb in nmi_calltrap () at /usr/src/sys/amd64/amd64/exception.S:427 427 call trap Current language: auto; currently asm (kgdb) list 422
nmi_calltrap in siopoll()
= 0xff000142e8f0, le_prev = 0xff00014459d8}, p_pptr = 0x80b64640, p_sibling = {le_next = 0xff000142e8f0, le_prev = 0xff00014459f0}, p_children = {lh_first = 0x0}, p_mtx = { lock_object = {lo_name = 0x808da296 process lock, lo_type = 0x808da296 process lock, lo_flags = 21168128, lo_witness_data = {lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}, mtx_lock = 4, mtx_recurse = 0}, p_ksi = 0x0, p_sigqueue = {sq_signals = { __bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = { tqh_first = 0x0, tqh_last = 0xff000142e5e8}, sq_proc = 0xff000142e478, sq_flags = 1}, p_oppid = 0, ---Type return to continue, or q return to quit--- p_vmspace = 0x80b64e00, p_swtick = 0, p_realtimer = {it_interval = { tv_sec = 0, tv_usec = 0}, it_value = {tv_sec = 0, tv_usec = 0}}, p_ru = { ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 0, ru_nivcsw = 0}, p_rux = {rux_runtime = 2063660005807206, rux_uticks = 0, rux_sticks = 137401258, rux_iticks = 0, rux_uu = 0, rux_su = 371936150861, rux_tu = 1034416043011}, p_crux = {rux_runtime = 0, rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, p_textvp = 0x0, p_lock = 0 '\0', p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0x0, p_suspcount = 0, p_xthread = 0x0, p_boundary_count = 0, p_pendingcnt = 0, p_itimers = 0x0, p_numupcalls = 0, p_upsleeps = 0, p_completed = 0x0, p_nextupcall = 0, p_upquantum = 0, p_magic = 3203398350, p_osrel = 702000, p_comm = idle: cpu1\000\000\000\000\000\000\000\000\000, p_pgrp = 0x80b65060, p_sysent = 0x80ad4d80, p_args = 0x0, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0x804f65d0 knlist_mtx_lock, ---Type return to continue, or q return to quit--- kl_unlock = 0x804f5ff0 knlist_mtx_unlock, kl_locked = 0x804f5fd0 knlist_mtx_locked, kl_lockarg = 0xff000142e590}, p_numthreads = 1, p_md = incomplete type, p_itcallout = {c_links = {sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_arg = 0x0, c_func = 0, c_mtx = 0x0, c_flags = 16}, p_acflag = 1, p_peers = 0x0, p_leader = 0xff000142e478, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xff000142e8f0, p_ktr = {stqh_first = 0x0, stqh_last = 0xff000142e8d0}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0x0} db show allpcpu Current CPU: 1 cpuid= 0 curthread= 0xff00014306e0: pid 18 idle: cpu0 curpcb = 0xfffe80064d40 fpcurthread = none idlethread = 0xff00014306e0: pid 18 idle: cpu0 cpuid= 1 curthread= 0xff0001430a50: pid 17 idle: cpu1 curpcb = 0xfffe8005fd40 fpcurthread = none idlethread = 0xff0001430a50: pid 17 idle: cpu1 cpuid= 2 curthread= 0xff000143c000: pid 16 idle: cpu2 curpcb = 0xfffe8005ad40 fpcurthread = none idlethread = 0xff000143c000: pid 16 idle: cpu2 cpuid= 3 curthread= 0xff000143c370: pid 15 idle: cpu3 curpcb = 0xfffe80055d40 fpcurthread = none idlethread = 0xff000143c370: pid 15 idle: cpu3 cpuid= 4 curthread= 0xff000143c6e0: pid 14 idle: cpu4 curpcb = 0xfffe80050d40 fpcurthread = none idlethread = 0xff000143c6e0: pid 14 idle: cpu4 cpuid= 5 curthread= 0xff000144e370: pid 20 swi4: clock sio curpcb = 0xfffe8006fd40 fpcurthread = none idlethread = 0xff000142f000: pid 13 idle: cpu5 cpuid= 6 curthread= 0xff000142f370: pid 12 idle: cpu6 curpcb = 0xfffe80046d40 fpcurthread = none idlethread = 0xff000142f370: pid 12 idle: cpu6 cpuid= 7 curthread= 0xff000142f6e0: pid 11 idle: cpu7 curpcb = 0xfffe80041d40 fpcurthread = none idlethread = 0xff000142f6e0: pid 11 idle: cpu7 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: LOR - 8.0-STABLE r202128
2010/1/13 Gardner Bell gbel...@rogers.com: I got this lock order reversal while running a windows executable through wine. I'm guess that is a regression w.r.t S/G pager, which uses kmem_alloc/free with vm_object locked and doesn't respect vm_map locks can sleep. I'm curious it was back order before 5.1-R. vm_object.c 741:/* 742: * Let the pager know object is dead. 743: */ 744:vm_pager_deallocate(object); 745:VM_OBJECT_UNLOCK(object); lock order reversal: 1st 0xc5e757f8 vm object (standard object) @ /usr/src/sys/vm/vm_object.c:482 2nd 0xc1c900e8 system map (system map) @ /usr/src/sys/vm/vm_map.c:2772 KDB: stack backtrace: db_trace_self_wrapper(c07632b0,e955d8a4,c05c7496,c05bbe7f,c076508c,...) at db_trace_self_wrapper+0x26 kdb_backtrace(c05bbe7f,c076508c,c0923520,c48dd278,e955d8fc,...) at kdb_backtrace+0x29 _witness_debugger(c076508c,c1c900e8,c0765236,c48d9110,c077536f,...) at _witness_debugger+0x1e witness_checkorder(c1c900e8,9,c077536f,ad4,0,...) at witness_checkorder+0x697 _mtx_lock_flags(c1c900e8,0,c077536f,ad4,c6089000,...) at _mtx_lock_flags+0x36 _vm_map_lock(c1c9008c,c077536f,ad4,c66437c4,c5e757f8,...) at _vm_map_lock+0x31 vm_map_remove(c1c9008c,c6087000,c6089000,e955d988,c06c0417,...) at vm_map_remove+0x2a kmem_free(c1c9008c,c6087000,2000,e955d9a0,c06c1000,...) at kmem_free+0x30 page_free(c6087000,2000,22,2000,e955d9b8,...) at page_free+0x46 uma_large_free(c66437c4,c4ec09a4,c1c8c014,c5e757f8,e955d9c8,...) at uma_large_free+0x87 free(c6087000,c07aae1c,e955d9e0,c06bb552,c6087000,...) at free+0xb8 sglist_free(c6087000,c598d7e0,0) at sglist_free+0x28 sg_pager_dealloc(c5e757f8,e955da14,c06d1587,c5e757f8,0,...) at sg_pager_dealloc+0x69 vm_pager_deallocate(c5e757f8,0,c0775baa,2dc,0,...) at vm_pager_deallocate+0x1a vm_object_terminate(c5e757f8,0,c0775baa,1e2,e955da40,...) at vm_object_terminate+0x171 vm_object_deallocate(c5e757f8,c077536f,ad7,c5548b40,0,...) at vm_object_deallocate+0x4ae _vm_map_unlock(c513d740,c077536f,ad7,1,c513d740,...) at _vm_map_unlock+0x74 vm_map_remove(c513d740,0,bfc0,0,c54e1550,...) at vm_map_remove+0x69 vmspace_exit(c4ec0900,0,c075d9eb,12d,c07a6e7c,...) at vmspace_exit+0xbc exit1(c4ec0900,f,c0761073,b15,1,...) at exit1+0x4f3 sigexit(c4ec0900,f,c0761073,aa5,e955dcdc,...) at sigexit+0x9f2 postsig(f,64,c07644d3,e8,c54e1550,...) at postsig+0x1b6 ast(e955dd38) at ast+0x308 doreti_ast() at doreti_ast+0x17 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: driver bug: Unable to set devclass (devname: (null))
2010/1/8 martinko gam...@users.sf.net: Hi, This is 8.0-RELEASE-p2 and I see the following during boot: Jan 8 09:38:46 mb-aw1n-bsd kernel: Timecounter TSC frequency 1993542975 Hz quality 800 Jan 8 09:38:46 mb-aw1n-bsd kernel: Timecounters tick every 1.000 msec Jan 8 09:38:46 mb-aw1n-bsd kernel: firewire0: 1 nodes, maxhop = 0 cable IRM irm(0) (me) Jan 8 09:38:46 mb-aw1n-bsd kernel: firewire0: bus manager 0 Jan 8 09:38:46 mb-aw1n-bsd kernel: usbus0: 12Mbps Full Speed USB v1.0 Jan 8 09:38:46 mb-aw1n-bsd kernel: usbus1: 12Mbps Full Speed USB v1.0 Jan 8 09:38:46 mb-aw1n-bsd kernel: usbus2: 12Mbps Full Speed USB v1.0 Jan 8 09:38:46 mb-aw1n-bsd kernel: usbus3: 480Mbps High Speed USB v2.0 Jan 8 09:38:46 mb-aw1n-bsd kernel: driver bug: Unable to set devclass (devname: (null)) ^^-- ??? Jan 8 09:38:46 mb-aw1n-bsd kernel: ad0: 76319MB HTS548080M9AT00 MG4OA53A at ata0-master UDMA100 I'm not sure which driver spits it out -- perhaps new USB stack ? There was a thread which might be helpful to identify a buggy driver: http://lists.freebsd.org/pipermail/freebsd-current/2009-March/004272.html -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: pkg_add thinks that -STABLE is -RELEASE (wrong __FreeBSD_version?)
2010/1/2 Paride Legovini p...@ninthfloor.org: Hi, I'm running 8.0-STABLE and I noticed that by default pkg_add -r uses packages-8.0-release/Latest/ as PACKAGESITE. I read in the handbook[1] that pkg_add should use packages-5-stable/Latest/ as PACKAGESITE when one is running -STABLE. I took a look at the source code, and noticed that pkg_add considers the system -STABLE if getosreldate() (i.e. __FreeBSD_version) is between 800500 and 899000 (see src/usr.sbin/pkg_install/add/main.c:92). However, in RELENG_8, __FreeBSD_version is set to 800108. You can check it via cvsweb[2]. Sounds like something is wrong. Am I missing something? Do I just have to wait for the __FreeBSD_version to be bumped? Hi. I'm afraid that's because __FreeBSD_version wasn't bumped 800107-800500 in RELENG_8 just after RELENG_8_0 created (wrt changes in scheme for RELENG_8 timeframe where current/stable border moved to 800500: http://lists.freebsd.org/pipermail/svn-src-head/2009-June/007830.html It continued then as is (still ok), and eventually was incremented to 800108 (wrong here, though I hope it still can be safely corrected). -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
ahcich0 issues in VB
Hi. There are two issues on RELENG_8 running in VirtualBox with ahci_load=YES. Transcribed manually (btw, smbd knows how can I get this via clipboard?). 1. Slightly older RELENG_8 (from 27 Nov or so). ahcich0: Timeout on slot 22 ahcich0: is cs 003c ss rs tfd 50 serr while portsnap update, then most processes stuck in [getblk] Jump to ddb manually, bt: sched_switch mi_switch sleepq_switch sleepq_wait __lockmgr_args getblk breadn bread ffs_update ufs_inactive VOP_INACTIVE_APV vinactive vrele fdfree exit1 sys_exit syscall 2. RELENG_8 csuped from yesterday: the problem appears during boot. [...] Waiting 5 seconds for SCSI devices to settle em0: link state changed to UP ahcich0: Poll error on slot 0, TFD: 0131 ROOT MOUNT ERROR: If you have valid mount options, reboot and first try the following from the loader prompt: set vfs.root.mountfrom.options=rw [...beyond as usual when kernel can't found a boot disk..] -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: route(8) and show/sticky/... Re: 8.0-RELEASE completed...
2009/12/3 John Baldwin j...@freebsd.org: On Wednesday 02 December 2009 7:02:27 pm Miroslav Lachman wrote: Kurt Jaeger wrote: Hi! Just a quick note in case there are people here who aren't subscribed to the freebsd-announce@ mailing list. We have completed the 8.0-RELEASE cycle. Details about the release are available from the main web site, in particular the announcement itself is available here: http://www.freebsd.org/releases/8.0R/announce.html Thanks! One question: http://www.freebsd.org/releases/8.0R/relnotes-detailed.html says: -- The route(8) utility now supports show, weights, and sticky commands. For more details, see the route(8) manual page. -- I do not have those things in my man page or route(8) command ? I have one more question about relnotes-detailed.html --- Specific CPU binding by using cpuset(1) has been implemented. Note that the current implementation allows the superuser inside of the jail to change the CPU bindings specified. --- Is it true? I don't have 8.0-RELEASE installed, but I think it was fixed in 7-STABLE right after the 7.2-RELEASE PR kern/134050 was reported by me I believe it is fixed in 8.0. This is what is in BUGS section of cpuset(1) manpage in 7.2-RELEASE, and not (fixed) in 8.0-RELEASE. It looks like it was leaved here by accident, since it was fixed on April in HEAD, MFC'ed on August to 7 after 7.2. The practice was to mention such misdescription on Errata page (e.g. see errata for 7.1). -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: pthread.h: typo in #define pthread_cleanup_push/pthread_cleanup_pop
2009/11/24 Mikolaj Golub to.my.troc...@gmail.com: Hi, I have problems with compiling our application under 8.0. It fails due to these definitions in pthread.h that look like a typo or incorrectly applied patch: 170 #define pthread_cleanup_push(cleanup_routine, cleanup_arg) \ 171 { \ 172 struct _pthread_cleanup_info __cleanup_info__; \ 173 __pthread_cleanup_push_imp(cleanup_routine, cleanup_arg,\ 174 __cleanup_info__); \ 175 { 176 177 #define pthread_cleanup_pop(execute) \ 178 } \ 179 __pthread_cleanup_pop_imp(execute); \ 180 } Hi. No, this is made intentionally. P.S. I don't understand the reason in the second brackets pair though (lines 175,178), maybe these are because of comment to v1.43.. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [geom] page fault in g_mbr_config()
2009/7/24 pluknet pluk...@gmail.com: 2009/7/24 pluknet pluk...@gmail.com: Hi. I got a panic while performing a repetitive 'fdisk -BI aacd0', where aacd0 is a disk on aac0: IBM ServeRAID-8k. This means that the command was issued after filesystems were already created on aacd (after the first fdisk -BI aacd0 iteration), and are in umount'ed state. This is on 7.2-R, amd64. Config is a GENERIC plus ddb, carp, ipfw, quota. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x8:0x804cc554 stack pointer = 0x10:0xfffe80079b80 frame pointer = 0x10:0xfffe80079bd0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100013 ] Stopped at g_mbr_config+0x64: movq 0x20(%rax),%r15 db bt Tracing pid 2 tid 100013 td 0xff000144da50 g_mbr_config() at g_mbr_config+0x64 g_run_events() at g_run_events+0x1b8 g_event_procbody() at g_event_procbody+0x57 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffe80079d30, rbp = 0 --- And, of course... db show proc 818 Process 818 (fdisk) at 0xff0004ed1000: state: NORMAL uid: 0 gids: 0, 0, 5 parent: pid 814 at 0xff00045c0478 ABI: FreeBSD ELF64 arguments: fdisk threads: 1 100169 D g_waitfo 0xff0004ec9100 fdisk db bt 818 Tracing pid 818 tid 100169 td 0xff0004fbf6e0 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x18e sleepq_timedwait() at sleepq_timedwait+0x31 _sleep() at _sleep+0x354 g_waitfor_event() at g_waitfor_event+0x9a g_ctl_ioctl() at g_ctl_ioctl+0x2df giant_ioctl() at giant_ioctl+0x7d devfs_ioctl_f() at devfs_ioctl_f+0x77 kern_ioctl() at kern_ioctl+0xa2 ioctl() at ioctl+0xf9 syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x8008200ec, rsp = 0x7fffe1d8, rbp = 0x4 --- This makes me tied to GEOM_* - GEOM_PART_* scheme (which is in 8+ in DEFAULTS now). After this replacement in DEFAULTS I see no more panics in 'fdisk -BI aacd0'. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi(4) endless loop kernel output on attach
2009/11/23 Scott Long sco...@samsco.org: Did you ever get a resolution for this? The 6.x bugs definitely need to be fixed. The reported timeouts on 7.x might be due to the adapter taking a log time to do the command. Scott An endless loop kernel output on boot always solved with clearing logs with MegaCli -AdpEventLog -GetEventLogInfo -aAll. As for BBULearn issue, I just tested it again on one of my boxes: # ./MegaCli -AdpBbuCmd -BbuLearn -aall Adapter 0: BBU Learn Succeeded. Exit Code: 0x00 So it seems to work. I see no problems. mfi0: 3748 (312279437s/0x0008/info) - Battery relearn started mfi0: 3749 (312279437s/0x0008/WARN) - BBU disabled; changing WB virtual disks to WT mfi0: 3750 (312279437s/0x0001/info) - Policy change on VD 00/0 to [ID=00,dcp=6d,ccp=6c,ap=0,dc=0,dbgi=0] from [ID=00,dcp=6d,ccp=6d,ap=0,dc=0,dbgi=0] mfi0: 3751 (312279442s/0x0008/info) - Battery is discharging mfi0: 3752 (312279442s/0x0008/info) - Battery relearn in progress On Oct 22, 2009, at 8:30 AM, pluknet wrote: 2009/10/15 John Baldwin j...@freebsd.org: On Thursday 15 October 2009 5:51:19 am pluknet wrote: Hi. This is 7.2-R. Seen on IBM x3650M2. During the boot I get those endless looping kernel messages while on mfi(4) attach phase. It's getting more odd since 7.2 booted and worked fine on exactly this server model months ago (on different box though).. Any hints? We just had some boxes die like this (but spewing a different loop of messages on boot related to continuously scheduling patrol reads and consistency checks that finished immediately) at work. We fixed them by swapping out the controller. We might try stick them in a different box and reflashing them using mfiutil(8) to see if it's some sort of corrupted state that flashing the adapter fixes. In your case it looks lik the firmware keeps crashing and restarting. Some more thoughts.. There was a problem I got with 'MegaCli -AdpBbuCmd -BbuLearn -aall' command. On 6.2-R process slept on mfiwait wchan: db bt 14734 Tracing pid 14734 tid 100135 td 0xc93f8190 sched_switch(c93f8190,0,1) at sched_switch+0x143 mi_switch(1,0,c93f8190,f9a32acc,c06a43a4,...) at mi_switch+0x1ba sleepq_switch(c8c6b0d0) at sleepq_switch+0x87 sleepq_wait(c8c6b0d0,0,c93f8190,c8c6b0d0,c8c25800,...) at sleepq_wait+0x5c msleep(c8c6b0d0,c8c25954,4c,c090acbc,0) at msleep+0x269 mfi_wait_command(c8c25800,c8c6b0d0,0,0,cc382460,...) at mfi_wait_command+0xa8 mfi_ioctl(c8c31300,c1144d01,cc870a00,1,c93f8190,...) at mfi_ioctl+0x485 devfs_ioctl_f(c90a2750,c1144d01,cc870a00,c9048000,c93f8190) at devfs_ioctl_f+0xaf ioctl(c93f8190,f9a32d04) at ioctl+0x445 syscall(3b,3b,3b,0,bfbfedc0,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8177207, esp = 0xbfbfe88c, ebp = 0xbfbfe8b8 --- Then: mfi0: COMMAND 0xc8c6b0d0 TIMEOUT AFTER 51 SECONDS mfi0: COMMAND 0xc8c61d50 TIMEOUT AFTER 49 SECONDS mfi0: COMMAND 0xc8c61850 TIMEOUT AFTER 49 SECONDS On 6.4-R MegaCli throws a page fault due to NULL deref in mfi_data_cb():cm-cm_sg (see below). There was past 6.4 backport mentioning fix some bugs in the API for the management ioctl. With this patch I have no longer panic and/or locks. Thanks to LSI now on 7.2-R (and on patched 6.4-R) it returns an error: # ./MegaCli -AdpBbuCmd -BbuLearn -aall Adapter 0: BBU Learn Failed Exit Code: 0x32 db bt Tracing pid 43059 tid 101363 td 0xcf46e680 mfi_data_cb(c9cfae00,c9cc3e00,1,0) at mfi_data_cb+0x5e bus_dmamap_load(c9cd7c80,0,caf86270,0,c0597240,c9cfae00,0) at bus_dmamap_load+0x4a1 mfi_mapcmd(c9cc3800,c9cfae00) at mfi_mapcmd+0x31 mfi_startio(c9cc3800) at mfi_startio+0x9b mfi_wait_command(c9cc3800,c9cfae00,0,0,caf86270,...) at mfi_wait_command+0x89 mfi_ioctl(c9cf7200,c1144d01,d3fb6200,1,cf46e680,...) at mfi_ioctl+0x52a devfs_ioctl_f(d1a551b0,c1144d01,d3fb6200,cbf52c80,cf46e680) at devfs_ioctl_f+0xaf ioctl(cf46e680,fbd91d04) at ioctl+0x445 syscall(3b,3b,3b,0,bfbfedc0,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8177207, esp = 0xbfbfe88c, ebp = 0xbfbfe8b8 #9 0xc08cbb1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #10 0xc059729e in mfi_data_cb (arg=0xc8a744b0, segs=0xc8a49e00, nsegs=1, ---Type return to continue, or q return to quit--- error=0) at /usr/src/sys/dev/mfi/mfi.c:1488 #11 0xc08c7afd in bus_dmamap_load (dmat=0xc8a6f100, map=0xac89e000, buf=0xc8a5ac60, buflen=0, callback=0xc0597240 mfi_data_cb, callback_arg=0xc8a744b0, flags=0) at /usr/src/sys/i386/i386/busdma_machdep.c:733 #12 0xc059721d in mfi_mapcmd (sc=0xc8a49800, cm=0xc8a49e00) at /usr/src/sys/dev/mfi/mfi.c:1452 #13 0xc0597177 in mfi_startio (sc=0xc8a49800) at /usr/src/sys/dev/mfi/mfi.c:1436 #14 0xc0595f09 in mfi_wait_command (sc=0xc8a49800, cm=0xc8a744b0) at /usr/src/sys/dev/mfi/mfi.c:822 #15 0xc059840a in mfi_ioctl (dev=0xac89e000, cmd=0, arg=0xc8de8800 , flag
Re: 7.2 dies in zfs
2009/11/21 Peter Jeremy peterjer...@acm.org: On 2009-Nov-21 09:47:56 +0900, Randy Bush ra...@psg.com wrote: imiho, zfs can not be called production ready if it crashes if you do not stand on your left leg, put your right hand in the air, and burn some eye of newt. FWIW, it's still very brittle on Solaris 10 and the Sun Support response to most issues is restore from backup. The IMHO, the biggest issue with ZFS itself is lack of recovery tools prior to PSARC 2009/479 (in ZFS v21). On 2009-Nov-21 11:36:43 -0800, Jeremy Chadwick free...@jdc.parodius.com wrote: RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18. Not in my repository. I still have v13 in sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h in last night's RELENG_7, RELENG_8 and -current. The good side of things is that there's the ongoing work on v13 - v22 in perforce. RELENG_7 and RELENG_8 both, more or less, behave the same way with regards to ZFS. Both panic on kmem exhaustion. No one has answered my question as far as what's needed to stabilise ZFS on either 7.x or 8.x. My understanding is that the problem is more that the FreeBSD VM system doesn't gracefully handle running low or out of memory. AFAIU kmacy works on zfs integration into FreeBSD'ish buf/vm. It'd be nice to read something on that.. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [releng_7 tinderbox] failure on i386/pc98
2009/10/30 FreeBSD Tinderbox tinder...@freebsd.org: TB --- 2009-10-30 07:41:34 - tinderbox 2.6 running on freebsd-stable.sentex.ca TB --- 2009-10-30 07:41:34 - starting RELENG_7 tinderbox run for i386/pc98 TB --- 2009-10-30 07:41:34 - cleaning the object tree TB --- 2009-10-30 07:42:00 - cvsupping the source tree TB --- 2009-10-30 07:42:00 - /usr/bin/csup -z -r 3 -g -L 1 -h localhost -s /tinderbox/RELENG_7/i386/pc98/supfile TB --- 2009-10-30 07:42:12 - building world TB --- 2009-10-30 07:42:12 - MAKEOBJDIRPREFIX=/obj TB --- 2009-10-30 07:42:12 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2009-10-30 07:42:12 - TARGET=pc98 TB --- 2009-10-30 07:42:12 - TARGET_ARCH=i386 TB --- 2009-10-30 07:42:12 - TZ=UTC TB --- 2009-10-30 07:42:12 - __MAKE_CONF=/dev/null TB --- 2009-10-30 07:42:12 - cd /src TB --- 2009-10-30 07:42:12 - /usr/bin/make -B buildworld World build started on Fri Oct 30 07:42:13 UTC 2009 Rebuilding the temporary build tree stage 1.1: legacy release compatibility shims stage 1.2: bootstrap tools stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3: cross tools stage 4.1: building includes stage 4.2: building libraries stage 4.3: make dependencies stage 4.4: building everything World build completed on Fri Oct 30 08:46:35 UTC 2009 TB --- 2009-10-30 08:46:35 - generating LINT kernel config TB --- 2009-10-30 08:46:35 - cd /src/sys/pc98/conf TB --- 2009-10-30 08:46:35 - /usr/bin/make -B LINT TB --- 2009-10-30 08:46:36 - building LINT kernel TB --- 2009-10-30 08:46:36 - MAKEOBJDIRPREFIX=/obj TB --- 2009-10-30 08:46:36 - PATH=/usr/bin:/usr/sbin:/bin:/sbin TB --- 2009-10-30 08:46:36 - TARGET=pc98 TB --- 2009-10-30 08:46:36 - TARGET_ARCH=i386 TB --- 2009-10-30 08:46:36 - TZ=UTC TB --- 2009-10-30 08:46:36 - __MAKE_CONF=/dev/null TB --- 2009-10-30 08:46:36 - cd /src TB --- 2009-10-30 08:46:36 - /usr/bin/make -B buildkernel KERNCONF=LINT Kernel build for LINT started on Fri Oct 30 08:46:36 UTC 2009 stage 1: configuring the kernel stage 2.1: cleaning up the object tree stage 2.2: rebuilding the object tree stage 2.3: build tools stage 3.1: making dependencies [...] awk -f /src/sys/tools/makeobjops.awk /src/sys/opencrypto/cryptodev_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/pci/agp_if.m -h awk -f /src/sys/tools/makeobjops.awk /src/sys/pc98/pc98/canbus_if.m -h rm -f .newdep /usr/bin/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES | MKDEP_CPP=cc -E CC=cc xargs mkdep -a -f .newdep -O2 -pipe -fno-strict-aliasing -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -nostdinc -I. -I/src/sys -I/src/sys/contrib/altq -I/src/sys/contrib/ipfilter -I/src/sys/contrib/pf -I/src/sys/dev/ath -I/src/sys/dev/ath/ath_hal -I/src/sys/contrib/ngatm -I/src/sys/dev/twa -I/src/sys/gnu/fs/xfs/FreeBSD -I/src/sys/gnu/fs/xfs/FreeBSD/support -I/src/sys/gnu/fs/xfs -I/src/sys/contrib/opensolaris/compat -I/src/sys/dev/cxgb -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 --param large-function-growth=1000 -DGPROF -falign-functions=16 -DGPROF4 -DGUPROF -fno-builtin -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -ffreestand ing /src/sys/i386/i386/mp_machdep.c:76:25: error: machine/mca.h: No such file or directory /src/sys/i386/i386/trap.c:93:25: error: machine/mca.h: No such file or directory mkdep: compile failed MFC r192106 ? -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi(4) endless loop kernel output on attach
2009/10/15 John Baldwin j...@freebsd.org: On Thursday 15 October 2009 5:51:19 am pluknet wrote: Hi. This is 7.2-R. Seen on IBM x3650M2. During the boot I get those endless looping kernel messages while on mfi(4) attach phase. It's getting more odd since 7.2 booted and worked fine on exactly this server model months ago (on different box though).. Any hints? We just had some boxes die like this (but spewing a different loop of messages on boot related to continuously scheduling patrol reads and consistency checks that finished immediately) at work. We fixed them by swapping out the controller. We might try stick them in a different box and reflashing them using mfiutil(8) to see if it's some sort of corrupted state that flashing the adapter fixes. In your case it looks lik the firmware keeps crashing and restarting. Some more thoughts.. There was a problem I got with 'MegaCli -AdpBbuCmd -BbuLearn -aall' command. On 6.2-R process slept on mfiwait wchan: db bt 14734 Tracing pid 14734 tid 100135 td 0xc93f8190 sched_switch(c93f8190,0,1) at sched_switch+0x143 mi_switch(1,0,c93f8190,f9a32acc,c06a43a4,...) at mi_switch+0x1ba sleepq_switch(c8c6b0d0) at sleepq_switch+0x87 sleepq_wait(c8c6b0d0,0,c93f8190,c8c6b0d0,c8c25800,...) at sleepq_wait+0x5c msleep(c8c6b0d0,c8c25954,4c,c090acbc,0) at msleep+0x269 mfi_wait_command(c8c25800,c8c6b0d0,0,0,cc382460,...) at mfi_wait_command+0xa8 mfi_ioctl(c8c31300,c1144d01,cc870a00,1,c93f8190,...) at mfi_ioctl+0x485 devfs_ioctl_f(c90a2750,c1144d01,cc870a00,c9048000,c93f8190) at devfs_ioctl_f+0xaf ioctl(c93f8190,f9a32d04) at ioctl+0x445 syscall(3b,3b,3b,0,bfbfedc0,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8177207, esp = 0xbfbfe88c, ebp = 0xbfbfe8b8 --- Then: mfi0: COMMAND 0xc8c6b0d0 TIMEOUT AFTER 51 SECONDS mfi0: COMMAND 0xc8c61d50 TIMEOUT AFTER 49 SECONDS mfi0: COMMAND 0xc8c61850 TIMEOUT AFTER 49 SECONDS On 6.4-R MegaCli throws a page fault due to NULL deref in mfi_data_cb():cm-cm_sg (see below). There was past 6.4 backport mentioning fix some bugs in the API for the management ioctl. With this patch I have no longer panic and/or locks. Thanks to LSI now on 7.2-R (and on patched 6.4-R) it returns an error: # ./MegaCli -AdpBbuCmd -BbuLearn -aall Adapter 0: BBU Learn Failed Exit Code: 0x32 db bt Tracing pid 43059 tid 101363 td 0xcf46e680 mfi_data_cb(c9cfae00,c9cc3e00,1,0) at mfi_data_cb+0x5e bus_dmamap_load(c9cd7c80,0,caf86270,0,c0597240,c9cfae00,0) at bus_dmamap_load+0x4a1 mfi_mapcmd(c9cc3800,c9cfae00) at mfi_mapcmd+0x31 mfi_startio(c9cc3800) at mfi_startio+0x9b mfi_wait_command(c9cc3800,c9cfae00,0,0,caf86270,...) at mfi_wait_command+0x89 mfi_ioctl(c9cf7200,c1144d01,d3fb6200,1,cf46e680,...) at mfi_ioctl+0x52a devfs_ioctl_f(d1a551b0,c1144d01,d3fb6200,cbf52c80,cf46e680) at devfs_ioctl_f+0xaf ioctl(cf46e680,fbd91d04) at ioctl+0x445 syscall(3b,3b,3b,0,bfbfedc0,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (54, FreeBSD ELF32, ioctl), eip = 0x8177207, esp = 0xbfbfe88c, ebp = 0xbfbfe8b8 #9 0xc08cbb1a in calltrap () at /usr/src/sys/i386/i386/exception.s:139 #10 0xc059729e in mfi_data_cb (arg=0xc8a744b0, segs=0xc8a49e00, nsegs=1, ---Type return to continue, or q return to quit--- error=0) at /usr/src/sys/dev/mfi/mfi.c:1488 #11 0xc08c7afd in bus_dmamap_load (dmat=0xc8a6f100, map=0xac89e000, buf=0xc8a5ac60, buflen=0, callback=0xc0597240 mfi_data_cb, callback_arg=0xc8a744b0, flags=0) at /usr/src/sys/i386/i386/busdma_machdep.c:733 #12 0xc059721d in mfi_mapcmd (sc=0xc8a49800, cm=0xc8a49e00) at /usr/src/sys/dev/mfi/mfi.c:1452 #13 0xc0597177 in mfi_startio (sc=0xc8a49800) at /usr/src/sys/dev/mfi/mfi.c:1436 #14 0xc0595f09 in mfi_wait_command (sc=0xc8a49800, cm=0xc8a744b0) at /usr/src/sys/dev/mfi/mfi.c:822 #15 0xc059840a in mfi_ioctl (dev=0xac89e000, cmd=0, arg=0xc8de8800 , flag=1, td=0xc8a5ac60) at /usr/src/sys/dev/mfi/mfi.c:2061 #16 0xc06598b7 in devfs_ioctl_f (fp=0xc902dc18, com=3239333121, data=0xc8de8800, cred=0xc9052980, td=0xc8e2dd00) at /usr/src/sys/fs/devfs/devfs_vnops.c:480 #17 0xc06d3a11 in ioctl (td=0xc8e2dd00, uap=0xeb37bd04) at file.h:265 (kgdb) f 10 #10 0xc059729e in mfi_data_cb (arg=0xc8a744b0, segs=0xc8a49e00, nsegs=1, error=0) at /usr/src/sys/dev/mfi/mfi.c:1488 1488sgl-sg32[i].addr = segs[i].ds_addr; (kgdb) list 1483return; 1484} 1485 1486if ((sc-mfi_flags MFI_FLAGS_SG64) == 0) { 1487for (i = 0; i nsegs; i++) { 1488sgl-sg32[i].addr = segs[i].ds_addr; 1489sgl-sg32[i].len = segs[i].ds_len; 1490} 1491} else { 1492for (i = 0; i nsegs; i++) { (kgdb) p i $1 = 0 (kgdb) p *segs $3 = {ds_addr = 2457600, ds_len = 65536} (kgdb) p sgl $4 = (union mfi_sgl *) 0x0 (kgdb) p *cm $6 = {cm_link = {tqe_next = 0x0, tqe_prev
8.0RC1: panic: softdep_deallocate_dependencies: unrecovered I/O error
Hi. I saw this panic just onсe during system shutdown, which runs under VirtualBox. Sorry, I don't remember more details. Unread portion of the kernel message buffer: ad0: FAILURE - device detached g_vfs_done():ad0s1e[WRITE(offset=7380582400, length=131072)]error = 6 /usr: got error 6 while accessing filesystem panic: softdep_deallocate_dependencies: unrecovered I/O error cpuid = 0 Uptime: 3h14m31s Physical memory: 1011 MB Dumping 160 MB: 145 129 113 97 81 65 49 33 17 1 #0 doadump () at pcpu.h:246 #1 0xc08827e7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0xc0882ad9 in panic (fmt=Variable fmt is not available. ) at /usr/src/sys/kern/kern_shutdown.c:579 #3 0xc0aad2cd in softdep_deallocate_dependencies (bp=Variable bp is not avail able. ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:6367 #4 0xc08f2cd0 in brelse (bp=0xd8198180) at buf.h:418 #5 0xc08f5646 in bufdone_finish (bp=0xd8198180) at /usr/src/sys/kern/vfs_bio.c:3401 #6 0xc08f56bd in bufdone (bp=0xd8198180) at /usr/src/sys/kern/vfs_bio.c:3261 #7 0xc08f91d9 in cluster_callback (bp=0xd80dda30) at /usr/src/sys/kern/vfs_cluster.c:551 #8 0xc08f56a7 in bufdone (bp=0xd80dda30) at /usr/src/sys/kern/vfs_bio.c:3255 #9 0xc0823765 in g_vfs_done (bip=0xc5647dac) at /usr/src/sys/geom/geom_vfs.c:97 #10 0xc08efc69 in biodone (bp=0xc5647dac) at /usr/src/sys/kern/vfs_bio.c:3096 #11 0xc0820c0f in g_io_schedule_up (tp=0xc414a000) at /usr/src/sys/geom/geom_io.c:669 #12 0xc0820fc8 in g_up_procbody () at /usr/src/sys/geom/geom_kern.c:95 #13 0xc0858571 in fork_exit (callout=0xc0820f60 g_up_procbody, arg=0x0, frame=0xc3ed2d38) at /usr/src/sys/kern/kern_fork.c:843 #14 0xc0b97350 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270 (kgdb) f 3 #3 0xc0aad2cd in softdep_deallocate_dependencies (bp=Variable bp is not avail able. ) at /usr/src/sys/ufs/ffs/ffs_softdep.c:6367 6367panic(softdep_deallocate_dependencies: unrecovered I/O error); (kgdb) list 6362{ 6363 6364if ((bp-b_ioflags BIO_ERROR) == 0) 6365panic(softdep_deallocate_dependencies: dangling deps); 6366softdep_error(bp-b_vp-v_mount-mnt_stat.f_mntonname, bp-b_err or); 6367panic(softdep_deallocate_dependencies: unrecovered I/O error); 6368} 6369 6370/* 6371 * Function to handle asynchronous write errors in the filesystem. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
mfi(4) endless loop kernel output on attach
: 5650 (boot + 89s/0x0002/info) - PD 0d(e0xff/s13) FRU is 42D0422 mfi0: 5651 (boot + 89s/0x0002/info) - Inserted: PD 0e(e0xff/s14) mfi0: 5652 (boot + 89s/0x0002/info) - Inserted: PD 0e(e0xff/s14) Info: enclPd=, scsiType=0, portMap=06, sasAddr=5000c500138d6839, mfi0: 5653 (boot + 89s/0x0002/info) - PD 0e(e0xff/s14) FRU is 42D0422 mfi0: 5654 (boot + 89s/0x0002/info) - Inserted: PD 0f(e0xff/s15) mfi0: 5655 (boot + 89s/0x0002/info) - Inserted: PD 0f(e0xff/s15) Info: enclPd=, scsiType=0, portMap=07, sasAddr=5000c500138d5e8d, mfi0: 5656 (boot + 89s/0x0002/info) - PD 0f(e0xff/s15) FRU is 42D0422 mfi0: 5657 (boot + 144s/0x0008/info) - Battery temperature is normal mfi0: 5658 (308394231s/0x0020/info) - Time established as 10/09/09 9:03:51; (262 seconds since power on) mfi0: 5659 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/0364/1014) mfi0: 5660 (boot + 3s/0x0020/info) - Firmware version 1.40.62-0691 mfi0: 5661 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/0364/1014) mfi0: 5662 (boot + 3s/0x0020/info) - Firmware version 1.40.62-0691 mfi0: 5663 (boot + 64s/0x0008/info) - Battery Present mfi0: 5664 (boot + 64s/0x0020/info) - BBU FRU is mfi0: 5665 (boot + 64s/0x0020/info) - Board Revision mfi0: 5666 (boot + 89s/0x0002/info) - Inserted: PD 08(e0xff/s8) mfi0: 5667 (boot + 89s/0x0002/info) - Inserted: PD 08(e0xff/s8) Info: enclPd=, scsiType=0, portMap=00, sasAddr=5000c500138d46b5, So on. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi(4) endless loop kernel output on attach
2009/10/15 pluknet pluk...@gmail.com: Hi. This is 7.2-R. Seen on IBM x3650M2. During the boot I get those endless looping kernel messages while on mfi(4) attach phase. It's getting more odd since 7.2 booted and worked fine on exactly this server model months ago (on different box though).. Any hints? After the looked endless loop the kernel eventually finished the mfi(4) initialization and continued its booting. Is it expected to do so long initialization? Almost at the end of kernel boot / near to multiuser start it panicked with: panic: invalid ife-ifm_data (0xa) in mii_phy_setmedia Though that's offtopic here. From last part of dmesg: mfi0: 29365 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 0060/1000/0364/1014) mfi0: 29366 (boot + 3s/0x0020/info) - Firmware version 1.40.62-0691 mfi0: 29367 (boot + 64s/0x0008/info) - Battery Present mfi0: 29368 (boot + 64s/0x0020/info) - BBU FRU is mfi0: 29369 (boot + 64s/0x0020/info) - Board Revision mfi0: 29370 (boot + 89s/0x0002/info) - Inserted: PD 08(e0xff/s8) mfi0: 29371 (boot + 89s/0x0002/info) - Inserted: PD 08(e0xff/s8) Info: enclPd=, scsiType=0, portMap=00, sasAddr=5000c500138d46b5, mfi0: 29372 (boot + 89s/0x0002/info) - PD 08(e0xff/s8) FRU is 42D0422 mfi0: 29373 (boot + 89s/0x0002/info) - Inserted: PD 09(e0xff/s9) mfi0: 29374 (boot + 89s/0x0002/info) - Inserted: PD 09(e0xff/s9) Info: enclPd=, scsiType=0, portMap=01, sasAddr=5000c500138d842d, mfi0: 29375 (boot + 89s/0x0002/info) - PD 09(e0xff/s9) FRU is 42D0422 mfi0: 29376 (boot + 89s/0x0002/info) - Inserted: PD 0a(e0xff/s10) mfi0: 29377 (boot + 89s/0x0002/info) - Inserted: PD 0a(e0xff/s10) Info: enclPd=, scsiType=0, portMap=04, sasAddr=5000c500138d7d75, mfi0: 29378 (boot + 89s/0x0002/info) - PD 0a(e0xff/s10) FRU is 42D0422 mfi0: 29379 (boot + 89s/0x0002/info) - Inserted: PD 0b(e0xff/s11) mfi0: 29380 (boot + 89s/0x0002/info) - Inserted: PD 0b(e0xff/s11) Info: enclPd=, scsiType=0, portMap=05, sasAddr=5000c500138d7835, mfi0: 29381 (boot + 89s/0x0002/info) - PD 0b(e0xff/s11) FRU is 42D0422 mfi0: 29382 (boot + 89s/0x0002/info) - Inserted: PD 0c(e0xff/s12) mfi0: 29383 (boot + 89s/0x0002/info) - Inserted: PD 0c(e0xff/s12) Info: enclPd=, scsiType=0, portMap=02, sasAddr=5000c500138d60b5, mfi0: 29384 (boot + 89s/0x0002/info) - PD 0c(e0xff/s12) FRU is 42D0422 mfi0: 29385 (boot + 89s/0x0002/info) - Inserted: PD 0d(e0xff/s13) mfi0: 29386 (boot + 89s/0x0002/info) - Inserted: PD 0d(e0xff/s13) Info: enclPd=, scsiType=0, portMap=03, sasAddr=5000c500138d7bdd, mfi0: 29387 (boot + 89s/0x0002/info) - PD 0d(e0xff/s13) FRU is 42D0422 mfi0: 29388 (boot + 89s/0x0002/info) - Inserted: PD 0e(e0xff/s14) mfi0: 29389 (boot + 89s/0x0002/info) - Inserted: PD 0e(e0xff/s14) Info: enclPd=, scsiType=0, portMap=06, sasAddr=5000c500138d6839, mfi0: 29390 (boot + 89s/0x0002/info) - PD 0e(e0xff/s14) FRU is 42D0422 mfi0: 29391 (boot + 89s/0x0002/info) - Inserted: PD 0f(e0xff/s15) mfi0: 29392 (boot + 89s/0x0002/info) - Inserted: PD 0f(e0xff/s15) Info: enclPd=, scsiType=0, portMap=07, sasAddr=5000c500138d5e8d, mfi0: 29393 (boot + 89s/0x0002/info) - PD 0f(e0xff/s15) FRU is 42D0422 mfi0: 29394 (boot + 144s/0x0008/info) - Battery temperature is normal ioapic0: routing intpin 16 (PCI IRQ 16) to vector 52 mfi0: [MPSAFE] mfi0: [ITHREAD] pcib8: PCI-PCI bridge irq 16 at device 28.4 on pci0 pcib8: domain0 pcib8: secondary bus 6 pcib8: subordinate bus 10 pcib8: I/O decode0xf000-0xfff pcib8: memory decode 0x9700-0x978f pcib8: prefetched decode 0x9600-0x96ff pci6: PCI bus on pcib8 pci6: domain=0, physical bus=6 found- vendor=0x101b, dev=0x0452, revid=0x01 domain=0, bus=6, slot=0, func=0 class=06-04-00, hdrtype=0x01, mfdev=0 cmdreg=0x0047, statreg=0x0010, cachelnsz=16 (dwords) lattimer=0x00 (0 ns), mingnt=0x0b (2750 ns), maxlat=0x00 (0 ns) intpin=a, irq=11 powerspec 3 supports D0 D3 current D0 MSI supports 2 messages, 64 bit pcib0: matched entry for 0.28.INTA pcib0: slot 28 INTA hardwired to IRQ 16 pcib8: slot 0 INTA is routed to irq 16 pcib9: PCI-PCI bridge irq 16 at device 0.0 on pci6 pcib9: domain0 pcib9: secondary bus 7 pcib9: subordinate bus 7 pcib9: I/O decode0xf000-0xfff pcib9: memory decode 0x9700-0x978f pcib9: prefetched decode 0x9600-0x96ff pci7: PCI bus on pcib9 pci7: domain=0, physical bus=7 found- vendor=0x102b, dev=0x0530, revid=0x00 domain=0, bus=7, slot=0, func=0 class=03-00-00, hdrtype=0x00, mfdev=0 cmdreg=0x0007, statreg=0x0290, cachelnsz=16 (dwords) lattimer=0x40 (1920 ns), mingnt=0x10 (4000 ns), maxlat=0x20 (8000 ns) intpin=a, irq=11 powerspec 1 supports D0 D3 current D0 map[10]: type Prefetchable Memory, range 32, base
Re: [bce] panic on boot in bce(4) on 7.2: invalid ife-ifm_data (0xa)
2009/8/6 Xin LI delp...@delphij.net: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, pluknet wrote: Hi. I got the following kernel panic on boot: invalid ife-ifm_data (0xa) in mii_phy_setmedia This is near /sys/dev/mii/mii_physubr.c:126 KASSERT(ife-ifm_data =0 ife-ifm_data MII_NMEDIA, (invalid ife-ifm_data (0x%x) in mii_phy_setmedia, ife-ifm_data)); I believe that this was because the (un)merged brgphy bits. Is it possible for you to use 7.2-STABLE instead? Otherwise you will need a patched kernel: http://people.freebsd.org/~delphij/misc/bce-5709phy.diff Please make sure that you have a full kernel build. I see no panic with this patch on 7.2. Thanks. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi(4) endless loop kernel output on attach
This is 7.2-R. Seen on IBM x3650M2. During the boot I get those endless looping kernel messages while on mfi(4) attach phase. It's getting more odd since 7.2 booted and worked fine on exactly this server model months ago (on different box though).. Any hints? After the looked endless loop the kernel eventually finished the mfi(4) initialization and continued its booting. Is it expected to do so long initialization? Replying to myself. As someone pointed out in the private email, that's due to the large log contained on non-volatile memory in the controller. MegaCli -AdpEventLog -GetEventLogInfo -aAll cleaned up 'em all. Sorry for noise. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: mfi(4) endless loop kernel output on attach
2009/10/15 John Baldwin j...@freebsd.org: On Thursday 15 October 2009 5:51:19 am pluknet wrote: Hi. This is 7.2-R. Seen on IBM x3650M2. During the boot I get those endless looping kernel messages while on mfi(4) attach phase. It's getting more odd since 7.2 booted and worked fine on exactly this server model months ago (on different box though).. Any hints? We just had some boxes die like this (but spewing a different loop of messages on boot related to continuously scheduling patrol reads and consistency checks that finished immediately) at work. We fixed them by swapping out the controller. We might try stick them in a different box and reflashing them using mfiutil(8) to see if it's some sort of corrupted state that flashing the adapter fixes. In your case it looks lik the firmware keeps crashing and restarting. Probably it is. Though clearing logs helped me. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 6.2 sporadically locks up
2009/6/17 Ed Maste ema...@freebsd.org: On Tue, Jun 16, 2009 at 07:03:34PM +0400, pluknet wrote: As for allpcpu, I often see the picture, when one CPU runs the irq17: bce1 aacu0 thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. Do you see the same result w/ the in-tree aac(4) driver as opposed to Adaptec's version? -Ed Sorry for late reply. Several months testing shows that in-tree aac(4) as of 6.4 (and later) has no this problem. Thanks. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 8.0-BETA4 IBM ServerRaid 8k issues
2009/9/10 George Mamalakis mama...@eng.auth.gr: [about aac0: COMMAND 0xff80003d9440 TIMEOUT AFTER 40 SECONDS] It looks like the controller was too busy rebuilding to take any new requests. It is possible you have filled the controller's write cache and that is why the lag happened at this point. You can easily test this theory. This is the exact reason why I am asking this question. If this behavior is normal, then there is no problem with me. But I couldn't be sure whether it was a driver's problem or a controller's problem. I see it from time to time on a number of boxes. You can often ignore this. It might be due to (and a symptom of) high disk workload. Btw, there was a recent change to increase command timeout to 120s. Thank you for your answer again, and (now that you mentioned it:) ) in case anyone knows whether we'll be able to see partitions 2T in the future (or now?!), please say how :). Look at gpart(8). -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: svn commit: r196210 - in stable/8/sys: . amd64/include/xen cddl/contrib/opensolaris contrib/dev/acpica contrib/pf dev/ata dev/cxgb dev/sound/usb dev/usb dev/usb/controller dev/usb/input dev/usb/
2009/8/14 Konstantin Belousov k...@freebsd.org: Author: kib Date: Fri Aug 14 11:22:09 2009 New Revision: 196210 URL: http://svn.freebsd.org/changeset/base/196210 Log: MFC r196206: Take the number of allocated freeblks into consideration for softdep_slowdown(), to prevent kernel memory exhaustioni on mass-truncation. Approved by: re (rwatson) [...] Hi. Is it scheduled to be merged to stable/7 (and even to stable/6, which also has this issue)? Thanks. Modified: stable/8/sys/ufs/ffs/ffs_softdep.c == --- stable/8/sys/ufs/ffs/ffs_softdep.c Fri Aug 14 11:17:34 2009 (r196209) +++ stable/8/sys/ufs/ffs/ffs_softdep.c Fri Aug 14 11:22:09 2009 (r196210) @@ -663,6 +663,8 @@ static int req_clear_inodedeps; /* synce static int req_clear_remove; /* syncer process flush some freeblks */ #define FLUSH_REMOVE 2 #define FLUSH_REMOVE_WAIT 3 +static long num_freeblkdep; /* number of freeblks workitems allocated */ + /* * runtime statistics */ @@ -2223,6 +2225,9 @@ softdep_setup_freeblocks(ip, length, fla freeblks-fb_uid = ip-i_uid; freeblks-fb_previousinum = ip-i_number; freeblks-fb_devvp = ip-i_devvp; + ACQUIRE_LOCK(lk); + num_freeblkdep++; + FREE_LOCK(lk); extblocks = 0; if (fs-fs_magic == FS_UFS2_MAGIC) extblocks = btodb(fragroundup(fs, ip-i_din2-di_extsize)); @@ -2815,6 +2820,7 @@ handle_workitem_freeblocks(freeblks, fla ACQUIRE_LOCK(lk); WORKITEM_FREE(freeblks, D_FREEBLKS); + num_freeblkdep--; FREE_LOCK(lk); } @@ -5768,7 +5774,8 @@ softdep_slowdown(vp) max_softdeps_hard = max_softdeps * 11 / 10; if (num_dirrem max_softdeps_hard / 2 num_inodedep max_softdeps_hard - VFSTOUFS(vp-v_mount)-um_numindirdeps maxindirdeps) { + VFSTOUFS(vp-v_mount)-um_numindirdeps maxindirdeps + num_freeblkdep max_softdeps_hard) { FREE_LOCK(lk); return (0); } -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: Fwd: How do I mount an external ntfs formatted harddisk manually and through /etc/fstab?
2009/8/16 Jens Rasmus Liland jensras...@gmail.com: Hi, Sorry for the late reply - I went on vacation for a while. I think 'mount_ntfs-3g' did the trick in terms of mounting /dev/da0s1 manually. But I tried to add /dev/da0s1 /homewd ntfs-3g ro 0 0 Since 7.2 new parameter -o mountprog was introduced so you should be able to set in fstab mounting with 3th party program like this: /dev/acd0 /mnt ntfsro,noauto,mountprog=/usr/local/bin/ntfs-3g 0 0 ... but then the computer panicked, and went into single user mode. I think it happened because the ntfs-3g module is loaded later with the fusefs-stuff. Or due to the wrong/unsupported syntax. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
panic in vgonel()
0 SL -0x80b63e38 [g_down] 3 0 0 0 SL -0x80b63e30 [g_up] 2 0 0 0 SL -0x80b63e20 [g_event] 21 0 0 0 WL [swi1: net] 20 0 0 0 WL [swi3: vm] 19 0 0 0 WL [swi4: clock sio] 18 0 0 0 RL CPU 0 [idle: cpu0] 17 0 0 0 RL CPU 1 [idle: cpu1] 16 0 0 0 RL CPU 2 [idle: cpu2] 15 0 0 0 RL [idle: cpu3] 14 0 0 0 RL CPU 4 [idle: cpu4] 13 0 0 0 RL CPU 5 [idle: cpu5] 12 0 0 0 RL CPU 6 [idle: cpu6] 11 0 0 0 RL CPU 7 [idle: cpu7] 1 0 1 0 SLs wait 0xff000142b8f0 [init] 10 0 0 0 SL audit_wo 0x80b910e0 [audit] 0 0 0 0 SLs sched0x80b63f40 [swapper] Also looks a bit weird: devbuf1699635483K17121 -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic in vgonel()
2009/8/7 Kostik Belousov kostik...@gmail.com: On Fri, Aug 07, 2009 at 03:37:11PM +0400, pluknet wrote: This is on 7.2-R amd64. I'm curious if it might be due to glusterfs on it. Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x8:0x805a52ba stack pointer = 0x10:0xfffefc3474a0 frame pointer = 0x10:0xfffefc347510 code segment = base 0x0, limit 0xf, type = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 35425 (find) db bt Tracing pid 35425 tid 100194 td 0xff003c165370 vgonel() at vgonel+0x1aa vnlru_free() at vnlru_free+0x36c getnewvnode() at getnewvnode+0x281 ffs_vgetf() at ffs_vgetf+0xdf ufs_lookup() at ufs_lookup+0x2dd vfs_cache_lookup() at vfs_cache_lookup+0xf3 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x598 namei() at namei+0x33e kern_lstat() at kern_lstat+0x5e lstat() at lstat+0x2a syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (190, FreeBSD ELF64, lstat), rip = 0x80071063c, rsp = 0x7fffea48, rbp = 0x800a06910 --- Did you got the vmcore ? If yes, please find the value for vgonel() argument, vp, and print the vnode content. I didn't. Same problem as in my another mail. :( Regardless of this, look up the source line for vgonel+0x1aa. I could resolve only address which belongs to instruction pointer = 0x8:0x805a52ba (eh, I don't know if I should sum these numbers, so I did this for both cases): dev2# addr2line -e /boot/kernel/kernel.symbols 0x805a52ba /usr/src/sys/kern/vfs_subr.c:979 delmntque():TAILQ_REMOVE(mp-mnt_nvnodelist, vp, v_nmntvnodes); dev2# addr2line -e /boot/kernel/kernel.symbols 0x805a52c2 /usr/src/sys/kern/vfs_subr.c:981 delmntque():MNT_REL(mp); -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: panic in vgonel()
2009/8/7 Kostik Belousov kostik...@gmail.com: On Fri, Aug 07, 2009 at 04:37:07PM +0400, pluknet wrote: 2009/8/7 Kostik Belousov kostik...@gmail.com: On Fri, Aug 07, 2009 at 03:37:11PM +0400, pluknet wrote: This is on 7.2-R amd64. I'm curious if it might be due to glusterfs on it. Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x8:0x805a52ba stack pointer = 0x10:0xfffefc3474a0 frame pointer = 0x10:0xfffefc347510 code segment = base 0x0, limit 0xf, type = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 35425 (find) db bt Tracing pid 35425 tid 100194 td 0xff003c165370 vgonel() at vgonel+0x1aa vnlru_free() at vnlru_free+0x36c getnewvnode() at getnewvnode+0x281 ffs_vgetf() at ffs_vgetf+0xdf ufs_lookup() at ufs_lookup+0x2dd vfs_cache_lookup() at vfs_cache_lookup+0xf3 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x40 lookup() at lookup+0x598 namei() at namei+0x33e kern_lstat() at kern_lstat+0x5e lstat() at lstat+0x2a syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (190, FreeBSD ELF64, lstat), rip = 0x80071063c, rsp = 0x7fffea48, rbp = 0x800a06910 --- Did you got the vmcore ? If yes, please find the value for vgonel() argument, vp, and print the vnode content. I didn't. Same problem as in my another mail. :( Regardless of this, look up the source line for vgonel+0x1aa. I could resolve only address which belongs to instruction pointer = 0x8:0x805a52ba (eh, I don't know if I should sum these numbers, so I did this for both cases): dev2# addr2line -e /boot/kernel/kernel.symbols 0x805a52ba /usr/src/sys/kern/vfs_subr.c:979 delmntque(): TAILQ_REMOVE(mp-mnt_nvnodelist, vp, v_nmntvnodes); dev2# addr2line -e /boot/kernel/kernel.symbols 0x805a52c2 /usr/src/sys/kern/vfs_subr.c:981 delmntque(): MNT_REL(mp); load kernel.debug into gdb, and then do list *(vgonel+0x1aa) Ah, of course. Sorry. (gdb) list *(vgonel+0x1aa) 0x805a52ba is in vgonel (/usr/src/sys/kern/vfs_subr.c:979). 974 return; 975 MNT_ILOCK(mp); 976 vp-v_mount = NULL; 977 VNASSERT(mp-mnt_nvnodelistsize 0, vp, 978 (bad mount point vnode list size)); 979 TAILQ_REMOVE(mp-mnt_nvnodelist, vp, v_nmntvnodes); 980 mp-mnt_nvnodelistsize--; 981 MNT_REL(mp); 982 MNT_IUNLOCK(mp); 983 } -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: softdep_setup_freeblocks: kmem_malloc(4096): kmem_map too small
2009/8/5 Kostik Belousov kostik...@gmail.com: On Wed, Aug 05, 2009 at 09:38:13AM +0400, pluknet wrote: Hi. We have a problem with user running with exceed quota: Disk quotas for user eviluser (uid 9181): Filesystem usage quota limit grace files quota limit grace /home 6172656 6172672 6172672 14723 0 0 Some types of ufs operations running under him lead to kernel panic due to out of kernel memory (tested on 6.2-R, and 6.4-R): db x/s *panicstr buf.1: kmem_malloc(4096): kmem_map too small: 335544320 total allocated Upping the higher level of vm.kmem_size_max doesn't help much, postponing that panic little farther. db bt Tracing pid 7242 tid 100772 td 0xca7a57d0 kdb_enter(c0924e28) at kdb_enter+0x2b panic(c093a575,1000,1400,c17f7818,0,...) at panic+0x127 kmem_malloc(c10680c0,1000,402,ef34e7bc,c07fb86d,...) at kmem_malloc+0x7d page_alloc(c10613c0,1000,ef34e7af,402,0,...) at page_alloc+0x1a slab_zalloc(c10613c0,402,c1061480,c10613c0,da68220c,...) at slab_zalloc+0xdd uma_zone_slab(c10613c0,502) at uma_zone_slab+0xe8 uma_zalloc_bucket(c10613c0,502) at uma_zalloc_bucket+0x15c uma_zalloc_arg(c10613c0,0,502) at uma_zalloc_arg+0x292 malloc(b8,c09d4ba0,502,0,0,...) at malloc+0x46 softdep_setup_freeblocks(cc8fb18c,0,0,800,cc8fb18c,ffe0,,0,0) at sof tdep_setup_freeblocks+0x48 ffs_truncate(c89e3990,0,0,800,c94f4300,...) at ffs_truncate+0x5cb ffs_write(ef34ebec) at ffs_write+0x603 VOP_WRITE_APV(c09d5960,ef34ebec) at VOP_WRITE_APV+0xce vn_write(caba7000,ef34ecbc,c94f4300,0,ca7a57d0) at vn_write+0x1ee dofilewrite(ca7a57d0,7,caba7000,ef34ecbc,,...) at dofilewrite+0x77 kern_writev(ca7a57d0,7,ef34ecbc,7e99c3c,42f6e8,...) at kern_writev+0x3b write(ca7a57d0,ef34ed04) at write+0x45 syscall(3b,82b003b,bfbf003b,8851,82bb000,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x281ae32f, esp = 0xbfbfbbdc, ebp = 0xbfbfbbf8 --- I have a high confidence that this issue should be fixed by r170991 and by minor followup in r183067. r170991 was MFC'ed before 6.4, and r183067 wasn't MFC'ed to RELENG_6.. I'll try and report later. Thanks. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: softdep_setup_freeblocks: kmem_malloc(4096): kmem_map too small
2009/8/5 pluknet pluk...@gmail.com: 2009/8/5 Kostik Belousov kostik...@gmail.com: On Wed, Aug 05, 2009 at 09:38:13AM +0400, pluknet wrote: Hi. We have a problem with user running with exceed quota: Disk quotas for user eviluser (uid 9181): Filesystem usage quota limit grace files quota limit grace /home 6172656 6172672 6172672 14723 0 0 Some types of ufs operations running under him lead to kernel panic due to out of kernel memory (tested on 6.2-R, and 6.4-R): db x/s *panicstr buf.1: kmem_malloc(4096): kmem_map too small: 335544320 total allocated Upping the higher level of vm.kmem_size_max doesn't help much, postponing that panic little farther. db bt Tracing pid 7242 tid 100772 td 0xca7a57d0 kdb_enter(c0924e28) at kdb_enter+0x2b panic(c093a575,1000,1400,c17f7818,0,...) at panic+0x127 kmem_malloc(c10680c0,1000,402,ef34e7bc,c07fb86d,...) at kmem_malloc+0x7d page_alloc(c10613c0,1000,ef34e7af,402,0,...) at page_alloc+0x1a slab_zalloc(c10613c0,402,c1061480,c10613c0,da68220c,...) at slab_zalloc+0xdd uma_zone_slab(c10613c0,502) at uma_zone_slab+0xe8 uma_zalloc_bucket(c10613c0,502) at uma_zalloc_bucket+0x15c uma_zalloc_arg(c10613c0,0,502) at uma_zalloc_arg+0x292 malloc(b8,c09d4ba0,502,0,0,...) at malloc+0x46 softdep_setup_freeblocks(cc8fb18c,0,0,800,cc8fb18c,ffe0,,0,0) at sof tdep_setup_freeblocks+0x48 ffs_truncate(c89e3990,0,0,800,c94f4300,...) at ffs_truncate+0x5cb ffs_write(ef34ebec) at ffs_write+0x603 VOP_WRITE_APV(c09d5960,ef34ebec) at VOP_WRITE_APV+0xce vn_write(caba7000,ef34ecbc,c94f4300,0,ca7a57d0) at vn_write+0x1ee dofilewrite(ca7a57d0,7,caba7000,ef34ecbc,,...) at dofilewrite+0x77 kern_writev(ca7a57d0,7,ef34ecbc,7e99c3c,42f6e8,...) at kern_writev+0x3b write(ca7a57d0,ef34ed04) at write+0x45 syscall(3b,82b003b,bfbf003b,8851,82bb000,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x281ae32f, esp = 0xbfbfbbdc, ebp = 0xbfbfbbf8 --- I have a high confidence that this issue should be fixed by r170991 and by minor followup in r183067. r170991 was MFC'ed before 6.4, and r183067 wasn't MFC'ed to RELENG_6.. I'll try and report later. Thanks. It seems to not work. It just panicked again with applied r183067, same backtrace output. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
softdep_setup_freeblocks: kmem_malloc(4096): kmem_map too small
Hi. We have a problem with user running with exceed quota: Disk quotas for user eviluser (uid 9181): Filesystem usage quota limit grace files quota limit grace /home 6172656 6172672 6172672 14723 0 0 Some types of ufs operations running under him lead to kernel panic due to out of kernel memory (tested on 6.2-R, and 6.4-R): db x/s *panicstr buf.1: kmem_malloc(4096): kmem_map too small: 335544320 total allocated Upping the higher level of vm.kmem_size_max doesn't help much, postponing that panic little farther. db bt Tracing pid 7242 tid 100772 td 0xca7a57d0 kdb_enter(c0924e28) at kdb_enter+0x2b panic(c093a575,1000,1400,c17f7818,0,...) at panic+0x127 kmem_malloc(c10680c0,1000,402,ef34e7bc,c07fb86d,...) at kmem_malloc+0x7d page_alloc(c10613c0,1000,ef34e7af,402,0,...) at page_alloc+0x1a slab_zalloc(c10613c0,402,c1061480,c10613c0,da68220c,...) at slab_zalloc+0xdd uma_zone_slab(c10613c0,502) at uma_zone_slab+0xe8 uma_zalloc_bucket(c10613c0,502) at uma_zalloc_bucket+0x15c uma_zalloc_arg(c10613c0,0,502) at uma_zalloc_arg+0x292 malloc(b8,c09d4ba0,502,0,0,...) at malloc+0x46 softdep_setup_freeblocks(cc8fb18c,0,0,800,cc8fb18c,ffe0,,0,0) at sof tdep_setup_freeblocks+0x48 ffs_truncate(c89e3990,0,0,800,c94f4300,...) at ffs_truncate+0x5cb ffs_write(ef34ebec) at ffs_write+0x603 VOP_WRITE_APV(c09d5960,ef34ebec) at VOP_WRITE_APV+0xce vn_write(caba7000,ef34ecbc,c94f4300,0,ca7a57d0) at vn_write+0x1ee dofilewrite(ca7a57d0,7,caba7000,ef34ecbc,,...) at dofilewrite+0x77 kern_writev(ca7a57d0,7,ef34ecbc,7e99c3c,42f6e8,...) at kern_writev+0x3b write(ca7a57d0,ef34ed04) at write+0x45 syscall(3b,82b003b,bfbf003b,8851,82bb000,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (4, FreeBSD ELF32, write), eip = 0x281ae32f, esp = 0xbfbfbbdc, ebp = 0xbfbfbbf8 --- Always, afaics, the source of panics is a process running apache: db show proc 7242 Process 7242 (httpd) at 0xca7a3648: state: NORMAL uid: 9181 gids: 999, 999, 9181 parent: pid 3799 at 0xca468000 ABI: FreeBSD ELF32 arguments: /home/eviluser/etc/apache/bin/httpd threads: 1 100772 Run CPU 5 httpd The type of a ufs operations is always the same: a process of that user tries to write data to fs and falls back (due to exceed quotas) to ffs_truncate() where it panics. I couldn't reproduce it, that happens once per several days. freeblks malloc type looks a bit leaky and a cause of panic. db show malloc Type AllocsFrees Used ... freeblks 3233513 2578006 655507 ... db show lockedbufs buf at 0xdbd84c88 b_flags = 0x2220vmio,done,cache b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0 b_bufobj = (0xcc1bfa50), b_data = 0xdc855000, b_blkno = 1624763616 b_npages = 4, pages(OBJ, IDX, PA): (0xcdb2cce4, 0xc, 0xb638b000),(0xcdb2cce4, 0x d, 0x6fe2c000),(0xcdb2cce4, 0xe, 0xca8d000),(0xcdb2cce4, 0xf, 0xb216e000) lock type bufwait: EXCL (count 1) by thread 0xcb41e190 (pid 11437) buf at 0xdbd86398 b_flags = 0x2000vmio b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0 b_bufobj = (0xc8165c70), b_data = 0xdc89d000, b_blkno = 545231232 b_npages = 4, pages(OBJ, IDX, PA): (0xc81695ac, 0x40ff230, 0x842dc000),(0xc81695 ac, 0x40ff231, 0xdfd000),(0xc81695ac, 0x40ff232, 0x3b3e000),(0xc81695ac, 0x40ff2 33, 0x201f000) db show lockedvnods Locked vnodes 0xcc144550: tag ufs, type VDIR usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xc9c6e000 ref 0 pages 1 lock type ufs: EXCL (count 1) by thread 0xd00e4af0 (pid 11041) ino 203063289, on dev aacd0s1g 0xcc1bf990: tag ufs, type VDIR usecount 1, writecount 0, refcount 6 mountedhere 0 flags () v_object 0xcdb2cce4 ref 0 pages 22 lock type ufs: EXCL (count 1) by thread 0xcb41e190 (pid 11437) with 1 pending ino 203063639, on dev aacd0s1g 0xcdd84dd0: tag ufs, type VDIR usecount 1, writecount 0, refcount 3 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xc8de8000 (pid 3612) ino 263490091, on dev aacd0s1g 0xc89e3990: tag ufs, type VREG usecount 1, writecount 1, refcount 3 mountedhere 0 flags () v_object 0xc98a1294 ref 0 pages 4 lock type ufs: EXCL (count 1) by thread 0xca7a57d0 (pid 7242) ino 140245401, on dev aacd0s1g -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[geom] page fault in g_mbr_config()
Hi. I got a panic while performing a repetitive 'fdisk -BI aacd0', where aacd0 is a disk on aac0: IBM ServeRAID-8k. This means that the command was issued after filesystems were already created on aacd (after the first fdisk -BI aacd0 iteration), and are in umount'ed state. This is on 7.2-R, amd64. Config is a GENERIC plus ddb, carp, ipfw, quota. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x8:0x804cc554 stack pointer = 0x10:0xfffe80079b80 frame pointer = 0x10:0xfffe80079bd0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100013 ] Stopped at g_mbr_config+0x64: movq0x20(%rax),%r15 db bt Tracing pid 2 tid 100013 td 0xff000144da50 g_mbr_config() at g_mbr_config+0x64 g_run_events() at g_run_events+0x1b8 g_event_procbody() at g_event_procbody+0x57 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffe80079d30, rbp = 0 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [geom] page fault in g_mbr_config()
2009/7/24 pluknet pluk...@gmail.com: Hi. I got a panic while performing a repetitive 'fdisk -BI aacd0', where aacd0 is a disk on aac0: IBM ServeRAID-8k. This means that the command was issued after filesystems were already created on aacd (after the first fdisk -BI aacd0 iteration), and are in umount'ed state. This is on 7.2-R, amd64. Config is a GENERIC plus ddb, carp, ipfw, quota. Fatal trap 12: page fault while in kernel mode cpuid = 5; apic id = 05 fault virtual address = 0x20 fault code = supervisor read data, page not present instruction pointer = 0x8:0x804cc554 stack pointer = 0x10:0xfffe80079b80 frame pointer = 0x10:0xfffe80079bd0 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (g_event) [thread pid 2 tid 100013 ] Stopped at g_mbr_config+0x64: movq 0x20(%rax),%r15 db bt Tracing pid 2 tid 100013 td 0xff000144da50 g_mbr_config() at g_mbr_config+0x64 g_run_events() at g_run_events+0x1b8 g_event_procbody() at g_event_procbody+0x57 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xfffe80079d30, rbp = 0 --- And, of course... db show proc 818 Process 818 (fdisk) at 0xff0004ed1000: state: NORMAL uid: 0 gids: 0, 0, 5 parent: pid 814 at 0xff00045c0478 ABI: FreeBSD ELF64 arguments: fdisk threads: 1 100169 D g_waitfo 0xff0004ec9100 fdisk db bt 818 Tracing pid 818 tid 100169 td 0xff0004fbf6e0 sched_switch() at sched_switch+0x1fe mi_switch() at mi_switch+0x18e sleepq_timedwait() at sleepq_timedwait+0x31 _sleep() at _sleep+0x354 g_waitfor_event() at g_waitfor_event+0x9a g_ctl_ioctl() at g_ctl_ioctl+0x2df giant_ioctl() at giant_ioctl+0x7d devfs_ioctl_f() at devfs_ioctl_f+0x77 kern_ioctl() at kern_ioctl+0xa2 ioctl() at ioctl+0xf9 syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x8008200ec, rsp = 0x7fffe1d8, rbp = 0x4 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ftpd - Logging and resolving IP
2009/7/20 Cristiano Deana cristiano.de...@gmail.com: Hi, i use ftpd (base system), logging login, xfer, auth failure. What i need is to log the IP address of the client, not the hostname. I looked in ftpd(8) ma it seems it's not possible to disable the reverse resolution. Any idea? Thanks in advance I hope it's still applicable. --- libexec/tftpd/tftpd.c.orig 2007-11-09 06:13:22.0 +0300 +++ libexec/tftpd/tftpd.c 2007-11-09 06:13:49.0 +0300 @@ -487,7 +487,7 @@ char hbuf[NI_MAXHOST]; getnameinfo((struct sockaddr *)from, from.ss_len, - hbuf, sizeof(hbuf), NULL, 0, 0); + hbuf, sizeof(hbuf), NULL, 0, NI_NUMERICHOST); syslog(LOG_INFO, %s: %s request for %s: %s, hbuf, tp-th_opcode == WRQ ? write : read, filename, errtomsg(ecode)); -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[rfc] MFC 7.x bce(4) to 6.x
Hi. Is there a planned MFC of bce(4) changes between 6.4 and 7.2 to RELENG_6? We need this at work in order to support Broadcom BCM5709 in (post-)6.4. I could able to backport recent 7.x changes to 6.4. I'm not sure about MSI and/or TSO4 stability here since there are changes since 6.x in bce(4). What I did is checkout RELENG_7 bce sources plus small hackish patch to compile this on 6.x. # uname -a FreeBSD 6.4-RELEASE FreeBSD 6.4-RELEASE #0: Fri Jul 17 21:08:32 MSD 2009 root@:/usr/obj/usr/src/sys/SMP i386 It seems to work good. I have a network access to the box now. after kldload if_bce: bce0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9200-0x93ff irq 28 at device 0.0 on pci11 miibus0: MII bus on bce0 ukphy0: Generic IEEE 802.3u media interface on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FD X, auto bce0: Ethernet address: 00:1a:64:e5:13:ec bce0: link state changed to DOWN bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags ( MFW MSI ) bce1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9400-0x95ff irq 40 at device 0.1 on pci11 miibus1: MII bus on bce1 ukphy1: Generic IEEE 802.3u media interface on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FD X, auto bce1: Ethernet address: 00:1a:64:e5:13:ee bce1: ASIC (0x bce1: link state changed to DOWN 57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags( MFW MSI ) bce0: link state changed to UP bce0: link state changed to DOWN bce0: link state changed to UP The patch (against if_bce.c,v 1.34.2.9): --- /home/pluknet/cvs-7/src/sys/dev/bce/if_bce.cWed Jun 3 13:42:55 2009 +++ bce/if_bce.cFri Jul 17 15:26:00 2009 @@ -54,6 +54,12 @@ __FBSDID($FreeBSD: src/sys/dev/bce/if_b #include dev/bce/if_bcereg.h #include dev/bce/if_bcefw.h +/* From sys/mbuf.h */ +#define CSUM_TSO 0x0020 /* will do TSO */ + +/* From net/if.h */ +#define IFCAP_TSO4 0x00100 /* can do TCP Segmentation Offload */ + // /* BCE Debug Options*/ // @@ -1059,7 +1065,7 @@ bce_attach(device_t dev) /* Hookup IRQ last. */ rc = bus_setup_intr(dev, sc-bce_res_irq, INTR_TYPE_NET | INTR_MPSAFE, - NULL, bce_intr, sc, sc-bce_intrhand); + bce_intr, sc, sc-bce_intrhand); if (rc) { BCE_PRINTF(%s(%d): Failed to setup IRQ!\n, @@ -6391,13 +6397,24 @@ bce_tx_encap(struct bce_softc *sc, struc bus_dma_segment_t segs[BCE_MAX_SEGMENTS]; bus_dmamap_t map; struct tx_bd *txbd = NULL; +#if __FreeBSD_version = 700022 + struct m_tag *mtag; +#endif struct mbuf *m0; +#if __FreeBSD_version 700022 struct ether_vlan_header *eh; struct ip *ip; struct tcphdr *th; - u16 prod, chain_prod, etype, mss = 0, vlan_tag = 0, flags = 0; +#endif + u16 prod, chain_prod, +#if __FreeBSD_version 700022 + etype, +#endif + mss = 0, vlan_tag = 0, flags = 0; u32 prod_bseq; +#if __FreeBSD_version 700022 int hdr_len = 0, e_hlen = 0, ip_hlen = 0, tcp_hlen = 0, ip_len = 0; +#endif #ifdef BCE_DEBUG u16 debug_prod; @@ -6418,6 +6435,7 @@ bce_tx_encap(struct bce_softc *sc, struc flags |= TX_BD_FLAGS_IP_CKSUM; if (m0-m_pkthdr.csum_flags (CSUM_TCP | CSUM_UDP)) flags |= TX_BD_FLAGS_TCP_UDP_CKSUM; +#if __FreeBSD_version 700022 if (m0-m_pkthdr.csum_flags CSUM_TSO) { /* For TSO the controller needs two pieces of info, */ /* the MSS and the IP+TCP options length. */ @@ -6481,14 +6499,23 @@ bce_tx_encap(struct bce_softc *sc, struc bce_tx_encap_skip_tso: DBRUN(sc-requested_tso_frames++); } +#endif } /* Transfer any VLAN tags to the bd. */ +#if __FreeBSD_version 700022 if (m0-m_flags M_VLANTAG) { flags |= TX_BD_FLAGS_VLAN_TAG; vlan_tag = m0-m_pkthdr.ether_vtag; } +#else +mtag = VLAN_OUTPUT_TAG(sc-bce_ifp, m0); +if (mtag != NULL) { +flags |= TX_BD_FLAGS_VLAN_TAG; +vlan_tag = VLAN_TAG_VALUE(mtag); +} +#endif /* Map the mbuf into DMAable memory. */ prod = sc-tx_prod; chain_prod = TX_CHAIN_IDX(prod); -- wbr, pluknet bce.7-down-to-6.patch Description: Binary data ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [rfc] MFC 7.x bce(4) to 6.x
2009/7/17 pluknet pluk...@gmail.com: Hi. Is there a planned MFC of bce(4) changes between 6.4 and 7.2 to RELENG_6? We need this at work in order to support Broadcom BCM5709 in (post-)6.4. I could able to backport recent 7.x changes to 6.4. I'm not sure about MSI and/or TSO4 stability here since there are changes since 6.x in bce(4). What I did is checkout RELENG_7 bce sources plus small hackish patch to compile this on 6.x. # uname -a FreeBSD 6.4-RELEASE FreeBSD 6.4-RELEASE #0: Fri Jul 17 21:08:32 MSD 2009 root@:/usr/obj/usr/src/sys/SMP i386 It seems to work good. I have a network access to the box now. after kldload if_bce: bce0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9200-0x93ff irq 28 at device 0.0 on pci11 miibus0: MII bus on bce0 ukphy0: Generic IEEE 802.3u media interface on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FD X, auto bce0: Ethernet address: 00:1a:64:e5:13:ec bce0: link state changed to DOWN bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags ( MFW MSI ) bce1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9400-0x95ff irq 40 at device 0.1 on pci11 miibus1: MII bus on bce1 ukphy1: Generic IEEE 802.3u media interface on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FD X, auto bce1: Ethernet address: 00:1a:64:e5:13:ee bce1: ASIC (0x bce1: link state changed to DOWN 57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags( MFW MSI ) bce0: link state changed to UP bce0: link state changed to DOWN bce0: link state changed to UP Ah, yes. Forgot to show dmesg from 7.2 for comparison: bce0: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9200-0x93ff irq 28 at device 0.0 on pci11 bce0: Reserved 0x200 bytes for rid 0x10 type 3 at 0x9200 bce0: attempting to allocate 1 MSI vectors (16 supported) bce0: using IRQ 256 for MSI miibus0: MII bus on bce0 bce0: bpf attached bce0: Ethernet address: 00:1a:64:e5:13:ec bce0: [MPSAFE] bce0: [ITHREAD] bce0: ASIC (0x57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags( MFW MSI ) bce1: Broadcom NetXtreme II BCM5709 1000Base-T (C0) mem 0x9400-0x95ff irq 40 at device 0.1 on pci11 bce1: Reserved 0x200 bytes for rid 0x10 type 3 at 0x9400 bce1: attempting to allocate 1 MSI vectors (16 supported) bce1: using IRQ 257 for MSI miibus1: MII bus on bce1 bce1: bpf attached bce1: Ethernet address: 00:1a:64:e5:13:ee bce1: [MPSAFE] bce1: [ITHREAD] bce1: ASIC (0x57092003); Rev (C0); Bus (PCIe x2, 5Gbps); B/C (0x04060705); Flags( MFW MSI ) bce0: link state changed to UP The patch (against if_bce.c,v 1.34.2.9): --- /home/pluknet/cvs-7/src/sys/dev/bce/if_bce.c Wed Jun 3 13:42:55 2009 +++ bce/if_bce.c Fri Jul 17 15:26:00 2009 @@ -54,6 +54,12 @@ __FBSDID($FreeBSD: src/sys/dev/bce/if_b #include dev/bce/if_bcereg.h #include dev/bce/if_bcefw.h +/* From sys/mbuf.h */ +#define CSUM_TSO 0x0020 /* will do TSO */ + +/* From net/if.h */ +#define IFCAP_TSO4 0x00100 /* can do TCP Segmentation Offload */ + // /* BCE Debug Options */ // @@ -1059,7 +1065,7 @@ bce_attach(device_t dev) /* Hookup IRQ last. */ rc = bus_setup_intr(dev, sc-bce_res_irq, INTR_TYPE_NET | INTR_MPSAFE, - NULL, bce_intr, sc, sc-bce_intrhand); + bce_intr, sc, sc-bce_intrhand); if (rc) { BCE_PRINTF(%s(%d): Failed to setup IRQ!\n, @@ -6391,13 +6397,24 @@ bce_tx_encap(struct bce_softc *sc, struc bus_dma_segment_t segs[BCE_MAX_SEGMENTS]; bus_dmamap_t map; struct tx_bd *txbd = NULL; +#if __FreeBSD_version = 700022 + struct m_tag *mtag; +#endif struct mbuf *m0; +#if __FreeBSD_version 700022 struct ether_vlan_header *eh; struct ip *ip; struct tcphdr *th; - u16 prod, chain_prod, etype, mss = 0, vlan_tag = 0, flags = 0; +#endif + u16 prod, chain_prod, +#if __FreeBSD_version 700022 + etype, +#endif + mss = 0, vlan_tag = 0, flags = 0; u32 prod_bseq; +#if __FreeBSD_version 700022 int hdr_len = 0, e_hlen = 0, ip_hlen = 0, tcp_hlen = 0, ip_len = 0; +#endif #ifdef BCE_DEBUG u16 debug_prod; @@ -6418,6 +6435,7 @@ bce_tx_encap(struct bce_softc *sc, struc flags |= TX_BD_FLAGS_IP_CKSUM; if (m0-m_pkthdr.csum_flags (CSUM_TCP | CSUM_UDP)) flags |= TX_BD_FLAGS_TCP_UDP_CKSUM; +#if __FreeBSD_version 700022 if (m0-m_pkthdr.csum_flags CSUM_TSO) { /* For TSO the controller needs two pieces of info
[quota] quotacheck wont go with geom_label
Hi. I found that a file system mounted through /dev/label/name doesn't work with quotacheck. It turned out that quotacheck after looking at fstab tries to open /dev/label/name and then receives EPERM (6.2, 7.2), while it works fine with ordinary device-mounted file systems. Is there a fix? Should I file a PR? -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: bug in ufs?
2009/7/7 Marat N.Afanasyev ama...@ksu.ru: Kostik Belousov wrote: On Tue, Jul 07, 2009 at 12:15:46AM +0400, Marat N.Afanasyev wrote: Kostik Belousov wrote: On Mon, Jul 06, 2009 at 09:45:45PM +0400, Marat N.Afanasyev wrote: i have a huge amount of small files on the source systems, as you can see they have about 20 million files and almost each of them is jpeg or gif. afaik, there are no sparse files at all. i still cannot figure out what is it: a free space leak in ufs2+su or bug in statfs(3), that is used in df, or something else. My guess that it is due to fragmentation. As an experiment, try to create 1-byte file. Does it work on the filesystem in described state ? I can create small files, as many as i have patience, maximum size of such small file is 14336, so. it seems that if file is no greater than (block_size-2048) it can be created. larger file cannot be created. imho, fragmentation on filesystem should be very low, there were no deletions on it, just creations. The fragmentation on UFS usually means using fragments for the file tails, not having file sequential blocks allocated in the non-sequential disk blocks. You experiment confirms my hypothesis. i cannot create a slightly larger file joining two unallocated parts of different blocks even if no fully free block exists? ;) You can't. As far as I remember, ffs_alloc() tries to populate a number of whole blocks and then only packs a remain data tail into a partially allocated block (one, two, .. seven fragments), thus creating a block fragmentation. In other words, it's impossible to partially allocate several blocks for a one file. Can you show your dumpfs output (superblock part)? -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p load: 1.08 cmd: awk 37581 [bo_wwait] 0.00u 0.00s 0% 1472k Setting mtu 1500 on NFS server side network interface fixes the issue. # make Warning: Object directory not changed from original /usr/src/sys/modules/linux @ - /usr/src/sys machine - /usr/src/sys/i386/include cc -c -O2 -fno-strict-aliasing -pipe -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I- -I. -I@ -I@/contrib/altq -I@/../include -I/usr/include -finline-limit=8000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 /usr/src/sys/modules/linux/../../i386/linux/linux_genassym.c sh @/kern/genassym.sh linux_genassym.o linux_assym.h echo #define COMPAT_43 1 opt_compat.h echo #define INET6 1 opt_inet6.h : opt_mac.h : opt_vmpage.h awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p load: 1.08 cmd: awk 37581 [bo_wwait] 0.00u 0.00s 0% 1472k All others subsystems seems to work. db bt 37581 Tracing pid 37581 tid 100364 td 0xc93c7b60 sched_switch(c93c7b60,0,1) at sched_switch+0x143 mi_switch(1,0,c93c7b60,eed95a24,c06ce6f0,...) at mi_switch+0x1ba sleepq_switch(ce138854) at sleepq_switch+0x87 sleepq_wait(ce138854,0,c93c7b60,ce138830,0,...) at sleepq_wait+0x5c msleep(ce138854,ce1387ec,4d,c096823e,0) at msleep+0x269 bufobj_wwait(ce138830,0,0,0,ce1387ec,...) at bufobj_wwait+0x37 nfs_flush(ce138770,1,c93c7b60,0,c93c7b60,...) at nfs_flush+0x8c8 nfs_close(eed95b80) at nfs_close+0xfd VOP_CLOSE_APV(c09ec5c0,eed95b80) at VOP_CLOSE_APV+0x38 vn_close(ce138770,2,cd769100,c93c7b60) at vn_close+0x5a vn_closefile(c9094900,c93c7b60) at vn_closefile+0xea fdrop_locked(c9094900,c93c7b60,cf054600,eed95ca8,c06875f3,...) at fdrop_locked+0xd0 fdrop(c9094900,c93c7b60,c93c7b60,eed95c64,1,...) at fdrop+0x41 closef(c9094900,c93c7b60,0,eed95d38,c949ea78,...) at closef+0x42f kern_close(c93c7b60,3,eed95d30,c08e1d4b,c93c7b60,...) at kern_close+0x20d close(c93c7b60,eed95d04) at close+0x10 syscall(3b,808003b,bfbf003b,0,28190a20,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (6, FreeBSD ELF32, close), eip = 0x2816c1e7, esp = 0xbfbfeb1c, ebp = 0xbfbfeb38 --- db show lockedvnods Locked vnodes 0xce138770: tag nfs, type VREG usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xcd0a2528 ref 0 pages 1 lock type nfs: EXCL (count 1) by thread 0xc93c7b60 (pid 37581) fileid 1372174 fsid 0x100ff05 db show lockedbufs buf at 0xdbf92d08 b_flags = 0x2024vmio,cache,async b_error = 0, b_bufsize = 2048, b_bcount = 1779, b_resid = 0 b_bufobj = (0xce138830), b_data = 0xe2e99000, b_blkno = 0 b_npages = 1, pages(OBJ, IDX, PA): (0xcd0a2528, 0x0, 0xa8067000) db show proc 37581 Process 37581 (awk) at 0xc949ea78: state: NORMAL uid: 0 gids: 0, 0, 2, 3, 4, 5, 20, 31 parent: pid 37557 at 0xc949e860 ABI: FreeBSD ELF32 arguments: awk threads: 1 100364 D bo_wwait 0xce138854 awk Next. # umount /usr/src load: 0.36 cmd: umount 37888 [nfs] 0.00u 0.04s 0% 900k db bt 37888 Tracing pid 37888 tid 100130 td 0xc93c84e0 sched_switch(c93c84e0,0,1) at sched_switch+0x143 mi_switch(1,0,c93c84e0,eeda4aa0,c06ce6f0,...) at mi_switch+0x1ba sleepq_switch(ce1387c8) at sleepq_switch+0x87 sleepq_wait(ce1387c8,0,c93c84e0,ce1387c8,4,...) at sleepq_wait+0x5c msleep(ce1387c8,c0a4af54,50,c09729b5,0,...) at msleep+0x269 acquire(eeda4b20,40,6,c93c84e0,0,...) at acquire+0x7b lockmgr(ce1387c8,2002,ce1387ec,c93c84e0,eeda4b44,...) at lockmgr+0x3fe vop_stdlock(eeda4b68) at vop_stdlock+0x1e VOP_LOCK_APV(c09ec5c0,eeda4b68) at VOP_LOCK_APV+0x43 vn_lock(ce138770,2002,c93c84e0) at vn_lock+0xf4 vflush(cf4f8cf8,1,0,c93c84e0) at vflush+0x136 nfs_unmount(cf4f8cf8,800,c93c84e0) at nfs_unmount+0x3c dounmount(cf4f8cf8,800,c93c84e0) at dounmount+0x3fa unmount(c93c84e0,eeda4d04) at unmount+0x279 syscall(3b,3b,3b,804a4aa,804de10,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (22, FreeBSD ELF32, unmount), eip = 0x280be967, esp = 0xbfbfe56c, ebp = 0xbfbfe618 --- db show lockedvnods Locked vnodes 0xca176aa0: tag ufs, type VDIR usecount 1, writecount 0, refcount 1 mountedhere 0xcf4f8cf8 flags () v_object 0xcb111294 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc93c84e0 (pid 37888) ino 1436672, on dev aacd0s1f 0xce138770: tag nfs, type VREG usecount 1, writecount 0, refcount 4
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? Can you also show the value of ps? A precise map of what processes are doing would give an help. Also would be useful to printout traces for other threads and not only the stucked one. From another run: db ps pid ppid pgrp uid state wmesg wchancmd 1228 1205 1205 0 S+ bo_wwait 0xc9887c10 awk 1205 1196 1205 0 S+ wait 0xc893ea78 make 1202 0 0 0 SL nfsreq 0xc9637a00 [nfsiod 0] 1196 1125 1196 0 Ss+ wait 0xc8942648 bash 1194 1 1194 0 Ss+ ttyin0xc82e9010 getty 1193 1 1193 0 Ss+ ttyin0xc82d7c10 getty 1192 1 1192 0 Ss+ ttyin0xc82f7010 getty 1191 1 1191 0 Ss+ ttyin0xc82f7410 getty 1190 1 1190 0 Ss+ ttyin0xc82f2410 getty 1189 1 1189 0 Ss+ ttyin0xc82e9810 getty 1188 1 1188 0 Ss+ ttyin0xc82e7410 getty 1187 1 1187 0 Ss+ ttyin0xc82d7410 getty 1186 1 1186 0 Ss+ ttyin0xc82f0410 getty 1185 1 1185 0 Ss+ ttyin0xc82f1810 getty 1171 1 1171 0 Ss select 0xc0a8d044 inetd 1134 1 1134 0 Ss nanslp 0xc0a3b4ec cron 1125 1064 1125 0 Ss select 0xc0a8d044 sshd 1064 1 1064 0 Ss select 0xc0a8d044 sshd 901 1 901 0 Ss select 0xc0a8d044 ntpd 796 1 796 0 Ss select 0xc0a8d044 syslogd 767 0 0 0 SL -0xc0a38b40 [accounting] 734 1 734 0 Ss select 0xc0a8d044 devd 50 0 0 0 SL sdflush 0xc0a9ae74 [softdepflush] 49 0 0 0 SL syncer 0xc0a3b25c [syncer] 48 0 0 0 SL vlruwt 0xc828ba78 [vnlru] 47 0 0 0 SL psleep 0xc0a8d5c0 [bufdaemon] 46 0 0 0 SL pgzero 0xc0a9bea4 [pagezero] 45 0 0 0 SL psleep 0xc0a9b9b4 [vmdaemon] 44 0 0 0 SL psleep 0xc0a9b968 [pagedaemon] 43 0 0 0 WL [irq1: atkbd0] 42 0 0 0 WL [swi0: sio] 41 0 0 0 WL [irq15: ata1] 40 0 0 0 WL [irq14: ata0] 39 0 0 0 SL usbevt 0xc81eb210 [usb4] 38 0 0 0 SL usbevt 0xc82c4210 [usb3] 37 0 0 0 SL usbevt 0xc82ab210 [usb2] 36 0 0 0 SL usbevt 0xc82af210 [usb1] 35 0 0 0 WL [irq22: uhci1 uhci3] 34 0 0 0 SL usbtsk 0xc0a37ba4 [usbtask] 33 0 0 0 SL usbevt 0xc8278210 [usb0] 32 0 0 0 WL [irq23: uhci0 uhci+] 31 0 0 0 WL [irq257: bce1] 30 0 0 0 WL [irq256: bce0] 29 0 0 0 SL aifthd 0xc828b218 [aac0aif] 28 0 0 0 WL [irq17: aac0] 27 0 0 0 WL [irq9: acpi0] 26 0 0 0 WL [swi5: +] 25 0 0 0 SL -0xc8150100 [thread taskq] 24 0 0 0 WL [swi6: Giant taskq] 9 0 0 0 SL -0xc8150280 [acpi_task_2] 8 0 0 0 SL -0xc8150280 [acpi_task_1] 7 0 0 0 SL -0xc8150280 [acpi_task_0] 23 0 0 0 WL [swi6: task queue] 6 0 0 0 SL -0xc8150400 [kqueue taskq] 22 0 0 0 WL [swi2: cambio] 5 0 0 0 SL ccb_scan 0xc0a1e204 [xpt_thrd] 21 0 0 0 SL -0xc0a358c0 [yarrow] 4 0 0 0 SL -0xc0a38488 [g_down] 3 0 0 0 SL -0xc0a38484 [g_up] 2 0 0 0 SL -0xc0a3847c [g_event] 20 0 0 0 WL [swi1: net] 19 0 0 0 WL [swi3: vm] 18 0 0 0 WL [swi4: clock sio] 17 0 0 0 RL CPU 0 [idle: cpu0] 16 0 0 0 RL CPU 1 [idle: cpu1] 15 0 0 0 RL CPU 2 [idle: cpu2] 14 0 0 0 RL CPU 3 [idle: cpu3] 13 0 0 0 RL
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? Can you also show the value of ps? A precise map of what processes are doing would give an help. Also would be useful to printout traces for other threads and not only the stucked one. From another run: I'm unable to see who would be locking the buffer object in question. Do you have INVARIANT_SUPPORT/INVARIANTS on? Yes, I do both. What revision of /usr/src/sys/kern/vfs_bio.c are you running with? As of 6.4-R: CVS rev 1.491.2.12.4.1 / SVN rev 183531. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? Can you also show the value of ps? A precise map of what processes are doing would give an help. Also would be useful to printout traces for other threads and not only the stucked one. From another run: I'm unable to see who would be locking the buffer object in question. Do you have INVARIANT_SUPPORT/INVARIANTS on? Yes, I do both. What revision of /usr/src/sys/kern/vfs_bio.c are you running with? As of 6.4-R: CVS rev 1.491.2.12.4.1 / SVN rev 183531. Please try this patch and report. Thanks, Attilio --- src/sys/nfsclient/nfs_vnops.c 2008/02/13 20:44:18 1.281 +++ src/sys/nfsclient/nfs_vnops.c 2008/03/22 09:15:15 1.282 @@ -33,7 +33,7 @@ */ #include sys/cdefs.h -__FBSDID($FreeBSD: /usr/local/www/cvsroot/FreeBSD/src/sys/nfsclient/nfs_vnops.c,v 1.281 2008/02/13 20:44:18 attilio Exp $); +__FBSDID($FreeBSD: /usr/local/www/cvsroot/FreeBSD/src/sys/nfsclient/nfs_vnops.c,v 1.282 2008/03/22 09:15:15 jeff Exp $); Do you refer to the whole svn r177493, or is its nfs part will be enough? This only vfs_vnops.c diff seems not applicable without underneath kernel part changes. I'll try. Thanks. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? Can you also show the value of ps? A precise map of what processes are doing would give an help. Also would be useful to printout traces for other threads and not only the stucked one. From another run: I'm unable to see who would be locking the buffer object in question. Do you have INVARIANT_SUPPORT/INVARIANTS on? Yes, I do both. What revision of /usr/src/sys/kern/vfs_bio.c are you running with? As of 6.4-R: CVS rev 1.491.2.12.4.1 / SVN rev 183531. Please try this patch and report. Thanks, Attilio --- src/sys/nfsclient/nfs_vnops.c 2008/02/13 20:44:18 1.281 +++ src/sys/nfsclient/nfs_vnops.c 2008/03/22 09:15:15 1.282 @@ -33,7 +33,7 @@ */ #include sys/cdefs.h -__FBSDID($FreeBSD: /usr/local/www/cvsroot/FreeBSD/src/sys/nfsclient/nfs_vnops.c,v 1.281 2008/02/13 20:44:18 attilio Exp $); +__FBSDID($FreeBSD: /usr/local/www/cvsroot/FreeBSD/src/sys/nfsclient/nfs_vnops.c,v 1.282 2008/03/22 09:15:15 jeff Exp $); Do you refer to the whole svn r177493, or is its nfs part will be enough? This only vfs_vnops.c diff seems not applicable without underneath kernel part changes. I'll try. Thanks. The NFS part should be enough, though I don't understand why it doesn't trigger a panic on STABLE_6 as long as, at least in my revision, there is an assert for the buffer object lock to be held in bufobj_wwait(). What's your sys/kern/vfs_bio.c rev? As of 6.4-R. $FreeBSD: src/sys/kern/vfs_bio.c,v 1.491.2.12.4.1 2008/10/02 02:57:24 kensmith Exp $ -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/29 Kostik Belousov kostik...@gmail.com: On Mon, Jun 29, 2009 at 05:18:03PM +0400, pluknet wrote: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/29 Attilio Rao atti...@freebsd.org: 2009/6/29 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: 2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? This is 100% reproducible. Lock in the same place. Any hints? Can you also show the value of ps? A precise map of what processes are doing would give an help. Also would be useful to printout traces for other threads and not only the stucked one. From another run: I'm unable to see who would be locking the buffer object in question. Do you have INVARIANT_SUPPORT/INVARIANTS on? Yes, I do both. What revision of /usr/src/sys/kern/vfs_bio.c are you running with? As of 6.4-R: CVS rev 1.491.2.12.4.1 / SVN rev 183531. It seems that your changes of MTU cause nfs requests to never reach network. bo_wwait is the state where thread waits for all outstanding i/o on bufobj to drain. It appears that you are right. I found in tcpdump that nfs client tries to send UDP packets sized in 1500 bytes. 19:40:13.937085 IP (tos 0x0, ttl 64, id 4658, offset 0, flags [+], proto: UDP (17), length: 1500) client.1662412076 server.nfs: 1472 write fh 1145,216955/1372174 1779 (1779) bytes @ 0 unstable While here I reverted mtu on NFS server back to 1500, then after some seconds locked up NFS client box continued to build a module as there were no any locking problems at all. So I understand this as defined behavior. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
[nfs] process locks in bo_wwait on 6.4
: 0, 0, 2, 3, 4, 5, 20, 31 parent: pid 37812 at 0xc936ea78 ABI: FreeBSD ELF32 arguments: umount threads: 1 100130 D nfs 0xce1387c8 umount -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: [nfs] process locks in bo_wwait on 6.4
2009/6/26 pluknet pluk...@gmail.com: Hello. While building a module on nfs mounted /usr/src I got an unkillable process waiting forever in bo_wwait. Small note: iface on NFS server has mtu changed from 1500 to 1450. Can this be a source of the problem? # make Warning: Object directory not changed from original /usr/src/sys/modules/linux @ - /usr/src/sys machine - /usr/src/sys/i386/include cc -c -O2 -fno-strict-aliasing -pipe -Werror -D_KERNEL -DKLD_MODULE -nostdinc -I- -I. -I@ -I@/contrib/altq -I@/../include -I/usr/include -finline-limit=8000 -mno-align-long-strings -mpreferred-stack-boundary=2 -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -ffreestanding -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -fformat-extensions -std=c99 /usr/src/sys/modules/linux/../../i386/linux/linux_genassym.c sh @/kern/genassym.sh linux_genassym.o linux_assym.h echo #define COMPAT_43 1 opt_compat.h echo #define INET6 1 opt_inet6.h : opt_mac.h : opt_vmpage.h awk -f @/tools/vnode_if.awk @/kern/vnode_if.src -p load: 1.08 cmd: awk 37581 [bo_wwait] 0.00u 0.00s 0% 1472k All others subsystems seems to work. db bt 37581 Tracing pid 37581 tid 100364 td 0xc93c7b60 sched_switch(c93c7b60,0,1) at sched_switch+0x143 mi_switch(1,0,c93c7b60,eed95a24,c06ce6f0,...) at mi_switch+0x1ba sleepq_switch(ce138854) at sleepq_switch+0x87 sleepq_wait(ce138854,0,c93c7b60,ce138830,0,...) at sleepq_wait+0x5c msleep(ce138854,ce1387ec,4d,c096823e,0) at msleep+0x269 bufobj_wwait(ce138830,0,0,0,ce1387ec,...) at bufobj_wwait+0x37 nfs_flush(ce138770,1,c93c7b60,0,c93c7b60,...) at nfs_flush+0x8c8 nfs_close(eed95b80) at nfs_close+0xfd VOP_CLOSE_APV(c09ec5c0,eed95b80) at VOP_CLOSE_APV+0x38 vn_close(ce138770,2,cd769100,c93c7b60) at vn_close+0x5a vn_closefile(c9094900,c93c7b60) at vn_closefile+0xea fdrop_locked(c9094900,c93c7b60,cf054600,eed95ca8,c06875f3,...) at fdrop_locked+0xd0 fdrop(c9094900,c93c7b60,c93c7b60,eed95c64,1,...) at fdrop+0x41 closef(c9094900,c93c7b60,0,eed95d38,c949ea78,...) at closef+0x42f kern_close(c93c7b60,3,eed95d30,c08e1d4b,c93c7b60,...) at kern_close+0x20d close(c93c7b60,eed95d04) at close+0x10 syscall(3b,808003b,bfbf003b,0,28190a20,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (6, FreeBSD ELF32, close), eip = 0x2816c1e7, esp = 0xbfbfeb1c, ebp = 0xbfbfeb38 --- db show lockedvnods Locked vnodes 0xce138770: tag nfs, type VREG usecount 1, writecount 0, refcount 3 mountedhere 0 flags () v_object 0xcd0a2528 ref 0 pages 1 lock type nfs: EXCL (count 1) by thread 0xc93c7b60 (pid 37581) fileid 1372174 fsid 0x100ff05 db show lockedbufs buf at 0xdbf92d08 b_flags = 0x2024vmio,cache,async b_error = 0, b_bufsize = 2048, b_bcount = 1779, b_resid = 0 b_bufobj = (0xce138830), b_data = 0xe2e99000, b_blkno = 0 b_npages = 1, pages(OBJ, IDX, PA): (0xcd0a2528, 0x0, 0xa8067000) db show proc 37581 Process 37581 (awk) at 0xc949ea78: state: NORMAL uid: 0 gids: 0, 0, 2, 3, 4, 5, 20, 31 parent: pid 37557 at 0xc949e860 ABI: FreeBSD ELF32 arguments: awk threads: 1 100364 D bo_wwait 0xce138854 awk Next. # umount /usr/src load: 0.36 cmd: umount 37888 [nfs] 0.00u 0.04s 0% 900k db bt 37888 Tracing pid 37888 tid 100130 td 0xc93c84e0 sched_switch(c93c84e0,0,1) at sched_switch+0x143 mi_switch(1,0,c93c84e0,eeda4aa0,c06ce6f0,...) at mi_switch+0x1ba sleepq_switch(ce1387c8) at sleepq_switch+0x87 sleepq_wait(ce1387c8,0,c93c84e0,ce1387c8,4,...) at sleepq_wait+0x5c msleep(ce1387c8,c0a4af54,50,c09729b5,0,...) at msleep+0x269 acquire(eeda4b20,40,6,c93c84e0,0,...) at acquire+0x7b lockmgr(ce1387c8,2002,ce1387ec,c93c84e0,eeda4b44,...) at lockmgr+0x3fe vop_stdlock(eeda4b68) at vop_stdlock+0x1e VOP_LOCK_APV(c09ec5c0,eeda4b68) at VOP_LOCK_APV+0x43 vn_lock(ce138770,2002,c93c84e0) at vn_lock+0xf4 vflush(cf4f8cf8,1,0,c93c84e0) at vflush+0x136 nfs_unmount(cf4f8cf8,800,c93c84e0) at nfs_unmount+0x3c dounmount(cf4f8cf8,800,c93c84e0) at dounmount+0x3fa unmount(c93c84e0,eeda4d04) at unmount+0x279 syscall(3b,3b,3b,804a4aa,804de10,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (22, FreeBSD ELF32, unmount), eip = 0x280be967, esp = 0xbfbfe56c, ebp = 0xbfbfe618 --- db show lockedvnods Locked vnodes 0xca176aa0: tag ufs, type VDIR usecount 1, writecount 0, refcount 1 mountedhere 0xcf4f8cf8 flags () v_object 0xcb111294 ref 0 pages 0 lock type ufs: EXCL (count 1) by thread 0xc93c84e0 (pid 37888) ino 1436672, on dev aacd0s1f 0xce138770: tag nfs, type VREG usecount 1, writecount 0, refcount 4 mountedhere 0 flags () v_object 0xcd0a2528 ref 0 pages 1 lock type nfs: EXCL (count 1) by thread 0xc93c7b60 (pid 37581) with 1 pending fileid 1372174 fsid 0x100ff05 db show lockedbufs buf at 0xdbf92d08 b_flags = 0x2024vmio,cache,async b_error = 0, b_bufsize = 2048
Re: lock up in 6.2 (procs massively stuck in Giant)
2009/5/13 John Baldwin j...@freebsd.org: On Wednesday 13 May 2009 11:41:22 am pluknet wrote: 2009/5/13 John Baldwin j...@freebsd.org: On Wednesday 13 May 2009 2:40:33 am pluknet wrote: 2009/5/13 pluknet pluk...@gmail.com: 2009/5/13 John Baldwin j...@freebsd.org: On Tuesday 12 May 2009 4:59:19 pm pluknet wrote: Hi. From just another box (not from the first two mentioned earlier) with a similar locking issue. If it would make sense, since there are possibly a bit different conditions. clock proc here is on swi4, I hope it's a non-important difference. 18 0 0 0 LL *Giant 0xd0a6b140 [swi4: clock sio] db bt 18 Ok, this is a known issue in 6.x. It is fixed in 6.4. Looking at the face of kern_timeout.c I suspect that was fixed in r181012. No, this particular issue is fixed by a change to sched_4bsd.c in r179975. Gah.. We constrained to use ule scheduler on 6.x (yes, I know that it's known to be broken (c)), since we have had a very bad interactivity on 4bsd on our workload. Ok, that's just another reason to move to 7.x. Hmmm I would have thought ULE wouldn't have suffered from this bug. The problem on 4BSD was if softclock ever blocked on Giant and the thread that held Giant was on a run queue and pinned to a specific CPU but that another userland thread was running on that CPU already, the userland thread would never yield the CPU so long as it kept busy since the round robin timeout would never run. -- John Baldwin That's another sort of lockup on 6.2 we experience often. May that be connected to ULE on 6.x? I regret if this info is not enough. db ps pid ppid pgrp uid state wmesg wchancmd 74606 74602 68315 0 R stat 74605 74601 68315 0 S piperd 0xcd7b0198 head 74603 74601 68315 0 S piperd 0xc8ca2198 sort 74602 74601 68315 0 S wait 0xcaed6000 find 74601 68319 68315 0 S wait 0xd0f5e860 sh 74588 7495 7495 13581 S lockf0xd1919dc0 httpd 74587 7495 7495 13581 S lockf0xce42b400 httpd 74586 8016 8016 7336 R httpd 74585 8016 8016 7336 R httpd 74584 9498 9498 26316 R httpd 74341 3399 8150 13289 R CPU 7 perl5.8.8 74020 7495 7495 13581 S lockf0xccf31180 httpd 74019 8247 8247 26256 R httpd 74018 8016 8016 7336 R CPU 4 httpd 72732 9190 9190 26291 RL CPU 1 httpd 72731 9190 9190 26291 S accept 0xcd31572e httpd 72729 8693 8693 26404 R httpd 72727 9190 9190 26291 S accept 0xcd31572e httpd 72726 9396 9396 26262 R httpd 72088 7495 7495 13581 S kqread 0xcb2f9400 httpd 72087 9190 9190 26291 S accept 0xcd31572e httpd 72085 9190 9190 26291 S accept 0xcd31572e httpd 72084 8162 8162 18538 R httpd 71402 7495 7495 13581 S lockf0xccfab3c0 httpd 71401 8162 8162 18538 R httpd 71400 9190 9190 26291 S accept 0xcd31572e httpd 71399 8716 8716 26278 R CPU 3 httpd 70063 7574 7574 11303 S lockf0xccf312c0 httpd 69417 8371 8371 25968 R httpd 69416 9030 9030 39658 R httpd 68319 68318 68315 0 S piperd 0xd1b9f198 sh 68318 68315 68315 0 S wait 0xc82f7648 lockf 68315 68313 68315 0 Ss wait 0xca914430 sh 68313 34501 34501 0 S piperd 0xcfbef000 cron 68310 8016 8016 7336 R httpd 68309 64318 64318 14620 R httpd 68308 9111 9111 26280 S lockf0xca51cc00 httpd 68302 8595 8595 26129 RL httpd 68301 9190 9190 26291 S accept 0xcd31572e httpd 68300 8483 8483 26049 R httpd 68296 8747 8747 33525 R httpd 68287 8952 8952 26340 R httpd 68282 9110 9110 26102 R httpd 68280 9110 9110 26102 R httpd 68272 8339 8339 17137 S accept 0xcc5159f6 httpd 68271 8595 8595 26129 R httpd 68269 9470 9470 26006 R httpd 68268 9030 9030 39658 S sbwait 0xc89d0da4 httpd 68251 36391 36391 38054 R httpd 68249 7527 7527 16760 R httpd 68247 9030 9030 39658 R httpd 68245 8901 8901 26031 S accept 0xcd3159f6 httpd 68239 8928 8928 26128 R httpd 68238 8928 8928 26128 S lockf0xd1659c40 httpd 68214 7619 7619 6478 S accept 0xcb25219e httpd 68210 8675 8675 26171 S
Re: panic on 6.4-R in ioapic_get_vector() during device probe
2009/6/18 John Baldwin j...@freebsd.org: On Wednesday 17 June 2009 8:13:31 am pluknet wrote: Hi. This is on 6.4-RELEASE-p5 Early in boot (probably due to network outage):: Hit [Enter] to boot immediately, or any other key for command prompt. Booting [/boot/kernel/kernel]... /boot/kernel/acpi.ko text*0x44f40 | readin failed elf32*loadimage: read failed GDB: no debug ports present and then.. Timecounter i8254 frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU E5440 @ 2.83GHz (2826.26-MHz 686-class CPU) Origin = GenuineIntel Id = 0x1067a Stepping = 10 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0x40ce3bdSSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA ,b19,b26 AMD Features=0x2000LM AMD Features2=0x1LAHF Cores per package: 4 real memory = 3220992000 (3071 MB) avail memory = 3150835712 (3004 MB) FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 user VMEM accounting on ioapic0: Assuming intbase of 0 MPTable: Ignoring interrupt entry for missing ioapic0 ioapic0 Version 2.0 irqs 0-23 on motherboard The 'ignoring interrupt entry' message is very odd. Can you get output from 'mptable'? I'm afraid that panic was only once and due to acpi.ko network load problem. I can boot this box with acpi opted out explicitly if it makes sense, also in order to reproduce those conditions. Are you able to boot with ACPI enabled? Of course. These boxes boot always fine with ACPI enabled. Below is part of related dmesg (now from from 7.2) with ACPI enabled: --- FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 This module (opensolaris) contains code covered by the Common Development and Distribution License (CDDL) see http://opensolaris.org/os/licensing/opensolaris_license/ ioapic0 Version 2.0 irqs 0-23 on motherboard --- At this point I would not be surprised if the MP Table was just flat wrong on modern machines as it seems many BIOS vendors do not test it anymore but only test the ACPI tables. : # mptable === MPTable --- MP Floating Pointer Structure: location: EBDA physical address: 0x0009ad40 signature:'_MP_' length: 16 bytes version: 1.4 checksum: 0xc9 mode: Virtual Wire --- MP Config Table Header: physical address: 0x0009be10 signature:'PCMP' base table length:716 version: 1.4 checksum: 0xd6 OEM ID: 'IBM ENSW' Product ID: 'x3650 SMP ' OEM table pointer:0x OEM table size: 0 entry count: 72 local APIC address: 0xfee0 extended table length:328 extended table checksum: 217 --- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model StepFlags 0 0x14BSP, usable 6 7 10 0x0301 1 0x14AP, usable 6 7 10 0x0301 2 0x14AP, usable 6 7 10 0x0301 3 0x14AP, usable 6 7 10 0x0301 4 0x14AP, usable 6 7 10 0x0301 5 0x14AP, usable 6 7 10 0x0301 6 0x14AP, usable 6 7 10 0x0301 7 0x14AP, usable 6 7 10 0x0301 -- Bus:Bus ID Type 0 PCI 1 PCI 2 PCI 3 PCI 4 PCI 5 PCI 6 PCI 7 PCI 8 PCI 9 PCI 10 PCI 11 PCI 12 PCI 13 PCI 14 PCI 15 PCI 16 PCI 17
Re: 6.2 sporadically locks up
2009/6/19 Adrian Chadd adr...@freebsd.org: Just modify the driver slightly to hijack a different device prefix :) Hi, Adrian. That's where I just go if I should have to. - .d_name = aac, + .d_name = aacu, (or vise versa) While here, I'd like to give some summary about locking up with irq17:bce1 aacu0 vs arcconf scenario. Abstract: we have a number of boxes with IBM ServeRAID 8k on 6.2. That scenario takes place only with aacu b15753, and not with b15411 (at least not noticed). We take a decision some time ago to move some boxes to 6.4 (and leave vendor aacu b15753 there as it's) to see how it goes. Until now (2 or 3 weeks) there were no lockup. I hope it will so farther.. Adrian 2009/6/17 pluknet pluk...@gmail.com: 2009/6/17 Ed Maste ema...@freebsd.org: On Tue, Jun 16, 2009 at 07:03:34PM +0400, pluknet wrote: As for allpcpu, I often see the picture, when one CPU runs the irq17: bce1 aacu0 thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. Do you see the same result w/ the in-tree aac(4) driver as opposed to Adaptec's version? -Ed [It's quite hard to move back to aac(4) as that requires fstab update [ aacdu0 - aacd0] and instant reboot, because we use quotas and quotacheck looks into /etc/fstab. Such preparations as fstab update and commenting out load_aacu=YES will give discrepancy between fstab and actual mount points.] I will try anyway. Thank you for your help. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
panic on 6.4-R in ioapic_get_vector() during device probe
ric_attach+0x16 nexus_attach(c7d49680) at nexus_attach+0x13 device_attach(c7d49680,c06cfe58,c7d49680,c0a173f0,c25000,...) at device_attach+0 x58 device_probe_and_attach(c7d49680) at device_probe_and_attach+0xc4 root_bus_configure(c0c20d88,c067ab96,0,c1ec00,c1e000,...) at root_bus_configure+ 0x16 configure(0,c1ec00,c1e000,0,c0452555,...) at configure+0x9 mi_startup() at mi_startup+0x96 begin() at begin+0x2c -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 6.2 sporadically locks up
2009/6/17 Ed Maste ema...@freebsd.org: On Tue, Jun 16, 2009 at 07:03:34PM +0400, pluknet wrote: As for allpcpu, I often see the picture, when one CPU runs the irq17: bce1 aacu0 thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. Do you see the same result w/ the in-tree aac(4) driver as opposed to Adaptec's version? -Ed [It's quite hard to move back to aac(4) as that requires fstab update [ aacdu0 - aacd0] and instant reboot, because we use quotas and quotacheck looks into /etc/fstab. Such preparations as fstab update and commenting out load_aacu=YES will give discrepancy between fstab and actual mount points.] I will try anyway. Thank you for your help. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
6.2 sporadically locks up
(3231536928,4004568076) at VOP_READ_APV+56 ufs_readdir(4004568208) at ufs_readdir+209 VOP_READDIR_APV(3231536928,4004568208) at VOP_READDIR_APV+56 getdirentries(3374408080,4004568324) at getdirentries+347 syscall(59,59,59,134693312,134688816,...) at syscall+703 Xint0x80_syscall() at Xint0x80_syscall+31 --- syscall (196, FreeBSD ELF32, getdirentries), eip = 672506011, esp = 3217023708, ebp = 3217023752 --- -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 6.2 sporadically locks up
2009/6/16 John Baldwin j...@freebsd.org: On Tuesday 16 June 2009 6:23:47 am pluknet wrote: Hi all. This is one of livelocks we have on a weekly basis. Yes, we do still use ULE scheduler on 6.2 and not moved to 7 yet. Any thought? db ps pid ppid pgrp uid state wmesg wchan cmd 70304 69700 69670 0 R sh 70303 70292 93818 3572 RL CPU 2 chrsh 70302 70294 93818 3572 R crond 70299 93818 93818 0 R CPU 1 crond 70298 93818 93818 0 R crond 70294 93818 93818 3572 S piperd 0xd1d8d330 crond 70292 93818 93818 3572 R crond 70284 70279 70040 10229 S biord 0xdbe2e4e8 perl5.8.8 70283 70278 93818 10229 SL biord 0xdbd70710 exim-4.63-0 70279 70040 70040 10229 S wait 0xc9005860 sh 70278 69996 93818 10229 S wait 0xcaf4ac90 sh 70191 4680 4680 9738 S select 0xc0a12944 httpd 70190 4796 4796 10008 R httpd 70188 5043 5043 30532 RL httpd 70043 6 70043 3572 Ss select 0xc0a12944 wget 70042 7 70042 3572 Ss select 0xc0a12944 wget 70041 70001 70041 3572 Ss select 0xc0a12944 wget 70040 69996 70040 10229 Ss piperd 0xca35e990 perl5.8.8 70039 70002 70039 3572 Ss select 0xc0a12944 wget This is not a full listing so one cannot assume it is a deadlock. Ok, usually that listing doesn't show anything interesting in this sort of lockup. I'll share a full ps output next time (sure, rather soon). db show lockchain Giant thread -3420549 (pid 434, ) ??? (0xc099cb0c) You would use 'show lock' or perhaps 'show turnstile' with specific lock variables. 'show lockchain' needs a TID or PID. Ok. As for turnstile, it showed nothing at all, hence omitted. db show allpcpu cpuid = 0 curthread = 0xc7cfec80: pid 18 swi4: clock sio cpuid = 1 curthread = 0xc99f9960: pid 70299 crond cpuid = 2 curthread = 0xc99f9af0: pid 70303 chrsh cpuid = 3 curthread = 0xd087d320: pid 69700 sh cpuid = 4 curthread = 0xc98f84b0: pid 69604 httpd cpuid = 5 curthread = 0xcaebe190: pid 69598 httpd cpuid = 6 curthread = 0xc7cfe960: pid 27 irq17: bce1 aacu0 cpuid = 7 curthread = 0xc837fe10: pid 69711 arcconf This is far more useful output than the truncated 'ps'. From this, all of the CPUs are busy (in at least some deadlocks, all the CPUs would be idle instead). There are several deadlocks fixed since 6.2 that I am aware of, but this doesn't look like any of those. I'm not sure why you aren't getting useful stack traces of running threads. I'll do next time. I thought it would be similar to bt PID output and simply didn't include. As for allpcpu, I often see the picture, when one CPU runs the irq17: bce1 aacu0 thread and another one runs arcconf. I wonder if that might be a source of bad locking or races, or.. The arcconf utility uses ioctl that goes into aac/aacu(4) internals. Perhaps DDB in 6.2 doesn't know to look in stoppcbs[]. Hmm, looks like 6.2 only does that if you are using KDB_STOP_NMI. Are you using that kernel option? If not, you probably want to. No, I'm not. Will that add an additional visible overhead on a running system? -- John Baldwin Thank you. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
lockup on 6.4 while bce in MGETHDR
Hi. This is on 6.4-S as of April with uptime about 2 months. Machine could not be accessed via network, didn't reply on ping. Looking at backtrace it's here: if_bce.c::bce_get_buf(): /* This is a new mbuf allocation. */ MGETHDR(m_new, M_DONTWAIT, MT_DATA); From brk-seq: db bt Tracing pid 31 tid 100034 td 0xc833ad00 kdb_enter(c097ef95) at kdb_enter+0x2b siointr1(c83a3c00) at siointr1+0xce siointr(c83a3c00) at siointr+0x5e intr_execute_handlers(c80ee4c8,e8963abc,4,e8963b0c,c08cba63,...) at intr_execute_handlers+0xe1 lapic_handle_intr(38) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc06a393e, esp = 0xe8963b00, ebp = 0xe8963b0c --- _mtx_lock_sleep(c0a655c0,c833ad00,0,0,0) at _mtx_lock_sleep+0xb6 kmem_malloc(c14680c0,1000,101,e8963b8c,c082cadd,...) at kmem_malloc+0x328 page_alloc(c1456000,1000,e8963b7f,101,c8b10016,...) at page_alloc+0x1a slab_zalloc(c1456000,101,0,d278ca90,c915aa3c,...) at slab_zalloc+0xdd uma_zone_slab(c1456000,1) at uma_zone_slab+0xf0 uma_zalloc_bucket(c1456000,1) at uma_zalloc_bucket+0x15c uma_zalloc_arg(c1456000,d0729e00,1) at uma_zalloc_arg+0x292 bce_get_buf(c8363000,0,e8963c7c,e8963c7e,e8963c80) at bce_get_buf+0xef bce_fill_rx_chain(c8363000,7047,d0729000,cbd86000,59375b37,...) at bce_fill_rx_chain+0x48 bce_rx_intr(c8363000) at bce_rx_intr+0x301 bce_intr(c8363000) at bce_intr+0xf4 ithread_execute_handlers(c8337c90,c8230a80) at ithread_execute_handlers+0x125 ithread_loop(c835f860,e8963d38) at ithread_loop+0x55 fork_exit(c0694a38,c835f860,e8963d38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe8963d6c, ebp = 0 --- db ps pid ppid pgrp uid state wmesg wchancmd 17102 17100 15549 0 L *vm page 0xcce93480 awk 17101 17100 15549 0 L *Giant0xc8737480 grep 17100 17098 15549 0 S wait 0xc8726a78 sh 17098 15570 15549 0 S wait 0xc98c2a78 sh 16935 30771 30771 8382 RL httpd 16656 16349 16349 10346 S biord0xdc389368 php 16564 16460 16460 18332 SL vmpfw0xc365da98 lynx 16563 16351 16351 18332 LL *vm page 0xcce93480 lynx 16460 16451 16460 18332 Ss wait 0xc8ebb218 bash 16451 11589 11589 18332 S piperd 0xd264a198 crond 16360 16345 16345 37174 SL vnread 0xdc14e7f0 php 16351 16343 16351 18332 Ss wait 0xc9236c90 bash 16349 16340 16349 10346 Ss wait 0xce3f3218 bash 16345 16337 16345 37174 Ss wait 0xcf6b1860 bash 16343 11589 11589 18332 S piperd 0xca389990 crond 16340 11589 11589 10346 S piperd 0xcce447f8 crond 16337 11589 11589 37174 S piperd 0xcc6fb198 crond 15996 37504 37504 27220 S select 0xc0a56dc4 httpd 15611 15608 15561 0 S piperd 0xd222f4c8 awk -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
coretemp(4) lockups on 6-stable
This is 6.4-stable from April. System locks up while in `sysctl dev.cpu` (with coretemp kldloaded). So as far as I understand sched_bind() binds an executing thread to nonexistent CPU 255. Same behavior on coretemp built on 6.2. db ps pid ppid pgrp uid state wmesg wchancmd 34381 34380 34381 0 R+ CPU 255 sysctl [...] db bt 34381 Tracing pid 34381 tid 100166 td 0xc8634680 sched_switch(c8634680,0,1) at sched_switch+0x143 mi_switch(1,0,c86347e0,4,c0a4e510,...) at mi_switch+0x1ba sched_bind(c8634680,4,c856f3b0,0,c0836b3b,...) at sched_bind+0x52 coretemp_get_temp_sysctl(c8ef56c0,c908c200,0,eebebc04,c8ef56c0,...) at coretemp_get_temp_sysctl+0x47 sysctl_root(0,eebebc74,4,eebebc04) at sysctl_root+0x107 userland_sysctl(c8634680,eebebc74,4,0,bfbfda8c,0,0,0,eebebc70,0) at userland_sysctl+0x112 __sysctl(c8634680,eebebd04) at __sysctl+0x93 syscall(3b,3b,3b,4,bfbfda8c,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x2812407b, esp = 0xbfbfd9fc, ebp = 0xbfbfda38 --- static int coretemp_get_temp(device_t dev) { uint64_t msr; int temp; int cpu = device_get_unit(dev); struct coretemp_softc *sc = device_get_softc(dev); char stemp[16]; mtx_lock_spin(sched_lock); sched_bind(curthread, cpu); ^^^ mtx_unlock_spin(sched_lock); -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: coretemp(4) lockups on 6-stable
2009/6/15 pluknet pluk...@gmail.com: This is 6.4-stable from April. System locks up while in `sysctl dev.cpu` (with coretemp kldloaded). Small follow-up: Just one of my wild guesses is that coretemp doesn't play nice with ncpu 4. A problem box is 8-way cpu. I always observe this lockup when sched_bind(curthread, cpu) called with cpu==4. While on another box `sysctl dev.cpu` works good and that box have only 4 cpu cores. So as far as I understand sched_bind() binds an executing thread to nonexistent CPU 255. Same behavior on coretemp built on 6.2. db ps pid ppid pgrp uid state wmesg wchan cmd 34381 34380 34381 0 R+ CPU 255 sysctl [...] db bt 34381 Tracing pid 34381 tid 100166 td 0xc8634680 sched_switch(c8634680,0,1) at sched_switch+0x143 mi_switch(1,0,c86347e0,4,c0a4e510,...) at mi_switch+0x1ba sched_bind(c8634680,4,c856f3b0,0,c0836b3b,...) at sched_bind+0x52 coretemp_get_temp_sysctl(c8ef56c0,c908c200,0,eebebc04,c8ef56c0,...) at coretemp_get_temp_sysctl+0x47 sysctl_root(0,eebebc74,4,eebebc04) at sysctl_root+0x107 userland_sysctl(c8634680,eebebc74,4,0,bfbfda8c,0,0,0,eebebc70,0) at userland_sysctl+0x112 __sysctl(c8634680,eebebd04) at __sysctl+0x93 syscall(3b,3b,3b,4,bfbfda8c,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x2812407b, esp = 0xbfbfd9fc, ebp = 0xbfbfda38 --- static int coretemp_get_temp(device_t dev) { uint64_t msr; int temp; int cpu = device_get_unit(dev); struct coretemp_softc *sc = device_get_softc(dev); char stemp[16]; mtx_lock_spin(sched_lock); sched_bind(curthread, cpu); ^^^ mtx_unlock_spin(sched_lock); -- wbr, pluknet -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: coretemp(4) lockups on 6-stable
2009/6/15 pluknet pluk...@gmail.com: 2009/6/15 pluknet pluk...@gmail.com: This is 6.4-stable from April. System locks up while in `sysctl dev.cpu` (with coretemp kldloaded). Small follow-up: Just one of my wild guesses is that coretemp doesn't play nice with ncpu 4. A problem box is 8-way cpu. I always observe this lockup when sched_bind(curthread, cpu) called with cpu==4. While on another box `sysctl dev.cpu` works good and that box have only 4 cpu cores. And yet another small one: coretemp(4) works ok under 7.0+ on same h/w. So as far as I understand sched_bind() binds an executing thread to nonexistent CPU 255. Same behavior on coretemp built on 6.2. db ps pid ppid pgrp uid state wmesg wchancmd 34381 34380 34381 0 R+ CPU 255 sysctl [...] db bt 34381 Tracing pid 34381 tid 100166 td 0xc8634680 sched_switch(c8634680,0,1) at sched_switch+0x143 mi_switch(1,0,c86347e0,4,c0a4e510,...) at mi_switch+0x1ba sched_bind(c8634680,4,c856f3b0,0,c0836b3b,...) at sched_bind+0x52 coretemp_get_temp_sysctl(c8ef56c0,c908c200,0,eebebc04,c8ef56c0,...) at coretemp_get_temp_sysctl+0x47 sysctl_root(0,eebebc74,4,eebebc04) at sysctl_root+0x107 userland_sysctl(c8634680,eebebc74,4,0,bfbfda8c,0,0,0,eebebc70,0) at userland_sysctl+0x112 __sysctl(c8634680,eebebd04) at __sysctl+0x93 syscall(3b,3b,3b,4,bfbfda8c,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (202, FreeBSD ELF32, __sysctl), eip = 0x2812407b, esp = 0xbfbfd9fc, ebp = 0xbfbfda38 --- static int coretemp_get_temp(device_t dev) { uint64_t msr; int temp; int cpu = device_get_unit(dev); struct coretemp_softc *sc = device_get_softc(dev); char stemp[16]; mtx_lock_spin(sched_lock); sched_bind(curthread, cpu); ^^^ mtx_unlock_spin(sched_lock); -- wbr, pluknet -- wbr, pluknet -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
system locks up in vmmaps
() at Xint0x80_syscall+0x1f --- syscall (95, FreeBSD ELF32, fsync), eip = 0x28147ab7, esp = 0xbfbfda3c, ebp = 0xbfbfee78 --- db bt 10283 Tracing pid 10283 tid 100447 td 0xcf701190 sched_switch(cf701190,0,1) at sched_switch+0x143 mi_switch(1,0,cf701190,ef1687ac,c06a484c,...) at mi_switch+0x1ba sleepq_switch(c8d9d8d8) at sleepq_switch+0x87 sleepq_wait(c8d9d8d8,0,cf701190,c8d9d8d8,4,...) at sleepq_wait+0x5c msleep(c8d9d8d8,c0a06a08,50,c0928d0d,0,...) at msleep+0x269 acquire(ef16882c,40,6,cf701190,0,...) at acquire+0x7b lockmgr(c8d9d8d8,2002,c8d9d8fc,cf701190) at lockmgr+0x3fe ffs_lock(ef16) at ffs_lock+0x88 VOP_LOCK_APV(c09d5e60,ef16) at VOP_LOCK_APV+0x43 vn_lock(c8d9d880,2002,cf701190,c8d9d880) at vn_lock+0xf4 vget(c8d9d880,2002,cf701190) at vget+0xbe cache_lookup(c85f8440,ef168be0,ef168bf4) at cache_lookup+0x458 vfs_cache_lookup(ef1689c8) at vfs_cache_lookup+0x8f VOP_LOOKUP_APV(c09d5e60,ef1689c8) at VOP_LOOKUP_APV+0x43 lookup(ef168bcc) at lookup+0x4c1 namei(ef168bcc) at namei+0x39a vn_open_cred(ef168bcc,ef168ccc,1a4,ce4db200,4,...) at vn_open_cred+0x2ad vn_open(ef168bcc,ef168ccc,1a4,4) at vn_open+0x1e kern_open(cf701190,805120c,0,1,1b6,...) at kern_open+0xb6 open(cf701190,ef168d04) at open+0x1a syscall(3b,3b,3b,4,28242df8,...) at syscall+0x2bf Xint0x80_syscall() at Xint0x80_syscall+0x1f --- syscall (5, FreeBSD ELF32, open), eip = 0x282114b3, esp = 0xbfbfe8fc, ebp = 0xbfbfe928 --- db bt Tracing pid 16 tid 10 td 0xc7cfe000 kdb_enter(c09408b4) at kdb_enter+0x2b siointr1(c7f93000) at siointr1+0xce siointr(c7f93000) at siointr+0x5e intr_execute_handlers(c7cf24c8,e687ac94,4,e687acd8,c0899743,...) at intr_execute_handlers+0xe1 lapic_handle_intr(37) at lapic_handle_intr+0x2e Xapic_isr1() at Xapic_isr1+0x33 --- interrupt, eip = 0xc0b96165, esp = 0xe687acd8, ebp = 0xe687acd8 --- acpi_cpu_c1(7b404985,35f2a28c,c7cfe000,c7cfe000,2,...) at acpi_cpu_c1+0x5 acpi_cpu_idle(e687ad10,c066c435,c7cfc000,c066c3a0,e687ad24,...) at acpi_cpu_idle+0x152 cpu_idle(c7cfc000,c066c3a0,e687ad24,c066c121,0,...) at cpu_idle+0x28 idle_proc(0,e687ad38) at idle_proc+0x95 fork_exit(c066c3a0,0,e687ad38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe687ad6c, ebp = 0 --- All 8 CPUs are in idle state. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS MFC heads down
2009/5/21 Kip Macy km...@freebsd.org: On Wed, May 20, 2009 at 2:59 PM, Kip Macy km...@freebsd.org wrote: I will be MFC'ing the newer ZFS support some time this afternoon. Both world and kernel will need to be re-built. Existing pools will continue to work without upgrade. If you choose to upgrade a pool to take advantage of new features you will no longer be able to use it with sources prior to today. 'zfs send/recv' is not expected to inter-operate between different pool versions. The MFC went in r192498. Please let me know if you have any problems. Please, fix 4 times repetition of all its content in stable/7/cddl/compat/opensolaris/include/libshare.h. Thanks! -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: ZFS MFC heads down
2009/5/21 pluknet pluk...@gmail.com: 2009/5/21 Kip Macy km...@freebsd.org: On Wed, May 20, 2009 at 2:59 PM, Kip Macy km...@freebsd.org wrote: I will be MFC'ing the newer ZFS support some time this afternoon. Both world and kernel will need to be re-built. Existing pools will continue to work without upgrade. If you choose to upgrade a pool to take advantage of new features you will no longer be able to use it with sources prior to today. 'zfs send/recv' is not expected to inter-operate between different pool versions. The MFC went in r192498. Please let me know if you have any problems. Please, fix 4 times repetition of all its content in stable/7/cddl/compat/opensolaris/include/libshare.h. The same: stable/7/sys/cddl/compat/opensolaris/sys/pathname.h stable/7/sys/cddl/compat/opensolaris/sys/kidmap.h stable/7/sys/cddl/compat/opensolaris/sys/file.h -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: So where are we at with bce and lagg then ?
2009/5/19 Pete French petefre...@ticketswitch.com: Just wondering if there was any update to this ? I seem to be the only one who actually has the problem, but I have gone as far as I can trying to diagnose it unless someone can send me patches to test. I guess it was fixed in -current in r191923. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: lock up in 6.2 (procs massively stuck in Giant)
2009/5/13 pluknet pluk...@gmail.com: 2009/5/13 John Baldwin j...@freebsd.org: On Tuesday 12 May 2009 4:59:19 pm pluknet wrote: Hi. From just another box (not from the first two mentioned earlier) with a similar locking issue. If it would make sense, since there are possibly a bit different conditions. clock proc here is on swi4, I hope it's a non-important difference. 18 0 0 0 LL *Giant0xd0a6b140 [swi4: clock sio] db bt 18 Ok, this is a known issue in 6.x. It is fixed in 6.4. Looking at the face of kern_timeout.c I suspect that was fixed in r181012. Thanks again for the tips. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: lock up in 6.2 (procs massively stuck in Giant)
2009/5/13 John Baldwin j...@freebsd.org: On Wednesday 13 May 2009 2:40:33 am pluknet wrote: 2009/5/13 pluknet pluk...@gmail.com: 2009/5/13 John Baldwin j...@freebsd.org: On Tuesday 12 May 2009 4:59:19 pm pluknet wrote: Hi. From just another box (not from the first two mentioned earlier) with a similar locking issue. If it would make sense, since there are possibly a bit different conditions. clock proc here is on swi4, I hope it's a non-important difference. 18 0 0 0 LL *Giant 0xd0a6b140 [swi4: clock sio] db bt 18 Ok, this is a known issue in 6.x. It is fixed in 6.4. Looking at the face of kern_timeout.c I suspect that was fixed in r181012. No, this particular issue is fixed by a change to sched_4bsd.c in r179975. Gah.. We constrained to use ule scheduler on 6.x (yes, I know that it's known to be broken (c)), since we have had a very bad interactivity on 4bsd on our workload. Ok, that's just another reason to move to 7.x. Thanks. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: lock up in 6.2 (procs massively stuck in Giant)
2009/5/11 John Baldwin j...@freebsd.org: On Monday 04 May 2009 11:41:35 pm pluknet wrote: 2009/5/1 John Baldwin j...@freebsd.org: On Thursday 30 April 2009 2:36:34 am pluknet wrote: Hi folks. Today I got a new locking issue. This is the first time I got it, and it's merely reproduced. The box has lost both remote connection and local access. No SIGINFO output on the local console even. Jumping in ddb shows the next: 1) first, this is a 8-way web server. No processes on runqueue except one httpd (i.e. ps shows R in its state): You need to find who owns Giant and what that thread is doing. You can try using 'show lock Giant' as well as 'show lockchain 11568'. Hi, John! Just reproduced now on another box. Hmm.. stack of the process owing Giant looks garbled. db show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {OWNED, CONTESTED} owner: 0xd0d79320 (tid 102754, pid 34594, httpd) db show lockchain 34594 thread 102754 (pid 34594, httpd) running on CPU 7 db show lockchain 102754 thread 102754 (pid 34594, httpd) running on CPU 7 The thread is running, so we don't know what it's top of stack is and you can't a good stack trace in that case. None of your CPUs are idle, so I don't think you have any sort of deadlock. You might have a livelock. -- John Baldwin I'm curious if it could be caused by heavy load. I don't know what it might be definitely, as it's non-trivial for me to determine the reason of a livelock, and to debug it. So I think it may have sense to try 7.x, as there has been done much locking work. Thank you. -- wbr, pluknet ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: lock up in 6.2 (procs massively stuck in Giant)
2009/5/12 John Baldwin j...@freebsd.org: On Tuesday 12 May 2009 2:12:27 am pluknet wrote: 2009/5/11 John Baldwin j...@freebsd.org: On Monday 04 May 2009 11:41:35 pm pluknet wrote: 2009/5/1 John Baldwin j...@freebsd.org: On Thursday 30 April 2009 2:36:34 am pluknet wrote: Hi folks. Today I got a new locking issue. This is the first time I got it, and it's merely reproduced. The box has lost both remote connection and local access. No SIGINFO output on the local console even. Jumping in ddb shows the next: 1) first, this is a 8-way web server. No processes on runqueue except one httpd (i.e. ps shows R in its state): You need to find who owns Giant and what that thread is doing. You can try using 'show lock Giant' as well as 'show lockchain 11568'. Hi, John! Just reproduced now on another box. Hmm.. stack of the process owing Giant looks garbled. db show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {OWNED, CONTESTED} owner: 0xd0d79320 (tid 102754, pid 34594, httpd) db show lockchain 34594 thread 102754 (pid 34594, httpd) running on CPU 7 db show lockchain 102754 thread 102754 (pid 34594, httpd) running on CPU 7 The thread is running, so we don't know what it's top of stack is and you can't a good stack trace in that case. None of your CPUs are idle, so I don't think you have any sort of deadlock. You might have a livelock. -- John Baldwin I'm curious if it could be caused by heavy load. I don't know what it might be definitely, as it's non-trivial for me to determine the reason of a livelock, and to debug it. So I think it may have sense to try 7.x, as there has been done much locking work. It may be worth trying 7. Also, what is the state of the 'swi7: clock' process? -- John Baldwin Hi. From just another box (not from the first two mentioned earlier) with a similar locking issue. If it would make sense, since there are possibly a bit different conditions. clock proc here is on swi4, I hope it's a non-important difference. 18 0 0 0 LL *Giant0xd0a6b140 [swi4: clock sio] db bt 18 Tracing pid 18 tid 100015 td 0xc7cfec80 sched_switch(c7cfec80,0,1) at sched_switch+0x143 mi_switch(1,0) at mi_switch+0x1ba turnstile_wait(c0a06c60,cb77ee10) at turnstile_wait+0x2f7 _mtx_lock_sleep(c0a06c60,c7cfec80,0,0,0) at _mtx_lock_sleep+0xfc softclock(0) at softclock+0x231 ithread_execute_handlers(c7d07218,c7d4a100) at ithread_execute_handlers+0x125 ithread_loop(c7cb69f0,e6892d38) at ithread_loop+0x55 fork_exit(c066d3e4,c7cb69f0,e6892d38) at fork_exit+0x71 fork_trampoline() at fork_trampoline+0x8 --- trap 0x1, eip = 0, esp = 0xe6892d6c, ebp = 0 --- db show lock Giant class: sleep mutex name: Giant flags: {DEF, RECURSE} state: {OWNED, CONTESTED} owner: 0xcb77ee10 (tid 101174, pid 8611, httpd) db show lockchain 101174 thread 101174 (pid 8611, httpd) running on CPU 4 db bt 101174 Tracing pid 8611 tid 101174 td 0xcb77ee10 sched_switch(cb77ee10,c7f3de10,6) at sched_switch+0x143 mi_switch(ca6d82e8,6,c0a0baf0,ca6d82e8,c0a0a0b0,...) at mi_switch kseq_move(c0a0baf0,6) at kseq_move+0xc1 sched_balance_pair(ef879bb0,ef879bb0,c08a2adf,cb77ef68,cb77b360,. lance_pair+0x91 sched_lock(0,cbd1f658,0,cb77b36c,0,...) at sched_lock _end(cb77b360,cb77b364,cb77ee10,cb77ee18,0,...) at 0xcb77b360 _end(d0a49a80,d0a49a84,c84cf7d0,c84cf7d8,0,...) at 0xc7f97648 _end(ca6dbcc0,ca6dbcc4,ca6d54b0,ca6d54b8,0,...) at 0xcbd1f648 _end(cbcad780,cbcad784,cc8a2190,cc8a2198,0,...) at 0xc8514430 _end(cab883c0,cab883c4,ca9417d0,ca9417d8,0,...) at 0xca6dc000 _end(cc67c4e0,cc67c4e4,cd6fd000,cd6fd008,0,...) at 0xcc8abc90 _end(cd3a9120,cd3a9124,cd3b1320,cd3b1328,0,...) at 0xcad68218 _end(cd130c60,cd130c64,d00ca320,d00ca328,0,...) at 0xca71e860 _end(cbcac240,cbcac244,cbf6e4b0,cbf6e4b8,0,...) at 0xcd472a78 _end(cb73c960,cb73c964,cb4f44b0,cb4f44b8,0,...) at 0xd00cfa78 _end(ca348b40,ca348b44,ca420af0,ca420af8,0,...) at 0xcc0e9c90 _end(d0310ea0,d0310ea4,cd3ad4b0,cd3ad4b8,0,...) at 0xcc7ec218 _end(ca5ddd20,ca5ddd24,ca6d8c80,ca6d8c88,0,...) at 0xca426c90 _end(c998aa20,c998aa24,ca2bb320,ca2bb328,0,...) at 0xd030fc90 [...] oh, i saw that earlier somewhere.. don't remember where. db c and waiting some moments shows a little different picture: db bt 101174 Tracing pid 8611 tid 101174 td 0xcb77ee10 sched_switch(cb77ee10,c7f3de10,6) at sched_switch+0x143 mi_switch(cf177608,7,c0a0b460,cf177608,c0a0a0b0,...) at mi_switch+0x1ba kseq_move(c0a0b460,7) at kseq_move+0xc1 sched_balance_pair(cb77ef68,ef879bb8,c0694edf,cb77ef68,cb77b360,...) at sched_balance_pair+0x91 _end(cbd1f650,cb77ee10,cb77ee20,0,cb77b374,...) at 0xcb77b360 MAXCPU(cb77b360,cb77b364,cb77ee10,cb77ee18,0,...) at 0 _end(d0a49a80,d0a49a84,c84cf7d0,c84cf7d8,0,...) at 0xc7f97648 _end(ca6dbcc0,ca6dbcc4,ca6d54b0,ca6d54b8,0,...) at 0xcbd1f648 _end(cbcad780,cbcad784,cc8a2190,cc8a2198,0,...) at 0xc8514430 _end(cab883c0,cab883c4,ca9417d0