Re: RCTL and VIMAGE for 11.0-RELEASE
Mark Felder wrote on 08/24/2015 21:29: On Mon, Aug 24, 2015, at 14:18, Bjoern A. Zeeb wrote: On 24 Aug 2015, at 19:08 , Mark Felderwrote: What is preventing RCTL from being enabled right now? Any known/serious blockers? Its enabled in GENERIC. If RCTL is in GENERIC, can somebody look on to this problem with swapuse? https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/082019.html IMHO it shoul be fixed or better documented in man pages. Miroslav Lachman ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic on kldload/kldunload in/near callout
On 09/13/15 at 08:51P, Alexander V. Chernikov wrote: > > > 12.09.2015, 20:30, "hiren panchasara": > > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote: > >> ?12.09.2015, 02:22, "hiren panchasara" : > >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote: > > > > [skip] > >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box: > >> ?> http://pastebin.com/9qJTFMik > >> ?The easiest explanation could be lack of lla_create() result check, fixed > >> in r286945. > >> ?This panic is triggered by fast interface down-up (or just up), when ARP > >> packet is received but there are no (matching) IPv4 prefix on the > >> interface. > >> ?If this is not the case (e.g. it paniced w/o any interface changes and > >> there were no other subnets in given L2 segment) I'd be happy to debug > >> this further. > > > > Just hit another last night. (Box goes to db> ; let me know if you want > > to debug anything when that happens.) > Would you mind showing full backtrace for that core? (e.g. situation has to > be different for newer -current). Sure: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x20 fault code = supervisor write data, page not present instruction pointer = 0x20:0x80f214d6 stack pointer = 0x28:0xfe3d0620 frame pointer = 0x28:0xfe3d0630 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 12 (irq264: igb0:que 0) [ thread pid 12 tid 100035 ] Stopped at memcpy+0x16:repe movsb (%rsi),%es:(%rdi) db> bt full Symbol not found KDB: reentering KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+43/frame 0xfe3cff10 kdb_reenter() at kdb_reenter+51/frame 0xfe3cff20 db_term() at db_term+136/frame 0xfe3cff40 db_unary() at db_unary+116/frame 0xfe3cff60 db_mult_expr() at db_mult_expr+27/frame 0xfe3cffa0 db_add_expr() at db_add_expr+27/frame 0xfe3cffe0 db_expression() at db_expression+29/frame 0xfe3d0030 db_stack_trace() at db_stack_trace+48/frame 0xfe3d0060 db_command() at db_command+865/frame 0xfe3d0120 db_command_loop() at db_command_loop+100/frame 0xfe3d0130 db_trap() at db_trap+219/frame 0xfe3d01c0 kdb_trap() at kdb_trap+404/frame 0xfe3d0250 trap_fatal() at trap_fatal+789/frame 0xfe3d02b0 trap_pfault() at trap_pfault+806/frame 0xfe3d0350 trap() at trap+1124/frame 0xfe3d0560 calltrap() at calltrap+8/frame 0xfe3d0560 --- trap 12, rip = 18446744071577933014, rsp = 18446741874690295344, rbp = 18446741874690295344 --- memcpy() at memcpy+22/frame 0xfe3d0630 arpintr() at arpintr+2951/frame 0xfe3d0750 netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d07c0 ether_demux() at ether_demux+345/frame 0xfe3d07f0 ether_nh_input() at ether_nh_input+888/frame 0xfe3d0850 netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d08c0 ether_input() at ether_input+38/frame 0xfe3d08e0 igb_rxeof() at igb_rxeof+1764/frame 0xfe3d0990 igb_msix_que() at igb_msix_que+352/frame 0xfe3d09e0 intr_event_execute_handlers() at intr_event_execute_handlers+474/frame 0xfe3d0a20 ithread_loop() at ithread_loop+166/frame 0xfe3d0a70 fork_exit() at fork_exit+156/frame 0xfe3d0ab0 fork_trampoline() at fork_trampoline+14/frame 0xfe3d0ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db> Cheers, Hiren pgp7EDF20rSEw.pgp Description: PGP signature
kernel dtrace and current
Hello all, I keep running in "dtrace: failed to compile script: "/usr/lib/dtrace/psinfo.d", line 39: failed to copy type of 'pr_uid': Type information is in parent and unavailable" message more and more often while trying to trace different -current kernels. Typically the reason besides that is the number of types embedded in kernel CTF: ctfdump -S /boot/kernel/kernel | awk '$0~/of types/{print$6}' 37160 We are bound to 32k of types by CTF format (and numbers above 32k (e.g.w/ highest bit set) are considered "child" types with the information stored in "parent"). ctfmerge ignores this fact and instead of yelling emits type indices above 32k. On the other hand, libctf sees such indices while parsing sections and since there is no "parent" for kernel, it emits the error above and stops. Thankfully, r287234 really improved the situation for ctfmerge, but there are still several thousands of identical structures and the total number is close to 32k. Personally I solved this by removing unneeded devices from GENERIC-inherited configs. I wonder, however if this can be handled better. E.g. either show better error in dtrace(1) or make ctfmerge fail causing kernel build to stop (since we asked for dtrace but in reality it wouldn't work), or remove some stale devices from GENERIC, or .something totally different? ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Kernel panic with my kernel config. Applies to 9.x up to 10.2
Hi, I have custom kernel config: https://gist.github.com/dmilith/234b3e6b65b6fa606e27 If I uncomment VIMAGE and epair from lines 10-11, each time I try to launch any jail with vnet it panics the kernel. (also HBSD options might be omited - they change nothing in this case, panics happened also without HBSD patches on vanilla FreeBSD). Basically it’s 100% reproductible on all my testing hosts, and applies from 9.x to latest 10.2 (didn’t check 11.0 yet). I hope it helps someone track VIMAGE/epair bugs! best regards Daniel (dmilith) Dettlaff signature.asc Description: Message signed with OpenPGP using GPGMail
Re: em broken on current amd64
On 09/12/15 13:45, Mark R V Murray wrote: > >> On 8 Sep 2015, at 19:02, Mark R V Murraywrote: >> >> >>> On 8 Sep 2015, at 17:22, Sean Bruno wrote: >>> >>> >> >> I’m also seeing breakage with the em0 device; this isn’t a kernel >> hang, it is a failure to move data after about 10-15 minutes. The >> symptom is that my WAN ethernet no longer moves traffic, no pings, >> nothing. Booting looks normal: >> >> em0: port >> 0x30c0-0x30df mem 0x5030-0x5031,0x50324000-0x50324fff irq >> 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet >> address: 00:16:76:d3:e1:5b em0: netmap queues/slots: TX 1/1024, RX >> 1/1024 >> >> Fixing it is as easy as … >> >> # ifconfig em0 down ; service ipfw restart ; ifconfig em0 up >> >> :-) >> >> I’m running CURRENT, r287538. This last worked of me a month or so >> ago at my previous build. >> >> M >> > > > Just so I'm clear, the original problem reported was a failure to > attach (you were among several folks reporting breakage). Is that fixed > ? I did not report the failure to attach, and I am not seeing it as I don’t think I built a kernel that had that particular failure. I am having the “failure after 10-15 minutes” problem; this is on an em0 device. M >>> >>> >>> Hrm, that's odd. That sounds like a hole where interrupts aren't being >>> reset for "reasons" that I cannot fathom. >>> >>> What hardware (pciconf -lv) does your system actually have? The em(4) >>> driver doesn't identify components which is frustrating. >> >> pciconf -lv output below: >> >> hostb0@pci0:0:0:0: class=0x06 card=0x514d8086 chip=0x29a08086 rev=0x02 >> hdr=0x00 >>vendor = 'Intel Corporation' >>device = '82P965/G965 Memory Controller Hub' >>class = bridge >>subclass = HOST-PCI > > I just caught this, on today’s build: > > em0: Watchdog timeout Queue[0]-- resetting > Interface is RUNNING and ACTIVE > em0: TX Queue 0 -- > em0: hw tdh = 127, hw tdt = 139 > em0: Tx Queue Status = -2147483648 > em0: TX descriptors avail = 1012 > em0: Tx Descriptors avail failure = 0 > em0: RX Queue 0 -- > em0: hw rdh = 0, hw rdt = 1023 > em0: RX discarded packets = 0 > em0: RX Next to Check = 0 > em0: RX Next to Refresh = 1023 > > [graveyard] /usr/ports 09:42 pm # uname -a > FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: > Sat Sep 12 15:07:54 BST 2015 > r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE amd64 > > M > Any chance you can turn TSO off if its on and see what your results are? sean ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: em broken on current amd64
> On 13 Sep 2015, at 16:45, Sean Brunowrote: > Any chance you can turn TSO off if its on and see what your results are? Only TSO4 was on. I turned it off; no difference. M -- Mark R V Murray ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic on kldload/kldunload in/near callout
On 09/13/15 at 01:56P, hiren panchasara wrote: > On 09/13/15 at 08:51P, Alexander V. Chernikov wrote: > > > > > > 12.09.2015, 20:30, "hiren panchasara": > > > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote: > > >> ?12.09.2015, 02:22, "hiren panchasara" : > > >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote: > > > > > > [skip] > > >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box: > > >> ?> http://pastebin.com/9qJTFMik > > >> ?The easiest explanation could be lack of lla_create() result check, > > >> fixed in r286945. > > >> ?This panic is triggered by fast interface down-up (or just up), when > > >> ARP packet is received but there are no (matching) IPv4 prefix on the > > >> interface. > > >> ?If this is not the case (e.g. it paniced w/o any interface changes and > > >> there were no other subnets in given L2 segment) I'd be happy to debug > > >> this further. > > > > > > Just hit another last night. (Box goes to db> ; let me know if you want > > > to debug anything when that happens.) > > Would you mind showing full backtrace for that core? (e.g. situation has to > > be different for newer -current). Apparently I was using an older current than r286945. :-( Apologies for the false alarm. I'll update again if I see any issues. Hiren pgp6PwCZXEFcf.pgp Description: PGP signature
Re: em broken on current amd64
On Sat, 12 Sep 2015, Mark R V Murray wrote: I just caught this, on today’s build: em0: Watchdog timeout Queue[0]-- resetting Interface is RUNNING and ACTIVE em0: TX Queue 0 -- em0: hw tdh = 127, hw tdt = 139 em0: Tx Queue Status = -2147483648 em0: TX descriptors avail = 1012 em0: Tx Descriptors avail failure = 0 em0: RX Queue 0 -- em0: hw rdh = 0, hw rdt = 1023 em0: RX discarded packets = 0 em0: RX Next to Check = 0 em0: RX Next to Refresh = 1023 [graveyard] /usr/ports 09:42 pm # uname -a FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: Sat Sep 12 15:07:54 BST 2015 r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE amd64 That happened on an amd64 10-STABLE (r287148) system here a couple of days ago. Both I217-V and 82574L cards in that system, but I did not save the message and can't say which had the error. Never seen before, has not happened again. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: kernel dtrace and current
On Sun, Sep 13, 2015 at 04:23:23PM +0300, Alexander V. Chernikov wrote: > Hello all, > > I keep running in > "dtrace: failed to compile script: "/usr/lib/dtrace/psinfo.d", line 39: > failed to copy type of 'pr_uid': Type information is in parent and > unavailable" > message more and more often while trying to trace different -current kernels. > > Typically the reason besides that is the number of types embedded in kernel > CTF: > ctfdump -S /boot/kernel/kernel | awk '$0~/of types/{print$6}' > 37160 > > We are bound to 32k of types by CTF format (and numbers above 32k (e.g.w/ > highest bit set) are considered "child" types with the information stored in > "parent"). > ctfmerge ignores this fact and instead of yelling emits type indices above > 32k. On the other hand, libctf sees such indices while parsing sections and > since there is no > "parent" for kernel, it emits the error above and stops. > > Thankfully, r287234 really improved the situation for ctfmerge, but there are > still several thousands of identical structures and the total number is close > to 32k. r281797 and r287234 should have fixed most instances of duplicate type definitions. At the moment, amd64 GENERIC and GENERIC-NODEBUG have roughly 25K types in their respective CTF containers; there is a small handful of duplicates, but at least some of them are legitimate (some pairs of drivers redefine the same types, e.g. aac(4)/aacraid(4) or mps(4)/mpr(4)). Could you post a config that results in the large number of duplicates you mention above? > > Personally I solved this by removing unneeded devices from GENERIC-inherited > configs. > I wonder, however if this can be handled better. FWIW, removing old drivers from GENERIC would be straightforward if we could auto-load KLDs based on device IDs. > > E.g. either show better error in dtrace(1) or make ctfmerge fail causing > kernel build to stop (since we asked for dtrace but in reality it wouldn't > work), or remove some stale devices from GENERIC, or .something totally > different? One more radical option is to extend the width of CTF type IDs. I've been holding off on doing this for a few reasons: - Doing so would change the binary format, making us incompatible with the reference CTF code in illumos. - Type IDs are embedded in quite a few places in the various CTF structures, so enlarging them from 16 bits to 32 bits will bloat CTF containers somewhat. - I was under the impression that r287234 addressed the problem sufficiently for now. If type ID space is still a problem post-r287234, I think it's time to just go ahead and change the format. But first I'd like to understand the cause of the duplication you're seeing. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"