Re: RCTL and VIMAGE for 11.0-RELEASE

2015-09-13 Thread Miroslav Lachman

Mark Felder wrote on 08/24/2015 21:29:



On Mon, Aug 24, 2015, at 14:18, Bjoern A. Zeeb wrote:

On 24 Aug 2015, at 19:08 , Mark Felder  wrote:

What is preventing RCTL from being enabled right now? Any known/serious
blockers?


It’s enabled in GENERIC.


If RCTL is in GENERIC, can somebody look on to this problem with swapuse?
https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/082019.html

IMHO it shoul be fixed or better documented in man pages.

Miroslav Lachman

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on kldload/kldunload in/near callout

2015-09-13 Thread hiren panchasara
On 09/13/15 at 08:51P, Alexander V. Chernikov wrote:
> 
> 
> 12.09.2015, 20:30, "hiren panchasara" :
> > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> >> ?12.09.2015, 02:22, "hiren panchasara" :
> >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
> >
> > [skip]
> >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box:
> >> ?> http://pastebin.com/9qJTFMik
> >> ?The easiest explanation could be lack of lla_create() result check, fixed 
> >> in r286945.
> >> ?This panic is triggered by fast interface down-up (or just up), when ARP 
> >> packet is received but there are no (matching) IPv4 prefix on the 
> >> interface.
> >> ?If this is not the case (e.g. it paniced w/o any interface changes and 
> >> there were no other subnets in given L2 segment) I'd be happy to debug 
> >> this further.
> >
> > Just hit another last night. (Box goes to db> ; let me know if you want
> > to debug anything when that happens.)
> Would you mind showing full backtrace for that core? (e.g. situation has to 
> be different for newer -current).

Sure:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x20
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80f214d6
stack pointer   = 0x28:0xfe3d0620
frame pointer   = 0x28:0xfe3d0630
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 12 (irq264: igb0:que 0)
[ thread pid 12 tid 100035 ]
Stopped at  memcpy+0x16:repe movsb  (%rsi),%es:(%rdi)
db> bt full
Symbol not found
KDB: reentering
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+43/frame 0xfe3cff10
kdb_reenter() at kdb_reenter+51/frame 0xfe3cff20
db_term() at db_term+136/frame 0xfe3cff40
db_unary() at db_unary+116/frame 0xfe3cff60
db_mult_expr() at db_mult_expr+27/frame 0xfe3cffa0
db_add_expr() at db_add_expr+27/frame 0xfe3cffe0
db_expression() at db_expression+29/frame 0xfe3d0030
db_stack_trace() at db_stack_trace+48/frame 0xfe3d0060
db_command() at db_command+865/frame 0xfe3d0120
db_command_loop() at db_command_loop+100/frame 0xfe3d0130
db_trap() at db_trap+219/frame 0xfe3d01c0
kdb_trap() at kdb_trap+404/frame 0xfe3d0250
trap_fatal() at trap_fatal+789/frame 0xfe3d02b0
trap_pfault() at trap_pfault+806/frame 0xfe3d0350
trap() at trap+1124/frame 0xfe3d0560
calltrap() at calltrap+8/frame 0xfe3d0560
--- trap 12, rip = 18446744071577933014, rsp = 18446741874690295344, rbp = 
18446741874690295344 ---
memcpy() at memcpy+22/frame 0xfe3d0630
arpintr() at arpintr+2951/frame 0xfe3d0750
netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d07c0
ether_demux() at ether_demux+345/frame 0xfe3d07f0
ether_nh_input() at ether_nh_input+888/frame 0xfe3d0850
netisr_dispatch_src() at netisr_dispatch_src+97/frame 0xfe3d08c0
ether_input() at ether_input+38/frame 0xfe3d08e0
igb_rxeof() at igb_rxeof+1764/frame 0xfe3d0990
igb_msix_que() at igb_msix_que+352/frame 0xfe3d09e0
intr_event_execute_handlers() at intr_event_execute_handlers+474/frame 
0xfe3d0a20
ithread_loop() at ithread_loop+166/frame 0xfe3d0a70
fork_exit() at fork_exit+156/frame 0xfe3d0ab0
fork_trampoline() at fork_trampoline+14/frame 0xfe3d0ab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db> 

Cheers,
Hiren


pgp7EDF20rSEw.pgp
Description: PGP signature


kernel dtrace and current

2015-09-13 Thread Alexander V . Chernikov
Hello all,

I keep running in
"dtrace: failed to compile script: "/usr/lib/dtrace/psinfo.d", line 39: failed 
to copy type of 'pr_uid': Type information is in parent and unavailable"
message more and more often while trying to trace different -current kernels.

Typically the reason besides that is the number of types embedded in kernel CTF:
ctfdump -S /boot/kernel/kernel | awk '$0~/of types/{print$6}'
37160

We are bound to 32k of types by CTF format (and numbers above 32k (e.g.w/ 
highest bit set) are considered "child" types with the information stored in 
"parent").
ctfmerge ignores this fact and instead of yelling emits type indices above 32k. 
On the other hand, libctf sees such indices while parsing sections and since 
there is no
"parent" for kernel, it emits the error above and stops.

Thankfully, r287234 really improved the situation for ctfmerge, but there are 
still several thousands of identical structures and the total number is close 
to 32k.

Personally I solved this by removing unneeded devices from GENERIC-inherited 
configs.
I wonder, however if this can be handled better.

E.g. either show better error in dtrace(1) or make ctfmerge fail causing kernel 
build to stop (since we asked for dtrace but in reality it wouldn't work), or 
remove some stale devices from GENERIC, or .something totally different?


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Kernel panic with my kernel config. Applies to 9.x up to 10.2

2015-09-13 Thread Daniel Dettlaff
Hi, I have custom kernel config: 
https://gist.github.com/dmilith/234b3e6b65b6fa606e27
If I uncomment VIMAGE and epair from lines 10-11, each time I try to launch any 
jail with vnet it panics the kernel. (also HBSD options might be omited - they 
change nothing in this case, panics happened also without HBSD patches on 
vanilla FreeBSD).

Basically it’s 100% reproductible on all my testing hosts, and applies from 9.x 
to latest 10.2 (didn’t check 11.0 yet).

I hope it helps someone track VIMAGE/epair bugs!


best regards
Daniel (dmilith) Dettlaff


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: em broken on current amd64

2015-09-13 Thread Sean Bruno


On 09/12/15 13:45, Mark R V Murray wrote:
> 
>> On 8 Sep 2015, at 19:02, Mark R V Murray  wrote:
>>
>>
>>> On 8 Sep 2015, at 17:22, Sean Bruno  wrote:
>>>
>>>
>>
>> I’m also seeing breakage with the em0 device; this isn’t a kernel
>> hang, it is a failure to move data after about 10-15 minutes. The
>> symptom is that my WAN ethernet no longer moves traffic, no pings,
>> nothing. Booting looks normal:
>>
>> em0:  port
>> 0x30c0-0x30df mem 0x5030-0x5031,0x50324000-0x50324fff irq
>> 20 at device 25.0 on pci0 em0: Using an MSI interrupt em0: Ethernet
>> address: 00:16:76:d3:e1:5b em0: netmap queues/slots: TX 1/1024, RX
>> 1/1024
>>
>> Fixing it is as easy as …
>>
>> # ifconfig em0 down ; service ipfw restart ; ifconfig em0 up
>>
>> :-)
>>
>> I’m running CURRENT, r287538. This last worked of me a month or so
>> ago at my previous build.
>>
>> M
>>
>
>
> Just so I'm clear, the original problem reported was a failure to
> attach (you were among several folks reporting breakage).  Is that fixed
> ?

 I did not report the failure to attach, and I am not seeing it as I don’t
 think I built a kernel that had that particular failure. I am having the
 “failure after 10-15 minutes” problem; this is on an em0 device.

 M

>>>
>>>
>>> Hrm, that's odd.  That sounds like a hole where interrupts aren't being
>>> reset for "reasons" that I cannot fathom.
>>>
>>> What hardware (pciconf -lv) does your system actually have?  The em(4)
>>> driver doesn't identify components which is frustrating.
>>
>> pciconf -lv output below:
>>
>> hostb0@pci0:0:0:0:   class=0x06 card=0x514d8086 chip=0x29a08086 rev=0x02 
>> hdr=0x00
>>vendor = 'Intel Corporation'
>>device = '82P965/G965 Memory Controller Hub'
>>class  = bridge
>>subclass   = HOST-PCI
> 
> I just caught this, on today’s build:
> 
> em0: Watchdog timeout Queue[0]-- resetting
> Interface is RUNNING and ACTIVE
> em0: TX Queue 0 --
> em0: hw tdh = 127, hw tdt = 139
> em0: Tx Queue Status = -2147483648
> em0: TX descriptors avail = 1012
> em0: Tx Descriptors avail failure = 0
> em0: RX Queue 0 --
> em0: hw rdh = 0, hw rdt = 1023
> em0: RX discarded packets = 0
> em0: RX Next to Check = 0
> em0: RX Next to Refresh = 1023
> 
> [graveyard] /usr/ports 09:42 pm # uname -a
> FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: 
> Sat Sep 12 15:07:54 BST 2015 
> r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE  amd64
> 
> M
> 

Any chance you can turn TSO off if its on and see what your results are?

sean
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: em broken on current amd64

2015-09-13 Thread Mark R V Murray

> On 13 Sep 2015, at 16:45, Sean Bruno  wrote:
> Any chance you can turn TSO off if its on and see what your results are?

Only TSO4 was on. I turned it off; no difference.

M
-- 
Mark R V Murray

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic on kldload/kldunload in/near callout

2015-09-13 Thread hiren panchasara
On 09/13/15 at 01:56P, hiren panchasara wrote:
> On 09/13/15 at 08:51P, Alexander V. Chernikov wrote:
> > 
> > 
> > 12.09.2015, 20:30, "hiren panchasara" :
> > > On 09/12/15 at 03:32P, Alexander V. Chernikov wrote:
> > >> ?12.09.2015, 02:22, "hiren panchasara" :
> > >> ?> On 09/11/15 at 09:06P, Hans Petter Selasky wrote:
> > >
> > > [skip]
> > >> ?> I'll try to get it. Meanwhile I am getting another panic on idle box:
> > >> ?> http://pastebin.com/9qJTFMik
> > >> ?The easiest explanation could be lack of lla_create() result check, 
> > >> fixed in r286945.
> > >> ?This panic is triggered by fast interface down-up (or just up), when 
> > >> ARP packet is received but there are no (matching) IPv4 prefix on the 
> > >> interface.
> > >> ?If this is not the case (e.g. it paniced w/o any interface changes and 
> > >> there were no other subnets in given L2 segment) I'd be happy to debug 
> > >> this further.
> > >
> > > Just hit another last night. (Box goes to db> ; let me know if you want
> > > to debug anything when that happens.)
> > Would you mind showing full backtrace for that core? (e.g. situation has to 
> > be different for newer -current).

Apparently I was using an older current than r286945. :-(
Apologies for the false alarm. I'll update again if I see any issues.

Hiren


pgp6PwCZXEFcf.pgp
Description: PGP signature


Re: em broken on current amd64

2015-09-13 Thread Warren Block

On Sat, 12 Sep 2015, Mark R V Murray wrote:


I just caught this, on today’s build:

em0: Watchdog timeout Queue[0]-- resetting
Interface is RUNNING and ACTIVE
em0: TX Queue 0 --
em0: hw tdh = 127, hw tdt = 139
em0: Tx Queue Status = -2147483648
em0: TX descriptors avail = 1012
em0: Tx Descriptors avail failure = 0
em0: RX Queue 0 --
em0: hw rdh = 0, hw rdt = 1023
em0: RX discarded packets = 0
em0: RX Next to Check = 0
em0: RX Next to Refresh = 1023

[graveyard] /usr/ports 09:42 pm # uname -a
FreeBSD graveyard.grondar.org 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r287705: Sat 
Sep 12 15:07:54 BST 2015 
r...@graveyard.grondar.org:/b/obj/usr/src/sys/G_AMD64_GATE  amd64


That happened on an amd64 10-STABLE (r287148) system here a couple of 
days ago.  Both I217-V and 82574L cards in that system, but I did not 
save the message and can't say which had the error.  Never seen before, 
has not happened again.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: kernel dtrace and current

2015-09-13 Thread Mark Johnston
On Sun, Sep 13, 2015 at 04:23:23PM +0300, Alexander V. Chernikov wrote:
> Hello all,
> 
> I keep running in
> "dtrace: failed to compile script: "/usr/lib/dtrace/psinfo.d", line 39: 
> failed to copy type of 'pr_uid': Type information is in parent and 
> unavailable"
> message more and more often while trying to trace different -current kernels.
> 
> Typically the reason besides that is the number of types embedded in kernel 
> CTF:
> ctfdump -S /boot/kernel/kernel | awk '$0~/of types/{print$6}'
> 37160
> 
> We are bound to 32k of types by CTF format (and numbers above 32k (e.g.w/ 
> highest bit set) are considered "child" types with the information stored in 
> "parent").
> ctfmerge ignores this fact and instead of yelling emits type indices above 
> 32k. On the other hand, libctf sees such indices while parsing sections and 
> since there is no
> "parent" for kernel, it emits the error above and stops.
> 
> Thankfully, r287234 really improved the situation for ctfmerge, but there are 
> still several thousands of identical structures and the total number is close 
> to 32k.

r281797 and r287234 should have fixed most instances of duplicate type
definitions. At the moment, amd64 GENERIC and GENERIC-NODEBUG have
roughly 25K types in their respective CTF containers; there is a small
handful of duplicates, but at least some of them are legitimate (some
pairs of drivers redefine the same types, e.g. aac(4)/aacraid(4) or
mps(4)/mpr(4)).

Could you post a config that results in the large number of duplicates
you mention above?

> 
> Personally I solved this by removing unneeded devices from GENERIC-inherited 
> configs.
> I wonder, however if this can be handled better.

FWIW, removing old drivers from GENERIC would be straightforward if we
could auto-load KLDs based on device IDs.

> 
> E.g. either show better error in dtrace(1) or make ctfmerge fail causing 
> kernel build to stop (since we asked for dtrace but in reality it wouldn't 
> work), or remove some stale devices from GENERIC, or .something totally 
> different?

One more radical option is to extend the width of CTF type IDs. I've
been holding off on doing this for a few reasons:
- Doing so would change the binary format, making us incompatible with
  the reference CTF code in illumos.
- Type IDs are embedded in quite a few places in the various CTF
  structures, so enlarging them from 16 bits to 32 bits will bloat
  CTF containers somewhat.
- I was under the impression that r287234 addressed the problem
  sufficiently for now.

If type ID space is still a problem post-r287234, I think it's time to
just go ahead and change the format. But first I'd like to understand
the cause of the duplication you're seeing.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"