On Sat, May 06, 2017 at 17:35 +0200, Mark Kettenis wrote:
> > Date: Fri, 5 May 2017 22:09:03 +0200 (CEST)
> > From: Mark Kettenis <[email protected]>
> >
> > Just got this panic on armv7; got a very similar panic on hppa
> > yesterday that I didn't have time to look into any further. This is
> > completely reproducable.
> >
> > setting tty flags
> > pf enabled
> > kern.allowkmem: 0 -> 1
> > starting network
> > panic: pool_do_get: mbufpl free list modified: page 0xc56a4000; item addr
> > 0xc56a4400; offset 0x0=0x0 != 0x24a4c1a
> > Stopped at $d: ldrb r15, [r15, r15, ror r15]!
> > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > *364338 59716 0 0x100003 0 0 route
> > panic+0x18
> > scp=0xc03cae90 rlv=0xc03c761c ($d)
> > rsp=0xcc574bf0 rfp=0xcc574c2c
> > pool_do_get+0xc
> > scp=0xc03c73ac rlv=0xc03c6f1c (pool_get+0x7c)
> > rsp=0xcc574c30 rfp=0xcc574c8c
> > r7=0x00000000 r6=0x00000002 r5=0xc0726408 r4=0x000000ac
> > pool_get+0x10
> > scp=0xc03c6eb0 rlv=0xc03e075c (m_get+0x2c)
> > rsp=0xcc574c90 rfp=0xcc574cbc
> > r8=0x00000044 r7=0x00000000 r6=0xc56a4300 r5=0x000000ac
> > r4=0x000000ac
> > m_get+0x10
> > scp=0xc03e0740 rlv=0xc03e1964 (m_copyback+0x1a8)
> > rsp=0xcc574cc0 rfp=0xcc574cfc
> > r10=0xc53cb300 r8=0x00000044 r7=0x00000000 r6=0xc56a4300
> > r5=0x000000ac r4=0x000000ac
> > m_copyback+0x10
> > scp=0xc03e17cc rlv=0xc044326c (route_output+0x350)
> > rsp=0xcc574d00 rfp=0xcc574d8c
> > r10=0xca435000 r9=0x00000000 r8=0xc53cb300 r7=0x00000000
> > r6=0xc56a4300 r5=0x00000001 r4=0x00000001
> > route_output+0xc
> > scp=0xc0442f28 rlv=0xc043ce04 ($a+0x154)
> > rsp=0xcc574d90 rfp=0xcc574dbc
> > r10=0xc56a4300 r9=0xcc574ea0 r8=0x00000000 r7=0xc5866780
> > r6=0xca435000 r5=0x00000009 r4=0x00000000
> > raw_usrreq+0xc
> > scp=0xc043cc00 rlv=0xc03e56ec (sosend+0x290)
> > rsp=0xcc574dc0 rfp=0xcc574e1c
> > r10=0x00000000 r8=0x00000000 r7=0xffffffd6 r6=0x00001f5c
> > r5=0xca435000 r4=0x00000000
> > sosend+0xc
> > scp=0xc03e5468 rlv=0xc03d247c (soo_write+0x2c)
> > rsp=0xcc574e20 rfp=0xcc574e3c
> > r10=0xca494140 r9=0x000000a4 r8=0xcc574f0c r7=0x00000003
> > r6=0x00000001 r5=0xcc574fb4 r4=0xcc574f0c
> > soo_write+0xc
> > scp=0xc03d245c rlv=0xc03cfa1c (dofilewritev+0x1a4)
> > rsp=0xcc574e40 rfp=0xcc574ef4
> > dofilewritev+0xc
> > scp=0xc03cf884 rlv=0xc03cfc68 (sys_write+0x80)
> > rsp=0xcc574ef8 rfp=0xcc574f3c
> > r10=0x00000028 r9=0x00000004 r8=0xcc574f74 r7=0x00000003
> > r6=0xca4a3cf4 r5=0xcc574fb4 r4=0xca494158
> > sys_write+0xc
> > scp=0xc03cfbf4 rlv=0xc054173c (swi_handler+0x174)
> > rsp=0xcc574f40 rfp=0xcc574fac
> > r8=0x00000004 r7=0xca4a3cf4 r6=0xcc574fb0 r5=0x00000003
> > r4=0xcc574fb4
> > swi_handler+0xc
> > scp=0xc05415d4 rlv=0xc0543fe8 (swi_entry+0x28)
> > rsp=0xcc574fb0 rfp=0xbffc9264
> > r10=0x00000028 r9=0x04e5abc0 r8=0x04e50f24 r7=0x04e5ac60
> > r6=0x04e5aef8 r5=0x00000000 r4=0x4e10c000
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports. Insufficient info makes it difficult to find and fix bugs.
> > ddb> ps
> > PID TID PPID UID S FLAGS WAIT COMMAND
> > *59716 364338 77999 0 7 0x100003 route
> > 77999 518782 1669 0 3 0x10008b pause sh
> > 1669 76642 1 0 3 0x10008b pause sh
> > 2702 287769 0 0 3 0x14200 pgzero zerothread
> > 58941 172655 0 0 3 0x14200 aiodoned aiodoned
> > 17148 492245 0 0 3 0x14200 syncer update
> > 86850 185874 0 0 3 0x14200 cleaner cleaner
> > 74158 246265 0 0 3 0x14200 reaper reaper
> > 96587 269786 0 0 3 0x14200 pgdaemon pagedaemon
> > 22576 337424 0 0 3 0x14200 bored crynlk
> > 17600 522232 0 0 3 0x14200 bored crypto
> > 19634 523615 0 0 3 0x14200 pftm pfpurge
> > 58943 420327 0 0 3 0x14200 usbtsk usbtask
> > 73136 234376 0 0 3 0x14200 usbatsk usbatsk
> > 47223 147942 0 0 3 0x14200 mmctsk sdmmc0
> > 56815 511243 0 0 3 0x14200 bored softnet
> > 95568 281092 0 0 3 0x14200 bored systqmp
> > 69722 514447 0 0 3 0x14200 bored systq
> > 83570 426789 0 0 3 0x40014200 bored softclock
> > 94516 155392 0 0 3 0x40014200 idle0
> > 89150 250451 0 0 3 0x14200 kmalloc kmthread
> > 1 308598 0 0 3 0x82 wait init
> > 0 0 -1 0 3 0x10200 scheduler swapper
>
> And the change that introduced the panic was:
>
> CVSROOT: /cvs
> Module name: src
> Changes by: [email protected] 2017/05/03 11:51:57
>
> Modified files:
> sys/sys : mbuf.h
>
> Log message:
> Provide a signed 64 bit integer timestamp in the mbuf packet header
>
> The precision of the timestamp is not fixed yet, but there's a strong
> argument to measure it in nanoseconds.
>
> With suggestions from kettenis, dlg, miod and deraadt.
> OK deraadt@, sthen@
>
> So there is something in the tree that doesn't like the mbuf packet
> header growth and decides to color outside the lines.
>
After looking into this with Mark, he has found out that the size of
an mbuf structure on armv7 and hppa has exceeded MSIZE (256 bytes).
The reason for that is these architectures have an 8 byte alignment
for 64 bit integer types and insert additional padding between m_hdr
and pkthdr inside the struct mbuf. This padding is not observable
when we calculate MLEN and MHLEN and thus we end up with an
incorrectly sized mbuf.
We cannot find an easy fix for this right now, so I propose to
either convert it to a 32 bit integer (for now?) or remove it
completely.
Opinions?
Index: sys/sys/mbuf.h
===================================================================
RCS file: /home/cvs/src/sys/sys/mbuf.h,v
retrieving revision 1.225
diff -u -p -r1.225 mbuf.h
--- sys/sys/mbuf.h 3 May 2017 17:51:57 -0000 1.225
+++ sys/sys/mbuf.h 7 May 2017 13:04:22 -0000
@@ -122,7 +122,7 @@ struct pkthdr_pf {
struct pkthdr {
void *ph_cookie; /* additional data */
SLIST_HEAD(, m_tag) ph_tags; /* list of packet tags */
- int64_t ph_timestamp; /* packet timestamp */
+ uint32_t ph_timestamp; /* packet timestamp */
int len; /* total packet length */
u_int16_t ph_tagsset; /* mtags attached */
u_int16_t ph_flowid; /* pseudo unique flow id */