Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Andriy Gapon
On 23/08/2015 15:54, Konstantin Belousov wrote:
> After looking at your data closely, I think you are right.  The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
> 
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use.  Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().

Thank you very much!
I no longer get the panic in the test case that previously triggered it.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread John-Mark Gurney
Konstantin Belousov wrote this message on Sun, Aug 23, 2015 at 15:54 +0300:
> On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
> > On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> > > On 12/08/2015 17:11, Lawrence Stewart wrote:
> > > > On 08/07/15 07:33, Pawel Pekala wrote:
> > > >> Hi K.,
> > > >>
> > > >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> > > >>> Is this still happening?
> > > >>
> > > >> Still crashes:
> > > > 
> > > > +1 for me running r286617
> > > 
> > > Here is another +1 with r286922.
> > > I can add a couple of bits of debugging data:
> > > 
> > > (kgdb) fr 8
> > > #8  0x80639d60 in knote (list=0xf8019a733ea0,
> > > hint=2147483648, lockflags=) at
> > > /usr/src/sys/kern/kern_event.c:1964
> > > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> > > (kgdb) p *list
> > > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> > > , kl_unlock = 0x8063a200 ,
> > >   kl_assert_locked = 0x8063a220 ,
> > > kl_assert_unlocked = 0x8063a240 ,
> > >   kl_lockarg = 0xf8019a733bb0}
> > > (kgdb) disassemble
> > > Dump of assembler code for function knote:
> > > 0x80639d00 :   push   %rbp
> > > 0x80639d01 :   mov%rsp,%rbp
> > > 0x80639d04 :   push   %r15
> > > 0x80639d06 :   push   %r14
> > > 0x80639d08 :   push   %r13
> > > 0x80639d0a :  push   %r12
> > > 0x80639d0c :  push   %rbx
> > > 0x80639d0d :  sub$0x18,%rsp
> > > 0x80639d11 :  mov%edx,%r12d
> > > 0x80639d14 :  mov%rsi,-0x30(%rbp)
> > > 0x80639d18 :  mov%rdi,%rbx
> > > 0x80639d1b :  test   %rbx,%rbx
> > > 0x80639d1e :  je 0x80639ef6 
> > > 0x80639d24 :  mov%r12d,%eax
> > > 0x80639d27 :  and$0x1,%eax
> > > 0x80639d2a :  mov%eax,-0x3c(%rbp)
> > > 0x80639d2d :  mov0x28(%rbx),%rdi
> > > 0x80639d31 :  je 0x80639d38 
> > > 0x80639d33 :  callq  *0x18(%rbx)
> > > 0x80639d36 :  jmp0x80639d42 
> > > 0x80639d38 :  callq  *0x20(%rbx)
> > > 0x80639d3b :  mov0x28(%rbx),%rdi
> > > 0x80639d3f :  callq  *0x8(%rbx)
> > > 0x80639d42 :  mov%rbx,-0x38(%rbp)
> > > 0x80639d46 :  mov(%rbx),%rbx
> > > 0x80639d49 :  test   %rbx,%rbx
> > > 0x80639d4c :  je 0x80639ee5 
> > > 0x80639d52 :  and$0x2,%r12d
> > > 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> > > 0x80639d60 :  mov0x28(%rbx),%r14
> > > 
> > > Panic is in the last quoted instruction.
> > > And:
> > > (kgdb) i reg
> > > rax0x246582
> > > rbx0xdeadc0dedeadc0de   -2401050962867404578
> > > rcx0x0  0
> > > rdx0x12e302
> > > rsi0x80a26a5a   -2136839590
> > > rdi0x80e81b80   -2132272256
> > > rbp0xfe02b7efea20   0xfe02b7efea20
> > > rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> > > r8 0x80a269ce   -2136839730
> > > r9 0x80e82838   -2132269000
> > > r100x1  65536
> > > r110x80fabd10   -2131051248
> > > r120x0  0
> > > r130xf801ff84a818   -8787511171048
> > > r140xf801ff84a800   -8787511171072
> > > r150xf8019a6974f0   -8789207452432
> > > rip0x80639d60   0x80639d60 
> > > eflags 0x10286  66182
> > > 
> > > I think that $rbx stands out here (this is a kernel with INVARIANTS).
> > > 
> > > Looking at the code, is it possible that one of the calls from within
> > > the loop's body modifies the list?  If that is so and provided that is a
> > > valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
> > 
> > This is first time a useful debugging data was posted.
> > 
> > The 0x28 offset may indicate either kn_kq member access of the struct
> > knote, or kq_list of the struct kqueue.
> > 
> > kl_list.slh_first of the list parameter is NULL, how would a list
> > iteration loop even start ?  Can you look up the list argument value
> > from the previous frame (%rdi is overwritten, so debugger might be
> > confused) ?
> 
> After looking at your data closely, I think you are right.  The panic
> occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
> only case in the tree where filter uses knlist_remove_inevent() to detach
> processed note, so indeed the slist is modified under the iterator.
> 
> Below is the patch with the suggested change and unrelated cleanup of
> the uma(9) KPI use.  Please test, everybody who has a panic with the
> backtrace pointing to the sys_exit().
> 
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index a4388aa..2f15f7f 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_

Re: Kernel panic with fresh current, probably nfs related

2015-08-23 Thread Yonghyeon PYUN
On Sat, Aug 22, 2015 at 11:25:58AM -0700, Sean Bruno wrote:
> 
> 
> 
> > I'm going to guess that you're using an "em" net driver, since that is the
> > only one that sets if_hw_tsomax > IP_MAXPACKET (65535) from what I can see.
> > 
> > Sean, EM_TSO_SIZE is defined as (65535 + sizeof(struct ether_vlan_header)),
> > which makes it > IP_MAXPACKET. The value of if_hw_tsomax must be <= 
> > IP_MAXPACKET
> > and I'm guessing this is what caused the above panic. (Someday it would be
> > nice if TSO segments > IP_MAXPACKET could be handled, but that will take 
> > changes
> > in the ip layer and router software so that a bogus ip_len field doesn't 
> > cause
> > problems.)
> > 
> > if_hw_tsomax needs to be the maximum segment size that the driver can accept
> > from IP. Since the driver adds any MAC header after accepting the TSO 
> > segment
> > from the IP layer, it shouldn't include MAC header(s) in the value for 
> > if_hw_tsomax.
> > (If its limit includes MAC header(s), it needs to subtract those out when 
> > setting
> >  if_hw_tsomax, not add them.)
> > 
> > Since I am working up a patch for the value of if_hw_tsomaxsegcount, I 
> > think I'll
> > add a check for > IP_MAXPACKET for if_hw_tsomax as well.
> > 
> > rick
> 
> Huh, ok.  You want to try something like this then?
> 
> sean
> 
> 
> Index: if_em.h
> ===
> --- if_em.h   (revision 286991)
> +++ if_em.h   (working copy)
> @@ -268,7 +268,7 @@
> 
>  #define EM_MAX_SCATTER   64
>  #define EM_VFTA_SIZE 128
> -#define EM_TSO_SIZE  (65535 + sizeof(struct ether_vlan_header))
> +#define EM_TSO_SIZE  (65535 - sizeof(struct ether_vlan_header))
>  #define EM_TSO_SEG_SIZE  4096/* Max dma segment size */
>  #define EM_MSIX_MASK 0x01F0 /* For 82574 use */
>  #define EM_MSIX_LINK 0x0100 /* For 82574 use */

I don't remember TSO details on em(4) controllers at this moment(it
had been long time ago since lastly I touched it) but I think the
controller has no additional limit on TSO size(it claims the
controller supports MS Large Send Offload so it should support up
to 64KB IP datagram) so the change would be sub-optimal.
I've attached a new diff.  It was not tested though, I don't have
em(4) controllers.


> Index: if_lem.h
> ===
> --- if_lem.h  (revision 286991)
> +++ if_lem.h  (working copy)
> @@ -238,7 +238,7 @@
> 
>  #define EM_MAX_SCATTER   64
>  #define EM_VFTA_SIZE 128
> -#define EM_TSO_SIZE  (65535 + sizeof(struct ether_vlan_header))
> +#define EM_TSO_SIZE  (65535 - sizeof(struct ether_vlan_header))
>  #define EM_TSO_SEG_SIZE  4096/* Max dma segment size */
>  #define EM_MSIX_MASK 0x01F0 /* For 82574 use */
>  #define ETH_ZLEN 60
> 

I think lem(4) does not support TSO so the change would have no
effect.  Actually all reference on TSO for lem(4) should be removed
I guess.
Index: sys/dev/e1000/if_em.c
===
--- sys/dev/e1000/if_em.c	(revision 287087)
+++ sys/dev/e1000/if_em.c	(working copy)
@@ -3044,7 +3044,7 @@ em_setup_interface(device_t dev, struct adapter *a
 	if_setioctlfn(ifp, em_ioctl);
 	if_setgetcounterfn(ifp, em_get_counter);
 	/* TSO parameters */
-	ifp->if_hw_tsomax = EM_TSO_SIZE;
+	ifp->if_hw_tsomax = IP_MAXPACKET;
 	ifp->if_hw_tsomaxsegcount = EM_MAX_SCATTER;
 	ifp->if_hw_tsomaxsegsize = EM_TSO_SEG_SIZE;
 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

racct crash/Linux Emulation

2015-08-23 Thread Larry Rosenman
got the below panio, on a linux (world community grid) process exit.


borg.lerctr.org dumped core - see /var/crash/vmcore.5

Sun Aug 23 20:14:24 CDT 2015

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #46 r287028: Sat Aug 
22 18:34:59 CDT 2015 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: racct_sub: freeing 1 of resource 11, which is more than allocated 0 for 
wcgrid_fahv_vina_pr (pid 1140)

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: racct_sub: freeing 1 of resource 11, which is more than allocated 0 for 
wcgrid_fahv_vina_pr (pid 1140)
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe2eb3859920
vpanic() at vpanic+0x189/frame 0xfe2eb38599a0
kassert_panic() at kassert_panic+0x132/frame 0xfe2eb3859a10
racct_sub() at racct_sub+0x13e/frame 0xfe2eb3859a50
exit1() at exit1+0xd4/frame 0xfe2eb3859ad0
linux_exit_group() at linux_exit_group+0xd/frame 0xfe2eb3859ae0
ia32_syscall() at ia32_syscall+0x28b/frame 0xfe2eb3859bf0
Xint0x80_syscall() at Xint0x80_syscall+0x95/frame 0xfe2eb3859bf0
--- syscall (252, Linux ELF32, linux_exit_group), rip = 0x817a9d7, rsp = 
0xca3c, rbp = 0xca58 ---
Uptime: 2m22s
Dumping 2881 out of 64454 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/linux_common.ko.symbols...done.
Loaded symbols for /boot/kernel/linux_common.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/snd_envy24ht.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_envy24ht.ko.symbols
Reading symbols from /boot/kernel/snd_spicds.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_spicds.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ichwd.ko.symbols...done.
Loaded symbols for /boot/kernel/ichwd.ko.symbols
Reading symbols from /boot/kernel/cpuctl.ko.symbols...done.
Loaded symbols for /boot/kernel/cpuctl.ko.symbols
Reading symbols from /boot/kernel/cryptodev.ko.symbols...done.
Loaded symbols for /boot/kernel/cryptodev.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols
Reading symbols from /boot/kernel/systrace.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace.ko.symbols
Reading symbols from /boot/kernel/sdt.ko.symbols...done.
Loaded symbols for /boot/kernel/sdt.ko.symbols
Reading symbols from /boot/kernel/fasttrap.ko.symbols...done.
Loaded symbols for /boot/kernel/fasttrap.ko.symbols
Reading symbols from /boot/kernel/fbt.ko.symbols...done.
Loaded symbols for /boot/kernel/fbt.ko.symbols
Reading symbols from /boot/kernel/dtnfscl.ko.symbols...done.
Loaded symbols for /boot/kernel/dtnfscl.ko.symbols
Reading symbols from /boot/kernel/dtmalloc.ko.symbols...done.
Loaded symbols for /boot/kernel/dtmalloc.ko.symbols
Reading symbols from /boot/modules/nvidia.ko...done.
Loaded symbols for /boot/modules/nvidia.ko
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/ipmi_linux.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi_linux.ko.symbols
Reading symbols from /boot/kernel/radeonkms.ko.symbols...done.
Loaded symbols for /boot/kernel/radeonkms.ko.symbols
Reading symbols from /boot/kernel/iicbb.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbb.ko.symbols
Reading symbols from /boot/kernel/iicbus.ko.symbols...done.
Loaded symbols for /boot/kernel/iicbus.ko.symbols
Reading symbols from /boot/kernel/iic.ko.symbols...done.
Loaded symbols for /boot/kernel/iic.ko.symbols
Reading symbols from /boot/kernel/drm2.ko.symbols...done.
Loaded symbols for /boot/kernel/drm2.ko.symbols
Reading symbols from /boot/kernel/radeonk

Re: Read-only /usr/obj/ no longer kosher?

2015-08-23 Thread Xin Li


On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
> I used to build world and kernel on one machine and export both /usr/src/ and
> /usr/obj read-only to other machines. It doesn't work anymore (this is from
> 'make installworld'):
> 
> ===> bin/freebsd-version (install)
> eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
> /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e " 
> s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
> s/@@BRANCH@@/${BRANCH}/g;  " 
> /usr/src/bin/freebsd-version/freebsd-version.sh.in >freebsd-version.sh ; then 
>  rm -f freebsd-version.sh ;  exit 1 ;  fi
> cannot create freebsd-version.sh: Permission denied
> rm: freebsd-version.sh: Read-only file system
> *** Error code 1

What's the modification times of
/usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
/usr/src/bin/freebsd-version/freebsd-version.sh and
/usr/src/sys/conf/newvers.sh?

Cheers,



signature.asc
Description: OpenPGP digital signature


Read-only /usr/obj/ no longer kosher?

2015-08-23 Thread Pawel Jakub Dawidek
I used to build world and kernel on one machine and export both /usr/src/ and
/usr/obj read-only to other machines. It doesn't work anymore (this is from
'make installworld'):

===> bin/freebsd-version (install)
eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
/usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e " 
s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  s/@@BRANCH@@/${BRANCH}/g; 
 " /usr/src/bin/freebsd-version/freebsd-version.sh.in >freebsd-version.sh ; 
then  rm -f freebsd-version.sh ;  exit 1 ;  fi
cannot create freebsd-version.sh: Permission denied
rm: freebsd-version.sh: Read-only file system
*** Error code 1

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp0DzHE4AU2t.pgp
Description: PGP signature


Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Konstantin Belousov
On Sun, Aug 23, 2015 at 12:08:16PM +0300, Konstantin Belousov wrote:
> On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> > On 12/08/2015 17:11, Lawrence Stewart wrote:
> > > On 08/07/15 07:33, Pawel Pekala wrote:
> > >> Hi K.,
> > >>
> > >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> > >>> Is this still happening?
> > >>
> > >> Still crashes:
> > > 
> > > +1 for me running r286617
> > 
> > Here is another +1 with r286922.
> > I can add a couple of bits of debugging data:
> > 
> > (kgdb) fr 8
> > #8  0x80639d60 in knote (list=0xf8019a733ea0,
> > hint=2147483648, lockflags=) at
> > /usr/src/sys/kern/kern_event.c:1964
> > 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> > (kgdb) p *list
> > $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> > , kl_unlock = 0x8063a200 ,
> >   kl_assert_locked = 0x8063a220 ,
> > kl_assert_unlocked = 0x8063a240 ,
> >   kl_lockarg = 0xf8019a733bb0}
> > (kgdb) disassemble
> > Dump of assembler code for function knote:
> > 0x80639d00 :   push   %rbp
> > 0x80639d01 :   mov%rsp,%rbp
> > 0x80639d04 :   push   %r15
> > 0x80639d06 :   push   %r14
> > 0x80639d08 :   push   %r13
> > 0x80639d0a :  push   %r12
> > 0x80639d0c :  push   %rbx
> > 0x80639d0d :  sub$0x18,%rsp
> > 0x80639d11 :  mov%edx,%r12d
> > 0x80639d14 :  mov%rsi,-0x30(%rbp)
> > 0x80639d18 :  mov%rdi,%rbx
> > 0x80639d1b :  test   %rbx,%rbx
> > 0x80639d1e :  je 0x80639ef6 
> > 0x80639d24 :  mov%r12d,%eax
> > 0x80639d27 :  and$0x1,%eax
> > 0x80639d2a :  mov%eax,-0x3c(%rbp)
> > 0x80639d2d :  mov0x28(%rbx),%rdi
> > 0x80639d31 :  je 0x80639d38 
> > 0x80639d33 :  callq  *0x18(%rbx)
> > 0x80639d36 :  jmp0x80639d42 
> > 0x80639d38 :  callq  *0x20(%rbx)
> > 0x80639d3b :  mov0x28(%rbx),%rdi
> > 0x80639d3f :  callq  *0x8(%rbx)
> > 0x80639d42 :  mov%rbx,-0x38(%rbp)
> > 0x80639d46 :  mov(%rbx),%rbx
> > 0x80639d49 :  test   %rbx,%rbx
> > 0x80639d4c :  je 0x80639ee5 
> > 0x80639d52 :  and$0x2,%r12d
> > 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> > 0x80639d60 :  mov0x28(%rbx),%r14
> > 
> > Panic is in the last quoted instruction.
> > And:
> > (kgdb) i reg
> > rax0x246582
> > rbx0xdeadc0dedeadc0de   -2401050962867404578
> > rcx0x0  0
> > rdx0x12e302
> > rsi0x80a26a5a   -2136839590
> > rdi0x80e81b80   -2132272256
> > rbp0xfe02b7efea20   0xfe02b7efea20
> > rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> > r8 0x80a269ce   -2136839730
> > r9 0x80e82838   -2132269000
> > r100x1  65536
> > r110x80fabd10   -2131051248
> > r120x0  0
> > r130xf801ff84a818   -8787511171048
> > r140xf801ff84a800   -8787511171072
> > r150xf8019a6974f0   -8789207452432
> > rip0x80639d60   0x80639d60 
> > eflags 0x10286  66182
> > 
> > I think that $rbx stands out here (this is a kernel with INVARIANTS).
> > 
> > Looking at the code, is it possible that one of the calls from within
> > the loop's body modifies the list?  If that is so and provided that is a
> > valid behavior, then maybe using SLIST_FOREACH_SAFE would help.
> 
> This is first time a useful debugging data was posted.
> 
> The 0x28 offset may indicate either kn_kq member access of the struct
> knote, or kq_list of the struct kqueue.
> 
> kl_list.slh_first of the list parameter is NULL, how would a list
> iteration loop even start ?  Can you look up the list argument value
> from the previous frame (%rdi is overwritten, so debugger might be
> confused) ?

After looking at your data closely, I think you are right.  The panic
occurs when the exit1(9) does KNOTE_LOCKED(NOTE_EXIT).  This is the
only case in the tree where filter uses knlist_remove_inevent() to detach
processed note, so indeed the slist is modified under the iterator.

Below is the patch with the suggested change and unrelated cleanup of
the uma(9) KPI use.  Please test, everybody who has a panic with the
backtrace pointing to the sys_exit().

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index a4388aa..2f15f7f 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -1106,7 +1106,12 @@ kqueue_register(struct kqueue *kq, struct kevent *kev, 
struct thread *td, int wa
return EINVAL;
 
if (kev->flags & EV_ADD)
-   tkn = knote_alloc(waitok);  /* prevent waiting with locks */
+   /*
+* 

Re: Instant panic while trying run ports-mgmt/poudriere

2015-08-23 Thread Konstantin Belousov
On Sun, Aug 23, 2015 at 09:54:28AM +0300, Andriy Gapon wrote:
> On 12/08/2015 17:11, Lawrence Stewart wrote:
> > On 08/07/15 07:33, Pawel Pekala wrote:
> >> Hi K.,
> >>
> >> On 2015-08-06 12:33 -0700, "K. Macy"  wrote:
> >>> Is this still happening?
> >>
> >> Still crashes:
> > 
> > +1 for me running r286617
> 
> Here is another +1 with r286922.
> I can add a couple of bits of debugging data:
> 
> (kgdb) fr 8
> #8  0x80639d60 in knote (list=0xf8019a733ea0,
> hint=2147483648, lockflags=) at
> /usr/src/sys/kern/kern_event.c:1964
> 1964} else if ((lockflags & KNF_NOKQLOCK) != 0) {
> (kgdb) p *list
> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0x8063a1e0
> , kl_unlock = 0x8063a200 ,
>   kl_assert_locked = 0x8063a220 ,
> kl_assert_unlocked = 0x8063a240 ,
>   kl_lockarg = 0xf8019a733bb0}
> (kgdb) disassemble
> Dump of assembler code for function knote:
> 0x80639d00 :   push   %rbp
> 0x80639d01 :   mov%rsp,%rbp
> 0x80639d04 :   push   %r15
> 0x80639d06 :   push   %r14
> 0x80639d08 :   push   %r13
> 0x80639d0a :  push   %r12
> 0x80639d0c :  push   %rbx
> 0x80639d0d :  sub$0x18,%rsp
> 0x80639d11 :  mov%edx,%r12d
> 0x80639d14 :  mov%rsi,-0x30(%rbp)
> 0x80639d18 :  mov%rdi,%rbx
> 0x80639d1b :  test   %rbx,%rbx
> 0x80639d1e :  je 0x80639ef6 
> 0x80639d24 :  mov%r12d,%eax
> 0x80639d27 :  and$0x1,%eax
> 0x80639d2a :  mov%eax,-0x3c(%rbp)
> 0x80639d2d :  mov0x28(%rbx),%rdi
> 0x80639d31 :  je 0x80639d38 
> 0x80639d33 :  callq  *0x18(%rbx)
> 0x80639d36 :  jmp0x80639d42 
> 0x80639d38 :  callq  *0x20(%rbx)
> 0x80639d3b :  mov0x28(%rbx),%rdi
> 0x80639d3f :  callq  *0x8(%rbx)
> 0x80639d42 :  mov%rbx,-0x38(%rbp)
> 0x80639d46 :  mov(%rbx),%rbx
> 0x80639d49 :  test   %rbx,%rbx
> 0x80639d4c :  je 0x80639ee5 
> 0x80639d52 :  and$0x2,%r12d
> 0x80639d56 :  nopw   %cs:0x0(%rax,%rax,1)
> 0x80639d60 :  mov0x28(%rbx),%r14
> 
> Panic is in the last quoted instruction.
> And:
> (kgdb) i reg
> rax0x246582
> rbx0xdeadc0dedeadc0de   -2401050962867404578
> rcx0x0  0
> rdx0x12e302
> rsi0x80a26a5a   -2136839590
> rdi0x80e81b80   -2132272256
> rbp0xfe02b7efea20   0xfe02b7efea20
> rsp0xfe02b7efe9e0   0xfe02b7efe9e0
> r8 0x80a269ce   -2136839730
> r9 0x80e82838   -2132269000
> r100x1  65536
> r110x80fabd10   -2131051248
> r120x0  0
> r130xf801ff84a818   -8787511171048
> r140xf801ff84a800   -8787511171072
> r150xf8019a6974f0   -8789207452432
> rip0x80639d60   0x80639d60 
> eflags 0x10286  66182
> 
> I think that $rbx stands out here (this is a kernel with INVARIANTS).
> 
> Looking at the code, is it possible that one of the calls from within
> the loop's body modifies the list?  If that is so and provided that is a
> valid behavior, then maybe using SLIST_FOREACH_SAFE would help.

This is first time a useful debugging data was posted.

The 0x28 offset may indicate either kn_kq member access of the struct
knote, or kq_list of the struct kqueue.

kl_list.slh_first of the list parameter is NULL, how would a list
iteration loop even start ?  Can you look up the list argument value
from the previous frame (%rdi is overwritten, so debugger might be
confused) ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"