Re: stable/11 r321349 crashing immediately
On 22 Jul, To: pz-freebsd-sta...@ziemba.us wrote: > On 22 Jul, G. Paul Ziemba wrote: >> My previous table had an error in the cumulative size column >> (keyboard made "220" into "20" when I was plugging into the hex >> calculator), so thet stack is 0x200 bigger than I originally thought: >> >> Frame Stack Pointer sz cumu function >> - - --- >> 44 0xfe085cfa8a10 amd64_syscall >> 43 0xfe085cfa88b0 160 160 syscallenter >> 42 0xfe085cfa87f0 220 380 sys_execve >> 41 0xfe085cfa87c0 30 3B0 kern_execve >> 40 0xfe085cfa8090 730 AE0 do_execve >> 39 0xfe085cfa7ec0 1D0 CB0 namei >> 38 0xfe085cfa7d40 180 E30 lookup >> 37 0xfe085cfa7cf0 50 E80 VOP_LOOKUP >> 36 0xfe085cfa7c80 70 EF0 VOP_LOOKUP_APV >> 35 0xfe085cfa7650 630 1520 nfs_lookup >> 34 0xfe085cfa75f0 60 1580 VOP_ACCESS >> 33 0xfe085cfa7580 70 15F0 VOP_ACCESS_APV >> 32 0xfe085cfa7410 170 1760 nfs_access >> 31 0xfe085cfa7240 1D0 1930 nfs34_access_otw >> 30 0xfe085cfa7060 1E0 1B10 nfsrpc_accessrpc >> 29 0xfe085cfa6fb0 B0 1BC0 nfscl_request >> 28 0xfe085cfa6b20 490 2050 newnfs_request >> 27 0xfe085cfa6980 1A0 21F0 clnt_reconnect_call >> 26 0xfe085cfa6520 460 2650 clnt_vc_call >> 25 0xfe085cfa64c0 60 26B0 sosend >> 24 0xfe085cfa6280 240 28F0 sosend_generic >> 23 0xfe085cfa6110 170 2A60 tcp_usr_send >> 22 0xfe085cfa5ca0 470 2ED0 tcp_output >> 21 0xfe085cfa5900 3A0 3270 ip_output >> 20 0xfe085cfa5880 80 32F0 looutput >> 19 0xfe085cfa5800 80 3370 if_simloop >> 18 0xfe085cfa57d0 30 33A0 netisr_queue >> 17 0xfe085cfa5780 50 33F0 netisr_queue_src >> 16 0xfe085cfa56f0 90 3480 netisr_queue_internal >> 15 0xfe085cfa56a0 50 34D0 swi_sched >> 14 0xfe085cfa5620 80 3550 intr_event_schedule_thread >> 13 0xfe085cfa55b0 70 35C0 sched_add >> 12 0xfe085cfa5490 120 36E0 sched_pickcpu >> 11 0xfe085cfa5420 70 3750 sched_lowest >> 10 0xfe085cfa52a0 180 38D0 cpu_search_lowest >> 9 0xfe085cfa52a00 38D0 cpu_search >> 8 0xfe085cfa5120 180 3A50 cpu_search_lowest >> 7 0xfe085cfa51200 3A50 cpu_search >> 6 0xfe085cfa4fa0 180 3BD0 cpu_search_lowest >> 5 0xfe0839778f40 signal handler > > The stack is aligned to a 4096 (0x1000) boundary. The first access to a > local variable below 0xfe085cfa5000 is what triggered the trap. The > other end of the stack must be at 0xfe085cfa9000 less a bit. I don't > know why the first stack pointer value in the trace is > 0xfe085cfa8a10. That would seem to indicate that amd64_syscall is > using ~1500 bytes of stack space. Actually there could be quite a bit of CPU context that gets saved. That could be sizeable on amd64. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On 22 Jul, G. Paul Ziemba wrote: > My previous table had an error in the cumulative size column > (keyboard made "220" into "20" when I was plugging into the hex > calculator), so thet stack is 0x200 bigger than I originally thought: > > Frame Stack Pointer sz cumu function > - - --- > 44 0xfe085cfa8a10 amd64_syscall > 43 0xfe085cfa88b0 160 160 syscallenter > 42 0xfe085cfa87f0 220 380 sys_execve > 41 0xfe085cfa87c0 30 3B0 kern_execve > 40 0xfe085cfa8090 730 AE0 do_execve > 39 0xfe085cfa7ec0 1D0 CB0 namei > 38 0xfe085cfa7d40 180 E30 lookup > 37 0xfe085cfa7cf0 50 E80 VOP_LOOKUP > 36 0xfe085cfa7c80 70 EF0 VOP_LOOKUP_APV > 35 0xfe085cfa7650 630 1520 nfs_lookup > 34 0xfe085cfa75f0 60 1580 VOP_ACCESS > 33 0xfe085cfa7580 70 15F0 VOP_ACCESS_APV > 32 0xfe085cfa7410 170 1760 nfs_access > 31 0xfe085cfa7240 1D0 1930 nfs34_access_otw > 30 0xfe085cfa7060 1E0 1B10 nfsrpc_accessrpc > 29 0xfe085cfa6fb0 B0 1BC0 nfscl_request > 28 0xfe085cfa6b20 490 2050 newnfs_request > 27 0xfe085cfa6980 1A0 21F0 clnt_reconnect_call > 26 0xfe085cfa6520 460 2650 clnt_vc_call > 25 0xfe085cfa64c0 60 26B0 sosend > 24 0xfe085cfa6280 240 28F0 sosend_generic > 23 0xfe085cfa6110 170 2A60 tcp_usr_send > 22 0xfe085cfa5ca0 470 2ED0 tcp_output > 21 0xfe085cfa5900 3A0 3270 ip_output > 20 0xfe085cfa5880 80 32F0 looutput > 19 0xfe085cfa5800 80 3370 if_simloop > 18 0xfe085cfa57d0 30 33A0 netisr_queue > 17 0xfe085cfa5780 50 33F0 netisr_queue_src > 16 0xfe085cfa56f0 90 3480 netisr_queue_internal > 15 0xfe085cfa56a0 50 34D0 swi_sched > 14 0xfe085cfa5620 80 3550 intr_event_schedule_thread > 13 0xfe085cfa55b0 70 35C0 sched_add > 12 0xfe085cfa5490 120 36E0 sched_pickcpu > 11 0xfe085cfa5420 70 3750 sched_lowest > 10 0xfe085cfa52a0 180 38D0 cpu_search_lowest > 9 0xfe085cfa52a00 38D0 cpu_search > 8 0xfe085cfa5120 180 3A50 cpu_search_lowest > 7 0xfe085cfa51200 3A50 cpu_search > 6 0xfe085cfa4fa0 180 3BD0 cpu_search_lowest > 5 0xfe0839778f40 signal handler The stack is aligned to a 4096 (0x1000) boundary. The first access to a local variable below 0xfe085cfa5000 is what triggered the trap. The other end of the stack must be at 0xfe085cfa9000 less a bit. I don't know why the first stack pointer value in the trace is 0xfe085cfa8a10. That would seem to indicate that amd64_syscall is using ~1500 bytes of stack space. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
My previous table had an error in the cumulative size column (keyboard made "220" into "20" when I was plugging into the hex calculator), so thet stack is 0x200 bigger than I originally thought: Frame Stack Pointer sz cumu function - - --- 44 0xfe085cfa8a10 amd64_syscall 43 0xfe085cfa88b0 160 160 syscallenter 42 0xfe085cfa87f0 220 380 sys_execve 41 0xfe085cfa87c0 30 3B0 kern_execve 40 0xfe085cfa8090 730 AE0 do_execve 39 0xfe085cfa7ec0 1D0 CB0 namei 38 0xfe085cfa7d40 180 E30 lookup 37 0xfe085cfa7cf0 50 E80 VOP_LOOKUP 36 0xfe085cfa7c80 70 EF0 VOP_LOOKUP_APV 35 0xfe085cfa7650 630 1520 nfs_lookup 34 0xfe085cfa75f0 60 1580 VOP_ACCESS 33 0xfe085cfa7580 70 15F0 VOP_ACCESS_APV 32 0xfe085cfa7410 170 1760 nfs_access 31 0xfe085cfa7240 1D0 1930 nfs34_access_otw 30 0xfe085cfa7060 1E0 1B10 nfsrpc_accessrpc 29 0xfe085cfa6fb0 B0 1BC0 nfscl_request 28 0xfe085cfa6b20 490 2050 newnfs_request 27 0xfe085cfa6980 1A0 21F0 clnt_reconnect_call 26 0xfe085cfa6520 460 2650 clnt_vc_call 25 0xfe085cfa64c0 60 26B0 sosend 24 0xfe085cfa6280 240 28F0 sosend_generic 23 0xfe085cfa6110 170 2A60 tcp_usr_send 22 0xfe085cfa5ca0 470 2ED0 tcp_output 21 0xfe085cfa5900 3A0 3270 ip_output 20 0xfe085cfa5880 80 32F0 looutput 19 0xfe085cfa5800 80 3370 if_simloop 18 0xfe085cfa57d0 30 33A0 netisr_queue 17 0xfe085cfa5780 50 33F0 netisr_queue_src 16 0xfe085cfa56f0 90 3480 netisr_queue_internal 15 0xfe085cfa56a0 50 34D0 swi_sched 14 0xfe085cfa5620 80 3550 intr_event_schedule_thread 13 0xfe085cfa55b0 70 35C0 sched_add 12 0xfe085cfa5490 120 36E0 sched_pickcpu 11 0xfe085cfa5420 70 3750 sched_lowest 10 0xfe085cfa52a0 180 38D0 cpu_search_lowest 9 0xfe085cfa52a00 38D0 cpu_search 8 0xfe085cfa5120 180 3A50 cpu_search_lowest 7 0xfe085cfa51200 3A50 cpu_search 6 0xfe085cfa4fa0 180 3BD0 cpu_search_lowest 5 0xfe0839778f40 signal handler -- G. Paul Ziemba FreeBSD unix: 5:46PM up 6:38, 8 users, load averages: 3.31, 3.79, 2.25 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Sat, Jul 22, 2017 at 01:12:29PM -0700, Don Lewis wrote: > On 21 Jul, G. Paul Ziemba wrote: > >>Your best bet for a quick workaround for the stack overflow would be to > >>rebuild the kernel with a larger value of KSTACK_PAGES. You can find > >>teh default in /usr/src/sys//conf/NOTES. I bumped it from the default 4 to 5 in /boot/loader.conf: kern.kstack_pages=5 and that prevented this crash. Uptime 5.5 hours at this point (instead of 1.5 minutes). So what's the down-side of increasing kstack_pages? What if I made it 10? I see comments elsewhere about reducing space for user-mode threads but I'm not sure what that means in practical terms, or if there is some other overarching tuning parameter that should also be increased. > Page size is 4096. Ah, I forgot to count the 2^0 bit. > It's interesting that you are running into this on amd64. Usually i386 > is the problem child. Maybe stack frames are bigger due to 64-bit variables? (And of course we get paid mostly for adding code, not so much for removing it) > >>It would probably be a good idea to compute the differences in the stack > >>pointer values between adjacent stack frames to see of any of them are > >>consuming an excessive amount of stack space. For our collective amusement, I noted the stack pointer for each frame and calculated frame size and cumulative stack consumption. If there is some other stack overhead not shown in the trace, I can see it going over 0x4000: Frame Stack Pointer sz cumu function - - --- 44 0xfe085cfa8a10 amd64_syscall 43 0xfe085cfa88b0 160 160 syscallenter 42 0xfe085cfa87f0 220 180 sys_execve 41 0xfe085cfa87c0 30 1B0 kern_execve 40 0xfe085cfa8090 730 8E0 do_execve 39 0xfe085cfa7ec0 1D0 AB0 namei 38 0xfe085cfa7d40 180 C30 lookup 37 0xfe085cfa7cf0 50 C80 VOP_LOOKUP 36 0xfe085cfa7c80 70 CF0VOP_LOOKUP_APV 35 0xfe085cfa7650 630 1320 nfs_lookup 34 0xfe085cfa75f0 60 1380 VOP_ACCESS 33 0xfe085cfa7580 70 13F0 VOP_ACCESS_APV 32 0xfe085cfa7410 170 1560 nfs_access 31 0xfe085cfa7240 1D0 1730 nfs34_access_otw 30 0xfe085cfa7060 1E0 1910 nfsrpc_accessrpc 29 0xfe085cfa6fb0 B0 19C0 nfscl_request 28 0xfe085cfa6b20 490 1E50 newnfs_request 27 0xfe085cfa6980 1A0 1FF0 clnt_reconnect_call 26 0xfe085cfa6520 460 2450 clnt_vc_call 25 0xfe085cfa64c0 60 24B0 sosend 24 0xfe085cfa6280 240 26F0 sosend_generic 23 0xfe085cfa6110 170 2860 tcp_usr_send 22 0xfe085cfa5ca0 470 2CD0 tcp_output 21 0xfe085cfa5900 3A0 3070 ip_output 20 0xfe085cfa5880 80 30F0 looutput 19 0xfe085cfa5800 80 3170 if_simloop 18 0xfe085cfa57d0 30 31A0 netisr_queue 17 0xfe085cfa5780 50 31F0 netisr_queue_src 16 0xfe085cfa56f0 90 3280 netisr_queue_internal 15 0xfe085cfa56a0 50 32D0 swi_sched 14 0xfe085cfa5620 80 3350 intr_event_schedule_thread 13 0xfe085cfa55b0 70 33C0 sched_add 12 0xfe085cfa5490 120 34E0 sched_pickcpu 11 0xfe085cfa5420 70 3550 sched_lowest 10 0xfe085cfa52a0 180 36D0 cpu_search_lowest 9 0xfe085cfa52a00 36D0 cpu_search 8 0xfe085cfa5120 180 3850 cpu_search_lowest 7 0xfe085cfa51200 3850 cpu_search 6 0xfe085cfa4fa0 180 39D0 cpu_search_lowest 5 0xfe0839778f40 signal handler -- G. Paul Ziemba FreeBSD unix: 4:36PM up 5:28, 8 users, load averages: 6.53, 7.79, 7.94 ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On 22 Jul, David Wolfskill wrote: > On Fri, Jul 21, 2017 at 04:53:18AM +, G. Paul Ziemba wrote: >> ... >> >It looks like you are trying to execute a program from an NFS file >> >system that is exported by the same host. This isn't exactly optimal >> >... >> >> Perhaps not optimal for the implementation, but I think it's a >> common NFS scenario: define a set of NFS-provided paths for files >> and use those path names on all hosts, regardless of whether they >> happen to be serving the files in question or merely clients. > > Back when I was doing sysadmin stuff for a group of engineers, my > usual approach for that sort of thing was to use amd (this was late > 1990s - 2001) to have maps so it would set up NFS mounts if the > file system being served was from a different host (from the one > running amd), but instantiating a symlink instead if the file system > resided on the current host. Same here. It's a bit messy to do this manually, but you could either use a symlink or a nullfs mount for the filesystems that are local. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On 21 Jul, G. Paul Ziemba wrote: > truck...@freebsd.org (Don Lewis) writes: > >>On 21 Jul, G. Paul Ziemba wrote: >>> GENERIC kernel r321349 results in the following about a minute after >>> multiuser boot completes. >>> >>> What additional information should I provide to assist in debugging? >>> >>> Many thanks! >>> >>> [Extracted from /var/crash/core.txt.NNN] >>> >>> KDB: stack backtrace: >>> #0 0x810f6ed7 at kdb_backtrace+0xa7 >>> #1 0x810872a9 at vpanic+0x249 >>> #2 0x81087060 at vpanic+0 >>> #3 0x817d9aca at dblfault_handler+0x10a >>> #4 0x817ae93c at Xdblfault+0xac >>> #5 0x810cf76e at cpu_search_lowest+0x35e >>> #6 0x810cf76e at cpu_search_lowest+0x35e >>> #7 0x810d5b36 at sched_lowest+0x66 >>> #8 0x810d1d92 at sched_pickcpu+0x522 >>> #9 0x810d2b03 at sched_add+0xd3 >>> #10 0x8101df5c at intr_event_schedule_thread+0x18c >>> #11 0x8101ddb0 at swi_sched+0xa0 >>> #12 0x81261643 at netisr_queue_internal+0x1d3 >>> #13 0x81261212 at netisr_queue_src+0x92 >>> #14 0x81261677 at netisr_queue+0x27 >>> #15 0x8123da5a at if_simloop+0x20a >>> #16 0x8123d83b at looutput+0x22b >>> #17 0x8131c4c6 at ip_output+0x1aa6 >>> >>> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >>> 298 dumptid = curthread->td_tid; >>> (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >>> #1 0x810867e8 in kern_reboot (howto=260) >>> at /usr/src/sys/kern/kern_shutdown.c:366 >>> #2 0x810872ff in vpanic (fmt=0x81e5f7e0 "double fault", >>> ap=0xfe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 >>> #3 0x81087060 in panic (fmt=0x81e5f7e0 "double fault") >>> at /usr/src/sys/kern/kern_shutdown.c:690 >>> #4 0x817d9aca in dblfault_handler (frame=0xfe0839778f40) >>> at /usr/src/sys/amd64/amd64/trap.c:828 >>> #5 >>> #6 0x810cf422 in cpu_search_lowest ( >>> cg=0x826ccd98, >>> low=>> 0xfe085cfa4 >>> ff8>) at /usr/src/sys/kern/sched_ule.c:782 >>> #7 0x810cf76e in cpu_search (cg=0x826cccb8 , >>> low=0xfe085cfa53b8, high=0x0, match=1) >>> at /usr/src/sys/kern/sched_ule.c:710 >>> #8 cpu_search_lowest (cg=0x826cccb8 , >>> low=0xfe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 >>> #9 0x810cf76e in cpu_search (cg=0x826ccc80 , >>> low=0xfe085cfa5430, high=0x0, match=1) >>> at /usr/src/sys/kern/sched_ule.c:710 >>> #10 cpu_search_lowest (cg=0x826ccc80 , >>> low=0xfe085cfa5430) >>> at /usr/src/sys/kern/sched_ule.c:783 >>> #11 0x810d5b36 in sched_lowest (cg=0x826ccc80 , >>> mask=..., pri=28, maxload=2147483647, prefer=4) >>> at /usr/src/sys/kern/sched_ule.c:815 >>> #12 0x810d1d92 in sched_pickcpu (td=0xf8000a3a9000, flags=4) >>> at /usr/src/sys/kern/sched_ule.c:1292 >>> #13 0x810d2b03 in sched_add (td=0xf8000a3a9000, flags=4) >>> at /usr/src/sys/kern/sched_ule.c:2447 >>> #14 0x8101df5c in intr_event_schedule_thread (ie=0xf80007e7ae00) >>> at /usr/src/sys/kern/kern_intr.c:917 >>> #15 0x8101ddb0 in swi_sched (cookie=0xf8000a386880, flags=0) >>> at /usr/src/sys/kern/kern_intr.c:1163 >>> #16 0x81261643 in netisr_queue_internal (proto=1, >>> m=0xf80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 >>> #17 0x81261212 in netisr_queue_src (proto=1, source=0, >>> m=0xf80026d00500) at /usr/src/sys/net/netisr.c:1056 >>> #18 0x81261677 in netisr_queue (proto=1, m=0xf80026d00500) >>> at /usr/src/sys/net/netisr.c:1069 >>> #19 0x8123da5a in if_simloop (ifp=0xf800116eb000, >>> m=0xf80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 >>> #20 0x8123d83b in looutput (ifp=0xf800116eb000, >>> m=0xf80026d00500, dst=0xf80026ed6550, ro=0xf80026ed6530) >>> at /usr/src/sys/net/if_loop.c:265 >>> #21 0x8131c4c6 in ip_output (m=0xf80026d00500, opt=0x0, >>> ro=0xf80026ed6530, flags=0, imo=0x0, inp=0xf80026ed63a0) >>> at /usr/src/sys/netinet/ip_output.c:655 >>> #22 0x8142e1c7 in tcp_output (tp=0xf80026eb2820) >>> at /usr/src/sys/netinet/tcp_output.c:1447 >>> #23 0x81447700 in tcp_usr_send (so=0xf80011ec2360, flags=0, >>> m=0xf80026d14d00, nam=0x0, control=0x0, td=0xf80063ba1000) >>> at /usr/src/sys/netinet/tcp_usrreq.c:967 >>> #24 0x811776f1 in sosend_generic (so=0xf80011ec2360, addr=0x0, >>> uio=0x0, top=0xf80026d14d00, control=0x0, flags=0, >>> td=0xf80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 >>> #25 0x811779bd in sosend (so=0xf80011ec2360, addr=0x0, uio=0x0, >>> top=0xf80026d14d00, control=0x0, flags=0,
Re: cannot destroy faulty zvol
Hi, On 22.07.2017 17:08, Eugene M. Zheganin wrote: is this weird error "cannot destroy: already exists" related to the fact that the zvol is faulty ? Does it indicate that metadata is probably faulty too ? Anyway, is there a way to destroy this dataset ? Follow-up: I sent a similar zvol of the thexactly same size into the faulty one, zpool errors are gone, still cannot destroy the zvol. Is this a zfs bug ? Eugene. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Fri, Jul 21, 2017 at 04:53:18AM +, G. Paul Ziemba wrote: > ... > >It looks like you are trying to execute a program from an NFS file > >system that is exported by the same host. This isn't exactly optimal > >... > > Perhaps not optimal for the implementation, but I think it's a > common NFS scenario: define a set of NFS-provided paths for files > and use those path names on all hosts, regardless of whether they > happen to be serving the files in question or merely clients. Back when I was doing sysadmin stuff for a group of engineers, my usual approach for that sort of thing was to use amd (this was late 1990s - 2001) to have maps so it would set up NFS mounts if the file system being served was from a different host (from the one running amd), but instantiating a symlink instead if the file system resided on the current host. IIRC, this was a fairly common practice with amd (and the like). > Peace, david -- David H. Wolfskill da...@catwhisker.org What kind of "investigation" would it be if it didn't "follow the money?" See http://www.catwhisker.org/~david/publickey.gpg for my public key. signature.asc Description: PGP signature
Re: stable/11 r321349 crashing immediately
truck...@freebsd.org (Don Lewis) writes: >On 21 Jul, G. Paul Ziemba wrote: >> GENERIC kernel r321349 results in the following about a minute after >> multiuser boot completes. >> >> What additional information should I provide to assist in debugging? >> >> Many thanks! >> >> [Extracted from /var/crash/core.txt.NNN] >> >> KDB: stack backtrace: >> #0 0x810f6ed7 at kdb_backtrace+0xa7 >> #1 0x810872a9 at vpanic+0x249 >> #2 0x81087060 at vpanic+0 >> #3 0x817d9aca at dblfault_handler+0x10a >> #4 0x817ae93c at Xdblfault+0xac >> #5 0x810cf76e at cpu_search_lowest+0x35e >> #6 0x810cf76e at cpu_search_lowest+0x35e >> #7 0x810d5b36 at sched_lowest+0x66 >> #8 0x810d1d92 at sched_pickcpu+0x522 >> #9 0x810d2b03 at sched_add+0xd3 >> #10 0x8101df5c at intr_event_schedule_thread+0x18c >> #11 0x8101ddb0 at swi_sched+0xa0 >> #12 0x81261643 at netisr_queue_internal+0x1d3 >> #13 0x81261212 at netisr_queue_src+0x92 >> #14 0x81261677 at netisr_queue+0x27 >> #15 0x8123da5a at if_simloop+0x20a >> #16 0x8123d83b at looutput+0x22b >> #17 0x8131c4c6 at ip_output+0x1aa6 >> >> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> 298 dumptid = curthread->td_tid; >> (kgdb) #0 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298 >> #1 0x810867e8 in kern_reboot (howto=260) >> at /usr/src/sys/kern/kern_shutdown.c:366 >> #2 0x810872ff in vpanic (fmt=0x81e5f7e0 "double fault", >> ap=0xfe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759 >> #3 0x81087060 in panic (fmt=0x81e5f7e0 "double fault") >> at /usr/src/sys/kern/kern_shutdown.c:690 >> #4 0x817d9aca in dblfault_handler (frame=0xfe0839778f40) >> at /usr/src/sys/amd64/amd64/trap.c:828 >> #5 >> #6 0x810cf422 in cpu_search_lowest ( >> cg=0x826ccd98, >> low=> 0xfe085cfa4 >> ff8>) at /usr/src/sys/kern/sched_ule.c:782 >> #7 0x810cf76e in cpu_search (cg=0x826cccb8 , >> low=0xfe085cfa53b8, high=0x0, match=1) >> at /usr/src/sys/kern/sched_ule.c:710 >> #8 cpu_search_lowest (cg=0x826cccb8 , >> low=0xfe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783 >> #9 0x810cf76e in cpu_search (cg=0x826ccc80 , >> low=0xfe085cfa5430, high=0x0, match=1) >> at /usr/src/sys/kern/sched_ule.c:710 >> #10 cpu_search_lowest (cg=0x826ccc80 , low=0xfe085cfa5430) >> at /usr/src/sys/kern/sched_ule.c:783 >> #11 0x810d5b36 in sched_lowest (cg=0x826ccc80 , >> mask=..., pri=28, maxload=2147483647, prefer=4) >> at /usr/src/sys/kern/sched_ule.c:815 >> #12 0x810d1d92 in sched_pickcpu (td=0xf8000a3a9000, flags=4) >> at /usr/src/sys/kern/sched_ule.c:1292 >> #13 0x810d2b03 in sched_add (td=0xf8000a3a9000, flags=4) >> at /usr/src/sys/kern/sched_ule.c:2447 >> #14 0x8101df5c in intr_event_schedule_thread (ie=0xf80007e7ae00) >> at /usr/src/sys/kern/kern_intr.c:917 >> #15 0x8101ddb0 in swi_sched (cookie=0xf8000a386880, flags=0) >> at /usr/src/sys/kern/kern_intr.c:1163 >> #16 0x81261643 in netisr_queue_internal (proto=1, >> m=0xf80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022 >> #17 0x81261212 in netisr_queue_src (proto=1, source=0, >> m=0xf80026d00500) at /usr/src/sys/net/netisr.c:1056 >> #18 0x81261677 in netisr_queue (proto=1, m=0xf80026d00500) >> at /usr/src/sys/net/netisr.c:1069 >> #19 0x8123da5a in if_simloop (ifp=0xf800116eb000, >> m=0xf80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358 >> #20 0x8123d83b in looutput (ifp=0xf800116eb000, >> m=0xf80026d00500, dst=0xf80026ed6550, ro=0xf80026ed6530) >> at /usr/src/sys/net/if_loop.c:265 >> #21 0x8131c4c6 in ip_output (m=0xf80026d00500, opt=0x0, >> ro=0xf80026ed6530, flags=0, imo=0x0, inp=0xf80026ed63a0) >> at /usr/src/sys/netinet/ip_output.c:655 >> #22 0x8142e1c7 in tcp_output (tp=0xf80026eb2820) >> at /usr/src/sys/netinet/tcp_output.c:1447 >> #23 0x81447700 in tcp_usr_send (so=0xf80011ec2360, flags=0, >> m=0xf80026d14d00, nam=0x0, control=0x0, td=0xf80063ba1000) >> at /usr/src/sys/netinet/tcp_usrreq.c:967 >> #24 0x811776f1 in sosend_generic (so=0xf80011ec2360, addr=0x0, >> uio=0x0, top=0xf80026d14d00, control=0x0, flags=0, >> td=0xf80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360 >> #25 0x811779bd in sosend (so=0xf80011ec2360, addr=0x0, uio=0x0, >> top=0xf80026d14d00, control=0x0, flags=0, td=0xf80063ba1000) >> at /usr/src/sys/kern/uipc_socket.c:1405 >> #26 0x815276a2 in clnt_vc_call (cl=0xf80063ca0980, >>
cannot destroy faulty zvol
Hi, I cannot destroy a zvol for a reason that I don't understand: [root@san1:~]# zfs list -t all | grep worker182 zfsroot/userdata/worker182-bad1,38G 1,52T 708M - [root@san1:~]# zfs destroy -R zfsroot/userdata/worker182-bad cannot destroy 'zfsroot/userdata/worker182-bad': dataset already exists [root@san1:~]# also noitice that this zvol is faulty: pool: zfsroot state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub in progress since Sat Jul 22 15:01:37 2017 18,7G scanned out of 130G at 75,0M/s, 0h25m to go 0 repaired, 14,43% done config: NAMESTATE READ WRITE CKSUM zfsroot ONLINE 0 0 4 mirror-0 ONLINE 0 0 8 gpt/zroot0 ONLINE 0 0 8 gpt/zroot1 ONLINE 0 0 8 errors: Permanent errors have been detected in the following files: zfsroot/userdata/worker182-bad:<0x1> <0xc7>:<0x1> is this weird error "cannot destroy: already exists" related to the fact that the zvol is faulty ? Does it indicate that metadata is probably faulty too ? Anyway, is there a way to destroy this dataset ? Thanks. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On 22.07.2017 16:57, Konstantin Belousov wrote: >>From this description, I would be not even surprised if your machine > load fits into the kstacks cache, despite cache' quite conservative > settings. In other words, almost definitely your machine is not > representative for the problematic load. Something that creates a lot of > short- and middle- lived threads would be. I'm having trouble to imagine a real-world task for today's i386 hardware or virtual machine involving heavy usage of many short/middle-lived threads. Perhaps, you have at least a synthetic benchmark so I could try to reproduce kstack/KVA fragmentation-related problem using my i386 hardware? It has 1G RAM, local 16G CompactFlash (13 GB unpartitioned) as ada0, over 200GB free within local IDE HDD (ada1) and mentioned 2TB USB 2.0 HDD. Pretty many resources for 2007 year AMD Geode system, eh? :-) It runs 11.1-PRERELEASE r318642 currently. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Sat, Jul 22, 2017 at 03:37:30PM +0700, Eugene Grosbein wrote: > On 22.07.2017 15:00, Konstantin Belousov wrote: > > On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: > >> Also, I've always wondered what load pattern one should have > >> to exhibit real kernel stack problems due to KVA fragmentation > >> and KSTACK_PAGES>2 on i386? > > In fact each stack consumes 3 contigous pages because there is also > > the guard, which catches the double faults. > > > > You need to use the machine, e.g. to run something that creates and destroys > > kernel threads, while doing something that consumes kernel_arena KVA. > > Plain malloc/zmalloc is enough. > > Does an i386 box running PPPoE connection to an ISP (mpd5) plus several > IPSEC tunnels plus PPtP tunnel plus WiFi access point plus > "transmission" torrent client with 2TB UFS volume over GEOM_CACHE > over GEOM_JOURNAL over USB qualify? There are ospfd, racoon, > sendmail, ssh and several periodic cron jobs too. I doubt that any tunnels activity causes creation and destruction of threads. Same for hostapd or routing daemons or UFS over really weird geom classes. Sendmail and cron indeed cause process creation, but the overhead of processing of these programs prevents high turnaround of new processes, typically. No idea about your torrent client. >From this description, I would be not even surprised if your machine load fits into the kstacks cache, despite cache' quite conservative settings. In other words, almost definitely your machine is not representative for the problematic load. Something that creates a lot of short- and middle- lived threads would be. > > > In other words, any non-static load would cause fragmentation preventing > > allocations of the kernel stacks for new threads. > > > >> How can I get ddb backtrace you asked for? I'm not very familiar with ddb. > >> I have serial console to such i386 system. > > > > bt command for the given thread provides the backtrace. I have no idea > > how did you obtained the numbers that you show. > > Not sure what kernel thread I too to trace... If you just need a name of the > function: > > $ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less > > 3b30 : > 3b30: 55 push %ebp > 3b31: 89 e5 mov%esp,%ebp > 3b33: 53 push %ebx > 3b34: 57 push %edi > 3b35: 56 push %esi > 3b36: 83 e4 f8and$0xfff8,%esp > 3b39: 81 ec 30 05 00 00 sub$0x530,%esp > > It uses stack for pretty large struct kinfo_vmobject (includes char > kvo_path[PATH_MAX]) > and several others. I see. It is enough information to fix your observation for vm_object.o. Patch below reduces the frame size for sysctl_vm_object_list from 1.3K to 200 bytes. This function is only executed by explicit user query. diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c index 6c6137d5fb2..b92d31c3e60 100644 --- a/sys/vm/vm_object.c +++ b/sys/vm/vm_object.c @@ -2275,7 +2315,7 @@ vm_object_vnode(vm_object_t object) static int sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) { - struct kinfo_vmobject kvo; + struct kinfo_vmobject *kvo; char *fullpath, *freepath; struct vnode *vp; struct vattr va; @@ -2300,6 +2340,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) count * 11 / 10)); } + kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK); error = 0; /* @@ -2317,13 +2358,13 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) continue; } mtx_unlock(_object_list_mtx); - kvo.kvo_size = ptoa(obj->size); - kvo.kvo_resident = obj->resident_page_count; - kvo.kvo_ref_count = obj->ref_count; - kvo.kvo_shadow_count = obj->shadow_count; - kvo.kvo_memattr = obj->memattr; - kvo.kvo_active = 0; - kvo.kvo_inactive = 0; + kvo->kvo_size = ptoa(obj->size); + kvo->kvo_resident = obj->resident_page_count; + kvo->kvo_ref_count = obj->ref_count; + kvo->kvo_shadow_count = obj->shadow_count; + kvo->kvo_memattr = obj->memattr; + kvo->kvo_active = 0; + kvo->kvo_inactive = 0; TAILQ_FOREACH(m, >memq, listq) { /* * A page may belong to the object but be @@ -2335,46 +2376,46 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS) * approximation of the system anyway. */ if (vm_page_active(m)) - kvo.kvo_active++; + kvo->kvo_active++; else if (vm_page_inactive(m)) - kvo.kvo_inactive++; +
Re: stable/11 r321349 crashing immediately
On 22.07.2017 15:00, Konstantin Belousov wrote: > On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: >> Also, I've always wondered what load pattern one should have >> to exhibit real kernel stack problems due to KVA fragmentation >> and KSTACK_PAGES>2 on i386? > In fact each stack consumes 3 contigous pages because there is also > the guard, which catches the double faults. > > You need to use the machine, e.g. to run something that creates and destroys > kernel threads, while doing something that consumes kernel_arena KVA. > Plain malloc/zmalloc is enough. Does an i386 box running PPPoE connection to an ISP (mpd5) plus several IPSEC tunnels plus PPtP tunnel plus WiFi access point plus "transmission" torrent client with 2TB UFS volume over GEOM_CACHE over GEOM_JOURNAL over USB qualify? There are ospfd, racoon, sendmail, ssh and several periodic cron jobs too. > In other words, any non-static load would cause fragmentation preventing > allocations of the kernel stacks for new threads. > >> How can I get ddb backtrace you asked for? I'm not very familiar with ddb. >> I have serial console to such i386 system. > > bt command for the given thread provides the backtrace. I have no idea > how did you obtained the numbers that you show. Not sure what kernel thread I too to trace... If you just need a name of the function: $ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less 3b30 : 3b30: 55 push %ebp 3b31: 89 e5 mov%esp,%ebp 3b33: 53 push %ebx 3b34: 57 push %edi 3b35: 56 push %esi 3b36: 83 e4 f8and$0xfff8,%esp 3b39: 81 ec 30 05 00 00 sub$0x530,%esp It uses stack for pretty large struct kinfo_vmobject (includes char kvo_path[PATH_MAX]) and several others. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote: > Also, I've always wondered what load pattern one should have > to exhibit real kernel stack problems due to KVA fragmentation > and KSTACK_PAGES>2 on i386? In fact each stack consumes 3 contigous pages because there is also the guard, which catches the double faults. You need to use the machine, e.g. to run something that creates and destroys kernel threads, while doing something that consumes kernel_arena KVA. Plain malloc/zmalloc is enough. In other words, any non-static load would cause fragmentation preventing allocations of the kernel stacks for new threads. > How can I get ddb backtrace you asked for? I'm not very familiar with ddb. > I have serial console to such i386 system. bt command for the given thread provides the backtrace. I have no idea how did you obtained the numbers that you show. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
22.07.2017 14:05, Konstantin Belousov wrote: > On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote: >> Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 > > I strongly disagree with the idea of increasing the default kernel > stack size, it will cause systematic problems for all users instead of > current state where some workloads are problematic. Finding contig > KVA ranges for larger stacks on KVA-starved architectures is not going > to work. My practice shows that increase of default kernel stack size for i386 system using IPSEC and ZFS with compression and KVA_PAGES=512/KSTACK_PAGES=4 does work. No stack-relates problems observed with such parametes. Contrary, problems quickly arise if one does not increase default kernel stack size for such i386 system. I use several such systems for years. We have src/UPDATING entries 20121223 and 20150728 stating the same. Those are linked to Errata Notes to every release since 10.2 as open issues. How many releases are we going to keep this "open"? Also, I've always wondered what load pattern one should have to exhibit real kernel stack problems due to KVA fragmentation and KSTACK_PAGES>2 on i386? > The real solution is to move allocations from stack to heap, one by one. That was not done since 10.2-RELEASE and I see that this only getting worse. > You claimed that vm/vm_object.o consumes 1.5K of stack, can you show > the ddb backtrace of this situation ? These data were collected with machine object code inspection and only some of numbers were verified by hand. I admit there may be some false positives. How can I get ddb backtrace you asked for? I'm not very familiar with ddb. I have serial console to such i386 system. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote: > Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476 I strongly disagree with the idea of increasing the default kernel stack size, it will cause systematic problems for all users instead of current state where some workloads are problematic. Finding contig KVA ranges for larger stacks on KVA-starved architectures is not going to work. The real solution is to move allocations from stack to heap, one by one. You claimed that vm/vm_object.o consumes 1.5K of stack, can you show the ddb backtrace of this situation ? ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: stable/11 r321349 crashing immediately
On Fri, Jul 21, 2017 at 10:42:42PM -0700, Don Lewis wrote: > Your best bet for a quick workaround for the stack overflow would be to > rebuild the kernel with a larger value of KSTACK_PAGES. You can find > teh default in /usr/src/sys//conf/NOTES. Or set the tunable kern.kstack_pages to the desired number of pages from the loader prompt. ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"