Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Don Lewis
On 22 Jul, To: pz-freebsd-sta...@ziemba.us wrote:
> On 22 Jul, G. Paul Ziemba wrote:
>> My previous table had an error in the cumulative size column
>> (keyboard made "220" into "20" when I was plugging into the hex
>> calculator), so thet stack is 0x200 bigger than I originally thought:
>> 
>> Frame   Stack Pointer   sz  cumu function
>> -   -   ---  
>>  44 0xfe085cfa8a10   amd64_syscall
>>  43 0xfe085cfa88b0  160  160 syscallenter
>>  42 0xfe085cfa87f0  220  380 sys_execve
>>  41 0xfe085cfa87c0   30  3B0 kern_execve
>>  40 0xfe085cfa8090  730  AE0 do_execve
>>  39 0xfe085cfa7ec0  1D0  CB0 namei
>>  38 0xfe085cfa7d40  180  E30 lookup
>>  37 0xfe085cfa7cf0   50  E80 VOP_LOOKUP
>>  36 0xfe085cfa7c80   70  EF0 VOP_LOOKUP_APV
>>  35 0xfe085cfa7650  630 1520 nfs_lookup
>>  34 0xfe085cfa75f0   60 1580 VOP_ACCESS
>>  33 0xfe085cfa7580   70 15F0 VOP_ACCESS_APV
>>  32 0xfe085cfa7410  170 1760 nfs_access
>>  31 0xfe085cfa7240  1D0 1930 nfs34_access_otw
>>  30 0xfe085cfa7060  1E0 1B10 nfsrpc_accessrpc
>>  29 0xfe085cfa6fb0   B0 1BC0 nfscl_request
>>  28 0xfe085cfa6b20  490 2050 newnfs_request
>>  27 0xfe085cfa6980  1A0 21F0 clnt_reconnect_call
>>  26 0xfe085cfa6520  460 2650 clnt_vc_call
>>  25 0xfe085cfa64c0   60 26B0 sosend
>>  24 0xfe085cfa6280  240 28F0 sosend_generic
>>  23 0xfe085cfa6110  170 2A60 tcp_usr_send
>>  22 0xfe085cfa5ca0  470 2ED0 tcp_output
>>  21 0xfe085cfa5900  3A0 3270 ip_output
>>  20 0xfe085cfa5880   80 32F0 looutput
>>  19 0xfe085cfa5800   80 3370 if_simloop
>>  18 0xfe085cfa57d0   30 33A0 netisr_queue
>>  17 0xfe085cfa5780   50 33F0 netisr_queue_src
>>  16 0xfe085cfa56f0   90 3480 netisr_queue_internal
>>  15 0xfe085cfa56a0   50 34D0 swi_sched
>>  14 0xfe085cfa5620   80 3550 intr_event_schedule_thread
>>  13 0xfe085cfa55b0   70 35C0 sched_add
>>  12 0xfe085cfa5490  120 36E0 sched_pickcpu
>>  11 0xfe085cfa5420   70 3750 sched_lowest
>>  10 0xfe085cfa52a0  180 38D0 cpu_search_lowest
>>   9 0xfe085cfa52a00 38D0 cpu_search
>>   8 0xfe085cfa5120  180 3A50 cpu_search_lowest
>>   7 0xfe085cfa51200 3A50 cpu_search
>>   6 0xfe085cfa4fa0  180 3BD0 cpu_search_lowest
>>   5 0xfe0839778f40  signal handler
> 
> The stack is aligned to a 4096 (0x1000) boundary.  The first access to a
> local variable below 0xfe085cfa5000 is what triggered the trap.  The
> other end of the stack must be at 0xfe085cfa9000 less a bit. I don't
> know why the first stack pointer value in the trace is
> 0xfe085cfa8a10. That would seem to indicate that amd64_syscall is
> using ~1500 bytes of stack space.

Actually there could be quite a bit of CPU context that gets saved. That
could be sizeable on amd64.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Don Lewis
On 22 Jul, G. Paul Ziemba wrote:
> My previous table had an error in the cumulative size column
> (keyboard made "220" into "20" when I was plugging into the hex
> calculator), so thet stack is 0x200 bigger than I originally thought:
> 
> Frame   Stack Pointer   sz  cumu function
> -   -   ---  
>  44 0xfe085cfa8a10   amd64_syscall
>  43 0xfe085cfa88b0  160  160 syscallenter
>  42 0xfe085cfa87f0  220  380 sys_execve
>  41 0xfe085cfa87c0   30  3B0 kern_execve
>  40 0xfe085cfa8090  730  AE0 do_execve
>  39 0xfe085cfa7ec0  1D0  CB0 namei
>  38 0xfe085cfa7d40  180  E30 lookup
>  37 0xfe085cfa7cf0   50  E80 VOP_LOOKUP
>  36 0xfe085cfa7c80   70  EF0 VOP_LOOKUP_APV
>  35 0xfe085cfa7650  630 1520 nfs_lookup
>  34 0xfe085cfa75f0   60 1580 VOP_ACCESS
>  33 0xfe085cfa7580   70 15F0 VOP_ACCESS_APV
>  32 0xfe085cfa7410  170 1760 nfs_access
>  31 0xfe085cfa7240  1D0 1930 nfs34_access_otw
>  30 0xfe085cfa7060  1E0 1B10 nfsrpc_accessrpc
>  29 0xfe085cfa6fb0   B0 1BC0 nfscl_request
>  28 0xfe085cfa6b20  490 2050 newnfs_request
>  27 0xfe085cfa6980  1A0 21F0 clnt_reconnect_call
>  26 0xfe085cfa6520  460 2650 clnt_vc_call
>  25 0xfe085cfa64c0   60 26B0 sosend
>  24 0xfe085cfa6280  240 28F0 sosend_generic
>  23 0xfe085cfa6110  170 2A60 tcp_usr_send
>  22 0xfe085cfa5ca0  470 2ED0 tcp_output
>  21 0xfe085cfa5900  3A0 3270 ip_output
>  20 0xfe085cfa5880   80 32F0 looutput
>  19 0xfe085cfa5800   80 3370 if_simloop
>  18 0xfe085cfa57d0   30 33A0 netisr_queue
>  17 0xfe085cfa5780   50 33F0 netisr_queue_src
>  16 0xfe085cfa56f0   90 3480 netisr_queue_internal
>  15 0xfe085cfa56a0   50 34D0 swi_sched
>  14 0xfe085cfa5620   80 3550 intr_event_schedule_thread
>  13 0xfe085cfa55b0   70 35C0 sched_add
>  12 0xfe085cfa5490  120 36E0 sched_pickcpu
>  11 0xfe085cfa5420   70 3750 sched_lowest
>  10 0xfe085cfa52a0  180 38D0 cpu_search_lowest
>   9 0xfe085cfa52a00 38D0 cpu_search
>   8 0xfe085cfa5120  180 3A50 cpu_search_lowest
>   7 0xfe085cfa51200 3A50 cpu_search
>   6 0xfe085cfa4fa0  180 3BD0 cpu_search_lowest
>   5 0xfe0839778f40  signal handler

The stack is aligned to a 4096 (0x1000) boundary.  The first access to a
local variable below 0xfe085cfa5000 is what triggered the trap.  The
other end of the stack must be at 0xfe085cfa9000 less a bit. I don't
know why the first stack pointer value in the trace is
0xfe085cfa8a10. That would seem to indicate that amd64_syscall is
using ~1500 bytes of stack space.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread G. Paul Ziemba
My previous table had an error in the cumulative size column
(keyboard made "220" into "20" when I was plugging into the hex
calculator), so thet stack is 0x200 bigger than I originally thought:

Frame   Stack Pointer   sz  cumu function
-   -   ---  
 44 0xfe085cfa8a10   amd64_syscall
 43 0xfe085cfa88b0  160  160 syscallenter
 42 0xfe085cfa87f0  220  380 sys_execve
 41 0xfe085cfa87c0   30  3B0 kern_execve
 40 0xfe085cfa8090  730  AE0 do_execve
 39 0xfe085cfa7ec0  1D0  CB0 namei
 38 0xfe085cfa7d40  180  E30 lookup
 37 0xfe085cfa7cf0   50  E80 VOP_LOOKUP
 36 0xfe085cfa7c80   70  EF0 VOP_LOOKUP_APV
 35 0xfe085cfa7650  630 1520 nfs_lookup
 34 0xfe085cfa75f0   60 1580 VOP_ACCESS
 33 0xfe085cfa7580   70 15F0 VOP_ACCESS_APV
 32 0xfe085cfa7410  170 1760 nfs_access
 31 0xfe085cfa7240  1D0 1930 nfs34_access_otw
 30 0xfe085cfa7060  1E0 1B10 nfsrpc_accessrpc
 29 0xfe085cfa6fb0   B0 1BC0 nfscl_request
 28 0xfe085cfa6b20  490 2050 newnfs_request
 27 0xfe085cfa6980  1A0 21F0 clnt_reconnect_call
 26 0xfe085cfa6520  460 2650 clnt_vc_call
 25 0xfe085cfa64c0   60 26B0 sosend
 24 0xfe085cfa6280  240 28F0 sosend_generic
 23 0xfe085cfa6110  170 2A60 tcp_usr_send
 22 0xfe085cfa5ca0  470 2ED0 tcp_output
 21 0xfe085cfa5900  3A0 3270 ip_output
 20 0xfe085cfa5880   80 32F0 looutput
 19 0xfe085cfa5800   80 3370 if_simloop
 18 0xfe085cfa57d0   30 33A0 netisr_queue
 17 0xfe085cfa5780   50 33F0 netisr_queue_src
 16 0xfe085cfa56f0   90 3480 netisr_queue_internal
 15 0xfe085cfa56a0   50 34D0 swi_sched
 14 0xfe085cfa5620   80 3550 intr_event_schedule_thread
 13 0xfe085cfa55b0   70 35C0 sched_add
 12 0xfe085cfa5490  120 36E0 sched_pickcpu
 11 0xfe085cfa5420   70 3750 sched_lowest
 10 0xfe085cfa52a0  180 38D0 cpu_search_lowest
  9 0xfe085cfa52a00 38D0 cpu_search
  8 0xfe085cfa5120  180 3A50 cpu_search_lowest
  7 0xfe085cfa51200 3A50 cpu_search
  6 0xfe085cfa4fa0  180 3BD0 cpu_search_lowest
  5 0xfe0839778f40  signal handler


-- 
G. Paul Ziemba
FreeBSD unix:
 5:46PM  up  6:38, 8 users, load averages: 3.31, 3.79, 2.25
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread G. Paul Ziemba
On Sat, Jul 22, 2017 at 01:12:29PM -0700, Don Lewis wrote:
> On 21 Jul, G. Paul Ziemba wrote:

> >>Your best bet for a quick workaround for the stack overflow would be to
> >>rebuild the kernel with a larger value of KSTACK_PAGES.  You can find
> >>teh default in /usr/src/sys//conf/NOTES.

I bumped it from the default 4 to 5 in /boot/loader.conf:

kern.kstack_pages=5

and that prevented this crash. Uptime 5.5 hours at this point (instead of
1.5 minutes).

So what's the down-side of increasing kstack_pages? What if I made it 10?
I see comments elsewhere about reducing space for user-mode threads but I'm
not sure what that means in practical terms, or if there is some other
overarching tuning parameter that should also be increased.

> Page size is 4096.

Ah, I forgot to count the 2^0 bit.

> It's interesting that you are running into this on amd64.  Usually i386
> is the problem child.

Maybe stack frames are bigger due to 64-bit variables? (And of course
we get paid mostly for adding code, not so much for removing it)

> >>It would probably be a good idea to compute the differences in the stack
> >>pointer values between adjacent stack frames to see of any of them are
> >>consuming an excessive amount of stack space.

For our collective amusement, I noted the stack pointer for each frame and
calculated frame size and cumulative stack consumption. If there is some
other stack overhead not shown in the trace, I can see it going over 0x4000:

Frame   Stack Pointer   sz  cumu function
-   -   ---  
 44 0xfe085cfa8a10   amd64_syscall
 43 0xfe085cfa88b0  160  160 syscallenter
 42 0xfe085cfa87f0  220  180 sys_execve
 41 0xfe085cfa87c0   30  1B0 kern_execve
 40 0xfe085cfa8090  730  8E0 do_execve
 39 0xfe085cfa7ec0  1D0  AB0 namei
 38 0xfe085cfa7d40  180  C30 lookup
 37 0xfe085cfa7cf0   50  C80 VOP_LOOKUP
 36 0xfe085cfa7c80   70  CF0VOP_LOOKUP_APV
 35 0xfe085cfa7650  630 1320 nfs_lookup
 34 0xfe085cfa75f0   60 1380 VOP_ACCESS
 33 0xfe085cfa7580   70 13F0 VOP_ACCESS_APV
 32 0xfe085cfa7410  170 1560 nfs_access
 31 0xfe085cfa7240  1D0 1730 nfs34_access_otw
 30 0xfe085cfa7060  1E0 1910 nfsrpc_accessrpc
 29 0xfe085cfa6fb0   B0 19C0 nfscl_request
 28 0xfe085cfa6b20  490 1E50 newnfs_request
 27 0xfe085cfa6980  1A0 1FF0 clnt_reconnect_call
 26 0xfe085cfa6520  460 2450 clnt_vc_call
 25 0xfe085cfa64c0   60 24B0 sosend
 24 0xfe085cfa6280  240 26F0 sosend_generic
 23 0xfe085cfa6110  170 2860 tcp_usr_send
 22 0xfe085cfa5ca0  470 2CD0 tcp_output
 21 0xfe085cfa5900  3A0 3070 ip_output
 20 0xfe085cfa5880   80 30F0 looutput
 19 0xfe085cfa5800   80 3170 if_simloop
 18 0xfe085cfa57d0   30 31A0 netisr_queue
 17 0xfe085cfa5780   50 31F0 netisr_queue_src
 16 0xfe085cfa56f0   90 3280 netisr_queue_internal
 15 0xfe085cfa56a0   50 32D0 swi_sched
 14 0xfe085cfa5620   80 3350 intr_event_schedule_thread
 13 0xfe085cfa55b0   70 33C0 sched_add
 12 0xfe085cfa5490  120 34E0 sched_pickcpu
 11 0xfe085cfa5420   70 3550 sched_lowest
 10 0xfe085cfa52a0  180 36D0 cpu_search_lowest
  9 0xfe085cfa52a00 36D0 cpu_search
  8 0xfe085cfa5120  180 3850 cpu_search_lowest
  7 0xfe085cfa51200 3850 cpu_search
  6 0xfe085cfa4fa0  180 39D0 cpu_search_lowest
  5 0xfe0839778f40  signal handler

-- 
G. Paul Ziemba
FreeBSD unix:
 4:36PM  up  5:28, 8 users, load averages: 6.53, 7.79, 7.94
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Don Lewis
On 22 Jul, David Wolfskill wrote:
> On Fri, Jul 21, 2017 at 04:53:18AM +, G. Paul Ziemba wrote:
>> ...
>> >It looks like you are trying to execute a program from an NFS file
>> >system that is exported by the same host.  This isn't exactly optimal
>> >...
>> 
>> Perhaps not optimal for the implementation, but I think it's a
>> common NFS scenario: define a set of NFS-provided paths for files
>> and use those path names on all hosts, regardless of whether they
>> happen to be serving the files in question or merely clients.
> 
> Back when I was doing sysadmin stuff for a group of engineers, my
> usual approach for that sort of thing was to use amd (this was late
> 1990s - 2001) to have maps so it would set up NFS mounts if the
> file system being served was from a different host (from the one
> running amd), but instantiating a symlink instead if the file system
> resided on the current host.

Same here.

It's a bit messy to do this manually, but you could either use a symlink
or a nullfs mount for the filesystems that are local.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Don Lewis
On 21 Jul, G. Paul Ziemba wrote:
> truck...@freebsd.org (Don Lewis) writes:
> 
>>On 21 Jul, G. Paul Ziemba wrote:
>>> GENERIC kernel r321349 results in the following about a minute after
>>> multiuser boot completes.
>>> 
>>> What additional information should I provide to assist in debugging?
>>> 
>>> Many thanks!
>>> 
>>> [Extracted from /var/crash/core.txt.NNN]
>>> 
>>> KDB: stack backtrace:
>>> #0 0x810f6ed7 at kdb_backtrace+0xa7
>>> #1 0x810872a9 at vpanic+0x249
>>> #2 0x81087060 at vpanic+0
>>> #3 0x817d9aca at dblfault_handler+0x10a
>>> #4 0x817ae93c at Xdblfault+0xac
>>> #5 0x810cf76e at cpu_search_lowest+0x35e
>>> #6 0x810cf76e at cpu_search_lowest+0x35e
>>> #7 0x810d5b36 at sched_lowest+0x66
>>> #8 0x810d1d92 at sched_pickcpu+0x522
>>> #9 0x810d2b03 at sched_add+0xd3
>>> #10 0x8101df5c at intr_event_schedule_thread+0x18c
>>> #11 0x8101ddb0 at swi_sched+0xa0
>>> #12 0x81261643 at netisr_queue_internal+0x1d3
>>> #13 0x81261212 at netisr_queue_src+0x92
>>> #14 0x81261677 at netisr_queue+0x27
>>> #15 0x8123da5a at if_simloop+0x20a
>>> #16 0x8123d83b at looutput+0x22b
>>> #17 0x8131c4c6 at ip_output+0x1aa6
>>> 
>>> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
>>> 298 dumptid = curthread->td_tid;
>>> (kgdb) #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
>>> #1  0x810867e8 in kern_reboot (howto=260)
>>> at /usr/src/sys/kern/kern_shutdown.c:366
>>> #2  0x810872ff in vpanic (fmt=0x81e5f7e0 "double fault", 
>>> ap=0xfe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759
>>> #3  0x81087060 in panic (fmt=0x81e5f7e0 "double fault")
>>> at /usr/src/sys/kern/kern_shutdown.c:690
>>> #4  0x817d9aca in dblfault_handler (frame=0xfe0839778f40)
>>> at /usr/src/sys/amd64/amd64/trap.c:828
>>> #5  
>>> #6  0x810cf422 in cpu_search_lowest (
>>> cg=0x826ccd98 , 
>>> low=>> 0xfe085cfa4
>>> ff8>) at /usr/src/sys/kern/sched_ule.c:782
>>> #7  0x810cf76e in cpu_search (cg=0x826cccb8 , 
>>> low=0xfe085cfa53b8, high=0x0, match=1)
>>> at /usr/src/sys/kern/sched_ule.c:710
>>> #8  cpu_search_lowest (cg=0x826cccb8 , 
>>> low=0xfe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783
>>> #9  0x810cf76e in cpu_search (cg=0x826ccc80 , 
>>> low=0xfe085cfa5430, high=0x0, match=1)
>>> at /usr/src/sys/kern/sched_ule.c:710
>>> #10 cpu_search_lowest (cg=0x826ccc80 , 
>>> low=0xfe085cfa5430)
>>> at /usr/src/sys/kern/sched_ule.c:783
>>> #11 0x810d5b36 in sched_lowest (cg=0x826ccc80 , 
>>> mask=..., pri=28, maxload=2147483647, prefer=4)
>>> at /usr/src/sys/kern/sched_ule.c:815
>>> #12 0x810d1d92 in sched_pickcpu (td=0xf8000a3a9000, flags=4)
>>> at /usr/src/sys/kern/sched_ule.c:1292
>>> #13 0x810d2b03 in sched_add (td=0xf8000a3a9000, flags=4)
>>> at /usr/src/sys/kern/sched_ule.c:2447
>>> #14 0x8101df5c in intr_event_schedule_thread (ie=0xf80007e7ae00)
>>> at /usr/src/sys/kern/kern_intr.c:917
>>> #15 0x8101ddb0 in swi_sched (cookie=0xf8000a386880, flags=0)
>>> at /usr/src/sys/kern/kern_intr.c:1163
>>> #16 0x81261643 in netisr_queue_internal (proto=1, 
>>> m=0xf80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022
>>> #17 0x81261212 in netisr_queue_src (proto=1, source=0, 
>>> m=0xf80026d00500) at /usr/src/sys/net/netisr.c:1056
>>> #18 0x81261677 in netisr_queue (proto=1, m=0xf80026d00500)
>>> at /usr/src/sys/net/netisr.c:1069
>>> #19 0x8123da5a in if_simloop (ifp=0xf800116eb000, 
>>> m=0xf80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358
>>> #20 0x8123d83b in looutput (ifp=0xf800116eb000, 
>>> m=0xf80026d00500, dst=0xf80026ed6550, ro=0xf80026ed6530)
>>> at /usr/src/sys/net/if_loop.c:265
>>> #21 0x8131c4c6 in ip_output (m=0xf80026d00500, opt=0x0, 
>>> ro=0xf80026ed6530, flags=0, imo=0x0, inp=0xf80026ed63a0)
>>> at /usr/src/sys/netinet/ip_output.c:655
>>> #22 0x8142e1c7 in tcp_output (tp=0xf80026eb2820)
>>> at /usr/src/sys/netinet/tcp_output.c:1447
>>> #23 0x81447700 in tcp_usr_send (so=0xf80011ec2360, flags=0, 
>>> m=0xf80026d14d00, nam=0x0, control=0x0, td=0xf80063ba1000)
>>> at /usr/src/sys/netinet/tcp_usrreq.c:967
>>> #24 0x811776f1 in sosend_generic (so=0xf80011ec2360, addr=0x0, 
>>> uio=0x0, top=0xf80026d14d00, control=0x0, flags=0, 
>>> td=0xf80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360
>>> #25 0x811779bd in sosend (so=0xf80011ec2360, addr=0x0, uio=0x0, 
>>> top=0xf80026d14d00, control=0x0, flags=0, 

Re: cannot destroy faulty zvol

2017-07-22 Thread Eugene M. Zheganin

Hi,

On 22.07.2017 17:08, Eugene M. Zheganin wrote:


is this weird error "cannot destroy: already exists" related to the 
fact that the zvol is faulty ? Does it indicate that metadata is 
probably faulty too ? Anyway, is there a way to destroy this dataset ?
Follow-up: I sent a similar zvol of the thexactly same size into the 
faulty one, zpool errors are gone, still cannot destroy the zvol. Is 
this a zfs bug ?


Eugene.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread David Wolfskill
On Fri, Jul 21, 2017 at 04:53:18AM +, G. Paul Ziemba wrote:
> ...
> >It looks like you are trying to execute a program from an NFS file
> >system that is exported by the same host.  This isn't exactly optimal
> >...
> 
> Perhaps not optimal for the implementation, but I think it's a
> common NFS scenario: define a set of NFS-provided paths for files
> and use those path names on all hosts, regardless of whether they
> happen to be serving the files in question or merely clients.

Back when I was doing sysadmin stuff for a group of engineers, my
usual approach for that sort of thing was to use amd (this was late
1990s - 2001) to have maps so it would set up NFS mounts if the
file system being served was from a different host (from the one
running amd), but instantiating a symlink instead if the file system
resided on the current host.

IIRC, this was a fairly common practice with amd (and the like).

> 

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
What kind of "investigation" would it be if it didn't "follow the money?"

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread G. Paul Ziemba
truck...@freebsd.org (Don Lewis) writes:

>On 21 Jul, G. Paul Ziemba wrote:
>> GENERIC kernel r321349 results in the following about a minute after
>> multiuser boot completes.
>> 
>> What additional information should I provide to assist in debugging?
>> 
>> Many thanks!
>> 
>> [Extracted from /var/crash/core.txt.NNN]
>> 
>> KDB: stack backtrace:
>> #0 0x810f6ed7 at kdb_backtrace+0xa7
>> #1 0x810872a9 at vpanic+0x249
>> #2 0x81087060 at vpanic+0
>> #3 0x817d9aca at dblfault_handler+0x10a
>> #4 0x817ae93c at Xdblfault+0xac
>> #5 0x810cf76e at cpu_search_lowest+0x35e
>> #6 0x810cf76e at cpu_search_lowest+0x35e
>> #7 0x810d5b36 at sched_lowest+0x66
>> #8 0x810d1d92 at sched_pickcpu+0x522
>> #9 0x810d2b03 at sched_add+0xd3
>> #10 0x8101df5c at intr_event_schedule_thread+0x18c
>> #11 0x8101ddb0 at swi_sched+0xa0
>> #12 0x81261643 at netisr_queue_internal+0x1d3
>> #13 0x81261212 at netisr_queue_src+0x92
>> #14 0x81261677 at netisr_queue+0x27
>> #15 0x8123da5a at if_simloop+0x20a
>> #16 0x8123d83b at looutput+0x22b
>> #17 0x8131c4c6 at ip_output+0x1aa6
>> 
>> doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
>> 298 dumptid = curthread->td_tid;
>> (kgdb) #0  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
>> #1  0x810867e8 in kern_reboot (howto=260)
>> at /usr/src/sys/kern/kern_shutdown.c:366
>> #2  0x810872ff in vpanic (fmt=0x81e5f7e0 "double fault", 
>> ap=0xfe0839778ec0) at /usr/src/sys/kern/kern_shutdown.c:759
>> #3  0x81087060 in panic (fmt=0x81e5f7e0 "double fault")
>> at /usr/src/sys/kern/kern_shutdown.c:690
>> #4  0x817d9aca in dblfault_handler (frame=0xfe0839778f40)
>> at /usr/src/sys/amd64/amd64/trap.c:828
>> #5  
>> #6  0x810cf422 in cpu_search_lowest (
>> cg=0x826ccd98 , 
>> low=> 0xfe085cfa4
>> ff8>) at /usr/src/sys/kern/sched_ule.c:782
>> #7  0x810cf76e in cpu_search (cg=0x826cccb8 , 
>> low=0xfe085cfa53b8, high=0x0, match=1)
>> at /usr/src/sys/kern/sched_ule.c:710
>> #8  cpu_search_lowest (cg=0x826cccb8 , 
>> low=0xfe085cfa53b8) at /usr/src/sys/kern/sched_ule.c:783
>> #9  0x810cf76e in cpu_search (cg=0x826ccc80 , 
>> low=0xfe085cfa5430, high=0x0, match=1)
>> at /usr/src/sys/kern/sched_ule.c:710
>> #10 cpu_search_lowest (cg=0x826ccc80 , low=0xfe085cfa5430)
>> at /usr/src/sys/kern/sched_ule.c:783
>> #11 0x810d5b36 in sched_lowest (cg=0x826ccc80 , 
>> mask=..., pri=28, maxload=2147483647, prefer=4)
>> at /usr/src/sys/kern/sched_ule.c:815
>> #12 0x810d1d92 in sched_pickcpu (td=0xf8000a3a9000, flags=4)
>> at /usr/src/sys/kern/sched_ule.c:1292
>> #13 0x810d2b03 in sched_add (td=0xf8000a3a9000, flags=4)
>> at /usr/src/sys/kern/sched_ule.c:2447
>> #14 0x8101df5c in intr_event_schedule_thread (ie=0xf80007e7ae00)
>> at /usr/src/sys/kern/kern_intr.c:917
>> #15 0x8101ddb0 in swi_sched (cookie=0xf8000a386880, flags=0)
>> at /usr/src/sys/kern/kern_intr.c:1163
>> #16 0x81261643 in netisr_queue_internal (proto=1, 
>> m=0xf80026d00500, cpuid=0) at /usr/src/sys/net/netisr.c:1022
>> #17 0x81261212 in netisr_queue_src (proto=1, source=0, 
>> m=0xf80026d00500) at /usr/src/sys/net/netisr.c:1056
>> #18 0x81261677 in netisr_queue (proto=1, m=0xf80026d00500)
>> at /usr/src/sys/net/netisr.c:1069
>> #19 0x8123da5a in if_simloop (ifp=0xf800116eb000, 
>> m=0xf80026d00500, af=2, hlen=0) at /usr/src/sys/net/if_loop.c:358
>> #20 0x8123d83b in looutput (ifp=0xf800116eb000, 
>> m=0xf80026d00500, dst=0xf80026ed6550, ro=0xf80026ed6530)
>> at /usr/src/sys/net/if_loop.c:265
>> #21 0x8131c4c6 in ip_output (m=0xf80026d00500, opt=0x0, 
>> ro=0xf80026ed6530, flags=0, imo=0x0, inp=0xf80026ed63a0)
>> at /usr/src/sys/netinet/ip_output.c:655
>> #22 0x8142e1c7 in tcp_output (tp=0xf80026eb2820)
>> at /usr/src/sys/netinet/tcp_output.c:1447
>> #23 0x81447700 in tcp_usr_send (so=0xf80011ec2360, flags=0, 
>> m=0xf80026d14d00, nam=0x0, control=0x0, td=0xf80063ba1000)
>> at /usr/src/sys/netinet/tcp_usrreq.c:967
>> #24 0x811776f1 in sosend_generic (so=0xf80011ec2360, addr=0x0, 
>> uio=0x0, top=0xf80026d14d00, control=0x0, flags=0, 
>> td=0xf80063ba1000) at /usr/src/sys/kern/uipc_socket.c:1360
>> #25 0x811779bd in sosend (so=0xf80011ec2360, addr=0x0, uio=0x0, 
>> top=0xf80026d14d00, control=0x0, flags=0, td=0xf80063ba1000)
>> at /usr/src/sys/kern/uipc_socket.c:1405
>> #26 0x815276a2 in clnt_vc_call (cl=0xf80063ca0980, 
>> 

cannot destroy faulty zvol

2017-07-22 Thread Eugene M. Zheganin

Hi,


I cannot destroy a zvol for a reason that I don't understand:


[root@san1:~]# zfs list -t all | grep worker182
zfsroot/userdata/worker182-bad1,38G  1,52T   708M  -
[root@san1:~]# zfs destroy -R zfsroot/userdata/worker182-bad
cannot destroy 'zfsroot/userdata/worker182-bad': dataset already exists
[root@san1:~]#


also noitice that this zvol is faulty:


  pool: zfsroot
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Sat Jul 22 15:01:37 2017
18,7G scanned out of 130G at 75,0M/s, 0h25m to go
0 repaired, 14,43% done
config:

NAMESTATE READ WRITE CKSUM
zfsroot ONLINE   0 0 4
  mirror-0  ONLINE   0 0 8
gpt/zroot0  ONLINE   0 0 8
gpt/zroot1  ONLINE   0 0 8

errors: Permanent errors have been detected in the following files:

zfsroot/userdata/worker182-bad:<0x1>
<0xc7>:<0x1>


is this weird error "cannot destroy: already exists" related to the fact 
that the zvol is faulty ? Does it indicate that metadata is probably 
faulty too ? Anyway, is there a way to destroy this dataset ?



Thanks.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Eugene Grosbein
On 22.07.2017 16:57, Konstantin Belousov wrote:

>>From this description, I would be not even surprised if your machine
> load fits into the kstacks cache, despite cache' quite conservative
> settings. In other words, almost definitely your machine is not
> representative for the problematic load. Something that creates a lot of
> short- and middle- lived threads would be.

I'm having trouble to imagine a real-world task for today's
i386 hardware or virtual machine involving heavy usage of many 
short/middle-lived threads.

Perhaps, you have at least a synthetic benchmark so I could try to reproduce
kstack/KVA fragmentation-related problem using my i386 hardware?

It has 1G RAM, local 16G CompactFlash (13 GB unpartitioned) as ada0,
over 200GB free within local IDE HDD (ada1) and mentioned 2TB USB 2.0 HDD.
Pretty many resources for 2007 year AMD Geode system, eh? :-)

It runs 11.1-PRERELEASE r318642 currently.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Konstantin Belousov
On Sat, Jul 22, 2017 at 03:37:30PM +0700, Eugene Grosbein wrote:
> On 22.07.2017 15:00, Konstantin Belousov wrote:
> > On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote:
> >> Also, I've always wondered what load pattern one should have
> >> to exhibit real kernel stack problems due to KVA fragmentation
> >> and KSTACK_PAGES>2 on i386?
> > In fact each stack consumes 3 contigous pages because there is also
> > the guard, which catches the double faults.
> > 
> > You need to use the machine, e.g. to run something that creates and destroys
> > kernel threads, while doing something that consumes kernel_arena KVA.
> > Plain malloc/zmalloc is enough.
> 
> Does an i386 box running PPPoE connection to an ISP (mpd5) plus several
> IPSEC tunnels plus PPtP tunnel plus WiFi access point plus
> "transmission" torrent client with 2TB UFS volume over GEOM_CACHE
> over GEOM_JOURNAL over USB qualify? There are ospfd, racoon,
> sendmail, ssh and several periodic cron jobs too.
I doubt that any tunnels activity causes creation and destruction of threads.
Same for hostapd or routing daemons or UFS over really weird geom classes.

Sendmail and cron indeed cause process creation, but the overhead of
processing of these programs prevents high turnaround of new processes,
typically.  No idea about your torrent client.

>From this description, I would be not even surprised if your machine
load fits into the kstacks cache, despite cache' quite conservative
settings. In other words, almost definitely your machine is not
representative for the problematic load. Something that creates a lot of
short- and middle- lived threads would be.


> 
> > In other words, any non-static load would cause fragmentation preventing
> > allocations of the kernel stacks for new threads.
> > 
> >> How can I get ddb backtrace you asked for? I'm not very familiar with ddb.
> >> I have serial console to such i386 system.
> > 
> > bt command for the given thread provides the backtrace.  I have no idea
> > how did you obtained the numbers that you show.
> 
> Not sure what kernel thread I too to trace... If you just need a name of the 
> function:
> 
> $ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less
> 
> 3b30 :
> 3b30:   55  push   %ebp
> 3b31:   89 e5   mov%esp,%ebp
> 3b33:   53  push   %ebx
> 3b34:   57  push   %edi
> 3b35:   56  push   %esi
> 3b36:   83 e4 f8and$0xfff8,%esp
> 3b39:   81 ec 30 05 00 00   sub$0x530,%esp
> 
> It uses stack for pretty large struct kinfo_vmobject (includes char 
> kvo_path[PATH_MAX])
> and several others.
I see. It is enough information to fix your observation for vm_object.o.
Patch below reduces the frame size for sysctl_vm_object_list from 1.3K
to 200 bytes.  This function is only executed by explicit user query.

diff --git a/sys/vm/vm_object.c b/sys/vm/vm_object.c
index 6c6137d5fb2..b92d31c3e60 100644
--- a/sys/vm/vm_object.c
+++ b/sys/vm/vm_object.c
@@ -2275,7 +2315,7 @@ vm_object_vnode(vm_object_t object)
 static int
 sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 {
-   struct kinfo_vmobject kvo;
+   struct kinfo_vmobject *kvo;
char *fullpath, *freepath;
struct vnode *vp;
struct vattr va;
@@ -2300,6 +2340,7 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
count * 11 / 10));
}
 
+   kvo = malloc(sizeof(*kvo), M_TEMP, M_WAITOK);
error = 0;
 
/*
@@ -2317,13 +2358,13 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
continue;
}
mtx_unlock(_object_list_mtx);
-   kvo.kvo_size = ptoa(obj->size);
-   kvo.kvo_resident = obj->resident_page_count;
-   kvo.kvo_ref_count = obj->ref_count;
-   kvo.kvo_shadow_count = obj->shadow_count;
-   kvo.kvo_memattr = obj->memattr;
-   kvo.kvo_active = 0;
-   kvo.kvo_inactive = 0;
+   kvo->kvo_size = ptoa(obj->size);
+   kvo->kvo_resident = obj->resident_page_count;
+   kvo->kvo_ref_count = obj->ref_count;
+   kvo->kvo_shadow_count = obj->shadow_count;
+   kvo->kvo_memattr = obj->memattr;
+   kvo->kvo_active = 0;
+   kvo->kvo_inactive = 0;
TAILQ_FOREACH(m, >memq, listq) {
/*
 * A page may belong to the object but be
@@ -2335,46 +2376,46 @@ sysctl_vm_object_list(SYSCTL_HANDLER_ARGS)
 * approximation of the system anyway.
 */
if (vm_page_active(m))
-   kvo.kvo_active++;
+   kvo->kvo_active++;
else if (vm_page_inactive(m))
-   kvo.kvo_inactive++;
+ 

Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Eugene Grosbein
On 22.07.2017 15:00, Konstantin Belousov wrote:
> On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote:
>> Also, I've always wondered what load pattern one should have
>> to exhibit real kernel stack problems due to KVA fragmentation
>> and KSTACK_PAGES>2 on i386?
> In fact each stack consumes 3 contigous pages because there is also
> the guard, which catches the double faults.
> 
> You need to use the machine, e.g. to run something that creates and destroys
> kernel threads, while doing something that consumes kernel_arena KVA.
> Plain malloc/zmalloc is enough.

Does an i386 box running PPPoE connection to an ISP (mpd5) plus several
IPSEC tunnels plus PPtP tunnel plus WiFi access point plus
"transmission" torrent client with 2TB UFS volume over GEOM_CACHE
over GEOM_JOURNAL over USB qualify? There are ospfd, racoon,
sendmail, ssh and several periodic cron jobs too.

> In other words, any non-static load would cause fragmentation preventing
> allocations of the kernel stacks for new threads.
> 
>> How can I get ddb backtrace you asked for? I'm not very familiar with ddb.
>> I have serial console to such i386 system.
> 
> bt command for the given thread provides the backtrace.  I have no idea
> how did you obtained the numbers that you show.

Not sure what kernel thread I too to trace... If you just need a name of the 
function:

$ objdump -d vm_object.o | grep -B 8 'sub .*0x...,%esp' |less

3b30 :
3b30:   55  push   %ebp
3b31:   89 e5   mov%esp,%ebp
3b33:   53  push   %ebx
3b34:   57  push   %edi
3b35:   56  push   %esi
3b36:   83 e4 f8and$0xfff8,%esp
3b39:   81 ec 30 05 00 00   sub$0x530,%esp

It uses stack for pretty large struct kinfo_vmobject (includes char 
kvo_path[PATH_MAX])
and several others.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Konstantin Belousov
On Sat, Jul 22, 2017 at 02:40:59PM +0700, Eugene Grosbein wrote:
> Also, I've always wondered what load pattern one should have
> to exhibit real kernel stack problems due to KVA fragmentation
> and KSTACK_PAGES>2 on i386?
In fact each stack consumes 3 contigous pages because there is also
the guard, which catches the double faults.

You need to use the machine, e.g. to run something that creates and destroys
kernel threads, while doing something that consumes kernel_arena KVA.
Plain malloc/zmalloc is enough.

In other words, any non-static load would cause fragmentation preventing
allocations of the kernel stacks for new threads.

> How can I get ddb backtrace you asked for? I'm not very familiar with ddb.
> I have serial console to such i386 system.

bt command for the given thread provides the backtrace.  I have no idea
how did you obtained the numbers that you show.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Eugene Grosbein
22.07.2017 14:05, Konstantin Belousov wrote:

> On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote:
>> Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476
> 
> I strongly disagree with the idea of increasing the default kernel
> stack size, it will cause systematic problems for all users instead of
> current state where some workloads are problematic.  Finding contig
> KVA ranges for larger stacks on KVA-starved architectures is not going
> to work.

My practice shows that increase of default kernel stack size for i386 system
using IPSEC and ZFS with compression and KVA_PAGES=512/KSTACK_PAGES=4 does work.
No stack-relates problems observed with such parametes.

Contrary, problems quickly arise if one does not increase default kernel stack 
size
for such i386 system. I use several such systems for years.

We have src/UPDATING entries 20121223 and 20150728 stating the same.
Those are linked to Errata Notes to every release since 10.2 as open issues.
How many releases are we going to keep this "open"?

Also, I've always wondered what load pattern one should have
to exhibit real kernel stack problems due to KVA fragmentation
and KSTACK_PAGES>2 on i386?
 
> The real solution is to move allocations from stack to heap, one by one.

That was not done since 10.2-RELEASE and I see that this only getting worse.
 
> You claimed that vm/vm_object.o consumes 1.5K of stack, can you show
> the ddb backtrace of this situation ?

These data were collected with machine object code inspection and
only some of numbers were verified by hand. I admit there may be some false 
positives.

How can I get ddb backtrace you asked for? I'm not very familiar with ddb.
I have serial console to such i386 system.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Konstantin Belousov
On Sat, Jul 22, 2017 at 12:51:01PM +0700, Eugene Grosbein wrote:
> Also, there is https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219476

I strongly disagree with the idea of increasing the default kernel
stack size, it will cause systematic problems for all users instead of
current state where some workloads are problematic.  Finding contig
KVA ranges for larger stacks on KVA-starved architectures is not going
to work.

The real solution is to move allocations from stack to heap, one by one.

You claimed that vm/vm_object.o consumes 1.5K of stack, can you show
the ddb backtrace of this situation ?
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: stable/11 r321349 crashing immediately

2017-07-22 Thread Konstantin Belousov
On Fri, Jul 21, 2017 at 10:42:42PM -0700, Don Lewis wrote:
> Your best bet for a quick workaround for the stack overflow would be to
> rebuild the kernel with a larger value of KSTACK_PAGES.  You can find
> teh default in /usr/src/sys//conf/NOTES.
Or set the tunable kern.kstack_pages to the desired number of pages
from the loader prompt.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"