Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Slawa Olhovchenkov
On Thu, Jul 12, 2018 at 02:42:29PM +0200, Alexander Leidinger wrote:

> __curthread () at ./machine/pcpu.h:230
> 230 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  __curthread () at ./machine/pcpu.h:230
> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
> #2  0x80485e11 in kern_reboot (howto=260) at  
> /usr/src/sys/kern/kern_shutdown.c:446
> #3  0x804863f3 in vpanic (fmt=, ap=0xfe457870)
>  at /usr/src/sys/kern/kern_shutdown.c:863
> #4  0x80486443 in panic (fmt=) at  
> /usr/src/sys/kern/kern_shutdown.c:790
> #5  0x8075279f in trap_fatal (frame=0xfe457a50,  
> eva=32) at /usr/src/sys/amd64/amd64/trap.c:892
> #6  0x80752812 in trap_pfault (frame=0xfe457a50,  
> usermode=)
>  at /usr/src/sys/amd64/amd64/trap.c:728
> #7  0x80751e1a in trap (frame=0xfe457a50) at  
> /usr/src/sys/amd64/amd64/trap.c:427
> #8  
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
> #10 arc_reclaim_thread (unused=)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4657
> #11 0x8044ca74 in fork_exit (callout=0x81391b90  
> , arg=0x0,
>  frame=0xfe457c00) at /usr/src/sys/kern/kern_fork.c:1057
> #12 
> (kgdb) up 9
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
> 4532lowest +=  
> uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
> (kgdb) list
> 4527int iter = 4;
> 4528int step = 1 << (SPA_MAXBLOCKSHIFT  
> - SPA_MINBLOCKSHIFT - 3);
> 4529int n = (SPA_MAXBLOCKSIZE >>  
> SPA_MINBLOCKSHIFT) - 1;
> 4530
> 4531while (n >= 0) {
> 4532lowest +=  
> uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
> 4533if (lowest >= 0)
> 4534return lowest;
> 4535n -= step;
> 4536if(--iter == 0) {
> (kgdb) print n
> $1 = 32767
> (kgdb) print zio_data_buf_cache[n]
> $2 = (kmem_cache_t *) 0x0
> (kgdb)

Very strange, zio_data_buf_cache[] can't be NULL, as asserted in
zio_init.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Andriy Gapon
On 12/07/2018 15:42, Alexander Leidinger wrote:
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532

Do you have any local modifications to ZFS code?
I cannot find that function.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Alexander Leidinger


Quoting Andriy Gapon  (from Fri, 13 Jul 2018 14:50:48 +0300):


On 12/07/2018 15:42, Alexander Leidinger wrote:

#9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532


Do you have any local modifications to ZFS code?


Yes, this is with https://reviews.freebsd.org/D7538

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbCuLJFJ0ex.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-07-12 Thread Alexander Leidinger


Quoting Alexander Leidinger  (from Mon, 04  
Jun 2018 22:31:08 +0200):


Quoting Slawa Olhovchenkov  (from Sun, 3 Jun 2018  
22:28:14 +0300):



On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:


Quoting Alexander Leidinger  (from Mon, 28
May 2018 09:02:01 +0200):


Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018
01:06:12 +0300):


On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:

> It has been a while since I tried Karl's patch the last time, and I
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it
applied OK (with some fuzz). I will try to reproduce my issue with
the patch.


The behavior changed (or the system was long enough in this state
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,
blocked for 1803003 ticks


Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


Both don'T produce any sensible output:
(kgdb) x/10i 0xf803766db580
0xf803766db580: subb   $0x80,-0x78(%rsi)
0xf803766db584: (bad)
0xf803766db585: (bad)
0xf803766db586: (bad)
0xf803766db587: incl   -0x7f7792(%rax)
0xf803766db58d: (bad)
0xf803766db58e: (bad)
0xf803766db58f: incl   -0x7f7792(%rax)
0xf803766db595: (bad)
0xf803766db596: (bad)


Seems I need to provoke a real kernel dump instead of a textdump for this.


Finally... time to recompile the kernel with crashdump-compress  
support and changing from textdump to normal dump and to install a  
recent gdb from ports...


The dump is with r336194 and the zfs patch as of 20180527.

---snip---
# kgdb -c /var/crash/vmcore.2 /boot/kernel/kernel
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from  
/usr/lib/debug//boot/kernel/kernel.debug...done.

done.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x20
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x81391fbe
stack pointer   = 0x0:0xfe457b10
frame pointer   = 0x0:0xfe457bb0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15 (arc_reclaim_thread)
trap number = 12
panic: page fault
cpuid = 1
time = 1531394214
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4577d0
vpanic() at vpanic+0x1a3/frame 0xfe457830
panic() at panic+0x43/frame 0xfe457890
trap_fatal() at trap_fatal+0x35f/frame 0xfe4578e0
trap_pfault() at trap_pfault+0x62/frame 0xfe457930
trap() at trap+0x2ba/frame 0xfe457a40
calltrap() at calltrap+0x8/frame 0xfe457a40
--- trap 0xc, rip = 0x81391fbe, rsp = 0xfe457b10, rbp  
= 0xfe457bb0 ---

arc_reclaim_thread() at arc_reclaim_thread+0x42e/frame 0xfe457bb0
fork_exit() at fork_exit+0x84/frame 0xfe457bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe457bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 38m3s
Dumping 2378 out of 8037 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at ./machine/pcpu.h:230
230 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0x80485e11 in kern_reboot (howto=260) at  
/usr/src/sys/kern/kern_shutdown.c:446

#3  

Re: Deadlocks / hangs in ZFS

2018-06-04 Thread Alexander Leidinger
Quoting Slawa Olhovchenkov  (from Sun, 3 Jun 2018  
22:28:14 +0300):



On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:


Quoting Alexander Leidinger  (from Mon, 28
May 2018 09:02:01 +0200):

> Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018
> 01:06:12 +0300):
>
>> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
>>
>>> On 05/22, Slawa Olhovchenkov wrote:
 > It has been a while since I tried Karl's patch the last time, and I
 > stopped because it didn't apply to -current anymore at some point.
 > Will what is provided right now in the patch work on -current?

 I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
 I am don't know how to have two distinct patch (for stable and
 current) in one review.
>>>
>>> I'm experiencing these issues sporadically as well, would you mind
>>> to publish this patch for fresh current?
>>
>> Week ago I am adopt and publish patch to fresh current and stable, is
>> adopt need again?
>
> I applied the patch in the review yesterday to rev 333966, it
> applied OK (with some fuzz). I will try to reproduce my issue with
> the patch.

The behavior changed (or the system was long enough in this state
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,
blocked for 1803003 ticks


Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


Both don'T produce any sensible output:
(kgdb) x/10i 0xf803766db580
0xf803766db580: subb   $0x80,-0x78(%rsi)
0xf803766db584: (bad)
0xf803766db585: (bad)
0xf803766db586: (bad)
0xf803766db587: incl   -0x7f7792(%rax)
0xf803766db58d: (bad)
0xf803766db58e: (bad)
0xf803766db58f: incl   -0x7f7792(%rax)
0xf803766db595: (bad)
0xf803766db596: (bad)


Seems I need to provoke a real kernel dump instead of a textdump for this.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpt4lkMicj25.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-06-03 Thread Slawa Olhovchenkov
On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:

> Quoting Alexander Leidinger  (from Mon, 28  
> May 2018 09:02:01 +0200):
> 
> > Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
> > 01:06:12 +0300):
> >
> >> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
> >>
> >>> On 05/22, Slawa Olhovchenkov wrote:
>  > It has been a while since I tried Karl's patch the last time, and I
>  > stopped because it didn't apply to -current anymore at some point.
>  > Will what is provided right now in the patch work on -current?
> 
>  I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
>  I am don't know how to have two distinct patch (for stable and  
>  current) in one review.
> >>>
> >>> I'm experiencing these issues sporadically as well, would you mind
> >>> to publish this patch for fresh current?
> >>
> >> Week ago I am adopt and publish patch to fresh current and stable, is
> >> adopt need again?
> >
> > I applied the patch in the review yesterday to rev 333966, it  
> > applied OK (with some fuzz). I will try to reproduce my issue with  
> > the patch.
> 
> The behavior changed (or the system was long enough in this state  
> without me noticing it). I have a panic now:
> panic: deadlkres: possible deadlock detected for 0xf803766db580,  
> blocked for 1803003 ticks

Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


> I only have the textdump. Is nayone up to debug this? If yes, I switch  
> to normal dumps, just tell me what I shall check for.
> 
> db:0:kdb.enter.panic>  run lockinfo
> db:1:lockinfo> show locks
> No such command; use "help" to list available commands
> db:1:lockinfo>  show alllocks
> No such command; use "help" to list available commands
> db:1:lockinfo>  show lockedvnods
> Locked vnodes
> db:0:kdb.enter.panic>  show pcpu
> cpuid= 6
> dynamic pcpu = 0xfe008f03e840
> curthread= 0xf80370c82000: pid 0 tid 100218 "deadlkres"
> curpcb   = 0xfe0116472cc0
> fpcurthread  = none
> idlethread   = 0xf803700b9580: tid 18 "idle: cpu6"
> curpmap  = 0x80d28448
> tssp = 0x80d96d90
> commontssp   = 0x80d96d90
> rsp0 = 0xfe0116472cc0
> gs32p= 0x80d9d9c8
> ldt  = 0x80d9da08
> tss  = 0x80d9d9f8
> db:0:kdb.enter.panic>  bt
> Tracing pid 0 tid 100218 td 0xf80370c82000
> kdb_enter() at kdb_enter+0x3b/frame 0xfe0116472aa0
> vpanic() at vpanic+0x1c0/frame 0xfe0116472b00
> panic() at panic+0x43/frame 0xfe0116472b60
> deadlkres() at deadlkres+0x3a6/frame 0xfe0116472bb0
> fork_exit() at fork_exit+0x84/frame 0xfe0116472bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe0116472bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> 
> 
> Bye,
> Alexander.
> 
> -- 
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-06-03 Thread Alexander Leidinger
Quoting Alexander Leidinger  (from Mon, 28  
May 2018 09:02:01 +0200):


Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
01:06:12 +0300):



On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:

> It has been a while since I tried Karl's patch the last time, and I
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and  
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it  
applied OK (with some fuzz). I will try to reproduce my issue with  
the patch.


The behavior changed (or the system was long enough in this state  
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,  
blocked for 1803003 ticks


I only have the textdump. Is nayone up to debug this? If yes, I switch  
to normal dumps, just tell me what I shall check for.


db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid= 6
dynamic pcpu = 0xfe008f03e840
curthread= 0xf80370c82000: pid 0 tid 100218 "deadlkres"
curpcb   = 0xfe0116472cc0
fpcurthread  = none
idlethread   = 0xf803700b9580: tid 18 "idle: cpu6"
curpmap  = 0x80d28448
tssp = 0x80d96d90
commontssp   = 0x80d96d90
rsp0 = 0xfe0116472cc0
gs32p= 0x80d9d9c8
ldt  = 0x80d9da08
tss  = 0x80d9d9f8
db:0:kdb.enter.panic>  bt
Tracing pid 0 tid 100218 td 0xf80370c82000
kdb_enter() at kdb_enter+0x3b/frame 0xfe0116472aa0
vpanic() at vpanic+0x1c0/frame 0xfe0116472b00
panic() at panic+0x43/frame 0xfe0116472b60
deadlkres() at deadlkres+0x3a6/frame 0xfe0116472bb0
fork_exit() at fork_exit+0x84/frame 0xfe0116472bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0116472bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpWdWfEqHP6I.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-28 Thread Slawa Olhovchenkov
On Mon, May 28, 2018 at 09:02:01AM +0200, Alexander Leidinger wrote:

> Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
> 01:06:12 +0300):
> 
> > On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
> >
> >> On 05/22, Slawa Olhovchenkov wrote:
> >> > > It has been a while since I tried Karl's patch the last time, and I
> >> > > stopped because it didn't apply to -current anymore at some point.
> >> > > Will what is provided right now in the patch work on -current?
> >> >
> >> > I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> >> > I am don't know how to have two distinct patch (for stable and  
> >> current) in one review.
> >>
> >> I'm experiencing these issues sporadically as well, would you mind
> >> to publish this patch for fresh current?
> >
> > Week ago I am adopt and publish patch to fresh current and stable, is
> > adopt need again?
> 
> I applied the patch in the review yesterday to rev 333966, it applied  
> OK (with some fuzz). I will try to reproduce my issue with the patch.
> 
> Some thoughts I had after looking a little bit at the output of top...  
> half of the RAM of my machine is in use, the other half is listed as  
> free. Swap gets used while there is plenty of free RAM. I have NUMA in  
> my kernel (it's 2 socket Xeon system). I don't see any NUMA specific  
> code in the diff (and I don't expect something there), but could it be  
> that some NUMA related behavior comes into play here too? Does it make  
> sense to try without NUMA in the kernel?

Good question, NUMA in FreeBSD too new, nobody know it.
For Linux, some effectt exists: exhaust all memory in one NUMA domain
can cause memory deficit (swap/allocation failure/etc) simultaneous
with many free memory in other NUMA domain.

Yes, try w/o NUMA, this is may be interesting for NUMA developers.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-28 Thread Alexander Leidinger
Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
01:06:12 +0300):



On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:
> > It has been a while since I tried Karl's patch the last time, and I
> > stopped because it didn't apply to -current anymore at some point.
> > Will what is provided right now in the patch work on -current?
>
> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> I am don't know how to have two distinct patch (for stable and  
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it applied  
OK (with some fuzz). I will try to reproduce my issue with the patch.


Some thoughts I had after looking a little bit at the output of top...  
half of the RAM of my machine is in use, the other half is listed as  
free. Swap gets used while there is plenty of free RAM. I have NUMA in  
my kernel (it's 2 socket Xeon system). I don't see any NUMA specific  
code in the diff (and I don't expect something there), but could it be  
that some NUMA related behavior comes into play here too? Does it make  
sense to try without NUMA in the kernel?


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpLxWVYkIy6R.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-27 Thread Slawa Olhovchenkov
On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:

> On 05/22, Slawa Olhovchenkov wrote:
> > > It has been a while since I tried Karl's patch the last time, and I  
> > > stopped because it didn't apply to -current anymore at some point.
> > > Will what is provided right now in the patch work on -current?
> > 
> > I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> > I am don't know how to have two distinct patch (for stable and current) in 
> > one review.
>  
> I'm experiencing these issues sporadically as well, would you mind
> to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-27 Thread Kirill Ponomarev
On 05/22, Slawa Olhovchenkov wrote:
> > It has been a while since I tried Karl's patch the last time, and I  
> > stopped because it didn't apply to -current anymore at some point.
> > Will what is provided right now in the patch work on -current?
> 
> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> I am don't know how to have two distinct patch (for stable and current) in 
> one review.
 
I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


signature.asc
Description: PGP signature


Re: Deadlocks / hangs in ZFS

2018-05-26 Thread Alexander Leidinger


Quoting Steve Wills  (from Tue, 22 May 2018  
08:17:00 -0400):


I may be seeing similar issues. Have you tried leaving top -SHa  
running and seeing what threads are using CPU when it hangs? I did  
and saw pid 17 [zfskern{txg_thread_enter}] using lots of CPU but no  
disk activity happening. Do you see similar?


For me it is a different zfs process/kthread, l2arc_feed_thread.  
Please note that there is still 31 GB free, so it doesn't look lie  
resource exhaustion. What I consider strange is the swap usage. I  
watched the system and it started to use swap while there were >30 GB  
listed as free (in/out rates visible from time to time, and plenty of  
RAM free... ???).


last pid: 93392;  load averages:  0.16,  0.44,  1.03
  up 1+15:36:34  22:35:45

1509 processes:17 running, 1392 sleeping, 3 zombie, 97 waiting
CPU:  0.1% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.9% idle
Mem: 597M Active, 1849M Inact, 6736K Laundry, 25G Wired, 31G Free
ARC: 20G Total, 9028M MFU, 6646M MRU, 2162M Anon, 337M Header, 1935M Other
 14G Compressed, 21G Uncompressed, 1.53:1 Ratio
Swap: 4096M Total, 1640M Used, 2455M Free, 40% Inuse

  PID JID USERNAME  PRI NICE   SIZERES STATE   C   TIME
 WCPU COMMAND
   10   0 root  155 ki31 0K   256K CPU11  35.4H  
100.00% [idle{idle: cpu1}]
   10   0 root  155 ki31 0K   256K CPU11  11  35.2H  
100.00% [idle{idle: cpu11}]
   10   0 root  155 ki31 0K   256K CPU33  35.2H  
100.00% [idle{idle: cpu3}]
   10   0 root  155 ki31 0K   256K CPU15  15  35.1H  
100.00% [idle{idle: cpu15}]
   10   0 root  155 ki31 0K   256K RUN 9  35.1H  
100.00% [idle{idle: cpu9}]
   10   0 root  155 ki31 0K   256K CPU55  35.0H  
100.00% [idle{idle: cpu5}]
   10   0 root  155 ki31 0K   256K CPU14  14  35.0H  
100.00% [idle{idle: cpu14}]
   10   0 root  155 ki31 0K   256K CPU00  35.8H   
99.12% [idle{idle: cpu0}]
   10   0 root  155 ki31 0K   256K CPU66  35.3H   
98.79% [idle{idle: cpu6}]
   10   0 root  155 ki31 0K   256K CPU88  35.1H   
98.31% [idle{idle: cpu8}]
   10   0 root  155 ki31 0K   256K CPU12  12  35.0H   
97.24% [idle{idle: cpu12}]
   10   0 root  155 ki31 0K   256K CPU44  35.4H   
96.71% [idle{idle: cpu4}]
   10   0 root  155 ki31 0K   256K CPU10  10  35.0H   
92.37% [idle{idle: cpu10}]
   10   0 root  155 ki31 0K   256K CPU77  35.2H   
92.20% [idle{idle: cpu7}]
   10   0 root  155 ki31 0K   256K CPU13  13  35.1H   
91.90% [idle{idle: cpu13}]
   10   0 root  155 ki31 0K   256K CPU22  35.4H   
90.97% [idle{idle: cpu2}]
   11   0 root  -60- 0K   816K WAIT0  15:08
0.82% [intr{swi4: clock (0)}]
   31   0 root  -16- 0K80K pwait   0  44:54
0.60% [pagedaemon{dom0}]
45453   0 root   200 16932K  7056K CPU99   4:12
0.24% top -SHaj
   24   0 root   -8- 0K   256K l2arc_  0   4:12
0.21% [zfskern{l2arc_feed_thread}]
 2375   0 root   200 16872K  6868K select 11   3:52
0.20% top -SHua
 7007  12235 200 18017M   881M uwait  12   0:00
0.19% [java{ESH-thingHandler-35}]
   32   0 root  -16- 0K16K psleep 15   5:03
0.11% [vmdaemon]
41037   0 netchild   270 18036K  9136K select  4   2:20
0.09% tmux: server (/tmp/tmux-1001/default) (t
   36   0 root  -16- 0K16K -   6   2:02
0.09% [racctd]
 7007  12235 200 18017M   881M uwait   9   1:24
0.07% [java{java}]
 4746   0 root   200 13020K  3792K nanslp  8   0:52
0.05% zpool iostat space 1
0   0 root  -76- 0K 10304K -   4   0:16
0.05% [kernel{if_io_tqg_4}]
 5550   8933 200  2448M   607M uwait   8   0:41
0.03% [java{java}]
 5550   8933 200  2448M   607M uwait  13   0:03
0.03% [java{Timer-1}]
 7007  12235 200 18017M   881M uwait   0   0:39
0.02% [java{java}]
 5655   8560 200 21524K  4840K select  6   0:21
0.02% /usr/local/sbin/hald{hald}
   30   0 root  -16- 0K16K -   4   0:25
0.01% [rand_harvestq]
 1259   0 root   200 18780K 18860K select 14   0:19
0.01% /usr/sbin/ntpd -c /etc/ntp.conf -p /var/
0   0 root  -76- 0K 10304K -  12   0:19
0.01% [kernel{if_config_tqg_0}]
   31   0 root  -16- 0K80K psleep  0   0:38
0.01% [pagedaemon{dom1}]
0   0 root  -76- 0K 10304K -   5   0:04
0.01% [kernel{if_io_tqg_5}]
 7007  12235 200 18017M   881M uwait   1   0:16
0.01% 

Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Slawa Olhovchenkov
On Tue, May 22, 2018 at 04:16:32PM +0200, Alexander Leidinger wrote:

> 
> Quoting Slawa Olhovchenkov  (from Tue, 22 May 2018  
> 15:29:24 +0300):
> 
> > On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:
> >
> >> I may be seeing similar issues. Have you tried leaving top -SHa running
> >> and seeing what threads are using CPU when it hangs? I did and saw pid
> >> 17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
> >> happening. Do you see similar?
> 
> I will try and report back.
> 
> > Can you try https://reviews.freebsd.org/D7538 and report?
> 
> The patch tells it is against -STABLE, we're talking -current here.

ZFS don't changes this.

> It has been a while since I tried Karl's patch the last time, and I  
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and current) in one 
review.

> As a data point, the system I talk about in the start of the thread  
> has 64 GB RAM and the ARC is not limited via sysctl.

Currently vanlia ARC poorly limited via sysctl. After abd extra.
May be interesting test

./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:boolean_t 
zfs_abd_scatter_enabled = B_FALSE;

(no sysctl for change this exist)
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Alexander Leidinger


Quoting Slawa Olhovchenkov  (from Tue, 22 May 2018  
15:29:24 +0300):



On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:


I may be seeing similar issues. Have you tried leaving top -SHa running
and seeing what threads are using CPU when it hangs? I did and saw pid
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
happening. Do you see similar?


I will try and report back.


Can you try https://reviews.freebsd.org/D7538 and report?


The patch tells it is against -STABLE, we're talking -current here.
It has been a while since I tried Karl's patch the last time, and I  
stopped because it didn't apply to -current anymore at some point.

Will what is provided right now in the patch work on -current?

As a data point, the system I talk about in the start of the thread  
has 64 GB RAM and the ARC is not limited via sysctl.




Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpJhsT1lSPKB.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Andrea Venturoli

On 05/22/18 10:17, Alexander Leidinger wrote:

Hi,

does someone else experience deadlocks / hangs in ZFS?


Yes, in conjunction with Poudriere, probably when it builds/activates jails.
Not sure this is the same problem you are seeing.

 bye
av.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Luciano Mannucci
On Tue, 22 May 2018 10:17:49 +0200
Alexander Leidinger  wrote:

> does someone else experience deadlocks / hangs in ZFS?
I did experience ZFS hangs on heavy load on relatively big iron (using
rsync, in my case). Theh was cured by reducing the amount of available
RAM to the zfs caching mechanism. Parameters in /boot/loader.conf
vfs.zfs.vdev.cache.size and vfs.zfs.arc_max may be your friends.
On a 16G machine not showing the syptoms anymore I have set:

kern.maxusers="4096"
vfs.zfs.vdev.cache.size="5G"
vfs.zfs.arc_min="122880"
vfs.zfs.arc_max="983040"

hope it helps,

Luciano.
-- 
 /"\ /Via A. Salaino, 7 - 20144 Milano (Italy)
 \ /  ASCII RIBBON CAMPAIGN / PHONE : +39 2 485781 FAX: +39 2 48578250
  X   AGAINST HTML MAIL/  E-MAIL: posthams...@sublink.sublink.org
 / \  AND POSTINGS/   WWW: http://www.lesassaie.IT/
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Slawa Olhovchenkov
On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:

> I may be seeing similar issues. Have you tried leaving top -SHa running 
> and seeing what threads are using CPU when it hangs? I did and saw pid 
> 17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity 
> happening. Do you see similar?

Can you try https://reviews.freebsd.org/D7538 and report?

> On 05/22/18 04:17, Alexander Leidinger wrote:
> > Hi,
> > 
> > does someone else experience deadlocks / hangs in ZFS?
> > 
> > What I see is that if on a 2 socket / 4 cores -> 16 threads system I do 
> > a lot in parallel (e.g. updating ports in several jails), then the 
> > system may get into a state were I can login, but any exit (e.g. from 
> > top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C 
> > all updates to get the system into a good shape again, but most of the 
> > times it doesn't.
> > 
> > On another system at the same rev (333966) with a lot less CPUs (and AMD 
> > instead of Intel), I don't see such a behavior.
> > 
> > Bye,
> > Alexander.
> > 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Steve Wills
I may be seeing similar issues. Have you tried leaving top -SHa running 
and seeing what threads are using CPU when it hangs? I did and saw pid 
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity 
happening. Do you see similar?


Steve

On 05/22/18 04:17, Alexander Leidinger wrote:

Hi,

does someone else experience deadlocks / hangs in ZFS?

What I see is that if on a 2 socket / 4 cores -> 16 threads system I do 
a lot in parallel (e.g. updating ports in several jails), then the 
system may get into a state were I can login, but any exit (e.g. from 
top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C 
all updates to get the system into a good shape again, but most of the 
times it doesn't.


On another system at the same rev (333966) with a lot less CPUs (and AMD 
instead of Intel), I don't see such a behavior.


Bye,
Alexander.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"