Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Slawa Olhovchenkov
On Thu, Jul 12, 2018 at 02:42:29PM +0200, Alexander Leidinger wrote:

> __curthread () at ./machine/pcpu.h:230
> 230 __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  __curthread () at ./machine/pcpu.h:230
> #1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
> #2  0x80485e11 in kern_reboot (howto=260) at  
> /usr/src/sys/kern/kern_shutdown.c:446
> #3  0x804863f3 in vpanic (fmt=, ap=0xfe457870)
>  at /usr/src/sys/kern/kern_shutdown.c:863
> #4  0x80486443 in panic (fmt=) at  
> /usr/src/sys/kern/kern_shutdown.c:790
> #5  0x8075279f in trap_fatal (frame=0xfe457a50,  
> eva=32) at /usr/src/sys/amd64/amd64/trap.c:892
> #6  0x80752812 in trap_pfault (frame=0xfe457a50,  
> usermode=)
>  at /usr/src/sys/amd64/amd64/trap.c:728
> #7  0x80751e1a in trap (frame=0xfe457a50) at  
> /usr/src/sys/amd64/amd64/trap.c:427
> #8  
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
> #10 arc_reclaim_thread (unused=)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4657
> #11 0x8044ca74 in fork_exit (callout=0x81391b90  
> , arg=0x0,
>  frame=0xfe457c00) at /usr/src/sys/kern/kern_fork.c:1057
> #12 
> (kgdb) up 9
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>  at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
> 4532lowest +=  
> uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
> (kgdb) list
> 4527int iter = 4;
> 4528int step = 1 << (SPA_MAXBLOCKSHIFT  
> - SPA_MINBLOCKSHIFT - 3);
> 4529int n = (SPA_MAXBLOCKSIZE >>  
> SPA_MINBLOCKSHIFT) - 1;
> 4530
> 4531while (n >= 0) {
> 4532lowest +=  
> uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
> 4533if (lowest >= 0)
> 4534return lowest;
> 4535n -= step;
> 4536if(--iter == 0) {
> (kgdb) print n
> $1 = 32767
> (kgdb) print zio_data_buf_cache[n]
> $2 = (kmem_cache_t *) 0x0
> (kgdb)

Very strange, zio_data_buf_cache[] can't be NULL, as asserted in
zio_init.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Andriy Gapon
On 12/07/2018 15:42, Alexander Leidinger wrote:
> #9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
>     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532

Do you have any local modifications to ZFS code?
I cannot find that function.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-07-13 Thread Alexander Leidinger


Quoting Andriy Gapon  (from Fri, 13 Jul 2018 14:50:48 +0300):


On 12/07/2018 15:42, Alexander Leidinger wrote:

#9  0x81391fbe in arc_check_uma_cache (lowest=-1011712)
    at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532


Do you have any local modifications to ZFS code?


Yes, this is with https://reviews.freebsd.org/D7538

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbCuLJFJ0ex.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-07-12 Thread Alexander Leidinger


Quoting Alexander Leidinger  (from Mon, 04  
Jun 2018 22:31:08 +0200):


Quoting Slawa Olhovchenkov  (from Sun, 3 Jun 2018  
22:28:14 +0300):



On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:


Quoting Alexander Leidinger  (from Mon, 28
May 2018 09:02:01 +0200):


Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018
01:06:12 +0300):


On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:

> It has been a while since I tried Karl's patch the last time, and I
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it
applied OK (with some fuzz). I will try to reproduce my issue with
the patch.


The behavior changed (or the system was long enough in this state
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,
blocked for 1803003 ticks


Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


Both don'T produce any sensible output:
(kgdb) x/10i 0xf803766db580
0xf803766db580: subb   $0x80,-0x78(%rsi)
0xf803766db584: (bad)
0xf803766db585: (bad)
0xf803766db586: (bad)
0xf803766db587: incl   -0x7f7792(%rax)
0xf803766db58d: (bad)
0xf803766db58e: (bad)
0xf803766db58f: incl   -0x7f7792(%rax)
0xf803766db595: (bad)
0xf803766db596: (bad)


Seems I need to provoke a real kernel dump instead of a textdump for this.


Finally... time to recompile the kernel with crashdump-compress  
support and changing from textdump to normal dump and to install a  
recent gdb from ports...


The dump is with r336194 and the zfs patch as of 20180527.

---snip---
# kgdb -c /var/crash/vmcore.2 /boot/kernel/kernel
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from  
/usr/lib/debug//boot/kernel/kernel.debug...done.

done.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x20
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x81391fbe
stack pointer   = 0x0:0xfe457b10
frame pointer   = 0x0:0xfe457bb0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 15 (arc_reclaim_thread)
trap number = 12
panic: page fault
cpuid = 1
time = 1531394214
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe4577d0
vpanic() at vpanic+0x1a3/frame 0xfe457830
panic() at panic+0x43/frame 0xfe457890
trap_fatal() at trap_fatal+0x35f/frame 0xfe4578e0
trap_pfault() at trap_pfault+0x62/frame 0xfe457930
trap() at trap+0x2ba/frame 0xfe457a40
calltrap() at calltrap+0x8/frame 0xfe457a40
--- trap 0xc, rip = 0x81391fbe, rsp = 0xfe457b10, rbp  
= 0xfe457bb0 ---

arc_reclaim_thread() at arc_reclaim_thread+0x42e/frame 0xfe457bb0
fork_exit() at fork_exit+0x84/frame 0xfe457bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe457bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 38m3s
Dumping 2378 out of 8037 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

__curthread () at ./machine/pcpu.h:230
230 __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0x80485e11 in kern_reboot (howto=260) at  
/usr/src/sys/kern/kern_shutdown.c:446

#3  0x804863f3

Re: Deadlocks / hangs in ZFS

2018-06-04 Thread Alexander Leidinger
Quoting Slawa Olhovchenkov  (from Sun, 3 Jun 2018  
22:28:14 +0300):



On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:


Quoting Alexander Leidinger  (from Mon, 28
May 2018 09:02:01 +0200):

> Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018
> 01:06:12 +0300):
>
>> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
>>
>>> On 05/22, Slawa Olhovchenkov wrote:
 > It has been a while since I tried Karl's patch the last time, and I
 > stopped because it didn't apply to -current anymore at some point.
 > Will what is provided right now in the patch work on -current?

 I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
 I am don't know how to have two distinct patch (for stable and
 current) in one review.
>>>
>>> I'm experiencing these issues sporadically as well, would you mind
>>> to publish this patch for fresh current?
>>
>> Week ago I am adopt and publish patch to fresh current and stable, is
>> adopt need again?
>
> I applied the patch in the review yesterday to rev 333966, it
> applied OK (with some fuzz). I will try to reproduce my issue with
> the patch.

The behavior changed (or the system was long enough in this state
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,
blocked for 1803003 ticks


Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


Both don'T produce any sensible output:
(kgdb) x/10i 0xf803766db580
0xf803766db580: subb   $0x80,-0x78(%rsi)
0xf803766db584: (bad)
0xf803766db585: (bad)
0xf803766db586: (bad)
0xf803766db587: incl   -0x7f7792(%rax)
0xf803766db58d: (bad)
0xf803766db58e: (bad)
0xf803766db58f: incl   -0x7f7792(%rax)
0xf803766db595: (bad)
0xf803766db596: (bad)


Seems I need to provoke a real kernel dump instead of a textdump for this.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpt4lkMicj25.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-06-03 Thread Slawa Olhovchenkov
On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:

> Quoting Alexander Leidinger  (from Mon, 28  
> May 2018 09:02:01 +0200):
> 
> > Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
> > 01:06:12 +0300):
> >
> >> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
> >>
> >>> On 05/22, Slawa Olhovchenkov wrote:
>  > It has been a while since I tried Karl's patch the last time, and I
>  > stopped because it didn't apply to -current anymore at some point.
>  > Will what is provided right now in the patch work on -current?
> 
>  I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
>  I am don't know how to have two distinct patch (for stable and  
>  current) in one review.
> >>>
> >>> I'm experiencing these issues sporadically as well, would you mind
> >>> to publish this patch for fresh current?
> >>
> >> Week ago I am adopt and publish patch to fresh current and stable, is
> >> adopt need again?
> >
> > I applied the patch in the review yesterday to rev 333966, it  
> > applied OK (with some fuzz). I will try to reproduce my issue with  
> > the patch.
> 
> The behavior changed (or the system was long enough in this state  
> without me noticing it). I have a panic now:
> panic: deadlkres: possible deadlock detected for 0xf803766db580,  
> blocked for 1803003 ticks

Hmm, may be first determinate locked function

addr2line -ie /boot/kernel/kernel 0xf803766db580

or

kgdb
x/10i 0xf803766db580


> I only have the textdump. Is nayone up to debug this? If yes, I switch  
> to normal dumps, just tell me what I shall check for.
> 
> db:0:kdb.enter.panic>  run lockinfo
> db:1:lockinfo> show locks
> No such command; use "help" to list available commands
> db:1:lockinfo>  show alllocks
> No such command; use "help" to list available commands
> db:1:lockinfo>  show lockedvnods
> Locked vnodes
> db:0:kdb.enter.panic>  show pcpu
> cpuid= 6
> dynamic pcpu = 0xfe008f03e840
> curthread= 0xf80370c82000: pid 0 tid 100218 "deadlkres"
> curpcb   = 0xfe0116472cc0
> fpcurthread  = none
> idlethread   = 0xf803700b9580: tid 18 "idle: cpu6"
> curpmap  = 0x80d28448
> tssp = 0x80d96d90
> commontssp   = 0x80d96d90
> rsp0 = 0xfe0116472cc0
> gs32p= 0x80d9d9c8
> ldt  = 0x80d9da08
> tss  = 0x80d9d9f8
> db:0:kdb.enter.panic>  bt
> Tracing pid 0 tid 100218 td 0xf80370c82000
> kdb_enter() at kdb_enter+0x3b/frame 0xfe0116472aa0
> vpanic() at vpanic+0x1c0/frame 0xfe0116472b00
> panic() at panic+0x43/frame 0xfe0116472b60
> deadlkres() at deadlkres+0x3a6/frame 0xfe0116472bb0
> fork_exit() at fork_exit+0x84/frame 0xfe0116472bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe0116472bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> 
> 
> Bye,
> Alexander.
> 
> -- 
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-06-03 Thread Alexander Leidinger
Quoting Alexander Leidinger  (from Mon, 28  
May 2018 09:02:01 +0200):


Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
01:06:12 +0300):



On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:

> It has been a while since I tried Karl's patch the last time, and I
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and  
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it  
applied OK (with some fuzz). I will try to reproduce my issue with  
the patch.


The behavior changed (or the system was long enough in this state  
without me noticing it). I have a panic now:
panic: deadlkres: possible deadlock detected for 0xf803766db580,  
blocked for 1803003 ticks


I only have the textdump. Is nayone up to debug this? If yes, I switch  
to normal dumps, just tell me what I shall check for.


db:0:kdb.enter.panic>  run lockinfo
db:1:lockinfo> show locks
No such command; use "help" to list available commands
db:1:lockinfo>  show alllocks
No such command; use "help" to list available commands
db:1:lockinfo>  show lockedvnods
Locked vnodes
db:0:kdb.enter.panic>  show pcpu
cpuid= 6
dynamic pcpu = 0xfe008f03e840
curthread= 0xf80370c82000: pid 0 tid 100218 "deadlkres"
curpcb   = 0xfe0116472cc0
fpcurthread  = none
idlethread   = 0xf803700b9580: tid 18 "idle: cpu6"
curpmap  = 0x80d28448
tssp = 0x80d96d90
commontssp   = 0x80d96d90
rsp0 = 0xfe0116472cc0
gs32p= 0x80d9d9c8
ldt  = 0x80d9da08
tss  = 0x80d9d9f8
db:0:kdb.enter.panic>  bt
Tracing pid 0 tid 100218 td 0xf80370c82000
kdb_enter() at kdb_enter+0x3b/frame 0xfe0116472aa0
vpanic() at vpanic+0x1c0/frame 0xfe0116472b00
panic() at panic+0x43/frame 0xfe0116472b60
deadlkres() at deadlkres+0x3a6/frame 0xfe0116472bb0
fork_exit() at fork_exit+0x84/frame 0xfe0116472bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0116472bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpWdWfEqHP6I.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-28 Thread Slawa Olhovchenkov
On Mon, May 28, 2018 at 09:02:01AM +0200, Alexander Leidinger wrote:

> Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
> 01:06:12 +0300):
> 
> > On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
> >
> >> On 05/22, Slawa Olhovchenkov wrote:
> >> > > It has been a while since I tried Karl's patch the last time, and I
> >> > > stopped because it didn't apply to -current anymore at some point.
> >> > > Will what is provided right now in the patch work on -current?
> >> >
> >> > I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> >> > I am don't know how to have two distinct patch (for stable and  
> >> current) in one review.
> >>
> >> I'm experiencing these issues sporadically as well, would you mind
> >> to publish this patch for fresh current?
> >
> > Week ago I am adopt and publish patch to fresh current and stable, is
> > adopt need again?
> 
> I applied the patch in the review yesterday to rev 333966, it applied  
> OK (with some fuzz). I will try to reproduce my issue with the patch.
> 
> Some thoughts I had after looking a little bit at the output of top...  
> half of the RAM of my machine is in use, the other half is listed as  
> free. Swap gets used while there is plenty of free RAM. I have NUMA in  
> my kernel (it's 2 socket Xeon system). I don't see any NUMA specific  
> code in the diff (and I don't expect something there), but could it be  
> that some NUMA related behavior comes into play here too? Does it make  
> sense to try without NUMA in the kernel?

Good question, NUMA in FreeBSD too new, nobody know it.
For Linux, some effectt exists: exhaust all memory in one NUMA domain
can cause memory deficit (swap/allocation failure/etc) simultaneous
with many free memory in other NUMA domain.

Yes, try w/o NUMA, this is may be interesting for NUMA developers.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-28 Thread Alexander Leidinger
Quoting Slawa Olhovchenkov  (from Mon, 28 May 2018  
01:06:12 +0300):



On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:


On 05/22, Slawa Olhovchenkov wrote:
> > It has been a while since I tried Karl's patch the last time, and I
> > stopped because it didn't apply to -current anymore at some point.
> > Will what is provided right now in the patch work on -current?
>
> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> I am don't know how to have two distinct patch (for stable and  
current) in one review.


I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?


I applied the patch in the review yesterday to rev 333966, it applied  
OK (with some fuzz). I will try to reproduce my issue with the patch.


Some thoughts I had after looking a little bit at the output of top...  
half of the RAM of my machine is in use, the other half is listed as  
free. Swap gets used while there is plenty of free RAM. I have NUMA in  
my kernel (it's 2 socket Xeon system). I don't see any NUMA specific  
code in the diff (and I don't expect something there), but could it be  
that some NUMA related behavior comes into play here too? Does it make  
sense to try without NUMA in the kernel?


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpLxWVYkIy6R.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-27 Thread Slawa Olhovchenkov
On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:

> On 05/22, Slawa Olhovchenkov wrote:
> > > It has been a while since I tried Karl's patch the last time, and I  
> > > stopped because it didn't apply to -current anymore at some point.
> > > Will what is provided right now in the patch work on -current?
> > 
> > I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> > I am don't know how to have two distinct patch (for stable and current) in 
> > one review.
>  
> I'm experiencing these issues sporadically as well, would you mind
> to publish this patch for fresh current?

Week ago I am adopt and publish patch to fresh current and stable, is
adopt need again?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-27 Thread Kirill Ponomarev
On 05/22, Slawa Olhovchenkov wrote:
> > It has been a while since I tried Karl's patch the last time, and I  
> > stopped because it didn't apply to -current anymore at some point.
> > Will what is provided right now in the patch work on -current?
> 
> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
> I am don't know how to have two distinct patch (for stable and current) in 
> one review.
 
I'm experiencing these issues sporadically as well, would you mind
to publish this patch for fresh current?


signature.asc
Description: PGP signature


Re: Deadlocks / hangs in ZFS

2018-05-26 Thread Alexander Leidinger
raf Lock Monitor }]
12622   2 88 200  1963M   247M uwait   7   0:13
0.01% [mysqld{mysqld}]
27043   0 netchild   200 18964K  9124K select  6   0:01
0.01% sshd: netchild@pts/0 (sshd)
 7007  12235 200 18017M   881M uwait   8   0:10
0.01% [java{openHAB-job-schedul}]
 7007  12235 200 18017M   881M uwait   6   0:10
0.01% [java{openHAB-job-schedul}]




On 05/22/18 04:17, Alexander Leidinger wrote:

Hi,

does someone else experience deadlocks / hangs in ZFS?

What I see is that if on a 2 socket / 4 cores -> 16 threads system  
I do a lot in parallel (e.g. updating ports in several jails), then  
the system may get into a state were I can login, but any exit  
(e.g. from top) or logout of shell blocks somewhere. Sometimes it  
helps to CTRL-C all updates to get the system into a good shape  
again, but most of the times it doesn't.


On another system at the same rev (333966) with a lot less CPUs  
(and AMD instead of Intel), I don't see such a behavior.


Bye,
Alexander.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpyPdO3s21JH.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Slawa Olhovchenkov
On Tue, May 22, 2018 at 04:16:32PM +0200, Alexander Leidinger wrote:

> 
> Quoting Slawa Olhovchenkov  (from Tue, 22 May 2018  
> 15:29:24 +0300):
> 
> > On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:
> >
> >> I may be seeing similar issues. Have you tried leaving top -SHa running
> >> and seeing what threads are using CPU when it hangs? I did and saw pid
> >> 17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
> >> happening. Do you see similar?
> 
> I will try and report back.
> 
> > Can you try https://reviews.freebsd.org/D7538 and report?
> 
> The patch tells it is against -STABLE, we're talking -current here.

ZFS don't changes this.

> It has been a while since I tried Karl's patch the last time, and I  
> stopped because it didn't apply to -current anymore at some point.
> Will what is provided right now in the patch work on -current?

I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
I am don't know how to have two distinct patch (for stable and current) in one 
review.

> As a data point, the system I talk about in the start of the thread  
> has 64 GB RAM and the ARC is not limited via sysctl.

Currently vanlia ARC poorly limited via sysctl. After abd extra.
May be interesting test

./sys/cddl/contrib/opensolaris/uts/common/fs/zfs/abd.c:boolean_t 
zfs_abd_scatter_enabled = B_FALSE;

(no sysctl for change this exist)
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Alexander Leidinger


Quoting Slawa Olhovchenkov  (from Tue, 22 May 2018  
15:29:24 +0300):



On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:


I may be seeing similar issues. Have you tried leaving top -SHa running
and seeing what threads are using CPU when it hangs? I did and saw pid
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity
happening. Do you see similar?


I will try and report back.


Can you try https://reviews.freebsd.org/D7538 and report?


The patch tells it is against -STABLE, we're talking -current here.
It has been a while since I tried Karl's patch the last time, and I  
stopped because it didn't apply to -current anymore at some point.

Will what is provided right now in the patch work on -current?

As a data point, the system I talk about in the start of the thread  
has 64 GB RAM and the ARC is not limited via sysctl.




Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpJhsT1lSPKB.pgp
Description: Digitale PGP-Signatur


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Andrea Venturoli

On 05/22/18 10:17, Alexander Leidinger wrote:

Hi,

does someone else experience deadlocks / hangs in ZFS?


Yes, in conjunction with Poudriere, probably when it builds/activates jails.
Not sure this is the same problem you are seeing.

 bye
av.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Luciano Mannucci
On Tue, 22 May 2018 10:17:49 +0200
Alexander Leidinger  wrote:

> does someone else experience deadlocks / hangs in ZFS?
I did experience ZFS hangs on heavy load on relatively big iron (using
rsync, in my case). Theh was cured by reducing the amount of available
RAM to the zfs caching mechanism. Parameters in /boot/loader.conf
vfs.zfs.vdev.cache.size and vfs.zfs.arc_max may be your friends.
On a 16G machine not showing the syptoms anymore I have set:

kern.maxusers="4096"
vfs.zfs.vdev.cache.size="5G"
vfs.zfs.arc_min="122880"
vfs.zfs.arc_max="983040"

hope it helps,

Luciano.
-- 
 /"\ /Via A. Salaino, 7 - 20144 Milano (Italy)
 \ /  ASCII RIBBON CAMPAIGN / PHONE : +39 2 485781 FAX: +39 2 48578250
  X   AGAINST HTML MAIL/  E-MAIL: posthams...@sublink.sublink.org
 / \  AND POSTINGS/   WWW: http://www.lesassaie.IT/
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Slawa Olhovchenkov
On Tue, May 22, 2018 at 08:17:00AM -0400, Steve Wills wrote:

> I may be seeing similar issues. Have you tried leaving top -SHa running 
> and seeing what threads are using CPU when it hangs? I did and saw pid 
> 17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity 
> happening. Do you see similar?

Can you try https://reviews.freebsd.org/D7538 and report?

> On 05/22/18 04:17, Alexander Leidinger wrote:
> > Hi,
> > 
> > does someone else experience deadlocks / hangs in ZFS?
> > 
> > What I see is that if on a 2 socket / 4 cores -> 16 threads system I do 
> > a lot in parallel (e.g. updating ports in several jails), then the 
> > system may get into a state were I can login, but any exit (e.g. from 
> > top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C 
> > all updates to get the system into a good shape again, but most of the 
> > times it doesn't.
> > 
> > On another system at the same rev (333966) with a lot less CPUs (and AMD 
> > instead of Intel), I don't see such a behavior.
> > 
> > Bye,
> > Alexander.
> > 
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Deadlocks / hangs in ZFS

2018-05-22 Thread Steve Wills
I may be seeing similar issues. Have you tried leaving top -SHa running 
and seeing what threads are using CPU when it hangs? I did and saw pid 
17 [zfskern{txg_thread_enter}] using lots of CPU but no disk activity 
happening. Do you see similar?


Steve

On 05/22/18 04:17, Alexander Leidinger wrote:

Hi,

does someone else experience deadlocks / hangs in ZFS?

What I see is that if on a 2 socket / 4 cores -> 16 threads system I do 
a lot in parallel (e.g. updating ports in several jails), then the 
system may get into a state were I can login, but any exit (e.g. from 
top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C 
all updates to get the system into a good shape again, but most of the 
times it doesn't.


On another system at the same rev (333966) with a lot less CPUs (and AMD 
instead of Intel), I don't see such a behavior.


Bye,
Alexander.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Deadlocks / hangs in ZFS

2018-05-22 Thread Alexander Leidinger

Hi,

does someone else experience deadlocks / hangs in ZFS?

What I see is that if on a 2 socket / 4 cores -> 16 threads system I  
do a lot in parallel (e.g. updating ports in several jails), then the  
system may get into a state were I can login, but any exit (e.g. from  
top) or logout of shell blocks somewhere. Sometimes it helps to CTRL-C  
all updates to get the system into a good shape again, but most of the  
times it doesn't.


On another system at the same rev (333966) with a lot less CPUs (and  
AMD instead of Intel), I don't see such a behavior.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp2EjECMweOa.pgp
Description: Digitale PGP-Signatur