Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:

 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
 
  I can be wrong.
  As I see ZFS cretate seperate thread for earch txg writing.
  Also for writing to L2ARC.
  As result -- up to several thousands threads created and destoyed per
  second. And hundreds thousands page allocations, zeroing, maping
  unmaping and freeing per seconds. Very high overhead.
  
  In systat -vmstat I see totfr up to 60, prcfr up to 20.
  
  Estimated overhead -- 30% of system time.
  
  Can anybody implement thread and page pool for txg?
 
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?

vfs.zfs.txg.timeout: 5

Only x5 lowering (less in real case with burst writing). And more fragmentation 
on writing and etc.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread aurfalien

On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.
 
 In systat -vmstat I see totfr up to 60, prcfr up to 20.
 
 Estimated overhead -- 30% of system time.
 
 Can anybody implement thread and page pool for txg?

Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?

- aurf
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:

 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.

 In systat -vmstat I see totfr up to 60, prcfr up to 20.

 Estimated overhead -- 30% of system time.

 Can anybody implement thread and page pool for txg?
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
 vfs.zfs.txg.timeout: 5

 Only x5 lowering (less in real case with burst writing). And more 
 fragmentation on writing and etc.
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
From my understanding, increasing the timeout so you are doing fewer
transaction groups, would actually be the way to increase performance,
at the cost of 'bursty' writing and the associated uneven latency.

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread aurfalien

On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote:

 On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
 
 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
 
 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.
 
 In systat -vmstat I see totfr up to 60, prcfr up to 20.
 
 Estimated overhead -- 30% of system time.
 
 Can anybody implement thread and page pool for txg?
 
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
 
 vfs.zfs.txg.timeout: 5
 
 Only x5 lowering (less in real case with burst writing). And more 
 fragmentation on writing and etc.

So leave it default in other words.

Good to know.

- aurf

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 14:25, aurfalien wrote:
 On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote:

 On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:

 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.

 In systat -vmstat I see totfr up to 60, prcfr up to 20.

 Estimated overhead -- 30% of system time.

 Can anybody implement thread and page pool for txg?
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
 vfs.zfs.txg.timeout: 5

 Only x5 lowering (less in real case with burst writing). And more 
 fragmentation on writing and etc.
 So leave it default in other words.

 Good to know.

 - aurf

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
The default is the default for a reason, although the original default
was 30

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:

 On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
  On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
 
  On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
 
  I can be wrong.
  As I see ZFS cretate seperate thread for earch txg writing.
  Also for writing to L2ARC.
  As result -- up to several thousands threads created and destoyed per
  second. And hundreds thousands page allocations, zeroing, maping
  unmaping and freeing per seconds. Very high overhead.
 
  In systat -vmstat I see totfr up to 60, prcfr up to 20.
 
  Estimated overhead -- 30% of system time.
 
  Can anybody implement thread and page pool for txg?
  Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
  vfs.zfs.txg.timeout: 5
 
  Only x5 lowering (less in real case with burst writing). And more 
  fragmentation on writing and etc.
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 From my understanding, increasing the timeout so you are doing fewer
 transaction groups, would actually be the way to increase performance,
 at the cost of 'bursty' writing and the associated uneven latency.

This (increasing the timeout) is dramaticaly decreasing read
performance by very high IO burst.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 16:48, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:

 On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:

 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.

 In systat -vmstat I see totfr up to 60, prcfr up to 20.

 Estimated overhead -- 30% of system time.

 Can anybody implement thread and page pool for txg?
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
 vfs.zfs.txg.timeout: 5

 Only x5 lowering (less in real case with burst writing). And more 
 fragmentation on writing and etc.
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 From my understanding, increasing the timeout so you are doing fewer
 transaction groups, would actually be the way to increase performance,
 at the cost of 'bursty' writing and the associated uneven latency.
 This (increasing the timeout) is dramaticaly decreasing read
 performance by very high IO burst.
It shouldn't affect read performance, except during the flush operations
(every txg.timeout seconds)

If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle

reading quickly, then every txg.timeout seconds (and for maybe longer),
it flushes the entire transaction group (may be 100s of MBs) to the
disk, this high write load may make reads slow until it is finished.

Over the course of a full 60 seconds, this should result in a higher
total read performance, although it will be uneven, slower during the
write cycle.

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 04:51:02PM -0400, Allan Jude wrote:

 On 2013-10-28 16:48, Slawa Olhovchenkov wrote:
  On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:
 
  On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
  On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
 
  On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
 
  I can be wrong.
  As I see ZFS cretate seperate thread for earch txg writing.
  Also for writing to L2ARC.
  As result -- up to several thousands threads created and destoyed per
  second. And hundreds thousands page allocations, zeroing, maping
  unmaping and freeing per seconds. Very high overhead.
 
  In systat -vmstat I see totfr up to 60, prcfr up to 20.
 
  Estimated overhead -- 30% of system time.
 
  Can anybody implement thread and page pool for txg?
  Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
  vfs.zfs.txg.timeout: 5
 
  Only x5 lowering (less in real case with burst writing). And more 
  fragmentation on writing and etc.
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
  From my understanding, increasing the timeout so you are doing fewer
  transaction groups, would actually be the way to increase performance,
  at the cost of 'bursty' writing and the associated uneven latency.
  This (increasing the timeout) is dramaticaly decreasing read
  performance by very high IO burst.
 It shouldn't affect read performance, except during the flush operations
 (every txg.timeout seconds)

Yes, I talk about this time.

 If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle
 
 reading quickly, then every txg.timeout seconds (and for maybe longer),
 it flushes the entire transaction group (may be 100s of MBs) to the
 disk, this high write load may make reads slow until it is finished.

Yes. And read may delayed for some seconds.
This is unacceptable for may case.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:

 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov s...@zxy.spb.ru wrote:
 
  As I see ZFS cretate seperate thread for earch txg writing.
  Also for writing to L2ARC.
  As result -- up to several thousands threads created and destoyed per
  second. And hundreds thousands page allocations, zeroing, maping
  unmaping and freeing per seconds. Very high overhead.
 
 How are you measuring the number of threads being created / destroyed?   This 
 claim seems erroneous given how the ZFS thread pool mechanism actually works 
 (and yes, there are thread pools already).
 
 It would be helpful to both see your measurement methodology and the workload 
 you are using in your tests.

Semi-indirect.
dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }'

After some (2-3) seconds 

  kernel`vnode_destroy_vobject+0xb9
  zfs.ko`zfs_freebsd_reclaim+0x2e
  kernel`VOP_RECLAIM_APV+0x78
  kernel`vgonel+0x134
  kernel`vnlru_free+0x362
  kernel`vnlru_proc+0x61e
  kernel`fork_exit+0x11f
  kernel`0x80cdbfde
 2490

I don't have user process created threads nor do fork/exit.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/28/13 14:32, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:
 
 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov s...@zxy.spb.ru
 wrote:
 
 As I see ZFS cretate seperate thread for earch txg writing. 
 Also for writing to L2ARC. As result -- up to several thousands
 threads created and destoyed per second. And hundreds thousands
 page allocations, zeroing, maping unmaping and freeing per
 seconds. Very high overhead.
 
 How are you measuring the number of threads being created /
 destroyed?   This claim seems erroneous given how the ZFS thread
 pool mechanism actually works (and yes, there are thread pools
 already).
 
 It would be helpful to both see your measurement methodology and
 the workload you are using in your tests.
 
 Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry {
 @traces[stack()] = count(); }'
 
 After some (2-3) seconds
 
 kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e 
 kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 
 kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e 
 kernel`fork_exit+0x11f kernel`0x80cdbfde 2490
 
 I don't have user process created threads nor do fork/exit.

This has nothing to do with fork/exit but does suggest that you are
running of vnodes.  What does sysctl -a | grep vnode say?

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJSbtlWAAoJEJW2GBstM+ns1BgP/iD89HXV3g/c4/GliMG27yB0
WMoWJVDvHmzvRuHBMC6rUIqvyfSaK4EdFDK2jYUIM9qQwWcrSXRXIDBLNE/5MHwl
FgcsaBlFaE17bMwzWrZRCzSb1YMxHXmHG5e10YrGUW8TKkGBVtDD6SIMVK8xg6SQ
5HM2HJR8BVaB65z4S1tLxA+VIqHitUZ0/kTME6X1Z+Y/CwS29F+seXk1DlDYNZM3
W3UVTxJnVwf9HhHRvx/kDtPIPeuIz0O/M5cgtbYq78wjG9Zim6a8SWpuxKeduDoT
CTllgyEidc+vtDiEiksRsja3ATwynzjLGlNribnMKP2U4KMu9qfVUXDse3wwKKXa
+f9Yfzg+fif3r6d/hdlQCtHJhjNlqfjDjCXHHpuTftLU2ONpj9hwKYKOqp6ykmt9
Ok2QziXqBxRMVXJjDAOybv8P1zCAcTpRtvR25bbE7T0M49dvVw51CdAdX8m8nJR+
tX72r+j4BeoNflQWqSsG8P9ao3AuOk6jGgXdtngbbpteyplaVqLragFo8shfUNmY
dWaJp46wUq3gaRBSO/4CkzdyWl99eTTOAW4/Zr78LuYT5wN7FL590AAT3Jmc9N4Z
edZsR2a8VwluLAVuJNqf9odg7MW03xxjKKf9Wm/I112XtFHDg/dCIrdf4cWc5iuA
SvGKci6yZfy5e6hj+ZH5
=aVMu
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 On 10/28/13 14:32, Slawa Olhovchenkov wrote:
  On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:
  
  
  On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov s...@zxy.spb.ru
  wrote:
  
  As I see ZFS cretate seperate thread for earch txg writing. 
  Also for writing to L2ARC. As result -- up to several thousands
  threads created and destoyed per second. And hundreds thousands
  page allocations, zeroing, maping unmaping and freeing per
  seconds. Very high overhead.
  
  How are you measuring the number of threads being created /
  destroyed?   This claim seems erroneous given how the ZFS thread
  pool mechanism actually works (and yes, there are thread pools
  already).
  
  It would be helpful to both see your measurement methodology and
  the workload you are using in your tests.
  
  Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry {
  @traces[stack()] = count(); }'
  
  After some (2-3) seconds
  
  kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e 
  kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 
  kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e 
  kernel`fork_exit+0x11f kernel`0x80cdbfde 2490

0x80cdbfd0 fork_trampoline:   mov%r12,%rdi
0x80cdbfd3 fork_trampoline+3: mov%rbx,%rsi
0x80cdbfd6 fork_trampoline+6: mov%rsp,%rdx
0x80cdbfd9 fork_trampoline+9: callq  0x808db560 fork_exit
0x80cdbfde fork_trampoline+14:jmpq 0x80cdca80 doreti
0x80cdbfe3 fork_trampoline+19:nopw 0x0(%rax,%rax,1)
0x80cdbfe9 fork_trampoline+25:nopl   0x0(%rax)


  I don't have user process created threads nor do fork/exit.
 
 This has nothing to do with fork/exit but does suggest that you are
 running of vnodes.  What does sysctl -a | grep vnode say?

kern.maxvnodes: 1095872
kern.minvnodes: 273968
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 62399
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10680
vfs.freevnodes: 275107
vfs.wantfreevnodes: 273968
vfs.numvnodes: 316321
debug.sizeof.vnode: 504
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/28/13 14:45, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote:
 
 -BEGIN PGP SIGNED MESSAGE- Hash: SHA512
 
 On 10/28/13 14:32, Slawa Olhovchenkov wrote:
 On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard
 wrote:
 
 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov
 s...@zxy.spb.ru wrote:
 
 As I see ZFS cretate seperate thread for earch txg writing.
  Also for writing to L2ARC. As result -- up to several
 thousands threads created and destoyed per second. And
 hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.
 
 How are you measuring the number of threads being created / 
 destroyed?   This claim seems erroneous given how the ZFS
 thread pool mechanism actually works (and yes, there are
 thread pools already).
 
 It would be helpful to both see your measurement methodology
 and the workload you are using in your tests.
 
 Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
 { @traces[stack()] = count(); }'
 
 After some (2-3) seconds
 
 kernel`vnode_destroy_vobject+0xb9
 zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
 kernel`vgonel+0x134 kernel`vnlru_free+0x362
 kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
 kernel`0x80cdbfde 2490
 
 0x80cdbfd0 fork_trampoline:   mov%r12,%rdi 
 0x80cdbfd3 fork_trampoline+3: mov%rbx,%rsi 
 0x80cdbfd6 fork_trampoline+6: mov%rsp,%rdx 
 0x80cdbfd9 fork_trampoline+9: callq  0x808db560
 fork_exit 0x80cdbfde fork_trampoline+14:jmpq
 0x80cdca80 doreti 0x80cdbfe3
 fork_trampoline+19:nopw 0x0(%rax,%rax,1) 
 0x80cdbfe9 fork_trampoline+25:nopl   0x0(%rax)
 
 
 I don't have user process created threads nor do fork/exit.
 
 This has nothing to do with fork/exit but does suggest that you
 are running of vnodes.  What does sysctl -a | grep vnode say?
 
 kern.maxvnodes: 1095872 kern.minvnodes: 273968 
 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 
 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 
 vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
 316321 debug.sizeof.vnode: 504

Try setting vfs.wantfreevnodes to 547936 (double it).

Cheers,
- -- 
Xin LI delp...@delphij.nethttps://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJSbt2BAAoJEJW2GBstM+nsknMP/1QQQ0BHJOu//nG2M2HnYGsQ
bS0he2xdom/GpPuMS3AwGYYwZTWwauGwr3c2K4czW5AzghNDxpVfycobuGeWVvcB
mvyBgkGhxy33nxVuw9hH4FJW62vJc9sJKlgg5QNQhER81OpCBS2AcVv7qNNtj9f6
svZrhu6X28maas+JnwSr5U82gudC1uhHD3h1pZqc+ogFiEgHlQOoL3Pl6SrpTKUZ
WNFnKd9xWQ/28n26r+jzQu9SlTSStKNQcZiCsMO/5TcGs6Ul8Ft2pS0EKYvVMdVF
poPLItT7qa38nM9BXZYNiESIoZpe1coYXX0en6NMTa0q7JerN05tk3d8q31Rn/Hp
toodJuZB8zA+ZN732s295G06j9gDbSj/iFLumV/0s9OHMVT5lgqVjxmPurmjE+ay
nnPrTDpO3Ef45nC6Gb87yN2ML2GG40de5kYWtieLFt5aSJhQjvmDA+zOxdC9orrh
raspOHfgysvSh8ykaS9SsNdzgEJr5TTzbxh91Ft06e65TEdIzX9HhnqxOLBT+lC1
E6OKYVuU1rLjZPPTplCFI922JbyKEhSc73Gu03zPma8cJEzP/ztCxm/Jv0PrV+4b
SzphVQdMbUr2TMKAUIJXcCwHSWhmqmODoDcHoTbC0kBAqyAbaTCZ8PJaR/A8
jxbZvQV8dGjSYu0LVhnT
=3Xt/
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Jordan Hubbard

On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov s...@zxy.spb.ru wrote:

 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.

How are you measuring the number of threads being created / destroyed?   This 
claim seems erroneous given how the ZFS thread pool mechanism actually works 
(and yes, there are thread pools already).

It would be helpful to both see your measurement methodology and the workload 
you are using in your tests.

- Jordan


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote:

  Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
  { @traces[stack()] = count(); }'
  
  After some (2-3) seconds
  
  kernel`vnode_destroy_vobject+0xb9
  zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
  kernel`vgonel+0x134 kernel`vnlru_free+0x362
  kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
  kernel`0x80cdbfde 2490
  
  0x80cdbfd0 fork_trampoline:   mov%r12,%rdi 
  0x80cdbfd3 fork_trampoline+3: mov%rbx,%rsi 
  0x80cdbfd6 fork_trampoline+6: mov%rsp,%rdx 
  0x80cdbfd9 fork_trampoline+9: callq  0x808db560
  fork_exit 0x80cdbfde fork_trampoline+14:jmpq
  0x80cdca80 doreti 0x80cdbfe3
  fork_trampoline+19:nopw 0x0(%rax,%rax,1) 
  0x80cdbfe9 fork_trampoline+25:nopl   0x0(%rax)
  
  
  I don't have user process created threads nor do fork/exit.
  
  This has nothing to do with fork/exit but does suggest that you
  are running of vnodes.  What does sysctl -a | grep vnode say?
  
  kern.maxvnodes: 1095872 kern.minvnodes: 273968 
  vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 
  vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 
  vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
  316321 debug.sizeof.vnode: 504
 
 Try setting vfs.wantfreevnodes to 547936 (double it).

Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr
too). Currently half of peeak traffic and I can't check impact to IRQ
handling.

kern.maxvnodes: 1095872
kern.minvnodes: 547936
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 63134
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10836
vfs.freevnodes: 481873
vfs.wantfreevnodes: 547936
vfs.numvnodes: 517331
debug.sizeof.vnode: 504

Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = 
count(); }'

  kernel`vm_object_deallocate+0x520
  kernel`vm_map_entry_deallocate+0x4c
  kernel`vm_map_process_deferred+0x3d
  kernel`sys_munmap+0x16c
  kernel`amd64_syscall+0x5ea
  kernel`0x80cdbd97
   56

I think this is nginx memory management (allocation|dealocation). Can
I tune malloc to disable free pages?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org