Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:56:17PM -0700, Xin Li wrote:

> >>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
> >>> { @traces[stack()] = count(); }'
> >>> 
> >>> After some (2-3) seconds
> >>> 
> >>> kernel`vnode_destroy_vobject+0xb9
> >>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
> >>> kernel`vgonel+0x134 kernel`vnlru_free+0x362
> >>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
> >>> kernel`0x80cdbfde 2490
> > 
> > 0x80cdbfd0 :   mov%r12,%rdi 
> > 0x80cdbfd3 : mov%rbx,%rsi 
> > 0x80cdbfd6 : mov%rsp,%rdx 
> > 0x80cdbfd9 : callq  0x808db560
> >  0x80cdbfde :jmpq
> > 0x80cdca80  0x80cdbfe3
> > :nopw 0x0(%rax,%rax,1) 
> > 0x80cdbfe9 :nopl   0x0(%rax)
> > 
> > 
> >>> I don't have user process created threads nor do fork/exit.
> >> 
> >> This has nothing to do with fork/exit but does suggest that you
> >> are running of vnodes.  What does sysctl -a | grep vnode say?
> > 
> > kern.maxvnodes: 1095872 kern.minvnodes: 273968 
> > vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 
> > vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 
> > vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
> > 316321 debug.sizeof.vnode: 504
> 
> Try setting vfs.wantfreevnodes to 547936 (double it).

Now fork_trampoline was gone, but I still see prcfr (and zfod/totfr
too). Currently half of peeak traffic and I can't check impact to IRQ
handling.

kern.maxvnodes: 1095872
kern.minvnodes: 547936
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 63134
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10836
vfs.freevnodes: 481873
vfs.wantfreevnodes: 547936
vfs.numvnodes: 517331
debug.sizeof.vnode: 504

Now dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = 
count(); }'

  kernel`vm_object_deallocate+0x520
  kernel`vm_map_entry_deallocate+0x4c
  kernel`vm_map_process_deferred+0x3d
  kernel`sys_munmap+0x16c
  kernel`amd64_syscall+0x5ea
  kernel`0x80cdbd97
   56

I think this is nginx memory management (allocation|dealocation). Can
I tune malloc to disable free pages?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Jordan Hubbard

On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov  wrote:

> As I see ZFS cretate seperate thread for earch txg writing.
> Also for writing to L2ARC.
> As result -- up to several thousands threads created and destoyed per
> second. And hundreds thousands page allocations, zeroing, maping
> unmaping and freeing per seconds. Very high overhead.

How are you measuring the number of threads being created / destroyed?   This 
claim seems erroneous given how the ZFS thread pool mechanism actually works 
(and yes, there are thread pools already).

It would be helpful to both see your measurement methodology and the workload 
you are using in your tests.

- Jordan


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/28/13 14:45, Slawa Olhovchenkov wrote:
> On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote:
> 
>> -BEGIN PGP SIGNED MESSAGE- Hash: SHA512
>> 
>> On 10/28/13 14:32, Slawa Olhovchenkov wrote:
>>> On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard
>>> wrote:
>>> 
 
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov
  wrote:
 
> As I see ZFS cretate seperate thread for earch txg writing.
>  Also for writing to L2ARC. As result -- up to several
> thousands threads created and destoyed per second. And
> hundreds thousands page allocations, zeroing, maping
> unmaping and freeing per seconds. Very high overhead.
 
 How are you measuring the number of threads being created / 
 destroyed?   This claim seems erroneous given how the ZFS
 thread pool mechanism actually works (and yes, there are
 thread pools already).
 
 It would be helpful to both see your measurement methodology
 and the workload you are using in your tests.
>>> 
>>> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry
>>> { @traces[stack()] = count(); }'
>>> 
>>> After some (2-3) seconds
>>> 
>>> kernel`vnode_destroy_vobject+0xb9
>>> zfs.ko`zfs_freebsd_reclaim+0x2e kernel`VOP_RECLAIM_APV+0x78
>>> kernel`vgonel+0x134 kernel`vnlru_free+0x362
>>> kernel`vnlru_proc+0x61e kernel`fork_exit+0x11f
>>> kernel`0x80cdbfde 2490
> 
> 0x80cdbfd0 :   mov%r12,%rdi 
> 0x80cdbfd3 : mov%rbx,%rsi 
> 0x80cdbfd6 : mov%rsp,%rdx 
> 0x80cdbfd9 : callq  0x808db560
>  0x80cdbfde :jmpq
> 0x80cdca80  0x80cdbfe3
> :nopw 0x0(%rax,%rax,1) 
> 0x80cdbfe9 :nopl   0x0(%rax)
> 
> 
>>> I don't have user process created threads nor do fork/exit.
>> 
>> This has nothing to do with fork/exit but does suggest that you
>> are running of vnodes.  What does sysctl -a | grep vnode say?
> 
> kern.maxvnodes: 1095872 kern.minvnodes: 273968 
> vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 
> vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 
> vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes:
> 316321 debug.sizeof.vnode: 504

Try setting vfs.wantfreevnodes to 547936 (double it).

Cheers,
- -- 
Xin LI https://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJSbt2BAAoJEJW2GBstM+nsknMP/1QQQ0BHJOu//nG2M2HnYGsQ
bS0he2xdom/GpPuMS3AwGYYwZTWwauGwr3c2K4czW5AzghNDxpVfycobuGeWVvcB
mvyBgkGhxy33nxVuw9hH4FJW62vJc9sJKlgg5QNQhER81OpCBS2AcVv7qNNtj9f6
svZrhu6X28maas+JnwSr5U82gudC1uhHD3h1pZqc+ogFiEgHlQOoL3Pl6SrpTKUZ
WNFnKd9xWQ/28n26r+jzQu9SlTSStKNQcZiCsMO/5TcGs6Ul8Ft2pS0EKYvVMdVF
poPLItT7qa38nM9BXZYNiESIoZpe1coYXX0en6NMTa0q7JerN05tk3d8q31Rn/Hp
toodJuZB8zA+ZN732s295G06j9gDbSj/iFLumV/0s9OHMVT5lgqVjxmPurmjE+ay
nnPrTDpO3Ef45nC6Gb87yN2ML2GG40de5kYWtieLFt5aSJhQjvmDA+zOxdC9orrh
raspOHfgysvSh8ykaS9SsNdzgEJr5TTzbxh91Ft06e65TEdIzX9HhnqxOLBT+lC1
E6OKYVuU1rLjZPPTplCFI922JbyKEhSc73Gu03zPma8cJEzP/ztCxm/Jv0PrV+4b
SzphVQdMbUr2TMKAUIJXcCwHSWhmqmODoDcHoTbC0kBAqyAbaTCZ8PJaR/A8
jxbZvQV8dGjSYu0LVhnT
=3Xt/
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> On 10/28/13 14:32, Slawa Olhovchenkov wrote:
> > On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:
> > 
> >> 
> >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov 
> >> wrote:
> >> 
> >>> As I see ZFS cretate seperate thread for earch txg writing. 
> >>> Also for writing to L2ARC. As result -- up to several thousands
> >>> threads created and destoyed per second. And hundreds thousands
> >>> page allocations, zeroing, maping unmaping and freeing per
> >>> seconds. Very high overhead.
> >> 
> >> How are you measuring the number of threads being created /
> >> destroyed?   This claim seems erroneous given how the ZFS thread
> >> pool mechanism actually works (and yes, there are thread pools
> >> already).
> >> 
> >> It would be helpful to both see your measurement methodology and
> >> the workload you are using in your tests.
> > 
> > Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry {
> > @traces[stack()] = count(); }'
> > 
> > After some (2-3) seconds
> > 
> > kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e 
> > kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 
> > kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e 
> > kernel`fork_exit+0x11f kernel`0x80cdbfde 2490

0x80cdbfd0 :   mov%r12,%rdi
0x80cdbfd3 : mov%rbx,%rsi
0x80cdbfd6 : mov%rsp,%rdx
0x80cdbfd9 : callq  0x808db560 
0x80cdbfde :jmpq 0x80cdca80 
0x80cdbfe3 :nopw 0x0(%rax,%rax,1)
0x80cdbfe9 :nopl   0x0(%rax)


> > I don't have user process created threads nor do fork/exit.
> 
> This has nothing to do with fork/exit but does suggest that you are
> running of vnodes.  What does sysctl -a | grep vnode say?

kern.maxvnodes: 1095872
kern.minvnodes: 273968
vm.stats.vm.v_vnodepgsout: 0
vm.stats.vm.v_vnodepgsin: 62399
vm.stats.vm.v_vnodeout: 0
vm.stats.vm.v_vnodein: 10680
vfs.freevnodes: 275107
vfs.wantfreevnodes: 273968
vfs.numvnodes: 316321
debug.sizeof.vnode: 504
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Xin Li
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 10/28/13 14:32, Slawa Olhovchenkov wrote:
> On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:
> 
>> 
>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov 
>> wrote:
>> 
>>> As I see ZFS cretate seperate thread for earch txg writing. 
>>> Also for writing to L2ARC. As result -- up to several thousands
>>> threads created and destoyed per second. And hundreds thousands
>>> page allocations, zeroing, maping unmaping and freeing per
>>> seconds. Very high overhead.
>> 
>> How are you measuring the number of threads being created /
>> destroyed?   This claim seems erroneous given how the ZFS thread
>> pool mechanism actually works (and yes, there are thread pools
>> already).
>> 
>> It would be helpful to both see your measurement methodology and
>> the workload you are using in your tests.
> 
> Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry {
> @traces[stack()] = count(); }'
> 
> After some (2-3) seconds
> 
> kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e 
> kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 
> kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e 
> kernel`fork_exit+0x11f kernel`0x80cdbfde 2490
> 
> I don't have user process created threads nor do fork/exit.

This has nothing to do with fork/exit but does suggest that you are
running of vnodes.  What does sysctl -a | grep vnode say?

Cheers,
- -- 
Xin LI https://www.delphij.net/
FreeBSD - The Power to Serve!   Live free or die
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJSbtlWAAoJEJW2GBstM+ns1BgP/iD89HXV3g/c4/GliMG27yB0
WMoWJVDvHmzvRuHBMC6rUIqvyfSaK4EdFDK2jYUIM9qQwWcrSXRXIDBLNE/5MHwl
FgcsaBlFaE17bMwzWrZRCzSb1YMxHXmHG5e10YrGUW8TKkGBVtDD6SIMVK8xg6SQ
5HM2HJR8BVaB65z4S1tLxA+VIqHitUZ0/kTME6X1Z+Y/CwS29F+seXk1DlDYNZM3
W3UVTxJnVwf9HhHRvx/kDtPIPeuIz0O/M5cgtbYq78wjG9Zim6a8SWpuxKeduDoT
CTllgyEidc+vtDiEiksRsja3ATwynzjLGlNribnMKP2U4KMu9qfVUXDse3wwKKXa
+f9Yfzg+fif3r6d/hdlQCtHJhjNlqfjDjCXHHpuTftLU2ONpj9hwKYKOqp6ykmt9
Ok2QziXqBxRMVXJjDAOybv8P1zCAcTpRtvR25bbE7T0M49dvVw51CdAdX8m8nJR+
tX72r+j4BeoNflQWqSsG8P9ao3AuOk6jGgXdtngbbpteyplaVqLragFo8shfUNmY
dWaJp46wUq3gaRBSO/4CkzdyWl99eTTOAW4/Zr78LuYT5wN7FL590AAT3Jmc9N4Z
edZsR2a8VwluLAVuJNqf9odg7MW03xxjKKf9Wm/I112XtFHDg/dCIrdf4cWc5iuA
SvGKci6yZfy5e6hj+ZH5
=aVMu
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote:

> 
> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov  wrote:
> 
> > As I see ZFS cretate seperate thread for earch txg writing.
> > Also for writing to L2ARC.
> > As result -- up to several thousands threads created and destoyed per
> > second. And hundreds thousands page allocations, zeroing, maping
> > unmaping and freeing per seconds. Very high overhead.
> 
> How are you measuring the number of threads being created / destroyed?   This 
> claim seems erroneous given how the ZFS thread pool mechanism actually works 
> (and yes, there are thread pools already).
> 
> It would be helpful to both see your measurement methodology and the workload 
> you are using in your tests.

Semi-indirect.
dtrace -n 'fbt:kernel:vm_object_terminate:entry { @traces[stack()] = count(); }'

After some (2-3) seconds 

  kernel`vnode_destroy_vobject+0xb9
  zfs.ko`zfs_freebsd_reclaim+0x2e
  kernel`VOP_RECLAIM_APV+0x78
  kernel`vgonel+0x134
  kernel`vnlru_free+0x362
  kernel`vnlru_proc+0x61e
  kernel`fork_exit+0x11f
  kernel`0x80cdbfde
 2490

I don't have user process created threads nor do fork/exit.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 04:51:02PM -0400, Allan Jude wrote:

> On 2013-10-28 16:48, Slawa Olhovchenkov wrote:
> > On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:
> >
> >> On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
> >>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
> >>>
>  On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
> 
> > I can be wrong.
> > As I see ZFS cretate seperate thread for earch txg writing.
> > Also for writing to L2ARC.
> > As result -- up to several thousands threads created and destoyed per
> > second. And hundreds thousands page allocations, zeroing, maping
> > unmaping and freeing per seconds. Very high overhead.
> >
> > In systat -vmstat I see totfr up to 60, prcfr up to 20.
> >
> > Estimated overhead -- 30% of system time.
> >
> > Can anybody implement thread and page pool for txg?
>  Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
> >>> vfs.zfs.txg.timeout: 5
> >>>
> >>> Only x5 lowering (less in real case with burst writing). And more 
> >>> fragmentation on writing and etc.
> >>> ___
> >>> freebsd-current@freebsd.org mailing list
> >>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> >> >From my understanding, increasing the timeout so you are doing fewer
> >> transaction groups, would actually be the way to increase performance,
> >> at the cost of 'bursty' writing and the associated uneven latency.
> > This (increasing the timeout) is dramaticaly decreasing read
> > performance by very high IO burst.
> It shouldn't affect read performance, except during the flush operations
> (every txg.timeout seconds)

Yes, I talk about this time.

> If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle
> 
> reading quickly, then every txg.timeout seconds (and for maybe longer),
> it flushes the entire transaction group (may be 100s of MBs) to the
> disk, this high write load may make reads slow until it is finished.

Yes. And read may delayed for some seconds.
This is unacceptable for may case.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 16:48, Slawa Olhovchenkov wrote:
> On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:
>
>> On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
>>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
>>>
 On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

> I can be wrong.
> As I see ZFS cretate seperate thread for earch txg writing.
> Also for writing to L2ARC.
> As result -- up to several thousands threads created and destoyed per
> second. And hundreds thousands page allocations, zeroing, maping
> unmaping and freeing per seconds. Very high overhead.
>
> In systat -vmstat I see totfr up to 60, prcfr up to 20.
>
> Estimated overhead -- 30% of system time.
>
> Can anybody implement thread and page pool for txg?
 Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
>>> vfs.zfs.txg.timeout: 5
>>>
>>> Only x5 lowering (less in real case with burst writing). And more 
>>> fragmentation on writing and etc.
>>> ___
>>> freebsd-current@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>> >From my understanding, increasing the timeout so you are doing fewer
>> transaction groups, would actually be the way to increase performance,
>> at the cost of 'bursty' writing and the associated uneven latency.
> This (increasing the timeout) is dramaticaly decreasing read
> performance by very high IO burst.
It shouldn't affect read performance, except during the flush operations
(every txg.timeout seconds)

If you watch with 'gstat' or 'gstat -f ada.$' you should see the cycle

reading quickly, then every txg.timeout seconds (and for maybe longer),
it flushes the entire transaction group (may be 100s of MBs) to the
disk, this high write load may make reads slow until it is finished.

Over the course of a full 60 seconds, this should result in a higher
total read performance, although it will be uneven, slower during the
write cycle.

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 02:28:04PM -0400, Allan Jude wrote:

> On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
> > On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
> >
> >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
> >>
> >>> I can be wrong.
> >>> As I see ZFS cretate seperate thread for earch txg writing.
> >>> Also for writing to L2ARC.
> >>> As result -- up to several thousands threads created and destoyed per
> >>> second. And hundreds thousands page allocations, zeroing, maping
> >>> unmaping and freeing per seconds. Very high overhead.
> >>>
> >>> In systat -vmstat I see totfr up to 60, prcfr up to 20.
> >>>
> >>> Estimated overhead -- 30% of system time.
> >>>
> >>> Can anybody implement thread and page pool for txg?
> >> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
> > vfs.zfs.txg.timeout: 5
> >
> > Only x5 lowering (less in real case with burst writing). And more 
> > fragmentation on writing and etc.
> > ___
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
> >From my understanding, increasing the timeout so you are doing fewer
> transaction groups, would actually be the way to increase performance,
> at the cost of 'bursty' writing and the associated uneven latency.

This (increasing the timeout) is dramaticaly decreasing read
performance by very high IO burst.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 14:25, aurfalien wrote:
> On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote:
>
>> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
>>
>>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
>>>
 I can be wrong.
 As I see ZFS cretate seperate thread for earch txg writing.
 Also for writing to L2ARC.
 As result -- up to several thousands threads created and destoyed per
 second. And hundreds thousands page allocations, zeroing, maping
 unmaping and freeing per seconds. Very high overhead.

 In systat -vmstat I see totfr up to 60, prcfr up to 20.

 Estimated overhead -- 30% of system time.

 Can anybody implement thread and page pool for txg?
>>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
>> vfs.zfs.txg.timeout: 5
>>
>> Only x5 lowering (less in real case with burst writing). And more 
>> fragmentation on writing and etc.
> So leave it default in other words.
>
> Good to know.
>
> - aurf
>
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
The default is the default for a reason, although the original default
was 30

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread aurfalien

On Oct 28, 2013, at 11:16 AM, Slawa Olhovchenkov wrote:

> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
> 
>> 
>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
>> 
>>> I can be wrong.
>>> As I see ZFS cretate seperate thread for earch txg writing.
>>> Also for writing to L2ARC.
>>> As result -- up to several thousands threads created and destoyed per
>>> second. And hundreds thousands page allocations, zeroing, maping
>>> unmaping and freeing per seconds. Very high overhead.
>>> 
>>> In systat -vmstat I see totfr up to 60, prcfr up to 20.
>>> 
>>> Estimated overhead -- 30% of system time.
>>> 
>>> Can anybody implement thread and page pool for txg?
>> 
>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
> 
> vfs.zfs.txg.timeout: 5
> 
> Only x5 lowering (less in real case with burst writing). And more 
> fragmentation on writing and etc.

So leave it default in other words.

Good to know.

- aurf

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Allan Jude
On 2013-10-28 14:16, Slawa Olhovchenkov wrote:
> On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:
>
>> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
>>
>>> I can be wrong.
>>> As I see ZFS cretate seperate thread for earch txg writing.
>>> Also for writing to L2ARC.
>>> As result -- up to several thousands threads created and destoyed per
>>> second. And hundreds thousands page allocations, zeroing, maping
>>> unmaping and freeing per seconds. Very high overhead.
>>>
>>> In systat -vmstat I see totfr up to 60, prcfr up to 20.
>>>
>>> Estimated overhead -- 30% of system time.
>>>
>>> Can anybody implement thread and page pool for txg?
>> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?
> vfs.zfs.txg.timeout: 5
>
> Only x5 lowering (less in real case with burst writing). And more 
> fragmentation on writing and etc.
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>From my understanding, increasing the timeout so you are doing fewer
transaction groups, would actually be the way to increase performance,
at the cost of 'bursty' writing and the associated uneven latency.

-- 
Allan Jude

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread aurfalien

On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:

> I can be wrong.
> As I see ZFS cretate seperate thread for earch txg writing.
> Also for writing to L2ARC.
> As result -- up to several thousands threads created and destoyed per
> second. And hundreds thousands page allocations, zeroing, maping
> unmaping and freeing per seconds. Very high overhead.
> 
> In systat -vmstat I see totfr up to 60, prcfr up to 20.
> 
> Estimated overhead -- 30% of system time.
> 
> Can anybody implement thread and page pool for txg?

Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?

- aurf
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
On Mon, Oct 28, 2013 at 10:45:02AM -0700, aurfalien wrote:

> 
> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov wrote:
> 
> > I can be wrong.
> > As I see ZFS cretate seperate thread for earch txg writing.
> > Also for writing to L2ARC.
> > As result -- up to several thousands threads created and destoyed per
> > second. And hundreds thousands page allocations, zeroing, maping
> > unmaping and freeing per seconds. Very high overhead.
> > 
> > In systat -vmstat I see totfr up to 60, prcfr up to 20.
> > 
> > Estimated overhead -- 30% of system time.
> > 
> > Can anybody implement thread and page pool for txg?
> 
> Would lowering vfs.zfs.txg.timeout be a way to tame or mitigate this?

vfs.zfs.txg.timeout: 5

Only x5 lowering (less in real case with burst writing). And more fragmentation 
on writing and etc.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


ZFS txg implementation flaw

2013-10-28 Thread Slawa Olhovchenkov
I can be wrong.
As I see ZFS cretate seperate thread for earch txg writing.
Also for writing to L2ARC.
As result -- up to several thousands threads created and destoyed per
second. And hundreds thousands page allocations, zeroing, maping
unmaping and freeing per seconds. Very high overhead.

In systat -vmstat I see totfr up to 60, prcfr up to 20.

Estimated overhead -- 30% of system time.

Can anybody implement thread and page pool for txg?
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"