Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Op 12-07-11 13:40, Jim Klimov schreef: Even if I batch background RM's so a hundred processes hang and then they all at once complete in a minute or two. Hmmm. I only run one rm process at a time. You think running more processes at the same time would be faster? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
2011-07-14 11:54, Frank Van Damme пишет: Op 12-07-11 13:40, Jim Klimov schreef: Even if I batch background RM's so a hundred processes hang and then they all at once complete in a minute or two. Hmmm. I only run one rm process at a time. You think running more processes at the same time would be faster? Yes, quite often it seems so. Whenever my slow dcpool decides to accept a write, it processes a hundred pending deletions instead of one ;) Even so, it took quite a few pool or iscsi hangs and then reboots of both server and client, and about a week overall, to remove a 50Gb dir with 400k small files from a deduped pool served over iscsi from a volume in a physical pool. Just completed this night ;) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Op 14-07-11 12:28, Jim Klimov schreef: Yes, quite often it seems so. Whenever my slow dcpool decides to accept a write, it processes a hundred pending deletions instead of one ;) Even so, it took quite a few pool or iscsi hangs and then reboots of both server and client, and about a week overall, to remove a 50Gb dir with 400k small files from a deduped pool served over iscsi from a volume in a physical pool. Just completed this night ;) It seems counter-intuitive - you'd say: concurrent disk access makes things only slower - , but it turns out to be true. I'm deleting a dozen times faster than before. How completely ridiculous. Thank you :-) -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
2011-07-14 15:48, Frank Van Damme пишет: It seems counter-intuitive - you'd say: concurrent disk access makes things only slower - , but it turns out to be true. I'm deleting a dozen times faster than before. How completely ridiculous. Thank you :-) Well, look at it this way: it is not only about singular disk accesses (i.e. unlike other FSes, you do not in-place modify a directory entry), with ZFS COW it is about rewriting a tree of block pointers, with any new writes going into free (unreferenced ATM) disk blocks anyway. So by hoarding writes you have a chance to reduce mechanical IOPS required for your tasks. Until you run out of RAM ;) Just in case it helps, to quickly fire up removals of the specific directory after yet another reboot of the box, and not overwhelm it with hundreds of thousands queued rmprocesses either, I made this script as /bin/RM: === #!/bin/sh SLEEP=10 [ x$1 != x ] SLEEP=$1 A=0 # To rm small files: find ... -size -10 find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do du -hs $LINE rm -f $LINE A=$(($A+1)) [ $A -ge 100 ] ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef | grep -wc rm; done date ) A=`ps -ef | grep -wc rm` done ; date === Essentially, after firing up 100 rm attempts it waits for the rm process count to go below 50, then goes on. Sizing may vary between systems, phase of the moon and computer's attitude. Sometimes I had 700 processes stacked and processed quickly. Sometimes it hung on 50... HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
um, this is what xargs -P is for ... -- Dan. On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote: 2011-07-14 15:48, Frank Van Damme ?: It seems counter-intuitive - you'd say: concurrent disk access makes things only slower - , but it turns out to be true. I'm deleting a dozen times faster than before. How completely ridiculous. Thank you :-) Well, look at it this way: it is not only about singular disk accesses (i.e. unlike other FSes, you do not in-place modify a directory entry), with ZFS COW it is about rewriting a tree of block pointers, with any new writes going into free (unreferenced ATM) disk blocks anyway. So by hoarding writes you have a chance to reduce mechanical IOPS required for your tasks. Until you run out of RAM ;) Just in case it helps, to quickly fire up removals of the specific directory after yet another reboot of the box, and not overwhelm it with hundreds of thousands queued rmprocesses either, I made this script as /bin/RM: === #!/bin/sh SLEEP=10 [ x$1 != x ] SLEEP=$1 A=0 # To rm small files: find ... -size -10 find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do du -hs $LINE rm -f $LINE A=$(($A+1)) [ $A -ge 100 ] ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef | grep -wc rm; done date ) A=`ps -ef | grep -wc rm` done ; date === Essentially, after firing up 100 rm attempts it waits for the rm process count to go below 50, then goes on. Sizing may vary between systems, phase of the moon and computer's attitude. Sometimes I had 700 processes stacked and processed quickly. Sometimes it hung on 50... HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss pgprXDuV2KRuK.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey I understand the argument, DDT must be stored in the primary storage pool so you can increase the size of the storage pool without running out of space to hold the DDT... But it's a fatal design flaw as long as you care about performance... If you don't care about performance, you might as well use the netapp and do offline dedup. The point of online dedup is to gain performance. So in ZFS you have to care about the performance. There are only two possible ways to fix the problem. Either ... The DDT must be changed so it can be stored entirely in a designated sequential area of disk, and maintained entirely in RAM, so all DDT reads/writes can be infrequent and serial in nature... This would solve the case of async writes and large sync writes, but would still perform poorly for small sync writes. And it would be memory intensive. But it should perform very nicely given those limitations. ;-) Or ... The DDT stays as it is now, highly scattered small blocks, and there needs to be an option to store it entirely on low latency devices such as dedicated SSD's. Eliminate the need for the DDT to reside on the slow primary storage pool disks. I understand you must consider what happens when the dedicated SSD gets full. The obvious choices would be either (a) dedup turns off whenever the metadatadevice is full or (b) it defaults to writing blocks in the main storage pool. Maybe that could even be a configurable behavior. Either way, there's a very realistic use case here. For some people in some situations, it may be acceptable to say I have 32G mirrored metadatadevice, divided by 137bytes per entry I can dedup up to a maximum 218M unique blocks in pool, and if I estimate 100K average block size that means up to 20T primary pool storage. If I reach that limit, I'll add more metadatadevice. Both of those options would also go a long way toward eliminating the surprise delete performance black hole. Is anyone from Oracle reading this? I understand if you can't say what you're working on and stuff like that. But I am merely hopeful this work isn't going into a black hole... Anyway. Thanks for listening (I hope.) ttyl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
2011-07-15 6:21, Daniel Carosone ?: um, this is what xargs -P is for ... Thanks for the hint. True, I don't often use xargs. However from the man pages, I don't see a -P option on OpenSolaris boxes of different releases, and there is only a -p (prompt) mode. I am not eager to enter yes 40 times ;) The way I had this script in practice, I could enter RM once and it worked till the box hung. Even then, a watchdog script could often have it rebooted without my interaction so it could continue in the next lifetime ;) -- Dan. On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote: 2011-07-14 15:48, Frank Van Damme ?: It seems counter-intuitive - you'd say: concurrent disk access makes things only slower - , but it turns out to be true. I'm deleting a dozen times faster than before. How completely ridiculous. Thank you :-) Well, look at it this way: it is not only about singular disk accesses (i.e. unlike other FSes, you do not in-place modify a directory entry), with ZFS COW it is about rewriting a tree of block pointers, with any new writes going into free (unreferenced ATM) disk blocks anyway. So by hoarding writes you have a chance to reduce mechanical IOPS required for your tasks. Until you run out of RAM ;) Just in case it helps, to quickly fire up removals of the specific directory after yet another reboot of the box, and not overwhelm it with hundreds of thousands queued rmprocesses either, I made this script as /bin/RM: === #!/bin/sh SLEEP=10 [ x$1 != x ] SLEEP=$1 A=0 # To rm small files: find ... -size -10 find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do du -hs $LINE rm -f $LINE A=$(($A+1)) [ $A -ge 100 ] ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef | grep -wc rm; done date ) A=`ps -ef | grep -wc rm` done ; date === Essentially, after firing up 100 rm attempts it waits for the rm process count to go below 50, then goes on. Sizing may vary between systems, phase of the moon and computer's attitude. Sometimes I had 700 processes stacked and processed quickly. Sometimes it hung on 50... HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
On Fri, Jul 15, 2011 at 07:56:25AM +0400, Jim Klimov wrote: 2011-07-15 6:21, Daniel Carosone ?: um, this is what xargs -P is for ... Thanks for the hint. True, I don't often use xargs. However from the man pages, I don't see a -P option on OpenSolaris boxes of different releases, and there is only a -p (prompt) mode. I am not eager to enter yes 40 times ;) you want the /usr/gnu/{bin,share/man} version, at least in this case. -- Dan. pgpItiuUybbdI.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss