Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 12-07-11 13:40, Jim Klimov schreef:
 Even if I batch background RM's so a hundred processes hang
 and then they all at once complete in a minute or two.

Hmmm. I only run one rm process at a time. You think running more
processes at the same time would be faster?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-14 11:54, Frank Van Damme пишет:

Op 12-07-11 13:40, Jim Klimov schreef:

Even if I batch background RM's so a hundred processes hang
and then they all at once complete in a minute or two.

Hmmm. I only run one rm process at a time. You think running more
processes at the same time would be faster?

Yes, quite often it seems so.
Whenever my slow dcpool decides to accept a write,
it processes a hundred pending deletions instead of one ;)

Even so, it took quite a few pool or iscsi hangs and then
reboots of both server and client, and about a week overall,
to remove a 50Gb dir with 400k small files from a deduped
pool served over iscsi from a volume in a physical pool.

Just completed this night ;)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 14-07-11 12:28, Jim Klimov schreef:

 Yes, quite often it seems so.
 Whenever my slow dcpool decides to accept a write,
 it processes a hundred pending deletions instead of one ;)
 
 Even so, it took quite a few pool or iscsi hangs and then
 reboots of both server and client, and about a week overall,
 to remove a 50Gb dir with 400k small files from a deduped
 pool served over iscsi from a volume in a physical pool.
 
 Just completed this night ;)

It seems counter-intuitive - you'd say: concurrent disk access makes
things only slower - , but it turns out to be true. I'm deleting a dozen
times faster than before. How completely ridiculous.

Thank you :-)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-14 15:48, Frank Van Damme пишет:
It seems counter-intuitive - you'd say: concurrent disk access makes 
things only slower - , but it turns out to be true. I'm deleting a 
dozen times faster than before. How completely ridiculous. Thank you :-)


Well, look at it this way: it is not only about singular disk accesses
(i.e. unlike other FSes, you do not in-place modify a directory entry),
with ZFS COW it is about rewriting a tree of block pointers, with any
new writes going into free (unreferenced ATM) disk blocks anyway.

So by hoarding writes you have a chance to reduce mechanical
IOPS required for your tasks. Until you run out of RAM ;)

Just in case it helps, to quickly fire up removals of the specific 
directory

after yet another reboot of the box, and not overwhelm it with hundreds
of thousands queued rmprocesses either, I made this script as /bin/RM:

===
#!/bin/sh

SLEEP=10
[ x$1 != x ]  SLEEP=$1

A=0
# To rm small files: find ... -size -10
find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
  du -hs $LINE
  rm -f $LINE 
  A=$(($A+1))
  [ $A -ge 100 ]  ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
 echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps 
-ef | grep -wc rm;

  done
  date )  A=`ps -ef | grep -wc rm`
done ; date
===

Essentially, after firing up 100 rm attempts it waits for the rm
process count to go below 50, then goes on. Sizing may vary
between systems, phase of the moon and computer's attitude.
Sometimes I had 700 processes stacked and processed quickly.
Sometimes it hung on 50...

HTH,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Daniel Carosone
um, this is what xargs -P is for ...

--
Dan.

On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote:
 2011-07-14 15:48, Frank Van Damme ?:
 It seems counter-intuitive - you'd say: concurrent disk access makes  
 things only slower - , but it turns out to be true. I'm deleting a  
 dozen times faster than before. How completely ridiculous. Thank you 
 :-)

 Well, look at it this way: it is not only about singular disk accesses
 (i.e. unlike other FSes, you do not in-place modify a directory entry),
 with ZFS COW it is about rewriting a tree of block pointers, with any
 new writes going into free (unreferenced ATM) disk blocks anyway.

 So by hoarding writes you have a chance to reduce mechanical
 IOPS required for your tasks. Until you run out of RAM ;)

 Just in case it helps, to quickly fire up removals of the specific  
 directory
 after yet another reboot of the box, and not overwhelm it with hundreds
 of thousands queued rmprocesses either, I made this script as /bin/RM:

 ===
 #!/bin/sh

 SLEEP=10
 [ x$1 != x ]  SLEEP=$1

 A=0
 # To rm small files: find ... -size -10
 find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
   du -hs $LINE
   rm -f $LINE 
   A=$(($A+1))
   [ $A -ge 100 ]  ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
  echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef 
 | grep -wc rm;
   done
   date )  A=`ps -ef | grep -wc rm`
 done ; date
 ===

 Essentially, after firing up 100 rm attempts it waits for the rm
 process count to go below 50, then goes on. Sizing may vary
 between systems, phase of the moon and computer's attitude.
 Sometimes I had 700 processes stacked and processed quickly.
 Sometimes it hung on 50...

 HTH,
 //Jim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


pgprXDuV2KRuK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
 I understand the argument, DDT must be stored in the primary storage pool
 so
 you can increase the size of the storage pool without running out of space
 to hold the DDT...  But it's a fatal design flaw as long as you care about
 performance...  If you don't care about performance, you might as well use
 the netapp and do offline dedup.  The point of online dedup is to gain
 performance.  So in ZFS you have to care about the performance.
 
 There are only two possible ways to fix the problem.
 Either ...
 The DDT must be changed so it can be stored entirely in a designated
 sequential area of disk, and maintained entirely in RAM, so all DDT
 reads/writes can be infrequent and serial in nature...  This would solve
the
 case of async writes and large sync writes, but would still perform poorly
 for small sync writes.  And it would be memory intensive.  But it should
 perform very nicely given those limitations.  ;-)
 Or ...
 The DDT stays as it is now, highly scattered small blocks, and there needs
 to be an option to store it entirely on low latency devices such as
 dedicated SSD's.  Eliminate the need for the DDT to reside on the slow
 primary storage pool disks.  I understand you must consider what happens
 when the dedicated SSD gets full.  The obvious choices would be either (a)
 dedup turns off whenever the metadatadevice is full or (b) it defaults to
 writing blocks in the main storage pool.  Maybe that could even be a
 configurable behavior.  Either way, there's a very realistic use case
here.
 For some people in some situations, it may be acceptable to say I have
32G
 mirrored metadatadevice, divided by 137bytes per entry I can dedup up to a
 maximum 218M unique blocks in pool, and if I estimate 100K average block
 size that means up to 20T primary pool storage.  If I reach that limit,
I'll
 add more metadatadevice.
 
 Both of those options would also go a long way toward eliminating the
 surprise delete performance black hole.

Is anyone from Oracle reading this?  I understand if you can't say what
you're working on and stuff like that.  But I am merely hopeful this work
isn't going into a black hole...  

Anyway.  Thanks for listening (I hope.)   ttyl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Jim Klimov

2011-07-15 6:21, Daniel Carosone ?:

um, this is what xargs -P is for ...


Thanks for the hint. True, I don't often use xargs.

However from the man pages, I don't see a -P option
on OpenSolaris boxes of different releases, and there
is only a -p (prompt) mode. I am not eager to enter
yes 40 times ;)

The way I had this script in practice, I could enter RM
once and it worked till the box hung. Even then, a watchdog
script could often have it rebooted without my interaction
so it could continue in the next lifetime ;)



--
Dan.

On Thu, Jul 14, 2011 at 07:24:52PM +0400, Jim Klimov wrote:

2011-07-14 15:48, Frank Van Damme ?:

It seems counter-intuitive - you'd say: concurrent disk access makes
things only slower - , but it turns out to be true. I'm deleting a
dozen times faster than before. How completely ridiculous. Thank you
:-)

Well, look at it this way: it is not only about singular disk accesses
(i.e. unlike other FSes, you do not in-place modify a directory entry),
with ZFS COW it is about rewriting a tree of block pointers, with any
new writes going into free (unreferenced ATM) disk blocks anyway.

So by hoarding writes you have a chance to reduce mechanical
IOPS required for your tasks. Until you run out of RAM ;)

Just in case it helps, to quickly fire up removals of the specific
directory
after yet another reboot of the box, and not overwhelm it with hundreds
of thousands queued rmprocesses either, I made this script as /bin/RM:

===
#!/bin/sh

SLEEP=10
[ x$1 != x ]  SLEEP=$1

A=0
# To rm small files: find ... -size -10
find /export/OLD/PATH/TO/REMOVE -type f | while read LINE; do
   du -hs $LINE
   rm -f $LINE
   A=$(($A+1))
   [ $A -ge 100 ]  ( date; while [ `ps -ef | grep -wc rm` -gt 50 ]; do
  echo Sleep $SLEEP...; ps -ef | grep -wc rm ; sleep $SLEEP; ps -ef
| grep -wc rm;
   done
   date )  A=`ps -ef | grep -wc rm`
done ; date
===

Essentially, after firing up 100 rm attempts it waits for the rm
process count to go below 50, then goes on. Sizing may vary
between systems, phase of the moon and computer's attitude.
Sometimes I had 700 processes stacked and processed quickly.
Sometimes it hung on 50...

HTH,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Daniel Carosone
On Fri, Jul 15, 2011 at 07:56:25AM +0400, Jim Klimov wrote:
 2011-07-15 6:21, Daniel Carosone ?:
 um, this is what xargs -P is for ...

 Thanks for the hint. True, I don't often use xargs.

 However from the man pages, I don't see a -P option
 on OpenSolaris boxes of different releases, and there
 is only a -p (prompt) mode. I am not eager to enter
 yes 40 times ;)

you want the /usr/gnu/{bin,share/man} version, at least in this case.

--
Dan.


pgpItiuUybbdI.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss