Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Frank Van Damme
Op 06-05-11 05:44, Richard Elling schreef:
 As the size of the data grows, the need to have the whole DDT in RAM or L2ARC
 decreases. With one notable exception, destroying a dataset or snapshot 
 requires
 the DDT entries for the destroyed blocks to be updated. This is why people can
 go for months or years and not see a problem, until they try to destroy a 
 dataset.

So what you are saying is you with your ram-starved system, don't even
try to start using snapshots on that system. Right?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Casper . Dik

Op 06-05-11 05:44, Richard Elling schreef:
 As the size of the data grows, the need to have the whole DDT in RAM or L2ARC
 decreases. With one notable exception, destroying a dataset or snapshot 
 requires
 the DDT entries for the destroyed blocks to be updated. This is why people 
 can
 go for months or years and not see a problem, until they try to destroy a 
 dataset.

So what you are saying is you with your ram-starved system, don't even
try to start using snapshots on that system. Right?


I think it's more like don't use dedup when you don't have RAM.

(It is not possible to not use snapshots in Solaris; they are used for
everything)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Erik Trimble

On 5/6/2011 1:37 AM, casper@oracle.com wrote:

Op 06-05-11 05:44, Richard Elling schreef:

As the size of the data grows, the need to have the whole DDT in RAM or L2ARC
decreases. With one notable exception, destroying a dataset or snapshot requires
the DDT entries for the destroyed blocks to be updated. This is why people can
go for months or years and not see a problem, until they try to destroy a 
dataset.

So what you are saying is you with your ram-starved system, don't even
try to start using snapshots on that system. Right?


I think it's more like don't use dedup when you don't have RAM.

(It is not possible to not use snapshots in Solaris; they are used for
everything)

Casper

Casper and Richard are correct - RAM starvation seriously impacts 
snapshot or dataset deletion when a pool has dedup enabled.  The reason 
behind this is that ZFS needs to scan the entire DDT to check to see if 
it can actually delete each block in the to-be-deleted snapshot/dataset, 
or if it just needs to update the dedup reference count. If it can't 
store the entire DDT in either the ARC or L2ARC, it will be forced to do 
considerable I/O to disk, as it brings in the appropriate DDT entry.   
Worst case for insufficient ARC/L2ARC space can increase deletion times 
by many orders of magnitude. E.g. days, weeks, or even months to do a 
deletion.



If dedup isn't enabled, snapshot and data deletion is very light on RAM 
requirements, and generally won't need to do much (if any) disk I/O.  
Such deletion should take milliseconds to a minute or so.




--

Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Tomas Ögren
On 06 May, 2011 - Erik Trimble sent me these 1,8K bytes:

 If dedup isn't enabled, snapshot and data deletion is very light on RAM  
 requirements, and generally won't need to do much (if any) disk I/O.   
 Such deletion should take milliseconds to a minute or so.

.. or hours. We've had problems on an old raidz2 that a recursive
snapshot creation over ~800 filesystems could take quite some time, up
until the sata-scsi disk box ate the pool. Now we're using raid10 on a
scsi box, and it takes 3-15 minute or so, during which sync writes (NFS)
are almost unusable. Using 2 fast usb sticks as l2arc, waiting for a
Vertex2EX and a Vertex3 to arrive for ZILL2ARC testing. IO to the
filesystems are quite low (50 writes, 500k data per sec average), but
snapshot times goes waay up during backups.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-05-06 Thread Ian Collins

 On 05/ 5/11 10:02 PM, Joerg Schilling wrote:

Ian Collinsi...@ianshome.com  wrote:


*ufsrestore works fine on ZFS filesystems (although I haven't tried it
with any POSIX ACLs on the original ufs filesystem, which would probably
simply get lost).

star -copy -no-fsync  is typically 30% faster that ufsdump | ufsrestore.


Does it preserve ACLs?

Star supports ACLs from the withdrawn POSIX draft.

So star would work moving data from UFS to ZFS, assuming it uses 
acl_get/set to read and write the ACLs.



Star could already support ZFS ACLs in case that Sun had offered a correctly
working ACL support library when they introdiced ZFS ACLs. Unfortunately it
took some time until this lib was fixed and since then, I had other projects
that took my time. ZFS ACLs are not fogetten however.

Um, I thought the acl_totext and acl_fromtext functions had been around 
for many years.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Edward Ned Harvey
 From: Richard Elling [mailto:richard.ell...@gmail.com]
 
  --- To calculate size of DDT ---
   zdb -S poolname
Look at total blocks allocated.  It is rounded, and uses a suffix like K,
M, G but it's in decimal (powers of 10) notation, so you have to remember
that...  So I prefer the zdb -D method below, but this works too.  Total
blocks allocated * mem requirement per DDT entry, and you have the mem
requirement to hold whole DDT in ram.


   zdb -DD poolname
This just gives you the -S output, and the -D output all in one go.  So I
recommend using -DD, and base your calculations on #duplicate and #unique,
as mentioned below.  Consider the histogram to be informational.

   zdb -D poolname
It gives you a number of duplicate, and a number of unique blocks.  Add them
to get the total number of blocks.  Multiply by the mem requirement per DDT
entry, and you have the mem requirement to hold the whole DDT in ram.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
 
  zdb -DD poolname
 This just gives you the -S output, and the -D output all in one go.  So I

Sorry, zdb -DD only works for pools that are already dedup'd.
If you want to get a measurement for a pool that is not already dedup'd, you
have to use -S

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recommended eSATA PCI cards

2011-05-06 Thread Rich Teer
Hi all,

I'm looking at replacing my old D1000 array with some new external drives,
most likely these: http://www.g-technology.com/products/g-drive.cfm .  In
the immediate term, I'm planning to use USB 2.0 connections, but the drive
I'm considering also supports eSATA, which is MUCH faster than USB, but
also (I think, please correct me if I'm wrong) more reliable.

Neither of the machines I'll be using as my server (currently an SB1000 but
will be an Ultra 20 M2 soon; this is my home network, very light workload)
has an integrated eSATA port, so I must turn to add-on PCI cards.  What are
people recommending?  I need to attach at least two drives (I'll be mirroring
them), preferably three or more.

The machines are currently running SXCE snv_b130, with an upgrade to Solaris
Express 11 not too far away.

Thanks!

-- 
Rich Teer, Publisher
Vinylphile Magazine

www.vinylphilemag.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely Slow ZFS Performance

2011-05-06 Thread Garrett D'Amore
Sounds like a nasty bug, and not one I've seen in illumos or
NexentaStor.  What build are you running?

- Garrett

On Wed, 2011-05-04 at 15:40 -0700, Adam Serediuk wrote:
 Dedup is disabled (confirmed to be.) Doing some digging it looks like
 this is a very similar issue
 to http://forums.oracle.com/forums/thread.jspa?threadID=2200577tstart=0.
 
 
 
 On May 4, 2011, at 2:26 PM, Garrett D'Amore wrote:
 
  My first thought is dedup... perhaps you've got dedup enabled and
  the DDT no longer fits in RAM?  That would create a huge performance
  cliff.
  
  -Original Message-
  From: zfs-discuss-boun...@opensolaris.org on behalf of Eric D.
  Mudama
  Sent: Wed 5/4/2011 12:55 PM
  To: Adam Serediuk
  Cc: zfs-discuss@opensolaris.org
  Subject: Re: [zfs-discuss] Extremely Slow ZFS Performance
  
  On Wed, May  4 at 12:21, Adam Serediuk wrote:
  Both iostat and zpool iostat show very little to zero load on the
  devices even while blocking.
  
  Any suggestions on avenues of approach for troubleshooting?
  
  is 'iostat -en' error free?
  
  
  --
  Eric D. Mudama
  edmud...@bounceswoosh.org
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  
  
  
 
 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-06 Thread Ray Van Dolson
On Wed, May 04, 2011 at 08:49:03PM -0700, Edward Ned Harvey wrote:
  From: Tim Cook [mailto:t...@cook.ms]
  
  That's patently false.  VM images are the absolute best use-case for dedup
  outside of backup workloads.  I'm not sure who told you/where you got the
  idea that VM images are not ripe for dedup, but it's wrong.
 
 Well, I got that idea from this list.  I said a little bit about why I
 believed it was true ... about dedup being ineffective for VM's ... Would
 you care to describe a use case where dedup would be effective for a VM?  Or
 perhaps cite something specific, instead of just wiping the whole thing and
 saying patently false?  I don't feel like this comment was productive...
 

We use dedupe on our VMware datastores and typically see 50% savings,
often times more.  We do of course keep like VM's on the same volume
(at this point nothing more than groups of Windows VM's, Linux VM's and
so on).

Note that this isn't on ZFS (yet), but we hope to begin experimenting
with it soon (using NexentaStor).

Apologies for devolving the conversation too much in the NetApp
direction -- simply was a point of reference for me to get a better
understanding of things on the ZFS side. :)

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommended eSATA PCI cards

2011-05-06 Thread Mark Danico

 Hi Rich,
With the Ultra 20M2 there is a very cheap/easy alternative
that might work for you (until you need to expand past 2
more external devices anyway)

Pick up an eSATA pci bracket cable adapter, something like this-
http://www.newegg.com/Product/Product.aspx?Item=N82E16812226003cm_re=eSATA-_-12-226-003-_-Product
(I haven't used this specific product but it was the first example I found)

The U20M2 has slots for just 2 internal SATA drives but the
motherboard has a total of 4 SATA connectors so there are
two that normally go unused. Connect these to the bracket
and connect your external eSATA enclosures to these. You'll
get two eSATA ports without needing to use any PCI slots
and I believe that if you use the very bottom pci slot opening
you won't even block any of the actual pci slots from future use.

-Mark D.



On 05/ 6/11 12:04 PM, Rich Teer wrote:

Hi all,

I'm looking at replacing my old D1000 array with some new external drives,
most likely these: http://www.g-technology.com/products/g-drive.cfm .  In
the immediate term, I'm planning to use USB 2.0 connections, but the drive
I'm considering also supports eSATA, which is MUCH faster than USB, but
also (I think, please correct me if I'm wrong) more reliable.

Neither of the machines I'll be using as my server (currently an SB1000 but
will be an Ultra 20 M2 soon; this is my home network, very light workload)
has an integrated eSATA port, so I must turn to add-on PCI cards.  What are
people recommending?  I need to attach at least two drives (I'll be mirroring
them), preferably three or more.

The machines are currently running SXCE snv_b130, with an upgrade to Solaris
Express 11 not too far away.

Thanks!



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-06 Thread Brandon High
On Fri, May 6, 2011 at 9:15 AM, Ray Van Dolson rvandol...@esri.com wrote:

 We use dedupe on our VMware datastores and typically see 50% savings,
 often times more.  We do of course keep like VM's on the same volume

I think NetApp uses 4k blocks by default, so the block size and
alignment should match up for most filesystems and yield better
savings.

Your server's resource requirements for ZFS and dedup will be much
higher due to the large DDT, as you initially suspected.

If bp_rewrite is ever completed and released, this might change. It
should allow for offline dedup, which may make dedup usable in more
situations.

 Apologies for devolving the conversation too much in the NetApp
 direction -- simply was a point of reference for me to get a better
 understanding of things on the ZFS side. :)

It's good to compare the two, since they have a pretty large overlap
in functionality but sometimes very different implementations.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Yaverot
One of the quoted participants is Richard Elling, the other is Edward Ned 
Harvey, but my quoting was screwed up enough that I don't know which is which.  
Apologies.

 zdb -DD poolname
 This just gives you the -S output, and the -D output all in one go.  So I

Sorry, zdb -DD only works for pools that are already dedup'd.
If you want to get a measurement for a pool that is not already dedup'd, you 
have to use -S

And since zdb -S runs for 2 hours and dumps core (without results), the correct 
answer remains:
zdb -bb poolname | grep 'bp count'
as was given in the summary.

The theoretical output of zdb -S my be superior if you have a version that 
works, but I haven't seen anyone mention onlist which version(s) it is, or 
if/how it can be obtained; short of recompiling it yourself.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Richard Elling
On May 6, 2011, at 3:24 AM, Erik Trimble erik.trim...@oracle.com wrote:

 On 5/6/2011 1:37 AM, casper@oracle.com wrote:
 Op 06-05-11 05:44, Richard Elling schreef:
 As the size of the data grows, the need to have the whole DDT in RAM or 
 L2ARC
 decreases. With one notable exception, destroying a dataset or snapshot 
 requires
 the DDT entries for the destroyed blocks to be updated. This is why people 
 can
 go for months or years and not see a problem, until they try to destroy a 
 dataset.
 So what you are saying is you with your ram-starved system, don't even
 try to start using snapshots on that system. Right?
 
 I think it's more like don't use dedup when you don't have RAM.
 
 (It is not possible to not use snapshots in Solaris; they are used for
 everything)

:-)

 
 Casper
 
 Casper and Richard are correct - RAM starvation seriously impacts snapshot or 
 dataset deletion when a pool has dedup enabled.  The reason behind this is 
 that ZFS needs to scan the entire DDT to check to see if it can actually 
 delete each block in the to-be-deleted snapshot/dataset, or if it just needs 
 to update the dedup reference count.

AIUI, the issue is not the the DDT is scanned, it is an AVL tree for a reason. 
The issue is that each reference update means that one, small bit of data is 
changed. If the reference is not already in ARC, then a small, probably random 
read is needed. If you have a typical consumer disk, especially a green disk, 
and have not tuned zfs_vdev_max_pending, then that itty bitty read can easily 
take more than 100 milliseconds(!) Consider that you can have thousands or 
millions of reference updates to do during a zfs destroy, and the math gets 
ugly. This is why fast SSDs make good dedup candidates.

 If it can't store the entire DDT in either the ARC or L2ARC, it will be 
 forced to do considerable I/O to disk, as it brings in the appropriate DDT 
 entry.   Worst case for insufficient ARC/L2ARC space can increase deletion 
 times by many orders of magnitude. E.g. days, weeks, or even months to do a 
 deletion.

I've never seen months, but I have seen days, especially for low-perf disks.

 
 If dedup isn't enabled, snapshot and data deletion is very light on RAM 
 requirements, and generally won't need to do much (if any) disk I/O.  Such 
 deletion should take milliseconds to a minute or so.

Yes, perhaps a bit longer for recursive destruction, but everyone here knows 
recursion is evil, right? :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Erik Trimble

On 5/6/2011 5:46 PM, Richard Elling wrote:

On May 6, 2011, at 3:24 AM, Erik Trimbleerik.trim...@oracle.com  wrote:


Casper and Richard are correct - RAM starvation seriously impacts snapshot or 
dataset deletion when a pool has dedup enabled.  The reason behind this is that 
ZFS needs to scan the entire DDT to check to see if it can actually delete each 
block in the to-be-deleted snapshot/dataset, or if it just needs to update the 
dedup reference count.

AIUI, the issue is not the the DDT is scanned, it is an AVL tree for a reason. The issue 
is that each reference update means that one, small bit of data is changed. If the 
reference is not already in ARC, then a small, probably random read is needed. If you 
have a typical consumer disk, especially a green disk, and have not tuned 
zfs_vdev_max_pending, then that itty bitty read can easily take more than 100 
milliseconds(!) Consider that you can have thousands or millions of reference updates to 
do during a zfs destroy, and the math gets ugly. This is why fast SSDs make good dedup 
candidates.

Just out of curiosity - I'm assuming that a delete works like this:

(1) find list of blocks associated with file to be deleted
(2) using the DDT, find out if any other files are using those blocks
(3) delete/update any metadata associated with the file (dirents, 
ACLs, etc.)

(4) for each block in the file
(4a) if the DDT indicates there ARE other files using this 
block, update the DDT entry to change the refcount
(4b) if the DDT indicates there AREN'T any other files, move 
the physical block to the free list, and delete the DDT entry



In a bulk delete scenario (not just snapshot deletion), I'd presume #1 
above almost always causes a Random I/O request to disk, as all the 
relevant metadata for every (to be deleted) file is unlikely to be 
stored in ARC.  If you can't fit the DDT in ARC/L2ARC, #2 above would 
require you to pull in the remainder of the DDT info from disk, right?  
#3 and #4 can be batched up, so they don't hurt that much.


Is that a (roughly) correct deletion methodology? Or can someone give a 
more accurate view of what's actually going on?





If it can't store the entire DDT in either the ARC or L2ARC, it will be forced 
to do considerable I/O to disk, as it brings in the appropriate DDT entry.   
Worst case for insufficient ARC/L2ARC space can increase deletion times by many 
orders of magnitude. E.g. days, weeks, or even months to do a deletion.

I've never seen months, but I have seen days, especially for low-perf disks.
I've seen an estimate of 5 weeks for removing a snapshot on a 1TB dedup 
pool made up of 1 disk.


Not an optimal set up.

:-)


If dedup isn't enabled, snapshot and data deletion is very light on RAM 
requirements, and generally won't need to do much (if any) disk I/O.  Such 
deletion should take milliseconds to a minute or so.

Yes, perhaps a bit longer for recursive destruction, but everyone here knows 
recursion is evil, right? :-)
  -- richard
You, my friend, have obviously never worshipped at the Temple of the 
Lamba Calculus, nor been exposed to the Holy Writ that is Structure and 
Interpretation of Computer Programs 
(http://mitpress.mit.edu/sicp/full-text/book/book.html).


I sentence you to a semester of 6.001 problem sets, written by Prof 
Sussman sometime in the 1980s.


(yes, I went to MIT.)

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss