[zfs-discuss] deduplication requirements

2011-02-07 Thread Michael
Hi guys,

I'm currently running 2 zpools each in a raidz1 configuration, totally
around 16TB usable data. I'm running it all on an OpenSolaris based box with
2gb memory and an old Athlon 64 3700 CPU, I understand this is very poor and
underpowered for deduplication, so I'm looking at building a new system, but
wanted some advice first, here is what i've planned so far:

Core i7 2600 CPU
16gb DDR3 Memory
64GB SSD for ZIL (optional)

Would this produce decent results for deduplication of 16TB worth of pools
or would I need more RAM still?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication requirements

2011-02-07 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Michael
 
 Core i7 2600 CPU
 16gb DDR3 Memory
 64GB SSD for ZIL (optional)
 
 Would this produce decent results for deduplication of 16TB worth of pools
 or would I need more RAM still?

What matters is the amount of unique data in your pool.  I'll just assume
it's all unique, but of course that's ridiculous because if it's all unique
then why would you want to enable dedup.  But anyway, I'm assuming 16T of
unique data.  

The rule is a little less than 3G of ram for every 1T of unique data.  In
your case, 16*2.8 = 44.8G ram required in addition to your base ram
configuration.  You need at least 48G of ram.  Or less unique data.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication requirements

2011-02-07 Thread Erik Trimble

On 2/7/2011 1:06 PM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Michael

Core i7 2600 CPU
16gb DDR3 Memory
64GB SSD for ZIL (optional)

Would this produce decent results for deduplication of 16TB worth of pools
or would I need more RAM still?

What matters is the amount of unique data in your pool.  I'll just assume
it's all unique, but of course that's ridiculous because if it's all unique
then why would you want to enable dedup.  But anyway, I'm assuming 16T of
unique data.

The rule is a little less than 3G of ram for every 1T of unique data.  In
your case, 16*2.8 = 44.8G ram required in addition to your base ram
configuration.  You need at least 48G of ram.  Or less unique data.


To follow up on Ned's estimation, please let us know what kind of data 
you're planning on putting in the Dedup'd zpool. That can really give us 
a better idea as to the number of slabs that the pool will have, which 
is what drives dedup RAM and L2ARC usage.


You also want to use an SSD for L2ARC, NOT for ZIL (though, you *might* 
also want one for ZIL, depending on your write patterns).



In all honesty, these days, it doesn't pay to dedup a pool unless you 
can count on large amounts of common data.  Virtual Machine images, 
incremental backups, ISO images of data CD/DVDs, and some Video are your 
best bet. Pretty much everything else is going to cost you more in 
RAM/L2ARC than it's worth.



IMHO, you don't want Dedup unless you can *count* on a 10x savings factor.


Also, for reasons discussed here before, I would not recommend a Core i7 
for use as a fileserver CPU. It's an Intel Desktop CPU, and almost 
certainly won't support ECC Ram on your motherboard, and it seriously 
overpowered for your use.


See if you can find a nice socket AM3+ motherboard for a low-range 
Athlon X3/X4.  You can get ECC RAM for it (even in a desktop 
motherboard), it will cost less, and perform at least as well.


Dedup is not CPU intensive. Compression is, and you may very well want 
to enable that, but you're still very unlikely to hit a CPU bottleneck 
before RAM starvation or disk wait occurs.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication requirements

2011-02-07 Thread Erik Trimble

On 2/7/2011 1:06 PM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Michael

Core i7 2600 CPU
16gb DDR3 Memory
64GB SSD for ZIL (optional)

Would this produce decent results for deduplication of 16TB worth of pools
or would I need more RAM still?

What matters is the amount of unique data in your pool.  I'll just assume
it's all unique, but of course that's ridiculous because if it's all unique
then why would you want to enable dedup.  But anyway, I'm assuming 16T of
unique data.

The rule is a little less than 3G of ram for every 1T of unique data.  In
your case, 16*2.8 = 44.8G ram required in addition to your base ram
configuration.  You need at least 48G of ram.  Or less unique data.


To follow up on Ned's estimation, please let us know what kind of data 
you're planning on putting in the Dedup'd zpool. That can really give us 
a better idea as to the number of slabs that the pool will have, which 
is what drives dedup RAM and L2ARC usage.


You also want to use an SSD for L2ARC, NOT for ZIL (though, you *might* 
also want one for ZIL, depending on your write patterns).



In all honesty, these days, it doesn't pay to dedup a pool unless you 
can count on large amounts of common data.  Virtual Machine images, 
incremental backups, ISO images of data CD/DVDs, and some Video are your 
best bet. Pretty much everything else is going to cost you more in 
RAM/L2ARC than it's worth.



IMHO, you don't want Dedup unless you can *count* on a 10x savings factor.


Also, for reasons discussed here before, I would not recommend a Core i7 
for use as a fileserver CPU. It's an Intel Desktop CPU, and almost 
certainly won't support ECC Ram on your motherboard, and it seriously 
overpowered for your use.


See if you can find a nice socket AM3+ motherboard for a low-range 
Athlon X3/X4.  You can get ECC RAM for it (even in a desktop 
motherboard), it will cost less, and perform at least as well.


Dedup is not CPU intensive. Compression is, and you may very well want 
to enable that, but you're still very unlikely to hit a CPU bottleneck 
before RAM starvation or disk wait occurs.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication requirements

2011-02-07 Thread taemun
On 6 February 2011 01:34, Michael michael.armstr...@gmail.com wrote:

 Hi guys,

 I'm currently running 2 zpools each in a raidz1 configuration, totally
 around 16TB usable data. I'm running it all on an OpenSolaris based box with
 2gb memory and an old Athlon 64 3700 CPU, I understand this is very poor and
 underpowered for deduplication, so I'm looking at building a new system, but
 wanted some advice first, here is what i've planned so far:

 Core i7 2600 CPU
 16gb DDR3 Memory
 64GB SSD for ZIL (optional)


http://ark.intel.com/Product.aspx?id=52213
http://ark.intel.com/Product.aspx?id=52213The desktop Core i* range
doesn't support ECC ram at all, this could potentially be a pool breaker if
you get a flipped bit in the wrong place (a significant metadata block).
Just something to keep in mind. Also, Intel have issued a recall (ish) for
all of the 6 series chipsets released so far, the PLL unit for the 3gbit
SATA ports on the chipset is driven too hard and will likely degrade over
time (5~15% failure rate over three years). They are talking about a
March~April time to fix in the channel. If you don't plan on using the 3gbit
SATA ports, then you're fine.

Intel will make 1155 Xeon's at some point, ie
http://en.wikipedia.org/wiki/List_of_future_Intel_microprocessors#.22Sandy_Bridge.22_.2832_nm.29_8
They support ECC (just check for a specific QVL after launch, DDR3 ECC
isn't necessarily the only thing you need to look for). I think the Feb 20
release date may have been pushed for the chipset respin.

Cheers,
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss