Re: [zfs-discuss] Migrating zpool to new drives with 4K Sectors

2011-01-06 Thread taemun
zfs replace will copy across on to the disk with the same old ashift=9,
whereas you want ashift=12 for 4KB drives. (size = 2^ashift)

You'd need to make a new pool (or add a vdev to an existing pool) with the
modified tools in order to get proper performance out of 4KB drives.

On 7 January 2011 17:43, Matthew Angelo  wrote:

> Hi ZFS Discuss,
>
> I have a 8x 1TB RAIDZ running on Samsung 1TB 5400rpm drives with 512b
> sectors.
>
> I will be replacing all of these with 8x Western Digital 2TB drives
> with support for 4K sectors.  The replacement plan will be to swap out
> each of the 8 drives until all are replaced and the new size (~16TB)
> is available with a `zfs scrub`.
>
> My question is, how do I do this and also factor in the new 4k sector
> size?  or should I find a 2TB drive that still uses 512b sectors?
>
>
> Thanks
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Migrating zpool to new drives with 4K Sectors

2011-01-06 Thread Matthew Angelo
Hi ZFS Discuss,

I have a 8x 1TB RAIDZ running on Samsung 1TB 5400rpm drives with 512b sectors.

I will be replacing all of these with 8x Western Digital 2TB drives
with support for 4K sectors.  The replacement plan will be to swap out
each of the 8 drives until all are replaced and the new size (~16TB)
is available with a `zfs scrub`.

My question is, how do I do this and also factor in the new 4k sector
size?  or should I find a 2TB drive that still uses 512b sectors?


Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael DeMan
At the end of the day this issue essentially is about mathematical 
improbability versus certainty?

To be quite honest, I too am skeptical about about using de-dupe just based on 
SHA256.  In prior posts it was asked that the potential adopter of the 
technology provide the mathematical reason to NOT use SHA-256 only.  However, 
if Oracle believes that it is adequate to do that, would it be possible for 
somebody to provide:

(A) The theoretical documents and associated mathematics specific to say one 
simple use case?
(A1) Total data size is 1PB (lets say the zpool is 2PB to not worry about that 
part of it).
(A2) Daily, 10TB of data is updated, 1TB of data is deleted, and 1TB of data is 
'new'.
(A3) Out of the dataset, 25% of the data is capable of being de-duplicated
(A4) Between A2 and A3 above, the 25% rule from A3 also applies to everything 
in A2.


I think the above would be a pretty 'soft' case for justifying the case that 
SHA-256 works?  I would presume some kind of simple kind of scenario 
mathematically has been run already by somebody inside Oracle/Sun long ago when 
first proposing that ZFS be funded internally at all?


Then - there is the other side of things.  The 'black swan' event.  At some 
point, given percentages on a scenario like the example case above, one simply 
has to make the business justification case internally at their own company 
about whether to go SHA-256 only or Fletcher+Verification?  Add Murphy's Law to 
the 'black swan event' and of course the only data that is lost is that .01% of 
your data that is the most critical?



Not trying to be aggressive or combative here at all against peoples opinions 
and understandings of it all - I would just like to see some hard information 
about it all - it must exist somewhere already?

Thanks,
 
- Mike




On Jan 6, 2011, at 10:05 PM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Peter Taps
>> 
>> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> 
> Append 50 more 9's on there. 
> 99.%
> 
> See below.
> 
> 
>> I have been told that the checksum value returned by Sha256 is almost
>> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
>> bigger problem such as memory corruption, etc. Essentially, adding
>> verification to sha256 is an overkill.
> 
> Someone please correct me if I'm wrong.  I assume ZFS dedup matches both the
> blocksize and the checksum right?  A simple checksum collision (which is
> astronomically unlikely) is still not sufficient to produce corrupted data.
> It's even more unlikely than that.
> 
> Using the above assumption, here's how you calculate the probability of
> corruption if you're not using verification:
> 
> Suppose every single block in your whole pool is precisely the same size
> (which is unrealistic in the real world, but I'm trying to calculate worst
> case.)  Suppose the block is 4K, which is again, unrealistically worst case.
> Suppose your dataset is purely random or sequential ... with no duplicated
> data ... which is unrealisic because if your data is like that, then why in
> the world are you enabling dedupe?  But again, assuming worst case
> scenario...  At this point we'll throw in some evil clowns, spit on a voodoo
> priestess, and curse the heavens for some extra bad luck.
> 
> If you have astronomically infinite quantities of data, then your
> probability of corruption approaches 100%.  With infinite data, eventually
> you're guaranteed to have a collision.  So the probability of corruption is
> directly related to the total amount of data you have, and the new question
> is:  For anything Earthly, how near are you to 0% probability of collision
> in reality?
> 
> Suppose you have 128TB of data.  That is ...  you have 2^35 unique 4k blocks
> of uniformly sized data.  Then the probability you have any collision in
> your whole dataset is (sum(1 thru 2^35))*2^-256 
> Note: sum of integers from 1 to N is  (N*(N+1))/2
> Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35
> Note: (N*(N+1))/2 in this case = 2^69 + 2^34
> So the probability of data corruption in this case, is 2^-187 + 2^-222 ~=
> 5.1E-57 + 1.5E-67
> 
> ~= 5.1E-57
> 
> In other words, even in the absolute worst case, cursing the gods, running
> without verification, using data that's specifically formulated to try and
> cause errors, on a dataset that I bet is larger than what you're doing, ...
> 
> Before we go any further ... The total number of bits stored on all the
> storage in the whole planet is a lot smaller than the total number of
> molecules in the planet.
> 
> There are estimated 8.87 * 10^49 molecules in planet Earth.
> 
> The probability of a collision in your worst-case unrealistic dataset as
> described, is even 100 million times less likely than randomly finding a
> single specific molecule in the whole planet Earth

Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Michael Sullivan
Ed, with all due respect to your math,

I've seen rsync bomb due to an SHA256 collision, so I know it can and does 
happen.

I respect my data, so even with checksumming and comparing the block size, I'll 
still do a comparison check if those two match.  You will end up with silent 
data corruption which could affect you in so many ways.

Do you want to stake your career and reputation on that?  With a client or 
employer's data? I sure don't.

"Those who walk on the razor's edge are destined to be cut to ribbons…" Someone 
I used to work with said that, not me.

For my home media server, maybe, but even then I'd hate to lose any of my 
family photos or video due to a hash collision.

I'll play it safe if I dedup.

Mike

---
Michael Sullivan   
michael.p.sulli...@me.com
http://www.kamiogi.net/
Mobile: +1-662-202-7716
US Phone: +1-561-283-2034
JP Phone: +81-50-5806-6242

On 7 Jan 2011, at 00:05 , Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Peter Taps
>> 
>> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> 
> Append 50 more 9's on there. 
> 99.%
> 
> See below.
> 
> 
>> I have been told that the checksum value returned by Sha256 is almost
>> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
>> bigger problem such as memory corruption, etc. Essentially, adding
>> verification to sha256 is an overkill.
> 
> Someone please correct me if I'm wrong.  I assume ZFS dedup matches both the
> blocksize and the checksum right?  A simple checksum collision (which is
> astronomically unlikely) is still not sufficient to produce corrupted data.
> It's even more unlikely than that.
> 
> Using the above assumption, here's how you calculate the probability of
> corruption if you're not using verification:
> 
> Suppose every single block in your whole pool is precisely the same size
> (which is unrealistic in the real world, but I'm trying to calculate worst
> case.)  Suppose the block is 4K, which is again, unrealistically worst case.
> Suppose your dataset is purely random or sequential ... with no duplicated
> data ... which is unrealisic because if your data is like that, then why in
> the world are you enabling dedupe?  But again, assuming worst case
> scenario...  At this point we'll throw in some evil clowns, spit on a voodoo
> priestess, and curse the heavens for some extra bad luck.
> 
> If you have astronomically infinite quantities of data, then your
> probability of corruption approaches 100%.  With infinite data, eventually
> you're guaranteed to have a collision.  So the probability of corruption is
> directly related to the total amount of data you have, and the new question
> is:  For anything Earthly, how near are you to 0% probability of collision
> in reality?
> 
> Suppose you have 128TB of data.  That is ...  you have 2^35 unique 4k blocks
> of uniformly sized data.  Then the probability you have any collision in
> your whole dataset is (sum(1 thru 2^35))*2^-256 
> Note: sum of integers from 1 to N is  (N*(N+1))/2
> Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35
> Note: (N*(N+1))/2 in this case = 2^69 + 2^34
> So the probability of data corruption in this case, is 2^-187 + 2^-222 ~=
> 5.1E-57 + 1.5E-67
> 
> ~= 5.1E-57
> 
> In other words, even in the absolute worst case, cursing the gods, running
> without verification, using data that's specifically formulated to try and
> cause errors, on a dataset that I bet is larger than what you're doing, ...
> 
> Before we go any further ... The total number of bits stored on all the
> storage in the whole planet is a lot smaller than the total number of
> molecules in the planet.
> 
> There are estimated 8.87 * 10^49 molecules in planet Earth.
> 
> The probability of a collision in your worst-case unrealistic dataset as
> described, is even 100 million times less likely than randomly finding a
> single specific molecule in the whole planet Earth by pure luck.
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Peter Taps
> 
> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But

Append 50 more 9's on there. 
99.%

See below.


> I have been told that the checksum value returned by Sha256 is almost
> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
> bigger problem such as memory corruption, etc. Essentially, adding
> verification to sha256 is an overkill.

Someone please correct me if I'm wrong.  I assume ZFS dedup matches both the
blocksize and the checksum right?  A simple checksum collision (which is
astronomically unlikely) is still not sufficient to produce corrupted data.
It's even more unlikely than that.

Using the above assumption, here's how you calculate the probability of
corruption if you're not using verification:

Suppose every single block in your whole pool is precisely the same size
(which is unrealistic in the real world, but I'm trying to calculate worst
case.)  Suppose the block is 4K, which is again, unrealistically worst case.
Suppose your dataset is purely random or sequential ... with no duplicated
data ... which is unrealisic because if your data is like that, then why in
the world are you enabling dedupe?  But again, assuming worst case
scenario...  At this point we'll throw in some evil clowns, spit on a voodoo
priestess, and curse the heavens for some extra bad luck.

If you have astronomically infinite quantities of data, then your
probability of corruption approaches 100%.  With infinite data, eventually
you're guaranteed to have a collision.  So the probability of corruption is
directly related to the total amount of data you have, and the new question
is:  For anything Earthly, how near are you to 0% probability of collision
in reality?

Suppose you have 128TB of data.  That is ...  you have 2^35 unique 4k blocks
of uniformly sized data.  Then the probability you have any collision in
your whole dataset is (sum(1 thru 2^35))*2^-256 
Note: sum of integers from 1 to N is  (N*(N+1))/2
Note: 2^35 * (2^35+1) = 2^35 * 2^35 + 2^35 = 2^70 + 2^35
Note: (N*(N+1))/2 in this case = 2^69 + 2^34
So the probability of data corruption in this case, is 2^-187 + 2^-222 ~=
5.1E-57 + 1.5E-67

~= 5.1E-57

In other words, even in the absolute worst case, cursing the gods, running
without verification, using data that's specifically formulated to try and
cause errors, on a dataset that I bet is larger than what you're doing, ...

Before we go any further ... The total number of bits stored on all the
storage in the whole planet is a lot smaller than the total number of
molecules in the planet.

There are estimated 8.87 * 10^49 molecules in planet Earth.

The probability of a collision in your worst-case unrealistic dataset as
described, is even 100 million times less likely than randomly finding a
single specific molecule in the whole planet Earth by pure luck.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Jeff Bacon
> From: Edward Ned Harvey
>   
> To: "'Khushil Dep'" 
> Cc: Richard Elling ,
>   zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] A few questions
> Message-ID: <000201cbada5$a3678270$ea3687...@nedharvey.com>
> Content-Type: text/plain; charset="utf-8"
> 
> > From: Khushil Dep [mailto:khushil@gmail.com]
> >
> > I've deployed large SAN's on both SuperMicro 825/826/846 and Dell
> > R610/R710's and I've not found any issues so far. I always make a
> point of
> > installing Intel chipset NIC's on the DELL's and disabling the
> Broadcom ones
> > but other than that it's always been plain sailing - hardware-wise
> anyway.
> 
> "not found any issues," "except the broadcom one which causes the
> system to crash regularly in the default factory configuration."
> 
> How did you learn about the broadcom issue for the first time?  I had
> to learn the hard way, and with all the involvement of both Dell and
> Oracle support teams, nobody could tell me what I needed to change.
We
> literally replaced every component of the server twice over a period
of
> 1 year, and I spent mandays upgrading and downgrading firmwares
> randomly trying to find a stable configuration.  I scoured the
internet
> to find this little tidbit about replacing the broadcom NIC, and
> randomly guessed, and replaced my nic with an intel card to make the
> problem go away.

20 years of doing this c*(# has taught me that most things only
get learned the hard way. I certainly won't bet my career solely
on the ability of the vendor to support the product, because they're
hardly omniscient. Testing, testing, and generous return policies
(and/or R&D budget) 

> The same system doesn't have a problem running RHEL/centos.

Then you're not pushing it hard enough, or your stars are just
aligned nicely.

We have massive piles of Dell hardware, all types. Running CentOS
since at least 4.5. Every single one of those Dells has an Intel
NIC in it, and the Broadcoms disabled in the BIOS. Because every
time we do something stupid like let ourselves think "oh, we could
maybe use those extra Broadcom ports for X", we get burned. 

High-volume financial trading system. Blew up on the bcoms.
Didn't matter what driver or tweak or fix. Plenty of man-days 
wasted debugging. Went with net.advice, put in Intel NIC.
No more problems. That was 3 years ago.  

Thought we could use the bcoms for our fileservers. Nope.

Thought we could use the bcoms for the dedicated drbd links
for our xen cluster. Nope. 

And we know we're not alone in this evaluation.

We could have spent forever chasing support to get someone
to "fix" it I suppose... but we have better things to do. 

> See my point?  Next time I buy a server, I do not have confidence to
> simply expect solaris on dell to work reliably.  The same goes for
> solaris derivatives, and all non-sun hardware.  There simply is not an
> adequate qualification and/or support process.

I'm not convinced ANYONE really has such a thing. Or that it's even
necessarily possible. 

In fact, I'm sure they don't. Cuz that's what it says in the fine
print on the support contracts and the purchase agreements - "we do
not guarantee..." 

I just prefer not to have any confidence for the most part.
It's easier and safer.

-bacon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on top of ZFS iSCSI share

2011-01-06 Thread Edward Ned Harvey
> From: Brandon High [mailto:bh...@freaks.com]
> 
> On Thu, Jan 6, 2011 at 5:33 AM, Edward Ned Harvey
>  wrote:
> > But the conclusion remains the same:  Redundancy is not needed at the
> > client, because any data corruption the client could possibly see from
the
> > server would be transient and self-correcting.
> 
> Weren't you just chastising someone else for not using redundancy over
> iSCSI?

I wouldn't say chastising...  But yes.  But that was different.  The
difference is whether or not the iscsi target is using ZFS.  If the iscsi
target is a typical SAN made of typical hardware raid, then there is no
checksumming happening at the per-disk per-block level, and the raid
redundancy only protects against hardware-detected complete disk failure.
Any data corruption undetected by hardware is uncorrectable by software in
that case.

The situation is much better when your iscsi target is in fact a ZFS server.
Because if there's a checksum error on a disk, it's detected and correctable
by ZFS.  So the iscsi initiator will not see any corrupt data.

The point that I keep emphasizing is:  Let ZFS manage your raid.  No
hardware raid.  

As mentioned, sure there's always the possibility of an error being
introduced in the network between initiator & target, but ultimately the
nonvolatile storage is disk, which has good data.  So the possibility of
transient network errors, at least for me, is much less risky than the
possibility of undetected error on disk.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 06:07:47PM -0500, David Magda wrote:
> On Jan 6, 2011, at 15:57, Nicolas Williams wrote:
> 
> > Fletcher is faster than SHA-256, so I think that must be what you're
> > asking about: "can Fletcher+Verification be faster than
> > Sha256+NoVerification?"  Or do you have some other goal?
> 
> Would running on recent T-series servers, which have have on-die
> crypto units, help any in this regard?

Yes, particularly for larger blocks.

Hash collisions don't matter as long as ZFS verifies dups, so the real
question is: what is the false positive dup rate (i.e., the accidental
collision rate).  But that's going to vary a lot by {hash function,
working data set}, thus it's not possible to make exact determinations,
just estimates.

For me the biggest issue is that as good as Fletcher is for a CRC, I'd
rather have a cryptographic hash function because I've seen incredibly
odd CRC failures before.  There's a famous case from within SWAN a few
years ago where a switch flipped pairs of bits such that all too often
the various CRCs that applied to the moving packets failed to detect the
bit flips, and we discovered this when an SCCS file in a clone of the ON
gate got corrupted.  Such failures (collisions) wouldn't affect dedup,
but they would mask corruption of non-deduped blocks.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Jan 6, 2011, at 15:57, Nicolas Williams wrote:

> Fletcher is faster than SHA-256, so I think that must be what you're
> asking about: "can Fletcher+Verification be faster than
> Sha256+NoVerification?"  Or do you have some other goal?

Would running on recent T-series servers, which have have on-die crypto units, 
help any in this regard?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Nicolas Williams
On Thu, Jan 06, 2011 at 11:44:31AM -0800, Peter Taps wrote:
> I have been told that the checksum value returned by Sha256 is almost
> guaranteed to be unique.

All hash functions are guaranteed to have collisions [for inputs larger
than their output anyways].

>  In fact, if Sha256 fails in some case, we
> have a bigger problem such as memory corruption, etc. Essentially,
> adding verification to sha256 is an overkill.

What makes a hash function cryptographically secure is not impossibility
of collisions, but computational difficulty of finding arbitrary
colliding input pairs, collisions for known inputs, second pre-images,
and first pre-images.  Just because you can't easily find collisions on
purpose doesn't mean that you can't accidentally find collisions.

That said, if the distribution of SHA-256 is even enough then your
chances of finding a collision by accident are so remote (one in 2^128)
that you could reasonably decide that you don't care.

> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> (Fletcher+Verification) would work 100% of the time.

Fletcher is faster than SHA-256, so I think that must be what you're
asking about: "can Fletcher+Verification be faster than
Sha256+NoVerification?"  Or do you have some other goal?

Assuming I guessed correctly...  The speed of the hash function isn't
significant compared to the cost of the verification I/O, period, end of
story.  So, SHA-256 w/o verification will be faster than Fletcher +
Verification -- lots faster if you have particularly deduplicatious data
to write.  Moreorever, SHA-256 + verification will likely be somewhat
faster than Fletcher + verification because SHA-256 will likely have
fewer collisions than Fletcher, and the cost of I/O dominates the cost
of the hash functions.

> Which one of the two is a better deduplication strategy?
> 
> If we do not use verification with Sha256, what is the worst case
> scenario? Is it just more disk space occupied (because of failure to
> detect duplicate blocks) or there is a chance of actual data
> corruption (because two blocks were assumed to be duplicate although
> they are not)?

If you don't verify then you run the risk of corruption on collision,
NOT the risk of using too much disk space.

> Or, if I go with (Sha256+Verification), how much is the overhead of
> verification on the overall process?
> 
> If I do go with verification, it seems (Fletcher+Verification) is more
> efficient than (Sha256+Verification). And both are 100% accurate in
> detecting duplicate blocks.

You're confused.  Fletcher may be faster to compute than SHA-256, but
the run-time of both is as nothing compared to latency of the disk I/O
needed for verification, which means that the hash function's rate of
collisions is more important than its computational cost.

(Now, Fletcher is thought to not be a cryptographically secure hash
function, while SHA-256 is, for now, considered cryptographically
secure.  That probably means that the distribution of Fletcher's outputs
over random inputs is not as even as that of SHA-256, which probably
means you can expect more collisions with Fletcher than with SHA-256.
Note that I made no absolute statements in the previous sentence --
that's because I've not read any studies of Fletcher's performance
relative to SHA-256, thus I'm not certain of anything stated in the
previous sentence.)

David Magda's advice is spot on.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Richard Elling
On Jan 6, 2011, at 11:44 AM, Peter Taps wrote:

> Folks,
> 
> I have been told that the checksum value returned by Sha256 is almost 
> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a 
> bigger problem such as memory corruption, etc. Essentially, adding 
> verification to sha256 is an overkill.

I disagree. I do not believe you can uniquely identify all possible 
permutations of 1 million
bits using only 256 bits.

> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But 
> (Fletcher+Verification) would work 100% of the time.
> 
> Which one of the two is a better deduplication strategy?

If you love your data, always use verify=on

> If we do not use verification with Sha256, what is the worst case scenario? 
> Is it just more disk space occupied (because of failure to detect duplicate 
> blocks) or there is a chance of actual data corruption (because two blocks 
> were assumed to be duplicate although they are not)?

If you do not use verify=on, you risk repeatable data corruption.  

In some postings you will find claims of the "odds being 1 in 2^256 +/-"  for a 
collision.  This is correct.  However, they will then compare this to the odds 
of
a disk read error.  There is an important difference however -- the disk error
is likely to be noticed, but a collision is completely silent without the 
verify 
option.  This is why it is a repeatable problem, different than hardware 
failures
which are not repeatable.  Accepting repeatable and silent data corruption is a 
very bad tradeoff, IMNSHO.

> Or, if I go with (Sha256+Verification), how much is the overhead of 
> verification on the overall process?

In my experience, I see little chance that a verification will be used. As 
above,
you might run into a collision, but it will be rare.

> If I do go with verification, it seems (Fletcher+Verification) is more 
> efficient than (Sha256+Verification). And both are 100% accurate in detecting 
> duplicate blocks.

Yes.  Fletcher with verification will be more performant than sha-256.
However, that option is not available in the Solaris releases.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Robert Milkowski

 On 01/ 6/11 07:44 PM, Peter Taps wrote:

Folks,

I have been told that the checksum value returned by Sha256 is almost 
guaranteed to be unique. In fact, if Sha256 fails in some case, we have a 
bigger problem such as memory corruption, etc. Essentially, adding verification 
to sha256 is an overkill.

Perhaps (Sha256+NoVerification) would work 99.99% of the time. But 
(Fletcher+Verification) would work 100% of the time.

Which one of the two is a better deduplication strategy?

If we do not use verification with Sha256, what is the worst case scenario? Is 
it just more disk space occupied (because of failure to detect duplicate 
blocks) or there is a chance of actual data corruption (because two blocks were 
assumed to be duplicate although they are not)?


Yes, there is a possibility of data corruption.


Or, if I go with (Sha256+Verification), how much is the overhead of 
verification on the overall process?


It really depends on your specific workload.
If your application is mostly reading data then it well might be you 
won't even notice verify.


Sha256 is supposed to be almost bullet proof but...
At the end of a day it is all about how much you value your data.
But as I wrote before, try with verify and see if performance is 
acceptable. It well might be the case.

You can always disable verify at any time.


If I do go with verification, it seems (Fletcher+Verification) is more 
efficient than (Sha256+Verification). And both are 100% accurate in detecting 
duplicate blocks.

I don't believe that fletcher is still allowed for dedup - right now it 
is only sha256.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread David Magda
On Thu, January 6, 2011 14:44, Peter Taps wrote:
> I have been told that the checksum value returned by Sha256 is almost
> guaranteed to be unique. In fact, if Sha256 fails in some case, we have a
> bigger problem such as memory corruption, etc. Essentially, adding
> verification to sha256 is an overkill.
>
> Perhaps (Sha256+NoVerification) would work 99.99% of the time. But
> (Fletcher+Verification) would work 100% of the time.
>
> Which one of the two is a better deduplication strategy?

The ZFS default is what you should be using unless you can explain
(technically, and preferably mathematically) why you should use something
else.

I'm guessing you're using "99.99%" as a 'literary gesture', and
haven't done the math. The above means that you have a 0.001% or 10^-7
chance of having a collision.

The reality is that the odds are actually 10^-77 (~ 2^-256; see [1] though):

http://blogs.sun.com/bonwick/entry/zfs_dedup

As a form of comparison, the odds of having an non-recoverable bit error
from a hard disk is about 10^15 for SAS disks and 10^-14 for SATA disks.
So you're about sixty times more likely to have a disk read error than get
a collision from SHA-256.

If you're not worried about disk read errors (and/or are not experiencing
them), then you shouldn't be worried about has collisions.

TL;DR: do a "dedupe=on" and forget about it.

Some more discussion as it relates to some backup dedupe appliances (the
principles are the same):

http://tinyurl.com/36369pb
http://www.backupcentral.com/mr-backup-blog-mainmenu-47/13-mr-backup-blog/145-de-dupe-hash-collisions.html


[1] It may actually be 10^-38 (2^-128) or so because of the birthday
paradox, but we're still talking unlikely. You have a better chance of
dying from lightning or being attacked by a mountain lion:

http://www.blog.joelx.com/odds-chances-of-dying/877/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Single VDEV pool permanent and checksum errors after replace

2011-01-06 Thread Chris Murray
On 6 January 2011 20:02, Chris Murray  wrote:
> On 5 January 2011 13:26, Edward Ned Harvey
>  wrote:
>> One comment about etiquette though:
>>
>
>
> I'll certainly bear your comments in mind in future, however I'm not
> sure what happened to the subject, as I used the interface at
> http://opensolaris.org/jive/. I thought that would keep the subject
> the same. Plus, my gmail account appears to have joined up my reply
> from the web interface with the original thread too? Anyhow, I do see
> your point about quoting, and will do from now.
>
> For anyone wondering about the extent of checksum problems in my VMDK
> files, they range from only 128KB worth in some, to 640KB in others.
> Unfortunately it appears that the bad parts are in critical parts of
> the filesystem, but it's not a ZFS matter so I'll see what can be done
> by way of repair with Windows/NTFS inside each affected VM. So
> whatever went wrong, it was only a small amount of data.
>
> Thanks again,
> Chris
>

I'll get the hang of this e-mail lark on of these days, I'm sure  :-)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Single VDEV pool permanent and checksum errors after replace

2011-01-06 Thread Chris Murray
On 5 January 2011 13:26, Edward Ned Harvey
 wrote:
> One comment about etiquette though:
>


I'll certainly bear your comments in mind in future, however I'm not
sure what happened to the subject, as I used the interface at
http://opensolaris.org/jive/. I thought that would keep the subject
the same. Plus, my gmail account appears to have joined up my reply
from the web interface with the original thread too? Anyhow, I do see
your point about quoting, and will do from now.

For anyone wondering about the extent of checksum problems in my VMDK
files, they range from only 128KB worth in some, to 640KB in others.
Unfortunately it appears that the bad parts are in critical parts of
the filesystem, but it's not a ZFS matter so I'll see what can be done
by way of repair with Windows/NTFS inside each affected VM. So
whatever went wrong, it was only a small amount of data.

Thanks again,
Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] (Fletcher+Verification) versus (Sha256+No Verification)

2011-01-06 Thread Peter Taps
Folks,

I have been told that the checksum value returned by Sha256 is almost 
guaranteed to be unique. In fact, if Sha256 fails in some case, we have a 
bigger problem such as memory corruption, etc. Essentially, adding verification 
to sha256 is an overkill.

Perhaps (Sha256+NoVerification) would work 99.99% of the time. But 
(Fletcher+Verification) would work 100% of the time.

Which one of the two is a better deduplication strategy?

If we do not use verification with Sha256, what is the worst case scenario? Is 
it just more disk space occupied (because of failure to detect duplicate 
blocks) or there is a chance of actual data corruption (because two blocks were 
assumed to be duplicate although they are not)?

Or, if I go with (Sha256+Verification), how much is the overhead of 
verification on the overall process?

If I do go with verification, it seems (Fletcher+Verification) is more 
efficient than (Sha256+Verification). And both are 100% accurate in detecting 
duplicate blocks.

Thank you in advance for your help.

Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] BOOT, ZIL, L2ARC one one SSD?

2011-01-06 Thread David Dyer-Bennet

On Thu, December 23, 2010 22:45, Edward Ned Harvey wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Bill Werner
>>
>> on a single 60GB SSD drive, use FDISK to create 3 physical partitions, a
> 20GB
>> for boot, a 30GB for L2ARC and a 10GB for ZIL?   Or is 3 physical
>> Solaris
>> partitions on a disk not considered the entire disk as far as ZFS is
>> concerned?
>
> You can do that.  Other people have before.  But IMHO, it demonstrates a
> faulty way of thinking.
>
> "SSD's are big and cheap now, so I can buy one of these high performance
> things, and slice it up!"  In all honesty, GB availability is not your
> limiting factor.  Speed is your limiting factor.  That's the whole point
> of
> buying the thing in the first place.  If you have 3 SSD's, they're each
> able
> to talk 3Gbit/sec at the same time.  But if you buy one SSD which is 3x
> larger, you save money but you get 1/3 the speed.

Boot, at least, largely doesn't overlap with any significant traffic to
ZIL, for example.

And where I come from, even at work, money doesn't grow on trees.  Sure,
three separate SSDs will clearly perform better.  They will also cost 3x
as much.  (Or more, if you don't have three free bays and controller
ports.)

The question we often have to address is, "what's the biggest performance
increase we can get for $500".  I considered multiple rotating disks vs.
one SSD for that reason, for example.

Yeah, anybody quibbling about $500 isn't building top-performance
enterprise-grade storage.  We do know this.  It's still where a whole lot
of us live -- especially those running a home NAS.

> That's not to say there's never a situation where it makes sense.  Other
> people have done it, and maybe it makes sense for you.  But probably not.

Yeah, okay, maybe we're not completely disagreeing.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on top of ZFS iSCSI share

2011-01-06 Thread Brandon High
On Thu, Jan 6, 2011 at 5:33 AM, Edward Ned Harvey
 wrote:
> But the conclusion remains the same:  Redundancy is not needed at the
> client, because any data corruption the client could possibly see from the
> server would be transient and self-correcting.

Weren't you just chastising someone else for not using redundancy over iSCSI?

The rules don't really change for zfs-backed iSCSI disks vs. SAN
iSCSI. If you don't let the client (initiator) manage redundancy, then
there is no way for it to recover if there is a network, memory, or
other error.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Richard Elling
On Jan 5, 2011, at 4:14 PM, Edward Ned Harvey wrote:

>> From: Richard Elling [mailto:richard.ell...@nexenta.com]
>> 
>>> I'll agree to call Nexenta "a major commerical interest," in regards to
>> contribution to the open source ZFS tree, if they become an officially
>> supported OS on Dell, HP, and/or IBM hardware.
>> 
>> NexentaStor is officially supported on Dell, HP, and IBM hardware.  The
> only
>> question is, "what is your definition of 'support'"?  Many NexentaStor
> 
> I don't want to argue about this, but I'll just try to clarify what I meant:
> 
> Presently, I have a dell server with officially supported solaris, and it's
> as unreliable as pure junk.  It's just the backup server, so I'm free to
> frequently create & destroy it... And as such, I frequently do recreate and
> destroy it.  It is entirely stable running RHEL (centos) because Dell and
> RedHat have a partnership with a serious number of human beings and machines
> looking for and fixing any compatibility issues.  For my solaris
> instability, I blame the fact that solaris developers don't do significant
> quality assurance on non-sun hardware.  To become "officially" compatible,
> the whole qualification process is like this:  Somebody installs it, doesn't
> see any problems, and then calls it "certified."  They reformat with
> something else, and move on.  They don't build their business on that
> platform, so they don't detect stability issues like the ones reported...
> System crashes once per week and so forth.  Solaris therefore passes the
> test, and becomes one of the options available on the drop-down menu for
> OSes with a new server.  (Of course that's been discontinued by oracle, but
> that's how it was in the past.)

If I understand correctly, you want Dell, HP, and IBM to run OSes other
than Microsoft and RHEL.  For the thousands of other OSes out there,
this is a significant barrier to entry. One can argue that the most significant
innovations in the past 5 years came from none of those companies -- they
came from Google, Apple, Amazon, Facebook, and the other innovators
who did not spend their efforts trying to beat Microsoft and get into the 
manufacturing floor of the big vendors.

> Developers need to "eat their own food."  

I agree, but neither Dell, HP, nor IBM develop Windows...

> Smoke your own crack.  Hardware
> engineers at Dell need to actually use your OS on their hardware, for their
> development efforts.  I would be willing to bet Sun hardware engineers use a
> significant percentage of solaris servers for their work...  And guess what
> solaris engineers don't use?  Non-sun hardware.  

I'm not sure of the current state, but many of the Solaris engineers develop
on laptops and Sun did not offer a laptop product line.

> Pretty safe bet you won't
> find any Dell servers in the server room where solaris developers do their
> thing.

You will find them where Nexenta developers live :-)

> If you want to be taken seriously as an alternative storage option, you've
> got to at LEAST be listed as a factory-distributed OS that is an option to
> ship with the new server, and THEN, when people such as myself buy those
> things, we've got to have a good enough experience that we don't all bitch
> and flame about it afterward.

Wait a minute... this is patently false.  The big storage vendors: NetApp,
EMC, Hitachi, Fujitsu, LSI... none run on HP, IBM, or Dell servers.

> Nexenta, you need a real and serious partnership with Dell, HP, IBM.  Get
> their developers to run YOUR OS on the servers which they use for
> development.  Get them to sell your product bundled with their product.  And
> dedicate real and serious engineering into bugfixes working with customers,
> to truly identify root causes of instability, with a real OS development and
> engineering and support group.  It's got to be STABLE, that's the #1
> requirement.

There are many marketing activities are in progress towards this end.
One of Nexenta's major OEMs (Compellent) is being purchased by Dell. 
The deal is not done, so there is no public information on future plans,
to my knowledge.

> I previously made the comparison...  Even close-source solaris & ZFS is a
> better alternative to close-source netapp & wafl.  So for now, those are the
> only two enterprise supportable options I'm willing to stake my career on,
> and I'll buy Sun hardware with Solaris.  But I really wish I could feel
> confident buying a cheaper Dell server and running ZFS on it.  Nexenta, if
> you make yourself look like a serious competitor against solaris, and really
> truly form an awesome stable partnership with Dell, I will happily buy your
> stuff instead of Oracle.  Even if you are a little behind in feature
> offering.  But I will not buy your stuff if I can't feel perfectly confident
> in its stability.

I can assure you that we take stability very seriously.  And since you seem
to think the big box vendors are infallible, a sampling of those things we
(Nexenta) have t

Re: [zfs-discuss] A few questions

2011-01-06 Thread Richard Elling
On Jan 5, 2011, at 7:44 AM, Edward Ned Harvey wrote:

>> From: Khushil Dep [mailto:khushil@gmail.com]
>> 
>> We do have a major commercial interest - Nexenta. It's been quiet but I do
>> look forward to seeing something come out of that stable this year? :-)
> 
> I'll agree to call Nexenta "a major commerical interest," in regards to 
> contribution to the open source ZFS tree, if they become an officially 
> supported OS on Dell, HP, and/or IBM hardware.  

NexentaStor is officially supported on Dell, HP, and IBM hardware.  The only
question is, "what is your definition of 'support'"?  Many NexentaStor customers
today appear to be deploying on SuperMicro and Quanta systems, for obvious
cost reasons. Nexenta has good working relationships with these major vendors
and others.

As for investment, Nexenta has been and continues to hire the best engineers
and professional services people we can find. We see a lot of demand in the 
market and have been growing at an astonishing rate. If you'd like to contribute
to making software storage solutions rather than whining about what Oracle won't
discuss, check us out and send me your resume :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hard Errors on HDDs

2011-01-06 Thread Benji
For anyone that is interested, here's a progress report.

I created a new pool with only one mirror vdev of 2 disks, namely with the new 
SAMSUNG HD204UI. These drives, along with the older HD203WI, use Advanced 
Format Technology (e.g. 4K sectors). Only these drives had hard errors in my 
pool, as opposed the the old Seagates and WDs. 

To create the new pool, I recompiled the zpool cmd to give the value of ashift 
12 so that the new pool has an alignement of 4K instead of 512 bytes (see here 
: 
http://www.solarismen.de/archives/5-Solaris-and-the-new-4K-Sector-Disks-e.g.-WDxxEARS-Part-2.html).

So I filled this new 4K aligned pool with 1.5TB of data, scrubbed it and no 
errors. I checked the log and no hard errors either. Usually after a scrub I 
get some hard errors.

Maybe the pool needs to have more vdevs in it to really stress the HBA and 
produce hard errors, but it's a strange coincidence nonetheless that only the 
4K drives had errors and then when used in a 4K aligned pool, no more errors.

I'll probably re-create my original pool with only 4K drives in a 4K aligned 
pool and see what happens.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Khushil Dep
Two fold really - firstly I remember the headaches I used to have
configuring Broadcom cards properly under Debain/Ubuntu but the sweetness
that was using an Intel NIC. Bottom line for me was that I know Intel
drivers have been around longer than Broadcom drivers and thus it would make
sense to ensure that we hand intel NIC's on the server. Secondly, I asked
Andy Bennett from Nexenta who told me it would make sense - always good to
get a second opinion :-)

There were/are reports all over Google about Broadcom issues with
Solaris/OpenSolaris so I didn't want to risk it. For a couple of hundred for
a quad port gig NIC - it's worth it when the entire solution is 90K+.

Sometimes (like the issue with bus-resets when some brands/firmware-rev's of
SSD's are used) the knowledge comes from people you work with (Nexenta rode
to the rescue here again - plug! plug! plug!) :-)

These are deployed in a couple of University and a very large data
capture/marketing company I used to work for and I know it works really well
and (plug! plug! plug) I know the dedicated support I got from the Nexenta
guys.

The difference as I see it is that OpenSolaris/ZFS/Dtrace/FMA allow you to
build your own solution to your own problem. Thinking of storage in a
completely new way instead of "just a block of storage" it becomes an
integrated part of performance engineering - certainly has been for the last
two installs I've been involved in.

I know why folks want a "Certified" solution with the likes of Dell/HP etc
but from my point of view (and all points of view are valid here), I know I
can deliver a cheaper, more focussed (and when I say that I'm not just doing
some marketing bs) solution for the requirement at hand. It's sometimes a
struggle to get customers/end-users to think of storage as more than just
storage. There's quite a lot of entrenched thinking to get around/over in
our field (try getting a Java dev to think clearly about thread handling and
massive SMP drawbacks for example).

Anyway - not trying to engage in an argument but it's always interesting to
find out why someone went for certain solutions over others.

My 2p. YMMV.

*goes off to collect cheque from Nexenta* ;-)

---
W. A. Khushil Dep - khushil@gmail.com -  07905374843
Windows - Linux - Solaris - ZFS - Nexenta - Development - Consulting &
Contracting
http://www.khushil.com/ - http://www.facebook.com/GlobalOverlord





On 6 January 2011 13:28, Edward Ned Harvey <
opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

> > From: Khushil Dep [mailto:khushil@gmail.com]
> >
> > I've deployed large SAN's on both SuperMicro 825/826/846 and Dell
> > R610/R710's and I've not found any issues so far. I always make a point
> of
> > installing Intel chipset NIC's on the DELL's and disabling the Broadcom
> ones
> > but other than that it's always been plain sailing - hardware-wise
> anyway.
>
> "not found any issues," "except the broadcom one which causes the system to
> crash regularly in the default factory configuration."
>
> How did you learn about the broadcom issue for the first time?  I had to
> learn the hard way, and with all the involvement of both Dell and Oracle
> support teams, nobody could tell me what I needed to change.  We literally
> replaced every component of the server twice over a period of 1 year, and I
> spent mandays upgrading and downgrading firmwares randomly trying to find a
> stable configuration.  I scoured the internet to find this little tidbit
> about replacing the broadcom NIC, and randomly guessed, and replaced my nic
> with an intel card to make the problem go away.
>
> The same system doesn't have a problem running RHEL/centos.
>
> What will be the new problem in the next line of servers?  Why, during my
> internet scouring, did I find a lot of other reports, of people who needed
> to disable c-states (didn't work for me) and lots of false leads indicating
> firmware downgrade would fix my broadcom issue?
>
> See my point?  Next time I buy a server, I do not have confidence to simply
> expect solaris on dell to work reliably.  The same goes for solaris
> derivatives, and all non-sun hardware.  There simply is not an adequate
> qualification and/or support process.
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on top of ZFS iSCSI share

2011-01-06 Thread Edward Ned Harvey
> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]
> >
> > But that's precisely why it's an impossible situation.  In order for the
> > client to see a checksum error, it must have read some corrupt data from
> the
> > pool storage, but the server will never allow that to happen.  So the
short
> > answer is No.  You don't need to add the redundancy at the client,
unless
> > you want the client to continue working (without pause) in the event the
> > server is unavailable.
> 
> I don't agree with the above.  It is quite possible for the server or
> network to cause an error.  Computers are not error free.  Network

I agree with Bob.  When I said "impossible," of course that's unrealistic.
But the conclusion remains the same:  Redundancy is not needed at the
client, because any data corruption the client could possibly see from the
server would be transient and self-correcting.

Out of curiosity ...  Let's suppose ZFS reads some corrupt data from a
device (in this case an iscsi target).  Does ZFS immediately mark it as a
checksum error without retrying?  Or does ZFS attempt to re-read the data
first?  As long as a re-read is attempted, the probability of the client
experiencing any checksum error at all would be very very low.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Edward Ned Harvey
> From: Khushil Dep [mailto:khushil@gmail.com]
> 
> I've deployed large SAN's on both SuperMicro 825/826/846 and Dell
> R610/R710's and I've not found any issues so far. I always make a point of
> installing Intel chipset NIC's on the DELL's and disabling the Broadcom ones
> but other than that it's always been plain sailing - hardware-wise anyway.

"not found any issues," "except the broadcom one which causes the system to 
crash regularly in the default factory configuration."

How did you learn about the broadcom issue for the first time?  I had to learn 
the hard way, and with all the involvement of both Dell and Oracle support 
teams, nobody could tell me what I needed to change.  We literally replaced 
every component of the server twice over a period of 1 year, and I spent 
mandays upgrading and downgrading firmwares randomly trying to find a stable 
configuration.  I scoured the internet to find this little tidbit about 
replacing the broadcom NIC, and randomly guessed, and replaced my nic with an 
intel card to make the problem go away.

The same system doesn't have a problem running RHEL/centos.

What will be the new problem in the next line of servers?  Why, during my 
internet scouring, did I find a lot of other reports, of people who needed to 
disable c-states (didn't work for me) and lots of false leads indicating 
firmware downgrade would fix my broadcom issue?

See my point?  Next time I buy a server, I do not have confidence to simply 
expect solaris on dell to work reliably.  The same goes for solaris 
derivatives, and all non-sun hardware.  There simply is not an adequate 
qualification and/or support process.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Edward Ned Harvey
> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us]
> 
> On Wed, 5 Jan 2011, Edward Ned Harvey wrote:
> > with regards to ZFS and all the other projects relevant to solaris.)
> >
> > I know in the case of SGE/OGE, it's officially closed source now.  As of
Dec
> > 31st, sunsource is being decomissioned, and the announcement of
officially
> > closing the SGE source and decomissioning the open source community
> went out
> > on Dec 24th.  So all of this leads me to believe, with very little
> > reservation, that the new developments beyond zpool 28 are closed
> source
> > moving forward.  There's very little breathing room remaining for hope
of
> > that being open sourced again.
> 
> I have no idea what you are talking about.  Best I can tell, SGE/OGE
> is a reference to Sun Grid Engine, which has nothing to do with zfs.
> The only annoucement and discussion I can find via Google is written
> by you.  It was pretty clear even a year ago that Sun Grid Engine was
> going away.

Agreed, SGE/OGE has nothing to do with ZFS, unless you believe there's an
oracle culture which might apply to both.

The only thing written by me, as I recall, included links to the original
official announcements.  Following those links now, I see the archives have
been decomissioned.  So there ya go.  Since it's still in my inbox, I just
saved a copy for you here...  It is long winded, and the main points are:
SGE (now called OGE) is officially closed-source, and sunsouce.net
decommissioned.  There is an open source fork, which will not share code
development with the closed-source product.
http://dl.dropbox.com/u/543241/SGE_officially_closed/GE%20users%20GE%20annou
nce%20Changes%20for%20a%20Bright%20Future%20at%20Oracle.txt


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread J.P. King


This is a silly argument, but...

Haven't seen any underdog proven solid enough for me to deploy in 
enterprise yet.


I haven't seen any "over"dog proven solid enough for me to be able to rely 
on either.  Certainly not Solaris.  Don't get me wrong, I like(d) Solaris.
But every so often you'd find a bug and they'd take an age to fix it (or 
to declare that they wouldn't fix it).  In one case we had 18 months 
between reporting a problem and Sun fixing it.  In another case it was 
around 3 months and because we happened to have the source code we even 
told them where the bug was and what a fix could be.


Solaris (and the other "over"dogs) are worth it when you want someone else 
to do the grunt work and someone else to point at and blame, but lets not 
romanticize how good it or any of the others are.  What made Solaris (10 
at least) worth deploying were its features (dtrace, zfs, SMF, etc).


Julian
--
Julian King
Computer Officer, University of Cambridge, Unix Support
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Edward Ned Harvey
> From: Richard Elling [mailto:richard.ell...@nexenta.com]
> 
> If I understand correctly, you want Dell, HP, and IBM to run OSes other
> 
> I agree, but neither Dell, HP, nor IBM develop Windows...
> 
> I'm not sure of the current state, but many of the Solaris engineers
develop
> on laptops and Sun did not offer a laptop product line.
> 
> You will find them where Nexenta developers live :-)
> 
> Wait a minute... this is patently false.  The big storage vendors: NetApp,
> EMC, Hitachi, Fujitsu, LSI... none run on HP, IBM, or Dell servers.

Like I said, not interested in arguing.  This is mostly just a bunch of
contradictions to what I said.

To each his own.  My conclusion is that I am not willing to stake my career
on the underdog alternative, when I know I can safely buy the sun hardware
and solaris.  I experimented once by buying solaris on dell.  It was a
proven failure, but that's why I did it on a cheap noncritical backup system
experimentally before expecting it to work in production.  Haven't seen any
underdog proven solid enough for me to deploy in enterprise yet.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-06 Thread Khushil Dep
I've deployed large SAN's on both SuperMicro 825/826/846 and Dell
R610/R710's and I've not found any issues so far. I always make a point of
installing Intel chipset NIC's on the DELL's and disabling the Broadcom ones
but other than that it's always been plain sailing - hardware-wise anyway.

I've always found that the real issue is formulating SOP's to match what the
organisation is used to with legacy storage systems, educating the admins
who will manage it going forward and doing the technical hand-over to folks
who may not know or want to know a whole lot of *nix land.

My 2p. YMMV.

---
W. A. Khushil Dep - khushil@gmail.com -  07905374843
Windows - Linux - Solaris - ZFS - Nexenta - Development - Consulting &
Contracting
http://www.khushil.com/ - http://www.facebook.com/GlobalOverlord





On 6 January 2011 00:14, Edward Ned Harvey <
opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:

> > From: Richard Elling [mailto:richard.ell...@nexenta.com]
> >
> > > I'll agree to call Nexenta "a major commerical interest," in regards to
> > contribution to the open source ZFS tree, if they become an officially
> > supported OS on Dell, HP, and/or IBM hardware.
> >
> > NexentaStor is officially supported on Dell, HP, and IBM hardware.  The
> only
> > question is, "what is your definition of 'support'"?  Many NexentaStor
>
> I don't want to argue about this, but I'll just try to clarify what I
> meant:
>
> Presently, I have a dell server with officially supported solaris, and it's
> as unreliable as pure junk.  It's just the backup server, so I'm free to
> frequently create & destroy it... And as such, I frequently do recreate and
> destroy it.  It is entirely stable running RHEL (centos) because Dell and
> RedHat have a partnership with a serious number of human beings and
> machines
> looking for and fixing any compatibility issues.  For my solaris
> instability, I blame the fact that solaris developers don't do significant
> quality assurance on non-sun hardware.  To become "officially" compatible,
> the whole qualification process is like this:  Somebody installs it,
> doesn't
> see any problems, and then calls it "certified."  They reformat with
> something else, and move on.  They don't build their business on that
> platform, so they don't detect stability issues like the ones reported...
> System crashes once per week and so forth.  Solaris therefore passes the
> test, and becomes one of the options available on the drop-down menu for
> OSes with a new server.  (Of course that's been discontinued by oracle, but
> that's how it was in the past.)
>
> Developers need to "eat their own food."  Smoke your own crack.  Hardware
> engineers at Dell need to actually use your OS on their hardware, for their
> development efforts.  I would be willing to bet Sun hardware engineers use
> a
> significant percentage of solaris servers for their work...  And guess what
> solaris engineers don't use?  Non-sun hardware.  Pretty safe bet you won't
> find any Dell servers in the server room where solaris developers do their
> thing.
>
> If you want to be taken seriously as an alternative storage option, you've
> got to at LEAST be listed as a factory-distributed OS that is an option to
> ship with the new server, and THEN, when people such as myself buy those
> things, we've got to have a good enough experience that we don't all bitch
> and flame about it afterward.
>
> Nexenta, you need a real and serious partnership with Dell, HP, IBM.  Get
> their developers to run YOUR OS on the servers which they use for
> development.  Get them to sell your product bundled with their product.
>  And
> dedicate real and serious engineering into bugfixes working with customers,
> to truly identify root causes of instability, with a real OS development
> and
> engineering and support group.  It's got to be STABLE, that's the #1
> requirement.
>
> I previously made the comparison...  Even close-source solaris & ZFS is a
> better alternative to close-source netapp & wafl.  So for now, those are
> the
> only two enterprise supportable options I'm willing to stake my career on,
> and I'll buy Sun hardware with Solaris.  But I really wish I could feel
> confident buying a cheaper Dell server and running ZFS on it.  Nexenta, if
> you make yourself look like a serious competitor against solaris, and
> really
> truly form an awesome stable partnership with Dell, I will happily buy your
> stuff instead of Oracle.  Even if you are a little behind in feature
> offering.  But I will not buy your stuff if I can't feel perfectly
> confident
> in its stability.
>
> Ever heard the phrase "Nobody ever got fired for buying IBM."  You're the
> little guys.  If you want to compete against the big guys, you've got to
> kick ass.  And don't get sued into oblivion.
>
> Even today's feature set is perfectly adequate for at least a couple of
> years to come.  If you put all your effort into stability and bugfixes,
> serious partnerships with Dell, HP, IB

Re: [zfs-discuss] A few questions

2011-01-06 Thread Darren J Moffat

On 06/01/2011 00:14, Edward Ned Harvey wrote:

solaris engineers don't use?  Non-sun hardware.  Pretty safe bet you won't
find any Dell servers in the server room where solaris developers do their
thing.


You would lose that bet, not only would you find Dell you would many 
other "big names" as well as white box hand build systems too.


Solaris developers use a lot of different hardware - Sun never made 
laptops so many of us have Apple (running Solaris on the metal and/or 
under virtualisation) or Toshiba or Fujitsu etc laptops.  There are also 
many workstations around the company that aren't Sun hardware as well as 
servers.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss