On Wed, Sep 2, 2009 at 3:02 PM, Frank Middleton <f.middle...@apogeect.com>wrote:

> On 09/02/09 02:17 PM, Jeff Victor wrote:
>
>  Just to expand on that: there are now three levels of testing (and
>> therefore stability) in [Open]Solaris:
>> * Nevada builds - I don't know the details, but it's what BobF referred
>> to with "simple sanity checks" and, I think, what he meant by
>> "OpenSolaris users become part of a new testing process."
>>
>> * OpenSolaris distro (e.g. 2009.06) - this goes through significant
>> testing, but not as much as Solaris 10 updates. OpenSolaris users (that
>> is, users of the OpenSolaris distro) benefit from this testing.
>>
>> * Solaris 10 goes through "Sun's legendary testing process" :-)
>>
>
> OK, I stand corrected. So the new snv121 checksum bug somehow made it
> through the "simple sanity checks". Based on this thread, I wonder if
> it is still doing so (my intuition is that the problem still doesn't
> show up on Sun hardware). No doubt there's someone out there itching
> to prove me wrong :-)
>
> Note that the "old" checksum bug evidently hasn't shown up much at
> all, although with the right (grotty) hardware it is quite reproducible
> even though iostat -Ene shows no hard errors at all...
>
> In the context of bug id 6848079, the only time new files get added
> to the list of the invisible checksum errors is after reboot of
> an otherwise read only file system. The new files show up with
> a checksum failure that a scrub clears, but zcksummon shows that
> scrub still finds them with checksum failures and supposedly
> repairs them (until next time). What's the betting (lottery
> aside) that fixing 6848079 will also fix the problem I found?
> Note also that 6848079 was reported against snv115. Baffling.
>
> My question here is - if this bug isn't triggered by some kind
> of (soft) hardware glitch, how come it isn't affecting more systems?
> After all, you have to reboot to actually run snv121, and there
> must be quite a few folks who use ZFS who must have done so by now.
>
>
>
>
Define "more systems".  How many people do you think are on 121?  And of
those, how many are on the zfs mailing list?  And of those, how many have
done a scrub recently to see the checksum errors?  Do you have some proof to
validate your beliefs?


REGARDLESS, had you read all the posts to this thread, you'd know you've
already been proven wrong:

On Wed, Sep 2, 2009 at 11:15 AM, Brent Jones <br...@servuhome.net> wrote:
I see this issue on each of my X4540's, 64GB of ECC memory, 1TB drives.
Rolling back to snv_118 does not reveal any checksum errors, only snc_121

So, the commodity hardware here doesn't hold up, unless Sun isn't
validating their equipment (not likely, as these servers have had no
hardware issues prior to this build)


--
Brent Jones
br...@servuhome.net




--Tim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to