Re: [zfs-discuss] GUI support for ZFS root?

2008-08-12 Thread Rich Teer
On Tue, 12 Aug 2008, Lori Alt wrote:

> There are no plans to add zfs root support to the existing
> install GUI.  GUI install support for zfs root will be
> provided by the new Caiman installer.

Thanks for the info.  Follow-up question: is there an ETA for when
Caiman will be integrated into Nevada (I use SXCE in preference to 
that which was previously known as Project Indiana)?

Thanks again,

-- 
Rich Teer, SCSA, SCNA, SCSECA

CEO,
My Online Home Inventory

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread Anton B. Rang
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon 
code for 128K blocks of data would be very slow if implemented in software 
(and, for that matter, take a lot of hardware to implement). A multi-bit 
Hamming code would be simpler, but I suspect that undetected multi-bit errors 
are quite rare.

I've seen a fair number of single-bit errors coming from SATA drives because 
the data is often not parity-protected through the whole data path within the 
drive. Some enterprise-class SATA disks have data protected (with a 
parity-equivalent) through the write data path, and more of these models will 
have this feature soon. All SAS and FibreChannel drives (that I am aware of) 
have data protected with ECC through the whole path for both reads and writes.

Single-bit errors can also be introduced in non-ECC DRAM, of course. In this 
case, it can happen either before the checksum computation (=> undetected data 
corruption) or after it (=> checksum failure on a later read).
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Miles Nordin
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes:

cs> It appears that the metadata on that pool became corrupted
cs> when the processor failed.  The exact mechanism is a bit of a
cs> mystery,

[...]

cs> We were told that the probability of metadata corruption would
cs> have been reduced but not eliminated by having a mirrored LUN.
cs> We were also told that the issue will be fixed in U6.

how can one fix an issue which is a bit of a mystery?  Or do you mean
the lazy-panic issue is fixed, but the corruption issue is not?


pgpOQP9yzsfsR.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GUI support for ZFS root?

2008-08-12 Thread Lori Alt
Rich Teer wrote:
> Hi all,
>
> I recently installed b95 and ZFS root is great!  I used the
> CLI installer because I remember reading that the GUI installer
> doesn't yet support ZFS root.  So my question is, what's the
> ETA for support in the GUI installer for ZFS root?
>
> TIA,
>
>   
There are no plans to add zfs root support to the existing
install GUI.  GUI install support for zfs root will be
provided by the new Caiman installer.

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] GUI support for ZFS root?

2008-08-12 Thread Rich Teer
Hi all,

I recently installed b95 and ZFS root is great!  I used the
CLI installer because I remember reading that the GUI installer
doesn't yet support ZFS root.  So my question is, what's the
ETA for support in the GUI installer for ZFS root?

TIA,

-- 
Rich Teer, SCSA, SCNA, SCSECA

CEO,
My Online Home Inventory

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper, ZFS and performance

2008-08-12 Thread John-Paul Drawneek
Config 1: as its got 4 vdev so it will stripe it across them vs the 1 vdev for 
Config 2  - for speed

reliability - Both Config probably the same, sods law states second disk to 
fail will be in the same vdev.

If you want the space Config 2 but with raidz2

If you want speed - 24 mirrors

See other post for more knowledgeable explanations
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
Richard Elling <[EMAIL PROTECTED]>
Cromar Scott wrote:
> Chris Siebenmann <[EMAIL PROTECTED]>
>
>  I'm not Anton Rang, but:
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> cks> The data recovery utility should not panic 
> cks> my entire system if it runs into some situation 
> cks> that it utterly cannot handle. Solaris 10 U5 
> cks> kernel ZFS code does not have this property; 
> cks> it is possible to wind up with ZFS pools that 
> cks> will panic your system when you try to touch them.
> ...
>
> I'll go you one worse.  Imagine a Sun Cluster with several resource
> groups and several zpools.  You blow a proc on one of the servers.  As
a
> result, the metadata on one of the pools becomes corrupted.
>   

re> This failure mode affects all shared-storage 
re> clusters.  I don't see how ZFS should or should 
re> not be any different than raw, UFS, et.al.

Absolutely true.  The file system definitely had a problem.

>
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html
>
> Now, each of the servers in your cluster attempts to import the
> zpool--and panics.
>
> As a result of a singe part failure on a single server, your entire
> cluster (and all the services on it) are sitting in a smoking heap on
> your machine room floor.
>   

re> Yes, but your data is corrupted.  

My data was only corrupted on ONE of the zpools.  In a cluster with
several zpools and several resource groups, we ended up with ALL of the
pools and ALL of the resource groups offline as one node after another
panicked.

re> If you were my bank, then I would greatly 
re> appreciate you getting the data corrected 
re> prior to bringing my account online.  

Fair enough, but do we have to take Fred's and Joe's accounts offline
too?  

re> If you study highly available clusters and services
re> then you will see many cases where human interaction 
re> is preferred to automation for just such cases. 

I see your point about requiring intervention to deal with a potentially
corrupt file system.

I would have preferred a behavior more like we get with VxVM and VxFS,
where the corrupted file system fails to mount without human
intervention, but the nodes don't panic on the failed vxdg import.  That
particular service group and that particular file system are offline,
but everything else keeps running because none of the other nodes
panics.

We handled the issue of not corrupting the file system further by
panicking the original node, but I don't understand why we need to panic
each other successive node in the cluster.  Why can't we just refuse to
import automatically?

> I'm just glad that our pool corruption experience happened during
> testing, and not after the system had gone into production.  Not
exactly
> a resume-enhancing experience.

re> I'm glad you found this in testing.  

I'm a believer.  Some people wanted us to just throw the box into
production, but I insisted on keeping our test schedule.  I'm glad I
did.

re> BTW, what was the root cause?

It appears that the metadata on that pool became corrupted when the
processor failed.  The exact mechanism is a bit of a mystery, since we
didn't get a valid crash dump.

The other pools were fine, once we imported them after a boot -x.

We ended up converting to VxVM and VxFS on that server because we could
not guarantee that the same thing wouldn't just happen again after we
went into production.  

If we had a tool that had allowed us to roll back to a previous snapshot
or something, it might have made a difference.  

We were told that the probability of metadata corruption would have been
reduced but not eliminated by having a mirrored LUN.  We were also told
that the issue will be fixed in U6.

--Scott
 
 
 
 
This message may contain information that is confidential or privileged. 
If you are not the intended recipient, please advise the sender immediately
and delete this message.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper, ZFS and performance

2008-08-12 Thread Bob Friesenhahn
On Tue, 12 Aug 2008, John Malick wrote:

> There is a thread quite similar to this but it did not provide a clear answer 
> to the question which was worded a bit odd..
>
> I have a Thumper and am trying to determine, for performance, which 
> is the best ZFS configuration of the two shown below. Any issues 
> other than performance that anyone may see to steer me in one 
> direction or another would be helpful as well. Thanks.

The joy of ZFS is that it takes just minutes to create various 
configurations and test them yourself.  The pool design is a pretty 
fundamental decision whereas filesystems are easily tuned.  The 
configuration with more vdevs should be both faster and more reliable. 
Zfs load shares across vdevs so with more vdevs, you can support more 
simultaneous users.  Your entire pool will be no stronger than the 
weakest vdev, so make sure that your vdevs are sufficiently reliable.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Richard Elling
Cromar Scott wrote:
> Chris Siebenmann <[EMAIL PROTECTED]>
>
>  I'm not Anton Rang, but:
> | How would you describe the difference between the data recovery
> | utility and ZFS's normal data recovery process?
>
> cks> The data recovery utility should not panic 
> cks> my entire system if it runs into some situation 
> cks> that it utterly cannot handle. Solaris 10 U5 
> cks> kernel ZFS code does not have this property; 
> cks> it is possible to wind up with ZFS pools that 
> cks> will panic your system when you try to touch them.
> ...
>
> I'll go you one worse.  Imagine a Sun Cluster with several resource
> groups and several zpools.  You blow a proc on one of the servers.  As a
> result, the metadata on one of the pools becomes corrupted.
>   

This failure mode affects all shared-storage clusters.  I don't see how
ZFS should or should not be any different than raw, UFS, et.al.

> http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html
>
> Now, each of the servers in your cluster attempts to import the
> zpool--and panics.
>
> As a result of a singe part failure on a single server, your entire
> cluster (and all the services on it) are sitting in a smoking heap on
> your machine room floor.
>   

Yes, but your data is corrupted.  If you were my bank, then I would
greatly appreciate you getting the data corrected prior to bringing my
account online.  If you study highly available clusters and services
then you will see many cases where human interaction is preferred to
automation for just such cases.  You will also find that a combination
of shared storage and non-shared storage cluster technology is used
for truly important data.  For example, we would use Solaris Cluster
for the local shared-storage framework and Solaris Cluster Geographic
Edition for a remote site (no shared hardware components with the
local cluster).

> | Nobody thinks that an answer of "sorry, we lost all of your data" is
> | acceptable.  However, there are failures which will result in loss of
> | data no matter how clever the file system is.
>
> cks> The problem is that there are currently ways to 
> cks> make ZFS lose all your data when there are no 
> cks> hardware faults or failures, merely people or
> cks> software mis-handling pools. This is especially 
> cks> frustrating when the only thing that is likely 
> cks> to be corrupted is ZFS metadata and the vast
> cks> majority (or all) of the data in the pool is intact, 
> cks> readable, and so on.
>
> I'm just glad that our pool corruption experience happened during
> testing, and not after the system had gone into production.  Not exactly
> a resume-enhancing experience.
>   

I'm glad you found this in testing.  BTW, what was the root cause?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

2008-08-12 Thread Miles Nordin
> "mp" == Mattias Pantzare <[EMAIL PROTECTED]> writes:

mp> Or the file was corrupted when you transfered it.

he stored the backup streams on ZFS, so obviously they couldn't
possibly be corrupt.   :p

Jonathan, does 'zfs receive -nv' also detect the checksum error, or is
it only detected when you actually receive onto a pool without -n?

in addition to skipping to the next header of corrupted tarballs, tar
can validate a tarball's checksums without extracting it, so it's
possible to write a tape, then read it to see if it's ok.  The 'tar t'
read test checks for medium errors, driver bugs, and bugs inside tar
itself.

so it sounds like: brrk, brrk, danger, do not use zfs send/receive for
backups---use only for moving filesystems from one pool to another.
This brings back the question ``how is it possible to back up and
restore a heavily-cloned/snapshotted system?''  because upon restore
the clone inheritance tree is lost, and you'll never have enough space
in the pool to fit what was there before.


pgpKRGlG0Pbzz.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper, ZFS and performance

2008-08-12 Thread Richard Elling
John Malick wrote:
> There is a thread quite similar to this but it did not provide a clear answer 
> to the question which was worded a bit odd..
>
> I have a Thumper and am trying to determine, for performance, which is the 
> best ZFS configuration of the two shown below. Any issues other than 
> performance that anyone may see to steer me in one direction or another would 
> be helpful as well. Thanks.
>   

Do config 1, please do not do config 2.
 From zpool(1):
 A raidz group with N disks of size X with P parity disks
 can  hold  approximately (N-P)*X bytes and can withstand
 one device failing before data integrity is compromised.
 The  minimum  number  of devices in a raidz group is one
 more than the number of parity  disks.  The  recommended
 number is between 3 and 9.

 -- richard

> ZFS Config 1:
>
> zpool status
>   pool: rpool
>  state: ONLINE
>  scrub: none requested
> config:
>
> NAMESTATE READ WRITE CKSUM
> rpool   ONLINE   0 0 0
>   raidzONLINE   0 0 0
> c0t1d0  ONLINE   0 0 0
> c1t1d0  ONLINE   0 0 0
> c4t1d0  ONLINE   0 0 0
> c5t1d0  ONLINE   0 0 0
> c6t1d0  ONLINE   0 0 0
> c7t1d0  ONLINE   0 0 0
>   raidzONLINE   0 0 0
> c0t2d0  ONLINE   0 0 0
> c1t2d0  ONLINE   0 0 0
> c4t2d0  ONLINE   0 0 0
> c5t2d0  ONLINE   0 0 0
> c6t2d0  ONLINE   0 0 0
> c7t2d0  ONLINE   0 0 0
>   raidzONLINE   0 0 0
> c0t3d0  ONLINE   0 0 0
> c1t3d0  ONLINE   0 0 0
> c4t3d0  ONLINE   0 0 0
> c5t3d0  ONLINE   0 0 0
> c6t3d0  ONLINE   0 0 0
> c7t3d0  ONLINE   0 0 0
>   raidzONLINE   0 0 0
> c0t4d0  ONLINE   0 0 0
> c1t4d0  ONLINE   0 0 0
> c4t4d0  ONLINE   0 0 0
> c5t4d0  ONLINE   0 0 0
> c6t4d0  ONLINE   0 0 0
> c7t4d0  ONLINE   0 0 0
>
>
> versus 
>
> ZFS Config 2:
>
> zpool status
>   pool: rpool
>  state: ONLINE
>  scrub: none requested
> config:
>
> NAMESTATE READ WRITE CKSUM
> rpool   ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> c0t1d0  ONLINE   0 0 0
> c1t1d0  ONLINE   0 0 0
> c4t1d0  ONLINE   0 0 0
> c5t1d0  ONLINE   0 0 0
> c6t1d0  ONLINE   0 0 0
> c7t1d0  ONLINE   0 0 0
> c0t2d0  ONLINE   0 0 0
> c1t2d0  ONLINE   0 0 0
> c4t2d0  ONLINE   0 0 0
> c5t2d0  ONLINE   0 0 0
> c6t2d0  ONLINE   0 0 0
> c7t2d0  ONLINE   0 0 0
> c0t3d0  ONLINE   0 0 0
> c1t3d0  ONLINE   0 0 0
> c4t3d0  ONLINE   0 0 0
> c5t3d0  ONLINE   0 0 0
> c6t3d0  ONLINE   0 0 0
> c7t3d0  ONLINE   0 0 0
> c0t4d0  ONLINE   0 0 0
> c1t4d0  ONLINE   0 0 0
> c4t4d0  ONLINE   0 0 0
> c5t4d0  ONLINE   0 0 0
> c6t4d0  ONLINE   0 0 0
> c7t4d0  ONLINE   0 0 0
>
> In a nutshell, for performance reasons, is it better to have multiple raidz 
> vdevs in the pool or just one raidz vdev. The number of disks used is the 
> same in either case.
>
> Thanks again.
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, SATA, LSI and stability

2008-08-12 Thread Miles Nordin
ff> I have check the drives with smartctl:

ff> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  
UPDATED  WHEN_FAILED RAW_VALUE
ff>   1 Raw_Read_Error_Rate 0x000f   115   075   006Pre-fail  
Always   -   94384069
ff>   5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  
Always   -   0
ff> 195 Hardware_ECC_Recovered  0x001a   065   056   000Old_age   
Always   -   173161329
ff> 199 UDMA_CRC_Error_Count0x003e   200   200   000Old_age   
Always   -   0

ff> But with no UDMA_CRC_Errors I believe the disks are fine.

no, UDMA_CRC_Errors counts checksum errors on PATA cables.  I cannot
confirm/deny if it counts CRC errors on SATA cables (and even if it
did this is complicated because there are weird scsi-emulation
proprietary drivers, port multipliers, u.s.w.)  so, if you are having
problems, and that parameter is increasing, then it's probably cabling
problems not drive problems.

The other three values I quoted are the ones that matter.  The VALUE
is scaled by constants defined by the manufacturer and used for the
``overall health assessment'', but the constants they use are always
way too forgiving, so it's worthless.  The RAW_VALUE looks bigger than
I'm used to, but this may also be meaningless.  The only way I know to
get information out of the report is:  How do the RAW_VALUE's of the
three parameters I quoted compare with other drives of the same model,
or to this drive before it started failing?

There is another section of the smartctl -a report that logs the last
5 or so errors the drive has reported to the host.  IIRC you will see
errors called 'ICRC' or 'UNC' on failing drives.

this experience is all PATA/SATA-specific.


pgpUhcxtASNbS.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Thumper, ZFS and performance

2008-08-12 Thread John Malick
There is a thread quite similar to this but it did not provide a clear answer 
to the question which was worded a bit odd..

I have a Thumper and am trying to determine, for performance, which is the best 
ZFS configuration of the two shown below. Any issues other than performance 
that anyone may see to steer me in one direction or another would be helpful as 
well. Thanks.

ZFS Config 1:

zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  raidzONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
  raidzONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
  raidzONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
  raidzONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0


versus 

ZFS Config 2:

zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t1d0  ONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0

In a nutshell, for performance reasons, is it better to have multiple raidz 
vdevs in the pool or just one raidz vdev. The number of disks used is the same 
in either case.

Thanks again.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread eric kustarz

On Aug 7, 2008, at 10:25 PM, Anton B. Rang wrote:

>> How would you describe the difference between the file system
>> checking utility and zpool scrub?  Is zpool scrub lacking in its
>> verification of the data?
>
> To answer the second question first, yes, zpool scrub is lacking, at  
> least to the best of my knowledge (I haven't looked at the ZFS  
> source in a few months). It does not verify that any internal data  
> structures are correct; rather, it simply verifies that data and  
> metadata blocks match their checksums.

Hey Anton,

What do you mean by "internal data structures"?  Are you referring to  
things like space maps, props, history obj, etc. (basically anything  
other than user data and the indirect blocks that point to user data)?

eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] URGENT: ZFS issue - can't import in degraded state

2008-08-12 Thread Robert Milkowski
Hello Robert,


Probably it was related to 6436000.
Kernel upgrade which does include fix for above has helped and now I
was able to import the pool without any issues.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]

2008-08-12 Thread [EMAIL PROTECTED]
Darren J Moffat wrote:
> [EMAIL PROTECTED] wrote:
>   
>>> As others have noted, the COW nature of ZFS means that there is a
>>> good chance that on a mostly-empty pool, previous data is still intact
>>> long after you might think it is gone. A utility to recover such data is
>>> (IMHO) more likely to be in the category of forensic analysis than
>>> a mount (import) process. There is more than enough information
>>> publically available for someone to build such a tool (hint, hint :-)
>>>  -- richard
>>>   
>>   Veritas,  the makers if vxfs, whom I consider ZFS to be trying to
>> compete against has higher level (normal) support engineers that have
>> access to tools that let them scan the disk for inodes and other filesystem
>> fragments and recover.  When you log a support call on a faulty filesystem
>> (in one such case I was involved in zeroed out 100mb of the first portion
>> of the volume killing off both top OLT's -- bad bad) they can actually help
>> you at a very low level dig data out of the filesystem or even recover from
>> pretty nasty issues.  They can scan for inodes (marked by a magic number),
>> have utilities to pull out files from those inodes (including indirect
>> blocks/extents).  Given the tools and help from their support I was able to
>> pull back 500 gb of files (99%) from a filesystem that emc killed during a
>> botched powerpath upgrade.  Can Sun's support engineers,  or is their
>> answer pull from tape?  (hint, hint ;-)
>> 
>
> Sounds like a good topic for here:
>
> http://opensolaris.org/os/project/forensics/
>   
I took a look at this project, specifically 
http://opensolaris.org/os/project/forensics/ZFS-Forensics/.
Is there any reason that the paper and slides I presented at the 
OpenSolaris Developers Conference
on zfs on-disk format not mentioned?  The paper is at: 
http://www.osdevcon.org/2008/files/osdevcon2008-proceedings.pdf
starting on page 36, and the slides are at: 
http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf

thanks,
max



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] integrated failure recovery thoughts (single-bit

2008-08-12 Thread paul
Although I don't know for sure that most such errors are in fact single bit in 
nature,
I can only surmise they most likely statistically are absent detection 
otherwise;
as with the exception of error corrected memory systems and/or check-summed
communication channels, each transition of data between hardware interfaces at 
ever
increasing clock clock rates, correspondingly increase the probability of such 
otherwise
non-detectable soft single bit error being injected at these boundaries, where 
although
the probabilities of their occurrence are small enough not to be easily 
detectable or
classifiable as a hardware failure, they none the less can occur with a high 
enough
probability that over the course of days/weeks/years and trillions of bits they 
will be
observable and should be expected and planed for within reason.

Utilizing a strong error correcting code in combination with or in lieu of a 
strong hash
code would seem like a good thing to more strongly warrant that data's 
representation in
memory at the time of it's computation is more resilient to transmission and 
subsequent
retrieval; but suspect through time as technology continues to push clock rates 
and
corresponding data pool size ever higher, that some form of uniform data 
integrity
mechanism will need to be incorporated within all the processing and 
communications
interface data paths within systems in order to improve data's resilience to 
transmission
and processing errors albeit being statistically very small for any single bit.

> Anton B. Rang wrote:
> > That brings up another interesting idea.
> >
> > ZFS currently uses a 128-bit checksum for blocks of
> up to 1048576 bits.
> >
> > If 20-odd bits of that were a Hamming code, you'd
> have something slightly stronger than SECDED, and ZFS
> could correct any single-bit errors encountered.
> >   
> 
> Yes.  But I'm not convinced that we will see single
> bit errors, since
> there is already a large number of single-bit-error
> detection and (often)
> correction capability in modern systems.  It seems
> that when we lose
> a block of data, we lose more than a single bit. 
> 
> It should be relatively easy to add code to the
> current protection schemes
> which will compare a bad block to a reconstructed,
> good block and
> deliver this information for us. I'll add an RFE.
>  -- richard
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]

2008-08-12 Thread Chris Siebenmann
| As others have noted, the COW nature of ZFS means that there is a good
| chance that on a mostly-empty pool, previous data is still intact long
| after you might think it is gone.

 In the cases I am thinking of I am sure that the data was there.
Kernel panics just didn't let me get at it. Fortunately it was only
testing data, but I am now concerned about it happening in production.

| A utility to recover such data is (IMHO) more likely to be in the
| category of forensic analysis than a mount (import) process. There is
| more than enough information publically available for someone to build
| such a tool (hint, hint :-)

 To put it crudely, if I wanted to write my own software for this sort
of thing I would run Linux.

- cks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]

2008-08-12 Thread Darren J Moffat
[EMAIL PROTECTED] wrote:
>> As others have noted, the COW nature of ZFS means that there is a
>> good chance that on a mostly-empty pool, previous data is still intact
>> long after you might think it is gone. A utility to recover such data is
>> (IMHO) more likely to be in the category of forensic analysis than
>> a mount (import) process. There is more than enough information
>> publically available for someone to build such a tool (hint, hint :-)
>>  -- richard
> 
>   Veritas,  the makers if vxfs, whom I consider ZFS to be trying to
> compete against has higher level (normal) support engineers that have
> access to tools that let them scan the disk for inodes and other filesystem
> fragments and recover.  When you log a support call on a faulty filesystem
> (in one such case I was involved in zeroed out 100mb of the first portion
> of the volume killing off both top OLT's -- bad bad) they can actually help
> you at a very low level dig data out of the filesystem or even recover from
> pretty nasty issues.  They can scan for inodes (marked by a magic number),
> have utilities to pull out files from those inodes (including indirect
> blocks/extents).  Given the tools and help from their support I was able to
> pull back 500 gb of files (99%) from a filesystem that emc killed during a
> botched powerpath upgrade.  Can Sun's support engineers,  or is their
> answer pull from tape?  (hint, hint ;-)

Sounds like a good topic for here:

http://opensolaris.org/os/project/forensics/


-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]

2008-08-12 Thread Wade . Stuart

>
> As others have noted, the COW nature of ZFS means that there is a
> good chance that on a mostly-empty pool, previous data is still intact
> long after you might think it is gone. A utility to recover such data is
> (IMHO) more likely to be in the category of forensic analysis than
> a mount (import) process. There is more than enough information
> publically available for someone to build such a tool (hint, hint :-)
>  -- richard

  Veritas,  the makers if vxfs, whom I consider ZFS to be trying to
compete against has higher level (normal) support engineers that have
access to tools that let them scan the disk for inodes and other filesystem
fragments and recover.  When you log a support call on a faulty filesystem
(in one such case I was involved in zeroed out 100mb of the first portion
of the volume killing off both top OLT's -- bad bad) they can actually help
you at a very low level dig data out of the filesystem or even recover from
pretty nasty issues.  They can scan for inodes (marked by a magic number),
have utilities to pull out files from those inodes (including indirect
blocks/extents).  Given the tools and help from their support I was able to
pull back 500 gb of files (99%) from a filesystem that emc killed during a
botched powerpath upgrade.  Can Sun's support engineers,  or is their
answer pull from tape?  (hint, hint ;-)

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Richard Elling
Anton B. Rang wrote:
> That brings up another interesting idea.
>
> ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits.
>
> If 20-odd bits of that were a Hamming code, you'd have something slightly 
> stronger than SECDED, and ZFS could correct any single-bit errors encountered.
>   

Yes.  But I'm not convinced that we will see single bit errors, since
there is already a large number of single-bit-error detection and (often)
correction capability in modern systems.  It seems that when we lose
a block of data, we lose more than a single bit. 

It should be relatively easy to add code to the current protection schemes
which will compare a bad block to a reconstructed, good block and
deliver this information for us. I'll add an RFE.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, SATA, LSI and stability

2008-08-12 Thread Thomas Maier-Komor
Frank Fischer wrote:
> After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 
> Marvell controllers and opensolaris snv79 (same as described here: 
> http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just 
> start over using new hardware and opensolaris 2008.05 upgraded to snv94. We 
> used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. 
> And guess what? Now we get these error-messages in /var/adm/messages:
> 
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd11):
> Aug 11 18:20:52 thumper2Error for Command: read(10)
> Error Level: Retryable
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
> 1423173120Error Block: 1423173120
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA   
>  Serial Number:  WD-WCAP
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
> Unit_Attention
> Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
> reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
> 
> Along whit these messages there are a lot of this messages:
> 
> Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
> Aug 11 18:20:51 thumper2Log info 0x31123000 received for target 5.
> Aug 11 18:20:51 thumper2scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0xc
> 
> 
> I would believe having a faulty disk, but not two:
> 
> Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
> Aug 11 17:47:47 thumper2Log info 0x31123000 received for target 4.
> Aug 11 17:47:47 thumper2scsi_status=0x0, ioc_status=0x804b, 
> scsi_state=0xc
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
> PROTECTED],0 (sd10):
> Aug 11 17:47:48 thumper2Error for Command: read(10)
> Error Level: Retryable
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
> 252165120 Error Block: 252165120
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA   
>  Serial Number:
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
> Unit_Attention
> Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
> reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
> Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL 
> PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0):
> 
> 
> Does somebody know what is going on here?
> I have checked the disks with iostat -En :
> 
> -bash-3.2# iostat -En
> ...
> c4t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
> Vendor: FUJITSU  Product: MBA3073RCRevision: 0103 Serial No:  
> Size: 73.54GB <73543163904 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c4t5d0   Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 
> Vendor: ATA  Product: ST3750330NS  Revision: SN04 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c4t6d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
> Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c6t4d0   Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 
> Vendor: ATA  Product: ST3750640NS  Revision: GSerial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> c6t5d0   Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 
> Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
> Size: 750.16GB <750156374016 bytes>
> Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 
> Illegal Request: 0 Predictive Failure Analysis: 0 
> 
> I have check the drives with smartctl:
> 
> ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
> WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate 0x000f   115   075   006Pre-fail  Always  
>  -   94384069
>   3 Spin_Up_Time0x0003   093   093   000Pre-fail  Always  
>  -   0
>   4 Start_Stop_Count0x0032   100   100   020Old_age   Always  
>  -   15
>   5 Reallocated_Sector_Ct   0x0033   100   100  

[zfs-discuss] ZFS, SATA, LSI and stability

2008-08-12 Thread Frank Fischer
After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 
Marvell controllers and opensolaris snv79 (same as described here: 
http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just 
start over using new hardware and opensolaris 2008.05 upgraded to snv94. We 
used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. 
And guess what? Now we get these error-messages in /var/adm/messages:

Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd11):
Aug 11 18:20:52 thumper2Error for Command: read(10)
Error Level: Retryable
Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
1423173120Error Block: 1423173120
Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA 
   Serial Number:  WD-WCAP
Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
Unit_Attention
Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

Along whit these messages there are a lot of this messages:

Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
Aug 11 18:20:51 thumper2Log info 0x31123000 received for target 5.
Aug 11 18:20:51 thumper2scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc


I would believe having a faulty disk, but not two:

Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1):
Aug 11 17:47:47 thumper2Log info 0x31123000 received for target 4.
Aug 11 17:47:47 thumper2scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd10):
Aug 11 17:47:48 thumper2Error for Command: read(10)
Error Level: Retryable
Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Requested Block: 
252165120 Error Block: 252165120
Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Vendor: ATA 
   Serial Number:
Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  Sense Key: 
Unit_Attention
Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice]  ASC: 0x29 (power on, 
reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0):


Does somebody know what is going on here?
I have checked the disks with iostat -En :

-bash-3.2# iostat -En
...
c4t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: FUJITSU  Product: MBA3073RCRevision: 0103 Serial No:  
Size: 73.54GB <73543163904 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c4t5d0   Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 
Vendor: ATA  Product: ST3750330NS  Revision: SN04 Serial No:  
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 
Illegal Request: 0 Predictive Failure Analysis: 0 
c4t6d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 0 Predictive Failure Analysis: 0 
c6t4d0   Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 
Vendor: ATA  Product: ST3750640NS  Revision: GSerial No:  
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 
Illegal Request: 0 Predictive Failure Analysis: 0 
c6t5d0   Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 
Vendor: ATA  Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No:  
Size: 750.16GB <750156374016 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 
Illegal Request: 0 Predictive Failure Analysis: 0 

I have check the drives with smartctl:

ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   115   075   006Pre-fail  Always   
-   94384069
  3 Spin_Up_Time0x0003   093   093   000Pre-fail  Always   
-   0
  4 Start_Stop_Count0x0032   100   100   020Old_age   Always   
-   15
  5 Reallocated_Sector_Ct   0x0033   100   100   036Pre-fail  Always   
-   0
  7 Seek_Error_Rate 0x000f   084   060   030Pre-fail  Always   
-   263091894
  9 Power_On_Hours  0x0032   096   096   000Old_age   Always 

Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.

2008-08-12 Thread Frank Fischer
James, one question: Do you know if and when yes in which version of 
opensolaris this issue is solved? We have the exact same problems using a 
Supermicro X7DBE with two Supermicro AOC-SAT2-MV8 (we are on snv79).

Thanks,

Frank
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
Chris Siebenmann <[EMAIL PROTECTED]>

 I'm not Anton Rang, but:
| How would you describe the difference between the data recovery
| utility and ZFS's normal data recovery process?

cks> The data recovery utility should not panic 
cks> my entire system if it runs into some situation 
cks> that it utterly cannot handle. Solaris 10 U5 
cks> kernel ZFS code does not have this property; 
cks> it is possible to wind up with ZFS pools that 
cks> will panic your system when you try to touch them.
...

I'll go you one worse.  Imagine a Sun Cluster with several resource
groups and several zpools.  You blow a proc on one of the servers.  As a
result, the metadata on one of the pools becomes corrupted.

http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html

Now, each of the servers in your cluster attempts to import the
zpool--and panics.

As a result of a singe part failure on a single server, your entire
cluster (and all the services on it) are sitting in a smoking heap on
your machine room floor.

| Nobody thinks that an answer of "sorry, we lost all of your data" is
| acceptable.  However, there are failures which will result in loss of
| data no matter how clever the file system is.

cks> The problem is that there are currently ways to 
cks> make ZFS lose all your data when there are no 
cks> hardware faults or failures, merely people or
cks> software mis-handling pools. This is especially 
cks> frustrating when the only thing that is likely 
cks> to be corrupted is ZFS metadata and the vast
cks> majority (or all) of the data in the pool is intact, 
cks> readable, and so on.

I'm just glad that our pool corruption experience happened during
testing, and not after the system had gone into production.  Not exactly
a resume-enhancing experience.

--Scott
 
 
 
 
This message may contain information that is confidential or privileged. 
If you are not the intended recipient, please advise the sender immediately
and delete this message.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
From: Richard Elling <[EMAIL PROTECTED]>
Miles Nordin wrote:
>> "re" == Richard Elling <[EMAIL PROTECTED]> writes:
>> "tb" == Tom Bird <[EMAIL PROTECTED]> writes:
>> 
>
...
>
> re> In general, ZFS can only repair conditions for which it owns
> re> data redundancy.
tb> If that's really the excuse for this situation, then ZFS is not
tb> ``always consistent on the disk'' for single-VDEV pools.

re> I disagree with your assessment.  The on-disk 
re> format (any on-disk format) necessarily assumes 
re> no faults on the media.  The difference between ZFS
re> on-disk format and most other file systems is that 
re> the metadata will be consistent to some point in time 
re> because it is COW.  
...


tb> There was no loss of data here, just an interruption in the
connection
tb> to the target, like power loss or any other unplanned shutdown.
tb> Corruption in this scenario is is a significant regression w.r.t.
UFS:

re> I see no evidence that the data is or is not correct.  
...

re> However, I will bet a steak dinner that if this device 
re> was mirrored to another, the pool will import just fine, 
re> with the affected device in a faulted or degraded state.

tb>
http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html 

re> I have no idea what Eric is referring to, and it does 
re> not match my experience.

We had a similar problem in our environment.  We lost a CPU on the
server, resulting in metadata corruption and an unrecoverable pool.  We
were told that we were seeing a known bug that will be fixed in S10u6.

http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html

From: Tom Bird <[EMAIL PROTECTED]>

tb> On any other file system though, I could probably kick 
tb> off a fsck and get back most of the data.  I see the 
tb> argument a lot that ZFS "doesn't need" a fsck utility, 
tb> however I would be inclined to disagree, if not a
tb> full on fsck then something that can patch it up to the 
tb> point where I can mount it and then get some data off or 
tb> run a scrub.

If not that, then we need some sort of recovery tool.  We ought to be
able to have some sort of recovery mode that allows us to read off the
known good data or roll back to a snapshot or something.  

When you have a really big file system, telling us (as Sun support told
us) that our only option was to re-build the zpool and restore from
tape, it becomes really difficult to justify using the product in
certain production environments.  

(For example, consider an environment where the available storage is on
a hardware RAID-5 system, and where mirroring large amounts of already
RAID-ed space adds up to more cost than a VxFS license.  Not every type
of data requires more protection than you get with standard
hardware-based RAID-5.)

--Scott
 
 
 
 
This message may contain information that is confidential or privileged. 
If you are not the intended recipient, please advise the sender immediately
and delete this message.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

2008-08-12 Thread Mattias Pantzare
2008/8/10 Jonathan Wheeler <[EMAIL PROTECTED]>:
> Hi Folks,
>
> I'm in the very unsettling position of fearing that I've lost all of my data 
> via a zfs send/receive operation, despite ZFS's legendary integrity.
>
> The error that I'm getting on restore is:
> receiving full stream of faith/[EMAIL PROTECTED] into Z/faith/[EMAIL 
> PROTECTED]
> cannot receive: invalid stream (checksum mismatch)
>
> Background:
> I was running snv_91, and decided to upgrade to snv_95 converting to the much 
> awaited zfs-root in the process.

You could try to restore on a snv_91 system. zfs send streams is not
for backups. This is from the zfs man page:

The format of the stream is evolving. No backwards  com-
patibility is guaranteed. You may not be able to receive
your streams on future versions of ZFS.

Or the file was corrupted when you transfered it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] URGENT: ZFS issue - can't import in degraded state

2008-08-12 Thread Robert Milkowski
Hello zfs-discuss,


S10, Generic_125100-10, SPARC


# zpool import
  pool: mail
id: 7518613205838351076
 state: DEGRADED
status: One or more devices are missing from the system.
action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
The pool may be active on on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-2Q
config:

mail   DEGRADED
  mirror   ONLINE
c2t0d0 ONLINE
c3t0d0 ONLINE
  mirror   ONLINE
c2t1d0 ONLINE
c3t1d0 ONLINE
  mirror   ONLINE
c2t2d0 ONLINE
c3t2d0 ONLINE
  mirror   ONLINE
c2t3d0 ONLINE
c3t3d0 ONLINE
  mirror   ONLINE
c2t8d0 ONLINE
c3t8d0 ONLINE
  mirror   DEGRADED
spare  DEGRADED
  c2t9d0   ONLINE
  c2t11d0  UNAVAIL   cannot open
c3t9d0 ONLINE
  mirror   ONLINE
c2t10d0ONLINE
c3t10d0ONLINE
spares
  c2t11d0
  c3t11d0
#

# zpool import -f mail
cannot import 'mail': no such device in pool
#


Why?


c2t11 was physically replaced with other disk.
Even if I power-off all c2 devices (one d1000) I can see all mirrors
in a degraded mode but I can't import the pool anyway.

??

-- 
Best regards,
 Robert Milkowski  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] corrupt zfs stream? "checksum mismatch"

2008-08-12 Thread Jonathan Wheeler
Hi folks,

Perhaps I was a little verbose in my first post, putting a view people off. 
Does anyone else have any ideas on this one.
I can't be the first person to have had a problem with a zfs backup stream. Is 
there nothing that can be done to recover at least some of the stream.

As another helpful chap pointed out, if tar encounters an error in the 
bitstream it just moves on until it finds usable data again. Can zfs not do 
something similar?

I'll take whatever i can get!
Jonathan
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver in progress - which disk is inconsistent?

2008-08-12 Thread Justin Vassallo
>I know this is too late to help you now, but...  Doesn't "zpool status -v"
>do what you want?

Hi,

No indeed it does not. At the top it just says that resilvering is happening
and that's it. Let me guess... it's to do with the zfs version I'm using?
(I'm on 3)

justin




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)

2008-08-12 Thread Mario Goebbels (iPhone)
I suppose an error correcting code like 256bit Hamming or Reed-Solomon  
can't substitute as reliable checksum on the level of default  
Fletcher2/4? If it can, it could be offered as alternative algorithm  
where necessary and let ZFS react accordingly, or not?

Regards,
-mg

On 12-août-08, at 08:48, "Anton B. Rang" <[EMAIL PROTECTED]> wrote:

> That brings up another interesting idea.
>
> ZFS currently uses a 128-bit checksum for blocks of up to 1048576  
> bits.
>
> If 20-odd bits of that were a Hamming code, you'd have something  
> slightly stronger than SECDED, and ZFS could correct any single-bit  
> errors encountered.
>
> This could be done without changing the ZFS on-disk format.
>
>
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss