Re: [zfs-discuss] GUI support for ZFS root?
On Tue, 12 Aug 2008, Lori Alt wrote: > There are no plans to add zfs root support to the existing > install GUI. GUI install support for zfs root will be > provided by the new Caiman installer. Thanks for the info. Follow-up question: is there an ETA for when Caiman will be integrated into Nevada (I use SXCE in preference to that which was previously known as Project Indiana)? Thanks again, -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] integrated failure recovery thoughts (single-bit
Reed-Solomon could correct multiple-bit errors, but an effective Reed-Solomon code for 128K blocks of data would be very slow if implemented in software (and, for that matter, take a lot of hardware to implement). A multi-bit Hamming code would be simpler, but I suspect that undetected multi-bit errors are quite rare. I've seen a fair number of single-bit errors coming from SATA drives because the data is often not parity-protected through the whole data path within the drive. Some enterprise-class SATA disks have data protected (with a parity-equivalent) through the write data path, and more of these models will have this feature soon. All SAS and FibreChannel drives (that I am aware of) have data protected with ECC through the whole path for both reads and writes. Single-bit errors can also be introduced in non-ECC DRAM, of course. In this case, it can happen either before the checksum computation (=> undetected data corruption) or after it (=> checksum failure on a later read). This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes: cs> It appears that the metadata on that pool became corrupted cs> when the processor failed. The exact mechanism is a bit of a cs> mystery, [...] cs> We were told that the probability of metadata corruption would cs> have been reduced but not eliminated by having a mirrored LUN. cs> We were also told that the issue will be fixed in U6. how can one fix an issue which is a bit of a mystery? Or do you mean the lazy-panic issue is fixed, but the corruption issue is not? pgpOQP9yzsfsR.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] GUI support for ZFS root?
Rich Teer wrote: > Hi all, > > I recently installed b95 and ZFS root is great! I used the > CLI installer because I remember reading that the GUI installer > doesn't yet support ZFS root. So my question is, what's the > ETA for support in the GUI installer for ZFS root? > > TIA, > > There are no plans to add zfs root support to the existing install GUI. GUI install support for zfs root will be provided by the new Caiman installer. Lori ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] GUI support for ZFS root?
Hi all, I recently installed b95 and ZFS root is great! I used the CLI installer because I remember reading that the GUI installer doesn't yet support ZFS root. So my question is, what's the ETA for support in the GUI installer for ZFS root? TIA, -- Rich Teer, SCSA, SCNA, SCSECA CEO, My Online Home Inventory URLs: http://www.rite-group.com/rich http://www.linkedin.com/in/richteer http://www.myonlinehomeinventory.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper, ZFS and performance
Config 1: as its got 4 vdev so it will stripe it across them vs the 1 vdev for Config 2 - for speed reliability - Both Config probably the same, sods law states second disk to fail will be in the same vdev. If you want the space Config 2 but with raidz2 If you want speed - 24 mirrors See other post for more knowledgeable explanations This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Richard Elling <[EMAIL PROTECTED]> Cromar Scott wrote: > Chris Siebenmann <[EMAIL PROTECTED]> > > I'm not Anton Rang, but: > | How would you describe the difference between the data recovery > | utility and ZFS's normal data recovery process? > > cks> The data recovery utility should not panic > cks> my entire system if it runs into some situation > cks> that it utterly cannot handle. Solaris 10 U5 > cks> kernel ZFS code does not have this property; > cks> it is possible to wind up with ZFS pools that > cks> will panic your system when you try to touch them. > ... > > I'll go you one worse. Imagine a Sun Cluster with several resource > groups and several zpools. You blow a proc on one of the servers. As a > result, the metadata on one of the pools becomes corrupted. > re> This failure mode affects all shared-storage re> clusters. I don't see how ZFS should or should re> not be any different than raw, UFS, et.al. Absolutely true. The file system definitely had a problem. > http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html > > Now, each of the servers in your cluster attempts to import the > zpool--and panics. > > As a result of a singe part failure on a single server, your entire > cluster (and all the services on it) are sitting in a smoking heap on > your machine room floor. > re> Yes, but your data is corrupted. My data was only corrupted on ONE of the zpools. In a cluster with several zpools and several resource groups, we ended up with ALL of the pools and ALL of the resource groups offline as one node after another panicked. re> If you were my bank, then I would greatly re> appreciate you getting the data corrected re> prior to bringing my account online. Fair enough, but do we have to take Fred's and Joe's accounts offline too? re> If you study highly available clusters and services re> then you will see many cases where human interaction re> is preferred to automation for just such cases. I see your point about requiring intervention to deal with a potentially corrupt file system. I would have preferred a behavior more like we get with VxVM and VxFS, where the corrupted file system fails to mount without human intervention, but the nodes don't panic on the failed vxdg import. That particular service group and that particular file system are offline, but everything else keeps running because none of the other nodes panics. We handled the issue of not corrupting the file system further by panicking the original node, but I don't understand why we need to panic each other successive node in the cluster. Why can't we just refuse to import automatically? > I'm just glad that our pool corruption experience happened during > testing, and not after the system had gone into production. Not exactly > a resume-enhancing experience. re> I'm glad you found this in testing. I'm a believer. Some people wanted us to just throw the box into production, but I insisted on keeping our test schedule. I'm glad I did. re> BTW, what was the root cause? It appears that the metadata on that pool became corrupted when the processor failed. The exact mechanism is a bit of a mystery, since we didn't get a valid crash dump. The other pools were fine, once we imported them after a boot -x. We ended up converting to VxVM and VxFS on that server because we could not guarantee that the same thing wouldn't just happen again after we went into production. If we had a tool that had allowed us to roll back to a previous snapshot or something, it might have made a difference. We were told that the probability of metadata corruption would have been reduced but not eliminated by having a mirrored LUN. We were also told that the issue will be fixed in U6. --Scott This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper, ZFS and performance
On Tue, 12 Aug 2008, John Malick wrote: > There is a thread quite similar to this but it did not provide a clear answer > to the question which was worded a bit odd.. > > I have a Thumper and am trying to determine, for performance, which > is the best ZFS configuration of the two shown below. Any issues > other than performance that anyone may see to steer me in one > direction or another would be helpful as well. Thanks. The joy of ZFS is that it takes just minutes to create various configurations and test them yourself. The pool design is a pretty fundamental decision whereas filesystems are easily tuned. The configuration with more vdevs should be both faster and more reliable. Zfs load shares across vdevs so with more vdevs, you can support more simultaneous users. Your entire pool will be no stronger than the weakest vdev, so make sure that your vdevs are sufficiently reliable. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Cromar Scott wrote: > Chris Siebenmann <[EMAIL PROTECTED]> > > I'm not Anton Rang, but: > | How would you describe the difference between the data recovery > | utility and ZFS's normal data recovery process? > > cks> The data recovery utility should not panic > cks> my entire system if it runs into some situation > cks> that it utterly cannot handle. Solaris 10 U5 > cks> kernel ZFS code does not have this property; > cks> it is possible to wind up with ZFS pools that > cks> will panic your system when you try to touch them. > ... > > I'll go you one worse. Imagine a Sun Cluster with several resource > groups and several zpools. You blow a proc on one of the servers. As a > result, the metadata on one of the pools becomes corrupted. > This failure mode affects all shared-storage clusters. I don't see how ZFS should or should not be any different than raw, UFS, et.al. > http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html > > Now, each of the servers in your cluster attempts to import the > zpool--and panics. > > As a result of a singe part failure on a single server, your entire > cluster (and all the services on it) are sitting in a smoking heap on > your machine room floor. > Yes, but your data is corrupted. If you were my bank, then I would greatly appreciate you getting the data corrected prior to bringing my account online. If you study highly available clusters and services then you will see many cases where human interaction is preferred to automation for just such cases. You will also find that a combination of shared storage and non-shared storage cluster technology is used for truly important data. For example, we would use Solaris Cluster for the local shared-storage framework and Solaris Cluster Geographic Edition for a remote site (no shared hardware components with the local cluster). > | Nobody thinks that an answer of "sorry, we lost all of your data" is > | acceptable. However, there are failures which will result in loss of > | data no matter how clever the file system is. > > cks> The problem is that there are currently ways to > cks> make ZFS lose all your data when there are no > cks> hardware faults or failures, merely people or > cks> software mis-handling pools. This is especially > cks> frustrating when the only thing that is likely > cks> to be corrupted is ZFS metadata and the vast > cks> majority (or all) of the data in the pool is intact, > cks> readable, and so on. > > I'm just glad that our pool corruption experience happened during > testing, and not after the system had gone into production. Not exactly > a resume-enhancing experience. > I'm glad you found this in testing. BTW, what was the root cause? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corrupt zfs stream? checksum mismatch
> "mp" == Mattias Pantzare <[EMAIL PROTECTED]> writes: mp> Or the file was corrupted when you transfered it. he stored the backup streams on ZFS, so obviously they couldn't possibly be corrupt. :p Jonathan, does 'zfs receive -nv' also detect the checksum error, or is it only detected when you actually receive onto a pool without -n? in addition to skipping to the next header of corrupted tarballs, tar can validate a tarball's checksums without extracting it, so it's possible to write a tape, then read it to see if it's ok. The 'tar t' read test checks for medium errors, driver bugs, and bugs inside tar itself. so it sounds like: brrk, brrk, danger, do not use zfs send/receive for backups---use only for moving filesystems from one pool to another. This brings back the question ``how is it possible to back up and restore a heavily-cloned/snapshotted system?'' because upon restore the clone inheritance tree is lost, and you'll never have enough space in the pool to fit what was there before. pgpKRGlG0Pbzz.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thumper, ZFS and performance
John Malick wrote: > There is a thread quite similar to this but it did not provide a clear answer > to the question which was worded a bit odd.. > > I have a Thumper and am trying to determine, for performance, which is the > best ZFS configuration of the two shown below. Any issues other than > performance that anyone may see to steer me in one direction or another would > be helpful as well. Thanks. > Do config 1, please do not do config 2. From zpool(1): A raidz group with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand one device failing before data integrity is compromised. The minimum number of devices in a raidz group is one more than the number of parity disks. The recommended number is between 3 and 9. -- richard > ZFS Config 1: > > zpool status > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAMESTATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > raidzONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c5t1d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > raidzONLINE 0 0 0 > c0t2d0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c5t2d0 ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > c7t2d0 ONLINE 0 0 0 > raidzONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c5t3d0 ONLINE 0 0 0 > c6t3d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > raidzONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > c5t4d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > > > versus > > ZFS Config 2: > > zpool status > pool: rpool > state: ONLINE > scrub: none requested > config: > > NAMESTATE READ WRITE CKSUM > rpool ONLINE 0 0 0 > raidz1ONLINE 0 0 0 > c0t1d0 ONLINE 0 0 0 > c1t1d0 ONLINE 0 0 0 > c4t1d0 ONLINE 0 0 0 > c5t1d0 ONLINE 0 0 0 > c6t1d0 ONLINE 0 0 0 > c7t1d0 ONLINE 0 0 0 > c0t2d0 ONLINE 0 0 0 > c1t2d0 ONLINE 0 0 0 > c4t2d0 ONLINE 0 0 0 > c5t2d0 ONLINE 0 0 0 > c6t2d0 ONLINE 0 0 0 > c7t2d0 ONLINE 0 0 0 > c0t3d0 ONLINE 0 0 0 > c1t3d0 ONLINE 0 0 0 > c4t3d0 ONLINE 0 0 0 > c5t3d0 ONLINE 0 0 0 > c6t3d0 ONLINE 0 0 0 > c7t3d0 ONLINE 0 0 0 > c0t4d0 ONLINE 0 0 0 > c1t4d0 ONLINE 0 0 0 > c4t4d0 ONLINE 0 0 0 > c5t4d0 ONLINE 0 0 0 > c6t4d0 ONLINE 0 0 0 > c7t4d0 ONLINE 0 0 0 > > In a nutshell, for performance reasons, is it better to have multiple raidz > vdevs in the pool or just one raidz vdev. The number of disks used is the > same in either case. > > Thanks again. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, SATA, LSI and stability
ff> I have check the drives with smartctl: ff> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE ff> 1 Raw_Read_Error_Rate 0x000f 115 075 006Pre-fail Always - 94384069 ff> 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 ff> 195 Hardware_ECC_Recovered 0x001a 065 056 000Old_age Always - 173161329 ff> 199 UDMA_CRC_Error_Count0x003e 200 200 000Old_age Always - 0 ff> But with no UDMA_CRC_Errors I believe the disks are fine. no, UDMA_CRC_Errors counts checksum errors on PATA cables. I cannot confirm/deny if it counts CRC errors on SATA cables (and even if it did this is complicated because there are weird scsi-emulation proprietary drivers, port multipliers, u.s.w.) so, if you are having problems, and that parameter is increasing, then it's probably cabling problems not drive problems. The other three values I quoted are the ones that matter. The VALUE is scaled by constants defined by the manufacturer and used for the ``overall health assessment'', but the constants they use are always way too forgiving, so it's worthless. The RAW_VALUE looks bigger than I'm used to, but this may also be meaningless. The only way I know to get information out of the report is: How do the RAW_VALUE's of the three parameters I quoted compare with other drives of the same model, or to this drive before it started failing? There is another section of the smartctl -a report that logs the last 5 or so errors the drive has reported to the host. IIRC you will see errors called 'ICRC' or 'UNC' on failing drives. this experience is all PATA/SATA-specific. pgpUhcxtASNbS.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Thumper, ZFS and performance
There is a thread quite similar to this but it did not provide a clear answer to the question which was worded a bit odd.. I have a Thumper and am trying to determine, for performance, which is the best ZFS configuration of the two shown below. Any issues other than performance that anyone may see to steer me in one direction or another would be helpful as well. Thanks. ZFS Config 1: zpool status pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 raidzONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 raidzONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 raidzONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 raidzONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 versus ZFS Config 2: zpool status pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 raidz1ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 In a nutshell, for performance reasons, is it better to have multiple raidz vdevs in the pool or just one raidz vdev. The number of disks used is the same in either case. Thanks again. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
On Aug 7, 2008, at 10:25 PM, Anton B. Rang wrote: >> How would you describe the difference between the file system >> checking utility and zpool scrub? Is zpool scrub lacking in its >> verification of the data? > > To answer the second question first, yes, zpool scrub is lacking, at > least to the best of my knowledge (I haven't looked at the ZFS > source in a few months). It does not verify that any internal data > structures are correct; rather, it simply verifies that data and > metadata blocks match their checksums. Hey Anton, What do you mean by "internal data structures"? Are you referring to things like space maps, props, history obj, etc. (basically anything other than user data and the indirect blocks that point to user data)? eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] URGENT: ZFS issue - can't import in degraded state
Hello Robert, Probably it was related to 6436000. Kernel upgrade which does include fix for above has helped and now I was able to import the pool without any issues. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]
Darren J Moffat wrote: > [EMAIL PROTECTED] wrote: > >>> As others have noted, the COW nature of ZFS means that there is a >>> good chance that on a mostly-empty pool, previous data is still intact >>> long after you might think it is gone. A utility to recover such data is >>> (IMHO) more likely to be in the category of forensic analysis than >>> a mount (import) process. There is more than enough information >>> publically available for someone to build such a tool (hint, hint :-) >>> -- richard >>> >> Veritas, the makers if vxfs, whom I consider ZFS to be trying to >> compete against has higher level (normal) support engineers that have >> access to tools that let them scan the disk for inodes and other filesystem >> fragments and recover. When you log a support call on a faulty filesystem >> (in one such case I was involved in zeroed out 100mb of the first portion >> of the volume killing off both top OLT's -- bad bad) they can actually help >> you at a very low level dig data out of the filesystem or even recover from >> pretty nasty issues. They can scan for inodes (marked by a magic number), >> have utilities to pull out files from those inodes (including indirect >> blocks/extents). Given the tools and help from their support I was able to >> pull back 500 gb of files (99%) from a filesystem that emc killed during a >> botched powerpath upgrade. Can Sun's support engineers, or is their >> answer pull from tape? (hint, hint ;-) >> > > Sounds like a good topic for here: > > http://opensolaris.org/os/project/forensics/ > I took a look at this project, specifically http://opensolaris.org/os/project/forensics/ZFS-Forensics/. Is there any reason that the paper and slides I presented at the OpenSolaris Developers Conference on zfs on-disk format not mentioned? The paper is at: http://www.osdevcon.org/2008/files/osdevcon2008-proceedings.pdf starting on page 36, and the slides are at: http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf thanks, max ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] integrated failure recovery thoughts (single-bit
Although I don't know for sure that most such errors are in fact single bit in nature, I can only surmise they most likely statistically are absent detection otherwise; as with the exception of error corrected memory systems and/or check-summed communication channels, each transition of data between hardware interfaces at ever increasing clock clock rates, correspondingly increase the probability of such otherwise non-detectable soft single bit error being injected at these boundaries, where although the probabilities of their occurrence are small enough not to be easily detectable or classifiable as a hardware failure, they none the less can occur with a high enough probability that over the course of days/weeks/years and trillions of bits they will be observable and should be expected and planed for within reason. Utilizing a strong error correcting code in combination with or in lieu of a strong hash code would seem like a good thing to more strongly warrant that data's representation in memory at the time of it's computation is more resilient to transmission and subsequent retrieval; but suspect through time as technology continues to push clock rates and corresponding data pool size ever higher, that some form of uniform data integrity mechanism will need to be incorporated within all the processing and communications interface data paths within systems in order to improve data's resilience to transmission and processing errors albeit being statistically very small for any single bit. > Anton B. Rang wrote: > > That brings up another interesting idea. > > > > ZFS currently uses a 128-bit checksum for blocks of > up to 1048576 bits. > > > > If 20-odd bits of that were a Hamming code, you'd > have something slightly stronger than SECDED, and ZFS > could correct any single-bit errors encountered. > > > > Yes. But I'm not convinced that we will see single > bit errors, since > there is already a large number of single-bit-error > detection and (often) > correction capability in modern systems. It seems > that when we lose > a block of data, we lose more than a single bit. > > It should be relatively easy to add code to the > current protection schemes > which will compare a bad block to a reconstructed, > good block and > deliver this information for us. I'll add an RFE. > -- richard > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]
| As others have noted, the COW nature of ZFS means that there is a good | chance that on a mostly-empty pool, previous data is still intact long | after you might think it is gone. In the cases I am thinking of I am sure that the data was there. Kernel panics just didn't let me get at it. Fortunately it was only testing data, but I am now concerned about it happening in production. | A utility to recover such data is (IMHO) more likely to be in the | category of forensic analysis than a mount (import) process. There is | more than enough information publically available for someone to build | such a tool (hint, hint :-) To put it crudely, if I wanted to write my own software for this sort of thing I would run Linux. - cks ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]
[EMAIL PROTECTED] wrote: >> As others have noted, the COW nature of ZFS means that there is a >> good chance that on a mostly-empty pool, previous data is still intact >> long after you might think it is gone. A utility to recover such data is >> (IMHO) more likely to be in the category of forensic analysis than >> a mount (import) process. There is more than enough information >> publically available for someone to build such a tool (hint, hint :-) >> -- richard > > Veritas, the makers if vxfs, whom I consider ZFS to be trying to > compete against has higher level (normal) support engineers that have > access to tools that let them scan the disk for inodes and other filesystem > fragments and recover. When you log a support call on a faulty filesystem > (in one such case I was involved in zeroed out 100mb of the first portion > of the volume killing off both top OLT's -- bad bad) they can actually help > you at a very low level dig data out of the filesystem or even recover from > pretty nasty issues. They can scan for inodes (marked by a magic number), > have utilities to pull out files from those inodes (including indirect > blocks/extents). Given the tools and help from their support I was able to > pull back 500 gb of files (99%) from a filesystem that emc killed during a > botched powerpath upgrade. Can Sun's support engineers, or is their > answer pull from tape? (hint, hint ;-) Sounds like a good topic for here: http://opensolaris.org/os/project/forensics/ -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Forensic analysis [was: more ZFS recovery]
> > As others have noted, the COW nature of ZFS means that there is a > good chance that on a mostly-empty pool, previous data is still intact > long after you might think it is gone. A utility to recover such data is > (IMHO) more likely to be in the category of forensic analysis than > a mount (import) process. There is more than enough information > publically available for someone to build such a tool (hint, hint :-) > -- richard Veritas, the makers if vxfs, whom I consider ZFS to be trying to compete against has higher level (normal) support engineers that have access to tools that let them scan the disk for inodes and other filesystem fragments and recover. When you log a support call on a faulty filesystem (in one such case I was involved in zeroed out 100mb of the first portion of the volume killing off both top OLT's -- bad bad) they can actually help you at a very low level dig data out of the filesystem or even recover from pretty nasty issues. They can scan for inodes (marked by a magic number), have utilities to pull out files from those inodes (including indirect blocks/extents). Given the tools and help from their support I was able to pull back 500 gb of files (99%) from a filesystem that emc killed during a botched powerpath upgrade. Can Sun's support engineers, or is their answer pull from tape? (hint, hint ;-) -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)
Anton B. Rang wrote: > That brings up another interesting idea. > > ZFS currently uses a 128-bit checksum for blocks of up to 1048576 bits. > > If 20-odd bits of that were a Hamming code, you'd have something slightly > stronger than SECDED, and ZFS could correct any single-bit errors encountered. > Yes. But I'm not convinced that we will see single bit errors, since there is already a large number of single-bit-error detection and (often) correction capability in modern systems. It seems that when we lose a block of data, we lose more than a single bit. It should be relatively easy to add code to the current protection schemes which will compare a bad block to a reconstructed, good block and deliver this information for us. I'll add an RFE. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, SATA, LSI and stability
Frank Fischer wrote: > After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 > Marvell controllers and opensolaris snv79 (same as described here: > http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just > start over using new hardware and opensolaris 2008.05 upgraded to snv94. We > used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. > And guess what? Now we get these error-messages in /var/adm/messages: > > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 (sd11): > Aug 11 18:20:52 thumper2Error for Command: read(10) > Error Level: Retryable > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Requested Block: > 1423173120Error Block: 1423173120 > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: WD-WCAP > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Sense Key: > Unit_Attention > Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, > reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > > Along whit these messages there are a lot of this messages: > > Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): > Aug 11 18:20:51 thumper2Log info 0x31123000 received for target 5. > Aug 11 18:20:51 thumper2scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > > > I would believe having a faulty disk, but not two: > > Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): > Aug 11 17:47:47 thumper2Log info 0x31123000 received for target 4. > Aug 11 17:47:47 thumper2scsi_status=0x0, ioc_status=0x804b, > scsi_state=0xc > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL > PROTECTED],0 (sd10): > Aug 11 17:47:48 thumper2Error for Command: read(10) > Error Level: Retryable > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Requested Block: > 252165120 Error Block: 252165120 > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA > Serial Number: > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Sense Key: > Unit_Attention > Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, > reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 > Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL > PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): > > > Does somebody know what is going on here? > I have checked the disks with iostat -En : > > -bash-3.2# iostat -En > ... > c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: FUJITSU Product: MBA3073RCRevision: 0103 Serial No: > Size: 73.54GB <73543163904 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t5d0 Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 > Vendor: ATA Product: ST3750330NS Revision: SN04 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 > Illegal Request: 0 Predictive Failure Analysis: 0 > c4t6d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 > Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 > Illegal Request: 0 Predictive Failure Analysis: 0 > c6t4d0 Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 > Vendor: ATA Product: ST3750640NS Revision: GSerial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 > Illegal Request: 0 Predictive Failure Analysis: 0 > c6t5d0 Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 > Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: > Size: 750.16GB <750156374016 bytes> > Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 > Illegal Request: 0 Predictive Failure Analysis: 0 > > I have check the drives with smartctl: > > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED > WHEN_FAILED RAW_VALUE > 1 Raw_Read_Error_Rate 0x000f 115 075 006Pre-fail Always > - 94384069 > 3 Spin_Up_Time0x0003 093 093 000Pre-fail Always > - 0 > 4 Start_Stop_Count0x0032 100 100 020Old_age Always > - 15 > 5 Reallocated_Sector_Ct 0x0033 100 100
[zfs-discuss] ZFS, SATA, LSI and stability
After having massive problems with a supermicro X7DBE box using AOC-SAT2-MV8 Marvell controllers and opensolaris snv79 (same as described here: http://sunsolve.sun.com/search/document.do?assetkey=1-66-233341-1) we just start over using new hardware and opensolaris 2008.05 upgraded to snv94. We used again a supermicro X7DBE but now with two LSI SAS3081E SAS controllers. And guess what? Now we get these error-messages in /var/adm/messages: Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd11): Aug 11 18:20:52 thumper2Error for Command: read(10) Error Level: Retryable Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Requested Block: 1423173120Error Block: 1423173120 Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: WD-WCAP Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Aug 11 18:20:52 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Along whit these messages there are a lot of this messages: Aug 11 18:20:51 thumper2 scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): Aug 11 18:20:51 thumper2Log info 0x31123000 received for target 5. Aug 11 18:20:51 thumper2scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc I would believe having a faulty disk, but not two: Aug 11 17:47:47 thumper2 scsi: [ID 365881 kern.info] /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt1): Aug 11 17:47:47 thumper2Log info 0x31123000 received for target 4. Aug 11 17:47:47 thumper2scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (sd10): Aug 11 17:47:48 thumper2Error for Command: read(10) Error Level: Retryable Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Requested Block: 252165120 Error Block: 252165120 Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention Aug 11 17:47:48 thumper2 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0 Aug 11 17:48:34 thumper2 scsi: [ID 243001 kern.warning] WARNING: /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci1000,[EMAIL PROTECTED] (mpt0): Does somebody know what is going on here? I have checked the disks with iostat -En : -bash-3.2# iostat -En ... c4t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: FUJITSU Product: MBA3073RCRevision: 0103 Serial No: Size: 73.54GB <73543163904 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c4t5d0 Soft Errors: 4 Hard Errors: 24 Transport Errors: 179 Vendor: ATA Product: ST3750330NS Revision: SN04 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 22 Recoverable: 4 Illegal Request: 0 Predictive Failure Analysis: 0 c4t6d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 c6t4d0 Soft Errors: 6 Hard Errors: 17 Transport Errors: 466 Vendor: ATA Product: ST3750640NS Revision: GSerial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 17 Recoverable: 6 Illegal Request: 0 Predictive Failure Analysis: 0 c6t5d0 Soft Errors: 2 Hard Errors: 23 Transport Errors: 539 Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Serial No: Size: 750.16GB <750156374016 bytes> Media Error: 0 Device Not Ready: 0 No Device: 23 Recoverable: 2 Illegal Request: 0 Predictive Failure Analysis: 0 I have check the drives with smartctl: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 115 075 006Pre-fail Always - 94384069 3 Spin_Up_Time0x0003 093 093 000Pre-fail Always - 0 4 Start_Stop_Count0x0032 100 100 020Old_age Always - 15 5 Reallocated_Sector_Ct 0x0033 100 100 036Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 084 060 030Pre-fail Always - 263091894 9 Power_On_Hours 0x0032 096 096 000Old_age Always
Re: [zfs-discuss] x4500 dead HDD, hung server, unable to boot.
James, one question: Do you know if and when yes in which version of opensolaris this issue is solved? We have the exact same problems using a Supermicro X7DBE with two Supermicro AOC-SAT2-MV8 (we are on snv79). Thanks, Frank This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
Chris Siebenmann <[EMAIL PROTECTED]> I'm not Anton Rang, but: | How would you describe the difference between the data recovery | utility and ZFS's normal data recovery process? cks> The data recovery utility should not panic cks> my entire system if it runs into some situation cks> that it utterly cannot handle. Solaris 10 U5 cks> kernel ZFS code does not have this property; cks> it is possible to wind up with ZFS pools that cks> will panic your system when you try to touch them. ... I'll go you one worse. Imagine a Sun Cluster with several resource groups and several zpools. You blow a proc on one of the servers. As a result, the metadata on one of the pools becomes corrupted. http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html Now, each of the servers in your cluster attempts to import the zpool--and panics. As a result of a singe part failure on a single server, your entire cluster (and all the services on it) are sitting in a smoking heap on your machine room floor. | Nobody thinks that an answer of "sorry, we lost all of your data" is | acceptable. However, there are failures which will result in loss of | data no matter how clever the file system is. cks> The problem is that there are currently ways to cks> make ZFS lose all your data when there are no cks> hardware faults or failures, merely people or cks> software mis-handling pools. This is especially cks> frustrating when the only thing that is likely cks> to be corrupted is ZFS metadata and the vast cks> majority (or all) of the data in the pool is intact, cks> readable, and so on. I'm just glad that our pool corruption experience happened during testing, and not after the system had gone into production. Not exactly a resume-enhancing experience. --Scott This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] more ZFS recovery
From: Richard Elling <[EMAIL PROTECTED]> Miles Nordin wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> "tb" == Tom Bird <[EMAIL PROTECTED]> writes: >> > ... > > re> In general, ZFS can only repair conditions for which it owns > re> data redundancy. tb> If that's really the excuse for this situation, then ZFS is not tb> ``always consistent on the disk'' for single-VDEV pools. re> I disagree with your assessment. The on-disk re> format (any on-disk format) necessarily assumes re> no faults on the media. The difference between ZFS re> on-disk format and most other file systems is that re> the metadata will be consistent to some point in time re> because it is COW. ... tb> There was no loss of data here, just an interruption in the connection tb> to the target, like power loss or any other unplanned shutdown. tb> Corruption in this scenario is is a significant regression w.r.t. UFS: re> I see no evidence that the data is or is not correct. ... re> However, I will bet a steak dinner that if this device re> was mirrored to another, the pool will import just fine, re> with the affected device in a faulted or degraded state. tb> http://mail.opensolaris.org/pipermail/zfs-discuss/2008-June/048375.html re> I have no idea what Eric is referring to, and it does re> not match my experience. We had a similar problem in our environment. We lost a CPU on the server, resulting in metadata corruption and an unrecoverable pool. We were told that we were seeing a known bug that will be fixed in S10u6. http://mail.opensolaris.org/pipermail/zfs-discuss/2008-April/046951.html From: Tom Bird <[EMAIL PROTECTED]> tb> On any other file system though, I could probably kick tb> off a fsck and get back most of the data. I see the tb> argument a lot that ZFS "doesn't need" a fsck utility, tb> however I would be inclined to disagree, if not a tb> full on fsck then something that can patch it up to the tb> point where I can mount it and then get some data off or tb> run a scrub. If not that, then we need some sort of recovery tool. We ought to be able to have some sort of recovery mode that allows us to read off the known good data or roll back to a snapshot or something. When you have a really big file system, telling us (as Sun support told us) that our only option was to re-build the zpool and restore from tape, it becomes really difficult to justify using the product in certain production environments. (For example, consider an environment where the available storage is on a hardware RAID-5 system, and where mirroring large amounts of already RAID-ed space adds up to more cost than a VxFS license. Not every type of data requires more protection than you get with standard hardware-based RAID-5.) --Scott This message may contain information that is confidential or privileged. If you are not the intended recipient, please advise the sender immediately and delete this message. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corrupt zfs stream? checksum mismatch
2008/8/10 Jonathan Wheeler <[EMAIL PROTECTED]>: > Hi Folks, > > I'm in the very unsettling position of fearing that I've lost all of my data > via a zfs send/receive operation, despite ZFS's legendary integrity. > > The error that I'm getting on restore is: > receiving full stream of faith/[EMAIL PROTECTED] into Z/faith/[EMAIL > PROTECTED] > cannot receive: invalid stream (checksum mismatch) > > Background: > I was running snv_91, and decided to upgrade to snv_95 converting to the much > awaited zfs-root in the process. You could try to restore on a snv_91 system. zfs send streams is not for backups. This is from the zfs man page: The format of the stream is evolving. No backwards com- patibility is guaranteed. You may not be able to receive your streams on future versions of ZFS. Or the file was corrupted when you transfered it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] URGENT: ZFS issue - can't import in degraded state
Hello zfs-discuss, S10, Generic_125100-10, SPARC # zpool import pool: mail id: 7518613205838351076 state: DEGRADED status: One or more devices are missing from the system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. The pool may be active on on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-2Q config: mail DEGRADED mirror ONLINE c2t0d0 ONLINE c3t0d0 ONLINE mirror ONLINE c2t1d0 ONLINE c3t1d0 ONLINE mirror ONLINE c2t2d0 ONLINE c3t2d0 ONLINE mirror ONLINE c2t3d0 ONLINE c3t3d0 ONLINE mirror ONLINE c2t8d0 ONLINE c3t8d0 ONLINE mirror DEGRADED spare DEGRADED c2t9d0 ONLINE c2t11d0 UNAVAIL cannot open c3t9d0 ONLINE mirror ONLINE c2t10d0ONLINE c3t10d0ONLINE spares c2t11d0 c3t11d0 # # zpool import -f mail cannot import 'mail': no such device in pool # Why? c2t11 was physically replaced with other disk. Even if I power-off all c2 devices (one d1000) I can see all mirrors in a degraded mode but I can't import the pool anyway. ?? -- Best regards, Robert Milkowski mailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corrupt zfs stream? "checksum mismatch"
Hi folks, Perhaps I was a little verbose in my first post, putting a view people off. Does anyone else have any ideas on this one. I can't be the first person to have had a problem with a zfs backup stream. Is there nothing that can be done to recover at least some of the stream. As another helpful chap pointed out, if tar encounters an error in the bitstream it just moves on until it finds usable data again. Can zfs not do something similar? I'll take whatever i can get! Jonathan This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver in progress - which disk is inconsistent?
>I know this is too late to help you now, but... Doesn't "zpool status -v" >do what you want? Hi, No indeed it does not. At the top it just says that resilvering is happening and that's it. Let me guess... it's to do with the zfs version I'm using? (I'm on 3) justin smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] integrated failure recovery thoughts (single-bit correction)
I suppose an error correcting code like 256bit Hamming or Reed-Solomon can't substitute as reliable checksum on the level of default Fletcher2/4? If it can, it could be offered as alternative algorithm where necessary and let ZFS react accordingly, or not? Regards, -mg On 12-août-08, at 08:48, "Anton B. Rang" <[EMAIL PROTECTED]> wrote: > That brings up another interesting idea. > > ZFS currently uses a 128-bit checksum for blocks of up to 1048576 > bits. > > If 20-odd bits of that were a Hamming code, you'd have something > slightly stronger than SECDED, and ZFS could correct any single-bit > errors encountered. > > This could be done without changing the ZFS on-disk format. > > > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss