Re: [zfs-discuss] more ZFS recovery

2008-10-09 Thread Ross
Victor, thanks for posting that. It really is interesting to see exactly what happened, and to read about how zfs pools can be recovered. Your work on these forums has done much to re-assure me that ZFS is stable enough for us to be using on a live server, and I look forward to seeing automate

Re: [zfs-discuss] more ZFS recovery

2008-10-09 Thread Victor Latushkin
Borys Saulyak wrote: >> As a follow up to the whole story, with the fantastic help of >> Victor, the failed pool is now imported and functional thanks to >> the redundancy in the meta data. > It would be really useful if you could publish the steps to recover > the pools. Here it is: Executive s

Re: [zfs-discuss] more ZFS recovery

2008-08-27 Thread Borys Saulyak
> As a follow up to the whole story, with the fantastic > help of Victor, > the failed pool is now imported and functional thanks > to the redundancy > in the meta data. It would be really useful if you could publish the steps to recover the pools. This message posted from opensolaris.org _

Re: [zfs-discuss] more ZFS recovery

2008-08-26 Thread Tom Bird
Victor Latushkin wrote: > Hi Tom and all, >> [EMAIL PROTECTED]:~# uname -a >> SunOS cs3.kw 5.10 Generic_127127-11 sun4v sparc SUNW,Sun-Fire-T200 > > Btw, have you considered opening support call for this issue? As a follow up to the whole story, with the fantastic help of Victor, the failed pool

Re: [zfs-discuss] more ZFS recovery

2008-08-19 Thread [EMAIL PROTECTED]
Hi Robert, et.al., I have blogged about a method I used to recover a removed file from a zfs file system at http://mbruning.blogspot.com. Be forewarned, it is very long... All comments are welcome. max Robert Milkowski wrote: > Hello max, > > Sunday, August 17, 2008, 1:02:05 PM, you wrote: > > m

Re: [zfs-discuss] more ZFS recovery

2008-08-18 Thread Robert Milkowski
Hello max, Sunday, August 17, 2008, 1:02:05 PM, you wrote: mbc> A Darren Dunham wrote: >> >> If the most recent uberblock appears valid, but doesn't have useful >> data, I don't think there's any way currently to see what the tree of an >> older uberblock looks like. It would be nice to see if t

Re: [zfs-discuss] more ZFS recovery

2008-08-17 Thread [EMAIL PROTECTED]
A Darren Dunham wrote: > > If the most recent uberblock appears valid, but doesn't have useful > data, I don't think there's any way currently to see what the tree of an > older uberblock looks like. It would be nice to see if that data > appears valid and try to create a view that would be > read

Re: [zfs-discuss] more ZFS recovery

2008-08-13 Thread Cromar Scott
Miles Nordin <[EMAIL PROTECTED]> > "cs" == Cromar Scott <[EMAIL PROTECTED]> writes: cs> We opened a call with Sun support. We were told that the cs> corruption issue was due to a race condition within ZFS. We cs> were also told that the issue was known and was scheduled for c

Re: [zfs-discuss] more ZFS recovery

2008-08-13 Thread Miles Nordin
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes: cs> We opened a call with Sun support. We were told that the cs> corruption issue was due to a race condition within ZFS. We cs> were also told that the issue was known and was scheduled for cs> a fix in S10U6. nice. Is the

Re: [zfs-discuss] more ZFS recovery

2008-08-13 Thread Cromar Scott
Miles Nordin <[EMAIL PROTECTED]> > "cs" == Cromar Scott <[EMAIL PROTECTED]> writes: cs> It appears that the metadata on that pool became corrupted cs> when the processor failed. The exact mechanism is a bit of a cs> mystery, [...] cs> We were told that the probability of me

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Miles Nordin
> "cs" == Cromar Scott <[EMAIL PROTECTED]> writes: cs> It appears that the metadata on that pool became corrupted cs> when the processor failed. The exact mechanism is a bit of a cs> mystery, [...] cs> We were told that the probability of metadata corruption would cs> ha

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
Richard Elling <[EMAIL PROTECTED]> Cromar Scott wrote: > Chris Siebenmann <[EMAIL PROTECTED]> > > I'm not Anton Rang, but: > | How would you describe the difference between the data recovery > | utility and ZFS's normal data recovery process? > > cks> The data recovery utility should not panic >

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Richard Elling
Cromar Scott wrote: > Chris Siebenmann <[EMAIL PROTECTED]> > > I'm not Anton Rang, but: > | How would you describe the difference between the data recovery > | utility and ZFS's normal data recovery process? > > cks> The data recovery utility should not panic > cks> my entire system if it runs in

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread eric kustarz
On Aug 7, 2008, at 10:25 PM, Anton B. Rang wrote: >> How would you describe the difference between the file system >> checking utility and zpool scrub? Is zpool scrub lacking in its >> verification of the data? > > To answer the second question first, yes, zpool scrub is lacking, at > least to

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
Chris Siebenmann <[EMAIL PROTECTED]> I'm not Anton Rang, but: | How would you describe the difference between the data recovery | utility and ZFS's normal data recovery process? cks> The data recovery utility should not panic cks> my entire system if it runs into some situation cks> that it ut

Re: [zfs-discuss] more ZFS recovery

2008-08-12 Thread Cromar Scott
From: Richard Elling <[EMAIL PROTECTED]> Miles Nordin wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> "tb" == Tom Bird <[EMAIL PROTECTED]> writes: >> > ... > > re> In general, ZFS can only repair conditions for which it owns > re> data redundancy. tb

Re: [zfs-discuss] more ZFS recovery

2008-08-11 Thread Richard Elling
Claus Guttesen wrote: >> | How would you describe the difference between the data recovery >> | utility and ZFS's normal data recovery process? >> >> The data recovery utility should not panic my entire system if it runs >> into some situation that it utterly cannot handle. Solaris 10 U5 kernel >>

Re: [zfs-discuss] more ZFS recovery

2008-08-11 Thread Claus Guttesen
> | How would you describe the difference between the data recovery > | utility and ZFS's normal data recovery process? > > The data recovery utility should not panic my entire system if it runs > into some situation that it utterly cannot handle. Solaris 10 U5 kernel > ZFS code does not have this

Re: [zfs-discuss] more ZFS recovery

2008-08-11 Thread Chris Siebenmann
I'm not Anton Rang, but: | How would you describe the difference between the data recovery | utility and ZFS's normal data recovery process? The data recovery utility should not panic my entire system if it runs into some situation that it utterly cannot handle. Solaris 10 U5 kernel ZFS code doe

Re: [zfs-discuss] more ZFS recovery

2008-08-11 Thread Tom Bird
Victor Latushkin wrote: > Hi Tom and all, > > Tom Bird wrote: >> Hi, >> >> Have a problem with a ZFS on a single device, this device is 48 1T SATA >> drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had >> a ZFS on it as a single device. >> >> There was a problem with the SAS b

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Anton B. Rang
> How would you describe the difference between the file system > checking utility and zpool scrub? Is zpool scrub lacking in its > verification of the data? To answer the second question first, yes, zpool scrub is lacking, at least to the best of my knowledge (I haven't looked at the ZFS source

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Victor Latushkin
Miles Nordin пишет: >> "r" == Ross <[EMAIL PROTECTED]> writes: > > r> Tom wrote "There was a problem with the SAS bus which caused > r> various errors including the inevitable kernel panic". It's > r> the various errors part that catches my eye, > > yeah, possibly, but there

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread A Darren Dunham
On Thu, Aug 07, 2008 at 11:34:12AM -0700, Richard Elling wrote: > Anton B. Rang wrote: > > First, there are two types of utilities which might be useful in the > > situation where a ZFS pool has become corrupted. The first is a file system > > checking utility (call it zfsck); the second is a dat

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Richard Elling
[I think Miles and I seem to be talking about two different topics] Miles Nordin wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> > > re> If your pool is not redundant, the chance that data > re> corruption can render some or all of your data inacces

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Bill Sommerfeld
On Thu, 2008-08-07 at 11:34 -0700, Richard Elling wrote: > How would you describe the difference between the data recovery > utility and ZFS's normal data recovery process? I'm not Anton but I think I see what he's getting at. Assume you have disks which once contained a pool but all of the uberb

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Bob Friesenhahn
On Thu, 7 Aug 2008, Miles Nordin wrote: I must apologize that I was not able to read your complete email due to local buffer overflow ... > someone who knows ZFS well like Pavel. Also, there is enough concern > for people designing paranoid systems to approach them with the view, > ``ZFS is not

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Richard Elling
Anton B. Rang wrote: >> From the ZFS Administration Guide, Chapter 11, Data Repair section: >> Given that the fsck utility is designed to repair known pathologies >> specific to individual file systems, writing such a utility for a file >> system with no known pathologies is impossible. >> > >

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Miles Nordin
> "r" == Ross <[EMAIL PROTECTED]> writes: r> Tom wrote "There was a problem with the SAS bus which caused r> various errors including the inevitable kernel panic". It's r> the various errors part that catches my eye, yeah, possibly, but there are checksums on the SAS bus, and

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Cindy . Swearingen
Hi Richard, Yes, sure. We can add that scenario. What's been on my todo list is a ZFS troubleshooting wiki. I've been collecting issues. Let's talk soon. Cindy Richard Elling wrote: > Tom Bird wrote: > >> Richard Elling wrote: >> >> >> >>> I see no evidence that the data is or is not correct

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Volker A. Brandt
Anton B. Rang writes: > dumping out the raw data structures and looking at > them by hand is the only way to determine what > ZFS doesn't like and deduce what went wrong (and > how to fix it). http://www.osdevcon.org/2008/files/osdevcon2008-max.pdf :-) --

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Victor Latushkin
Miles Nordin wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> "tb" == Tom Bird <[EMAIL PROTECTED]> writes: > > tb> There was a problem with the SAS bus which caused various > tb> errors including the inevitable kernel panic, the thing came > tb> back up with 3 ou

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Ross
Hi folks, Miles, I don't know if you have more information about this problem than I'm seeing, but from what Tom wrote I don't see how you can assume this is such a simple problem as an unclean shutdown? Tom wrote "There was a problem with the SAS bus which caused various errors including the i

Re: [zfs-discuss] more ZFS recovery

2008-08-07 Thread Victor Latushkin
> Would be grateful for any ideas, relevant output here: > > [EMAIL PROTECTED]:~# zpool import > pool: content > id: 14205780542041739352 > state: FAULTED > status: The pool metadata is corrupted. > action: The pool cannot be imported due to damaged devices or data. > The pool may b

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Victor Latushkin
Hi Tom and all, Tom Bird wrote: Hi, Have a problem with a ZFS on a single device, this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. There was a problem with the SAS bus which caused various errors including the in

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
> "nw" == Nicolas Williams <[EMAIL PROTECTED]> writes: nw> Without ZFS the OP would have had silent, undetected (by the nw> OS that is) data corruption. It sounds to me more like the system would have paniced as soon as he pulled the cord, and when it rebooted, it would have rolled t

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
> "re" == Richard Elling <[EMAIL PROTECTED]> writes: re> If your pool is not redundant, the chance that data re> corruption can render some or all of your data inaccessible is re> always present. 1. data corruption != unclean shutdown 2. other filesystems do not need a mirror

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Nicolas Williams
On Wed, Aug 06, 2008 at 03:44:08PM -0400, Miles Nordin wrote: > > "re" == Richard Elling <[EMAIL PROTECTED]> writes: > > c> If that's really the excuse for this situation, then ZFS is > c> not ``always consistent on the disk'' for single-VDEV pools. > > re> I disagree with your

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Nicolas Williams
On Wed, Aug 06, 2008 at 02:23:44PM -0400, Will Murnane wrote: > On Wed, Aug 6, 2008 at 13:57, Miles Nordin <[EMAIL PROTECTED]> wrote: > > If that's really the excuse for this situation, then ZFS is not > > ``always consistent on the disk'' for single-VDEV pools. > Well, yes. If data is sent, but c

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> As others have explained, if ZFS does not have a > config with data redundancy - there is not much that > can be learned - except that it "just broke". Plenty can be learned by just looking at the pool. Unfortunately ZFS currently doesn't have tools which make that easy; as I understand it, zdb

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Anton B. Rang
> From the ZFS Administration Guide, Chapter 11, Data Repair section: > Given that the fsck utility is designed to repair known pathologies > specific to individual file systems, writing such a utility for a file > system with no known pathologies is impossible. That's a fallacy (and is incorrect

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Tom Bird wrote: > Richard Elling wrote: > > >> I see no evidence that the data is or is not correct. What we know is that >> ZFS is attempting to read something and the device driver is returning EIO. >> Unfortunately, EIO is a catch-all error code, so more digging to find the >> root cause is

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Tom Bird wrote: > Richard Elling wrote: > > >> I see no evidence that the data is or is not correct. What we know is that >> ZFS is attempting to read something and the device driver is returning EIO. >> Unfortunately, EIO is a catch-all error code, so more digging to find the >> root cause is

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Al Hopper
On Wed, Aug 6, 2008 at 8:20 AM, Tom Bird <[EMAIL PROTECTED]> wrote: > Hi, > > Have a problem with a ZFS on a single device, this device is 48 1T SATA > drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had > a ZFS on it as a single device. > > There was a problem with the SAS bus

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
> "re" == Richard Elling <[EMAIL PROTECTED]> writes: c> If that's really the excuse for this situation, then ZFS is c> not ``always consistent on the disk'' for single-VDEV pools. re> I disagree with your assessment. The on-disk format (any re> on-disk format) necessarily

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Tom Bird
Richard Elling wrote: > I see no evidence that the data is or is not correct. What we know is that > ZFS is attempting to read something and the device driver is returning EIO. > Unfortunately, EIO is a catch-all error code, so more digging to find the > root cause is needed. I'm currently check

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Miles Nordin wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> "tb" == Tom Bird <[EMAIL PROTECTED]> writes: >> > > tb> There was a problem with the SAS bus which caused various > tb> errors including the inevitable kernel panic, the thing came > tb

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Will Murnane
On Wed, Aug 6, 2008 at 13:57, Miles Nordin <[EMAIL PROTECTED]> wrote: >> "re" == Richard Elling <[EMAIL PROTECTED]> writes: >> "tb" == Tom Bird <[EMAIL PROTECTED]> writes: > >tb> There was a problem with the SAS bus which caused various >tb> errors including the inevitable kernel pa

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Miles Nordin
> "re" == Richard Elling <[EMAIL PROTECTED]> writes: > "tb" == Tom Bird <[EMAIL PROTECTED]> writes: tb> There was a problem with the SAS bus which caused various tb> errors including the inevitable kernel panic, the thing came tb> back up with 3 out of 4 zfs mounted. re> I

Re: [zfs-discuss] more ZFS recovery

2008-08-06 Thread Richard Elling
Tom Bird wrote: > Hi, > > Have a problem with a ZFS on a single device, this device is 48 1T SATA > drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had > a ZFS on it as a single device. > > There was a problem with the SAS bus which caused various errors > including the inevita

[zfs-discuss] more ZFS recovery

2008-08-06 Thread Tom Bird
Hi, Have a problem with a ZFS on a single device, this device is 48 1T SATA drives presented as a 42T LUN via hardware RAID 6 on a SAS bus which had a ZFS on it as a single device. There was a problem with the SAS bus which caused various errors including the inevitable kernel panic, the thing ca