Re: [zfs-discuss] ZFS tale of woe and fail
It was blogged about on Joyent Tim: http://www.joyent.com/joyeurblog/2008/01/16/strongspace-and-bingodisk-update/ http://bugs.opensolaris.org/view_bug.do?bug_id=6458218 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Ross wrote: Yup, that one was down to a known (and fixed) bug though, so it isn't the normal story of ZFS problems. Got a bug ID or anything for that, just out of interest? As an update on my storage situation, I've got some JBODs now, see how that goes. -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Victor Latushkin wrote: This issue (and previous one reported by Tom) has got some publicity recently - see here http://www.uknof.org.uk/uknof13/Bird-Redux.pdf So i feel like i need to provide a little bit more information about the outcome (sorry that it is delayed and not as full as previous one). Morning, Right, the PDF on there doesn't really give the full story of the presentation, unfortunate as I see it seems to have got around a bit. In the actual presentation I wasn't perhaps as harsh as it seems on the slides! First permanent error means that root block of the filesystem named 'content' was corrupted (all copies), so it was not possible to open it and access any content of that filesystem. Fortunately enough, there were not too much activity on the pool, so we decided to try previous states of the pool. I do not remember exact txg number we tried, but it was something like hundred txg back or so. We checked it with zdb and discovered that that state was more or less good - at least filesystem content was openable and it was possible to access its content, so we decided to reactivate that previous state. Pool imported fine and contents of 'content' was there. Subsequent scrub did find some errors but I do not remember exactly how much. Tom may have exact number. I can't remember how many errors the check found, however all the data copied off successfully, as far as we know. -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
On Fri, August 14, 2009 09:02, Tom Bird wrote: I can't remember how many errors the check found, however all the data copied off successfully, as far as we know. I would think that you'd be fairly confident of the integrity of the data since everything would be checksummed. Joyent also had a fairly public issue with ZFS a while ago: http://tinyurl.com/ptt5zp http://www.joyent.com/joyeurblog/2008/01/22/bingodisk-and-strongspace-what-happened/ http://tinyurl.com/qlzsw6 http://www.joyent.com/joyeurblog/2008/01/24/new-podcast-quad-core-episode-2/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Yup, that one was down to a known (and fixed) bug though, so it isn't the normal story of ZFS problems. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
mhh, i think i`m afraid, too, as i also need to use zfs on a single, large lun. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
It is a ZFS issue. My understanding is that ZFS has multiple copies of the uberblock, but only tries to use the most recent one on import, meaning that on rare occasions, it's possible to loose access to the pool even though the vast majority of your data is fine. I believe there is work going on to create automatic recovery tools that will warn you of uberblock corruption, and attempt to automatically use an older copy, but I have no idea of the bug number nor status I'm afraid. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
On 19.01.09 12:09, Tom Bird wrote: Toby Thain wrote: On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: Hey, Tom - Correct me if I'm wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. Every other file system out there runs fine on a single LUN, when things go wrong you have a fsck utility that patches it up and the world keeps on turning. I can't find anywhere that will sell me a 48 drive SATA JBOD with all the drives presented on a single SAS channel, so running on a single giant LUN is a real world scenario that ZFS should be able to cope with, as this is how the hardware I am stuck with is arranged. Which is particularly catastrophic when one's 'content' is organized as a monolithic file, as it is here - unless, of course, you have some way of scavenging that file based on internal structure. No, it's not a monolithic file, the point I was making there is that no files are showing up. r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) This issue (and previous one reported by Tom) has got some publicity recently - see here http://www.uknof.org.uk/uknof13/Bird-Redux.pdf So i feel like i need to provide a little bit more information about the outcome (sorry that it is delayed and not as full as previous one). First, it looked like this: r...@cs4:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT content 62.5T 59.9T 2.63T95% ONLINE - r...@cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM content ONLINE 0 032 c2t8d0ONLINE 0 032 errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 First permanent error means that root block of the filesystem named 'content' was corrupted (all copies), so it was not possible to open it and access any content of that filesystem. Fortunately enough, there were not too much activity on the pool, so we decided to try previous states of the pool. I do not remember exact txg number we tried, but it was something like hundred txg back or so. We checked it with zdb and discovered that that state was more or less good - at least filesystem content was openable and it was possible to access its content, so we decided to reactivate that previous state. Pool imported fine and contents of 'content' was there. Subsequent scrub did find some errors but I do not remember exactly how much. Tom may have exact number. Victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
On Jul 1, 2009, at 12:37, Victor Latushkin wrote: This issue (and previous one reported by Tom) has got some publicity recently - see here http://www.uknof.org.uk/uknof13/Bird-Redux.pdf Joyent also had issues a while back as well: http://tinyurl.com/ytyzs6 http://www.joyeur.com/2008/01/22/bingodisk-and-strongspace-what-happened A lot of people billed it as a ZFS issue, but it should be noted that because of all the checksuming going on, when you get back data you can be fairly sure that it hasn't been corrupted. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Toby Thain wrote: On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: Hey, Tom - Correct me if I'm wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. Every other file system out there runs fine on a single LUN, when things go wrong you have a fsck utility that patches it up and the world keeps on turning. I can't find anywhere that will sell me a 48 drive SATA JBOD with all the drives presented on a single SAS channel, so running on a single giant LUN is a real world scenario that ZFS should be able to cope with, as this is how the hardware I am stuck with is arranged. Which is particularly catastrophic when one's 'content' is organized as a monolithic file, as it is here - unless, of course, you have some way of scavenging that file based on internal structure. No, it's not a monolithic file, the point I was making there is that no files are showing up. r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
You can get a sort of redundancy by creating multiple filesystems with 'copies' enabled on the ones that need some sort of self-healing in case of bad blocks. Is it possible to at least present your disks as several LUNs? If you must have an abstraction layer between ZFS and the block device, presenting ZFS with a plurality of abstracted devices would let you get some sort of parity...or is this device live and in production? I do think that, though ZFS doesn't need fsck in the traditional sense, some sort of recovery tool would make storage admins even happier about using ZFS. cheers, Blake On Mon, Jan 19, 2009 at 4:09 AM, Tom Bird t...@marmot.org.uk wrote: Toby Thain wrote: On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: Hey, Tom - Correct me if I'm wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. Every other file system out there runs fine on a single LUN, when things go wrong you have a fsck utility that patches it up and the world keeps on turning. I can't find anywhere that will sell me a 48 drive SATA JBOD with all the drives presented on a single SAS channel, so running on a single giant LUN is a real world scenario that ZFS should be able to cope with, as this is how the hardware I am stuck with is arranged. Which is particularly catastrophic when one's 'content' is organized as a monolithic file, as it is here - unless, of course, you have some way of scavenging that file based on internal structure. No, it's not a monolithic file, the point I was making there is that no files are showing up. r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
nk == Nathan Kroenert nathan.kroen...@sun.com writes: b == Blake blake.ir...@gmail.com writes: nk I'm not sure how you can class it a ZFS fail when the Disk nk subsystem has failed... The disk subsystem did not fail and lose all its contents. It just rebooted a few times. b You can get a sort of redundancy by creating multiple b filesystems with 'copies' enabled on the ones that need some b sort of self-healing in case of bad blocks. Won't work here. The pool won't import at all. The type of bad block fixing you're talking about applies to cases where the pool imports, but 'zpool status' reports files with bad blocks in them. pgp6eWbgsPtQQ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Miles, that's correct - I got muddled in the details of the thread. I'm not necessarily suggesting this, but is this an occasion when removing the zfs cache file located at /etc/zfs/zpool.cache might be an emergency workaround? Tom, please don't try this until someone more expert replies to my question. cheers, Blake On Mon, Jan 19, 2009 at 1:43 PM, Miles Nordin car...@ivy.net wrote: b You can get a sort of redundancy by creating multiple b filesystems with 'copies' enabled on the ones that need some b sort of self-healing in case of bad blocks. Won't work here. The pool won't import at all. The type of bad block fixing you're talking about applies to cases where the pool imports, but 'zpool status' reports files with bad blocks in them. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
b == Blake blake.ir...@gmail.com writes: b removing the zfs cache file located at /etc/zfs/zpool.cache b might be an emergency workaround? just the opposite. There seem to be fewer checks blocking the autoimport of pools listed in zpool.cache than on 'zpool import' manual imports. I'd expect the reverse, for some forceable 'zpool import' to accept pools that don't autoimport, but at least Ross found zpool.cache could auto-import a pool with a missing slog, while 'zpool import' tells you, recreate from backup. pgpa9pIk70DwP.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS tale of woe and fail
Morning, For those of you who remember last time, this is a different Solaris, different disk box and different host, but the epic nature of the fail is similar. The RAID box that is the 63T LUN has a hardware fault and has been crashing, up to now the box and host got restarted and both came up fine. However, just now as I have got replacement hardware in position and was ready to start copying, it went bang and my data has all gone. Ideas? r...@cs4:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT content 62.5T 59.9T 2.63T95% ONLINE - r...@cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM content ONLINE 0 032 c2t8d0ONLINE 0 032 errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) r...@cs4:~# uname -a SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 from format: 2. c2t8d0 IFT-S12S-G1033-363H-62.76TB /p...@7c0/p...@0/p...@8/LSILogic,s...@0/s...@8,0 Also, content does not show in df output. thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird t...@marmot.org.uk wrote: Morning, For those of you who remember last time, this is a different Solaris, different disk box and different host, but the epic nature of the fail is similar. The RAID box that is the 63T LUN has a hardware fault and has been crashing, up to now the box and host got restarted and both came up fine. However, just now as I have got replacement hardware in position and was ready to start copying, it went bang and my data has all gone. Ideas? r...@cs4:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT content 62.5T 59.9T 2.63T95% ONLINE - r...@cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM content ONLINE 0 032 c2t8d0ONLINE 0 032 errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) r...@cs4:~# uname -a SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 from format: 2. c2t8d0 IFT-S12S-G1033-363H-62.76TB /p...@7c0/p...@0/p...@8/LSILogic,s...@0/s...@8,0 Also, content does not show in df output. thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain Those are supposedly the two inodes that are corrupt. The 0x0 is a bit scary... you should be able to find out what file(s) they're tied to (if any) with: find /content -inum 0 find /content -inum 182424 If you can live without those files, delete them, export the pool, re-import, and resilver, and you should be good to go. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Tim t...@tcsac.net wrote: On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird t...@marmot.org.uk wrote: Those are supposedly the two inodes that are corrupt. The 0x0 is a bit scary... you should be able to find out what file(s) they're tied to (if any) with: find /content -inum 0 find /content -inum 182424 Using find to search for inodes with st_inode == 0 is not something you may rely on to work as expected. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Tim wrote: On Sun, Jan 18, 2009 at 8:02 AM, Tom Bird t...@marmot.org.uk mailto:t...@marmot.org.uk wrote: errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) Those are supposedly the two inodes that are corrupt. The 0x0 is a bit scary... you should be able to find out what file(s) they're tied to (if any) with: find /content -inum 0 find /content -inum 182424 If you can live without those files, delete them, export the pool, re-import, and resilver, and you should be good to go. Hi, well one of the problems is that find doesn't find anything as it is not presenting any files, so I can't delete anything. I've exported the pool but on reimport, I get the same error as I was getting last time something popped: r...@cs4:~# zpool import content cannot open 'content': I/O error Last time, Victor Latushkin fixed it by modifying the file system to point to an older copy of the data. I've not really been following the list of late, any more sign of a fsck.zfs...? thanks -- Tom // www.portfast.co.uk -- internet services and consultancy // hosting from 1.65 per domain ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
Hey, Tom - Correct me if I'm wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. I'm not sure how you can class it a ZFS fail when the Disk subsystem has failed... Or - did I miss something? :) Nathan. Tom Bird wrote: Morning, For those of you who remember last time, this is a different Solaris, different disk box and different host, but the epic nature of the fail is similar. The RAID box that is the 63T LUN has a hardware fault and has been crashing, up to now the box and host got restarted and both came up fine. However, just now as I have got replacement hardware in position and was ready to start copying, it went bang and my data has all gone. Ideas? r...@cs4:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT content 62.5T 59.9T 2.63T95% ONLINE - r...@cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM content ONLINE 0 032 c2t8d0ONLINE 0 032 errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) r...@cs4:~# uname -a SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 from format: 2. c2t8d0 IFT-S12S-G1033-363H-62.76TB /p...@7c0/p...@0/p...@8/LSILogic,s...@0/s...@8,0 Also, content does not show in df output. thanks -- /// // Nathan Kroenert nathan.kroen...@sun.com // // Senior Systems Engineer Phone:+61 3 9869 6255// // Global Systems Engineering Fax:+61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 VictoriaAustralia// /// ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS tale of woe and fail
On 18-Jan-09, at 6:12 PM, Nathan Kroenert wrote: Hey, Tom - Correct me if I'm wrong here, but it seems you are not allowing ZFS any sort of redundancy to manage. Which is particularly catastrophic when one's 'content' is organized as a monolithic file, as it is here - unless, of course, you have some way of scavenging that file based on internal structure. --Toby I'm not sure how you can class it a ZFS fail when the Disk subsystem has failed... Or - did I miss something? :) Nathan. Tom Bird wrote: Morning, For those of you who remember last time, this is a different Solaris, different disk box and different host, but the epic nature of the fail is similar. The RAID box that is the 63T LUN has a hardware fault and has been crashing, up to now the box and host got restarted and both came up fine. However, just now as I have got replacement hardware in position and was ready to start copying, it went bang and my data has all gone. Ideas? r...@cs4:~# zpool list NAME SIZE USED AVAILCAP HEALTH ALTROOT content 62.5T 59.9T 2.63T95% ONLINE - r...@cs4:~# zpool status -v pool: content state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM content ONLINE 0 032 c2t8d0ONLINE 0 032 errors: Permanent errors have been detected in the following files: content:0x0 content:0x2c898 r...@cs4:~# find /content /content r...@cs4:~# (yes that really is it) r...@cs4:~# uname -a SunOS cs4.kw 5.11 snv_99 sun4v sparc SUNW,Sun-Fire-T200 from format: 2. c2t8d0 IFT-S12S-G1033-363H-62.76TB /p...@7c0/p...@0/p...@8/LSILogic,s...@0/s...@8,0 Also, content does not show in df output. thanks -- /// // Nathan Kroenert nathan.kroen...@sun.com // // Senior Systems Engineer Phone:+61 3 9869 6255// // Global Systems Engineering Fax:+61 3 9869 6288 // // Level 7, 476 St. Kilda Road // // Melbourne 3004 VictoriaAustralia// /// ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss