Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2009-02-07 Thread Gino
FYI, I'm working on a workaround for broken devices. As you note, ome disks flat-out lie: you issue the synchronize-cache command, they say got it, boss, yet the data is still not on stable storage. Why do they do this? Because it performs better. Well, duh -- ou can make stuff

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-11-30 Thread Ray Clark
It would be extremely helpful to know what brands/models of disks lie and which don't. This information could be provided diplomatically simply as threads documenting problems you are working on, stating the facts. Use of a specific string of words would make searching for it easy. There

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-13 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 10/11/2008 09:36:02 PM: On Oct 10, 2008, at 7:55 PM 10/10/, David Magda wrote: If someone finds themselves in this position, what advice can be followed to minimize risks? Can you ask for two LUNs on different physical SAN devices and have an expectation

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-13 Thread Mike Gerdts
On Thu, Oct 9, 2008 at 10:33 PM, Mike Gerdts [EMAIL PROTECTED] wrote: On Thu, Oct 9, 2008 at 10:18 AM, Mike Gerdts [EMAIL PROTECTED] wrote: On Thu, Oct 9, 2008 at 10:10 AM, Greg Shaw [EMAIL PROTECTED] wrote: Nevada isn't production code. For real ZFS testing, you must use a production

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-11 Thread Keith Bierman
On Oct 10, 2008, at 7:55 PM 10/10/, David Magda wrote: If someone finds themselves in this position, what advice can be followed to minimize risks? Can you ask for two LUNs on different physical SAN devices and have an expectation of getting it? -- Keith H. Bierman [EMAIL

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Timh Bergström
2008/10/9 Bob Friesenhahn [EMAIL PROTECTED]: On Thu, 9 Oct 2008, Miles Nordin wrote: catastrophically. If this is really the situation, then ZFS needs to give the sysadmin a way to isolate and fix the problems deterministically before filling the pool with data, not just blame the sysadmin

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Marcelo Leal
Hello all, I think the problem here is the ZFS´ capacity for recovery from a failure. Forgive me, but thinking about creating a code without failures, maybe the hackers did forget that other people can make mistakes (if they can´t). - ZFS does not need fsck. Ok, that´s a great statement,

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
The circumstances where I have lost data have been when ZFS has not handled a layer of redundancy. However, I am not terribly optimistic of the prospects of ZFS on any device that hasn't committed writes that ZFS thinks are committed. FYI, I'm working on a workaround for broken devices. As

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Ricardo M. Correia
Hi Jeff, On Sex, 2008-10-10 at 01:26 -0700, Jeff Bonwick wrote: The circumstances where I have lost data have been when ZFS has not handled a layer of redundancy. However, I am not terribly optimistic of the prospects of ZFS on any device that hasn't committed writes that ZFS thinks are

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Ross
That sounds like a great idea for a tool Jeff. Would it be possible to build that in as a zpool recover command? Being able to run a tool like that and see just how bad the corruption is, but know it's possible to recover an older version would be great. Is there any chance of outputting

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Miles Nordin
jb == Jeff Bonwick [EMAIL PROTECTED] writes: rmc == Ricardo M Correia [EMAIL PROTECTED] writes: jb We need a little more Code of Hammurabi in the storage jb industry. It seems like most of the work people have to do now is cleaning up after the sloppyness of others. At least it takes

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Eric Schrock
On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote: - ZFS does not need fsck. Ok, that?s a great statement, but i think ZFS needs one. Really does. And in my opinion a enhanced zdb would be the solution. Flexibility. Options. About 99% of the problems reported as I need ZFS fsck

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Victor Latushkin
Eric Schrock wrote: On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote: - ZFS does not need fsck. Ok, that?s a great statement, but i think ZFS needs one. Really does. And in my opinion a enhanced zdb would be the solution. Flexibility. Options. About 99% of the problems

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Timh Bergström
2008/10/10 Richard Elling [EMAIL PROTECTED]: Timh Bergström wrote: 2008/10/9 Bob Friesenhahn [EMAIL PROTECTED]: On Thu, 9 Oct 2008, Miles Nordin wrote: catastrophically. If this is really the situation, then ZFS needs to give the sysadmin a way to isolate and fix the problems

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Marcelo Leal
On Fri, Oct 10, 2008 at 06:15:16AM -0700, Marcelo Leal wrote: - ZFS does not need fsck. Ok, that?s a great statement, but i think ZFS needs one. Really does. And in my opinion a enhanced zdb would be the solution. Flexibility. Options. About 99% of the problems reported as I

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Ricardo M. Correia
On Sex, 2008-10-10 at 11:23 -0700, Eric Schrock wrote: But I haven't actually heard a reasonable proposal for what a fsck-like tool (i.e. one that could repair things automatically) would actually *do*, let alone how it would work in the variety of situations it needs to (compressed RAID-Z?)

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Richard Elling
Timh Bergström wrote: 2008/10/10 Richard Elling [EMAIL PROTECTED]: Timh Bergström wrote: 2008/10/9 Bob Friesenhahn [EMAIL PROTECTED]: On Thu, 9 Oct 2008, Miles Nordin wrote: catastrophically. If this is really the situation, then ZFS needs to give the sysadmin

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread David Magda
On Oct 10, 2008, at 15:48, Victor Latushkin wrote: I've mostly seen (2), because despite all the best practices out there, single vdev pools are quite common. In all such cases that I had my hands on it was possible to recover pool by going back by one or two txgs. For better or worse

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Jeff Bonwick
Or is there a way to mitigate a checksum error on non-redundant zpool? It's just like the difference between non-parity, parity, and ECC memory. Most filesystems don't have checksums (non-parity), so they don't even know when they're returning corrupt data. ZFS without any replication can

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-10 Thread Mike Gerdts
On Fri, Oct 10, 2008 at 9:14 PM, Jeff Bonwick [EMAIL PROTECTED] wrote: Note: even in a single-device pool, ZFS metadata is replicated via ditto blocks at two or three different places on the device, so that a localized media failure can be both detected and corrected. If you have two or more

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread .
His explanation: he invalidated the incorrect uberblocks and forced zfs to revert to an earlier state that was consistent. Would someone be willing to document the steps required in order to do this please? I have a disk in a similar state: # zpool import pool: tank id:

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Mike Gerdts
On Thu, Oct 9, 2008 at 4:53 AM, . [EMAIL PROTECTED] wrote: While it's clearly my own fault for taking the risks I did, it's still pretty frustrating knowing that all my data is likely still intact and nicely checksummed on the disk but that none of it is accessible due to some tiny filesystem

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Wilkinson, Alex
0n Thu, Oct 09, 2008 at 06:37:23AM -0500, Mike Gerdts wrote: FWIW, I belive that I have hit the same type of bug as the OP in the following combinations: - T2000, LDoms 1.0, various builds of Nevada in control and guest domains. - Laptop, VirtualBox 1.6.2, Windows

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Mike Gerdts
On Thu, Oct 9, 2008 at 7:44 AM, Ahmed Kamal [EMAIL PROTECTED] wrote: In the past year I've lost more ZFS file systems than I have any other type of file system in the past 5 years. With other file systems I can almost always get some data back. With ZFS I can't get any back.

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Timh Bergström
Unfortunely I can only agree to the doubts about running ZFS in production environments, i've lost ditto-blocks, i''ve gotten corrupted pools and a bunch of other failures even in mirror/raidz/raidz2 setups with or without hardware mirrors/raid5/6. Plus the insecurity of a sudden crash/reboot will

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Greg Shaw
Perhaps I mis-understand, but the below issues are all based on Nevada, not Solaris 10. Nevada isn't production code. For real ZFS testing, you must use a production release, currently Solaris 10 (update 5, soon to be update 6). In the last 2 years, I've stored everything in my environment

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Mike Gerdts
On Thu, Oct 9, 2008 at 10:10 AM, Greg Shaw [EMAIL PROTECTED] wrote: Nevada isn't production code. For real ZFS testing, you must use a production release, currently Solaris 10 (update 5, soon to be update 6). I misstated before in my LDoms case. The corrupted pool was on Solaris 10, with

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Miles Nordin
gs == Greg Shaw [EMAIL PROTECTED] writes: gs Nevada isn't production code. For real ZFS testing, you must gs use a production release, currently Solaris 10 (update 5, soon gs to be update 6). based on list feedback, my impression is that the results of a ``test'' confined to s10,

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Bob Friesenhahn
On Thu, 9 Oct 2008, Miles Nordin wrote: catastrophically. If this is really the situation, then ZFS needs to give the sysadmin a way to isolate and fix the problems deterministically before filling the pool with data, not just blame the sysadmin based on nebulous speculatory hindsight

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-09 Thread Mike Gerdts
On Thu, Oct 9, 2008 at 10:18 AM, Mike Gerdts [EMAIL PROTECTED] wrote: On Thu, Oct 9, 2008 at 10:10 AM, Greg Shaw [EMAIL PROTECTED] wrote: Nevada isn't production code. For real ZFS testing, you must use a production release, currently Solaris 10 (update 5, soon to be update 6). I misstated

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-06 Thread Darren J Moffat
Fajar A. Nugraha wrote: On Fri, Oct 3, 2008 at 10:37 PM, Vasile Dumitrescu [EMAIL PROTECTED] wrote: VMWare 6.0.4 running on Debian unstable, Linux bigsrv 2.6.26-1-amd64 #1 SMP Wed Sep 24 13:59:41 UTC 2008 x86_64 GNU/Linux Solaris is vanilla snv_90 installed with no GUI. in summary:

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-04 Thread Fajar A. Nugraha
On Fri, Oct 3, 2008 at 10:37 PM, Vasile Dumitrescu [EMAIL PROTECTED] wrote: VMWare 6.0.4 running on Debian unstable, Linux bigsrv 2.6.26-1-amd64 #1 SMP Wed Sep 24 13:59:41 UTC 2008 x86_64 GNU/Linux Solaris is vanilla snv_90 installed with no GUI. in summary: physical disks, assigned

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-03 Thread Vasile Dumitrescu
Hi folks, I just wanted to share the end of my adventure here and especially take the time to thank Victor for helping me out of this mess. I will let him explain the technical details (I am out of my depth here) but bottom line he spent a couple of hours with me on the machine and sorted me

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-03 Thread Darren J Moffat
Vasile Dumitrescu wrote: Hi folks, I just wanted to share the end of my adventure here and especially take the time to thank Victor for helping me out of this mess. I will let him explain the technical details (I am out of my depth here) but bottom line he spent a couple of hours with

Re: [zfs-discuss] Solved - a big THANKS to Victor Latushkin @ Sun / Moscow

2008-10-03 Thread Vasile Dumitrescu
Which VM solution was this ? VMware, VirtualBox, Xen, other ? How were the disks presented to the guest ? What are the disks in the host, real disks, files, something else ? -- Darren J Moffat ___ zfs-discuss mailing list