Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Jim Klimov
2012-01-13 5:30, Daniel Carosone wrote: Corrupted file data that is then accurately checksummed and readable as valid? Speaking of which, is there currently any simple way to disable checksum validation during data reads (and not cause a kernel panic when reading garbage under the guise of metad

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Jim Klimov
2012-01-13 4:26, Richard Elling wrote: On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: The problem was solved by disabling dedup for the dataset involved and rsync-updating the file in-place. After the dedup feature was disabled and new blocks were uniquely written, everything was readable (and m

Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-12 Thread Jim Klimov
2012-01-13 7:26, Steve Gonczi wrote: JIm, Any modified block (in absence of a snaphot) gets re-written to a new location and the original block is freed. So the earlier state you want to go back and snapshot is no longer there, The essence of taking a snapshot is keeping the original blocks i

Re: [zfs-discuss] zfs defragmentation via resilvering?

2012-01-12 Thread Edward Ned Harvey
> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] > > > Suppose you write a 1G file to disk. It is a database store. Now you start > > running your db server. It starts performing transactions all over the > > place. It overwrites the middle 4k of the file, and it overwrites 512b >

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Jim Klimov
2012-01-13 5:01, Richard Elling wrote: On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote: Metadata is at least doubly redundant and checksummed. True, and this helps if it is valid in the first place (in RAM). >> As has been >> reported by many blog-posts researching ZDB, there do >> happen case

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Jim Klimov
2012-01-13 5:30, Daniel Carosone wrote: On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote: This thread is about checksums - namely, now, what are our options when they mismatch the data? As has been reported by many blog-posts researching ZDB, there do happen cases when checksums ar

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Jim Klimov
2012-01-13 5:34, Daniel Carosone wrote: On Fri, Jan 13, 2012 at 05:16:36AM +0400, Jim Klimov wrote: Either I misunderstand some of the above, or I fail to see how verification would eliminate this failure mode (namely, as per my suggestion, replace the bad block with a good one and have all refe

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Daniel Carosone
On Fri, Jan 13, 2012 at 05:16:36AM +0400, Jim Klimov wrote: > 2012-01-13 4:26, Richard Elling wrote: >> On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: >>> Alternatively (opportunistically), a flag might be set >>> in the DDT entry requesting that a new write mathching >>> this stored checksum shoul

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Daniel Carosone
On Thu, Jan 12, 2012 at 05:01:48PM -0800, Richard Elling wrote: > > This thread is about checksums - namely, now, what are > > our options when they mismatch the data? As has been > > reported by many blog-posts researching ZDB, there do > > happen cases when checksums are broken (i.e. bitrot in >

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Jim Klimov
2012-01-13 4:26, Richard Elling wrote: On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: Alternatively (opportunistically), a flag might be set in the DDT entry requesting that a new write mathching this stored checksum should get committed to disk - thus "repairing" all files which reference the b

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Richard Elling
On Jan 12, 2012, at 2:34 PM, Jim Klimov wrote: > I guess I have another practical rationale for a second > checksum, be it ECC or not: my scrubbing pool found some > "unrecoverable errors". Luckily, for those files I still > have external originals, so I rsynced them over. Still, > there is one fi

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Daniel Carosone
On Fri, Jan 13, 2012 at 04:48:44AM +0400, Jim Klimov wrote: > As Richard reminded me in another thread, both metadata > and DDT can contain checksums, hopefully of the same data > block. So for deduped data we may already have a means > to test whether the data or the checksum is incorrect... It's

[zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-12 Thread Jim Klimov
While reading about zfs on-disk formats, I wondered once again why is it not possible to create a snapshot on existing data, not of the current TXG but of some older point-in-time? From what I gathered, definition of a snapshot requires the cut-off TXG number existence of some blocks in this data

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Jim Klimov
2012-01-13 2:34, Jim Klimov wrote: I guess I have another practical rationale for a second checksum, be it ECC or not: my scrubbing pool found some "unrecoverable errors". ...Applications need to know whether the digest has been changed. As Richard reminded me in another thread, both metadata a

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Jim Klimov
2012-01-13 4:26, Richard Elling wrote: On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: As I recently wrote, my data pool has experienced some "unrecoverable errors". It seems that a userdata block of deduped data got corrupted and no longer matches the stored checksum. For whatever reason, raidz

Re: [zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Richard Elling
On Jan 12, 2012, at 4:12 PM, Jim Klimov wrote: > As I recently wrote, my data pool has experienced some > "unrecoverable errors". It seems that a userdata block > of deduped data got corrupted and no longer matches the > stored checksum. For whatever reason, raidz2 did not > help in recovery of th

[zfs-discuss] ZFS Dedup and bad checksums

2012-01-12 Thread Jim Klimov
As I recently wrote, my data pool has experienced some "unrecoverable errors". It seems that a userdata block of deduped data got corrupted and no longer matches the stored checksum. For whatever reason, raidz2 did not help in recovery of this data, so I rsync'ed the files over from another copy.

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread Jim Klimov
I guess I have another practical rationale for a second checksum, be it ECC or not: my scrubbing pool found some "unrecoverable errors". Luckily, for those files I still have external originals, so I rsynced them over. Still, there is one file whose broken prehistory is referenced in snapshots, an

Re: [zfs-discuss] Idea: ZFS and on-disk ECC for blocks

2012-01-12 Thread David Magda
On Wed, January 11, 2012 11:40, Nico Williams wrote: > I don't find this terribly attractive, but maybe I'm just not looking > at it the right way. Perhaps there is a killer enterprise feature for > ECC here: stretching MTTDL in the face of a device failure in a mirror > or raid-z configuration (

Re: [zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread Jim Klimov
2012-01-12 20:21, adele@oracle.com пишет: Jim and LaoTsao, Thanks a lot for your quick response. "zdb" might be good enough to get zpoll name or id from any disk in zool. CU will try it this afternoon. As of second situation - one host is AIX and the second disk is Solaris, CU wonders if "

Re: [zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread adele....@oracle.com
Jim and LaoTsao, Thanks a lot for your quick response. "zdb" might be good enough to get zpoll name or id from any disk in zool. CU will try it this afternoon. As of second situation - one host is AIX and the second disk is Solaris, CU wonders if "zdb" will show anything. I will also post

Re: [zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread Hung-Sheng Tsao (laoTsao)
if the disks are assigned by two hosts may be just do zpool import should see the zpool in other hosts? not sure as for AIX, control hdd, zpool will need partition that solaris could understand i do not know what AIX used for partition should not be the same for solaris Sent from my iPad On Ja

Re: [zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread Jim Klimov
Followup: namely, the 'hostname' field should report the host which has last (or currently) imported the pool, and the 'name' field is the pool name as of last import (can be changed by like "zpool import pool1 testpool2"). HTH, //Jim Klimov ___ zfs-dis

Re: [zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread Jim Klimov
Take a look at ZDB (the "ZFS Debugger"). Probably the "zdb -l" (label dump) option would suffice for your task, i.e.: # zdb -l /dev/dsk/c4t1d0s0 | egrep 'host|uid|name|devid|path' name: 'rpool' pool_guid: 12076177533503245216 hostid: 13583512 hostname: 'bofh-sol' top_guid: 179

[zfs-discuss] Do the disks in a zpool have a private region that I can read to get a zpool name or id?

2012-01-12 Thread adele....@oracle.com
Hi all, My cu has following question. Assume I have allocated a LUN from external storage to two hosts ( by mistake ). I create a zpool with this LUN on host1 with no errors. On host2 when I try to create a zpool by using the same disk ( which is allocated to host2 as well ), zpool create -