Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/16/2012 07:15 PM, Peter Jeremy wrote: I have been tracking down a problem with zfs diff that reveals itself variously as a hang (unkillable process), panic or error, depending on the ZFS kernel version but seems to be caused by corruption within the pool. I am using FreeBSD but the issue looks to be generic ZFS, rather than FreeBSD-specific. The hang and panic are related to the rw_enter() in opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() The error is: Unable to determine path or stats for object 2128453 in tank/beckett/home@20120518: Invalid argument Is the pool importing properly at least? Maybe you can create another volume and transfer the data over for that volume, then destroy it? There are special things you can do with import where you can roll back to a certain txg on the import if you know the damage is recent. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64
Hello Gregg, I acquired one of these Intel RAID Controller Card SATA/SAS PCI-E x8 8internal ports (SASUC8I) from your newegg link below, and then acquired the necessary cables to get everything hooked up. After multiple executions of devfsadm and reconfigure boots, the OS see's one of my 4 drives. The drives are 2 TB Seagate drives. Did you need to do anything special to get your card to work correctly? Did you need to do a firmware upgrade or anything? I am running an up-to-date version of OpenIndiana b151a7. Thank you, Jerry On 10/26/12 10:02 AM, Gregg Wonderly wrote: I've been using this card http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157 for my Solaris/Open Indiana installations because it has 8 ports. One of the issues that this card seems to have, is that certain failures can cause other secondary problems in other drives on the same SAS connector. I use mirrors for my storage machines with 4 pairs, and just put half the mirror on one side and the other drive on the other side. This, in general, has solved my problems. When a drive fails, I might see more than one drive no functioning. I can remove (I use hot swap bays such as http://www.newegg.com/Product/Product.aspx?Item=N82E16817994097) a drive, and restore the other to the pool to find which of the failed drives is actually the problem. What had happened before, was that my case was not moving enough air, and the hot drives had caused odd problems with failure. For the money, and the experience I have with these controllers, I'd still use them, they are 3GBs controllers. If you want 6GBs controllers, then some of the other suggestions might be a better choice for you. Gregg ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote: Is the pool importing properly at least? Maybe you can create another volume and transfer the data over for that volume, then destroy it? The pool is imported and passes all tests except zfs diff. Creating another pool _is_ an option but I'm not sure how to transfer the data across - using zfs send | zfs recv replicates the corruption and tar -c | tar -x loses all the snapshots. There are special things you can do with import where you can roll back to a certain txg on the import if you know the damage is recent. The damage exists in the oldest snapshot for that filesystem. -- Peter Jeremy pgpxQIIBICxmG.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On Mon, Nov 19, 2012 at 9:03 AM, Peter Jeremy pe...@rulingia.com wrote: On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote: Is the pool importing properly at least? Maybe you can create another volume and transfer the data over for that volume, then destroy it? The pool is imported and passes all tests except zfs diff. Creating another pool _is_ an option but I'm not sure how to transfer the data across - using zfs send | zfs recv replicates the corruption and tar -c | tar -x loses all the snapshots. Create new pool. Create new filesystem. rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem Snapshot new filesystem. rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem Snapshot new filesystem See if zfs diff works. If it does, repeat the rsync/snapshot steps for the rest of the snapshots. -- Freddie Cash fjwc...@gmail.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/19/2012 12:03 PM, Peter Jeremy wrote: On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote: The damage exists in the oldest snapshot for that filesystem. Are you able to delete that snapshot? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-Nov-19 13:47:01 -0500, Ray Arachelian r...@arachelian.com wrote: On 11/19/2012 12:03 PM, Peter Jeremy wrote: The damage exists in the oldest snapshot for that filesystem. Are you able to delete that snapshot? Yes but it has no effect - the corrupt object exists in the current pool so deleting an old snapshot has no effect. What I was hoping was that someone would have a suggestion on removing the corruption in-place - using zdb, zhack or similar. -- Peter Jeremy pgpjDHtffkLE5.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/16/12 17:15, Peter Jeremy wrote: I have been tracking down a problem with zfs diff that reveals itself variously as a hang (unkillable process), panic or error, depending on the ZFS kernel version but seems to be caused by corruption within the pool. I am using FreeBSD but the issue looks to be generic ZFS, rather than FreeBSD-specific. The hang and panic are related to the rw_enter() in opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk() There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file The error is: Unable to determine path or stats for object 2128453 in tank/beckett/home@20120518: Invalid argument A scrub reports no issues: root@FB10-64:~ # zpool status pool: tank state: ONLINE status: The pool is formatted using a legacy on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on software that does not support feature flags. scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 ada2 ONLINE 0 0 0 errors: No known data errors But zdb says that object is the child of a plain file - which isn't sane: root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2128453116K 1.50K 1.50K 1.50K 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path???object#2128453 uid 1000 gid 1000 atime Fri Mar 23 16:34:52 2012 mtime Sat Oct 22 16:13:42 2011 ctime Sun Oct 23 21:09:02 2011 crtime Sat Oct 22 16:13:42 2011 gen 2237174 mode100444 size1089 parent 2242171 links 1 pflags 4080004 xattr 0 rdev0x root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 2242171316K 128K 25.4M 25.5M 100.00 ZFS plain file 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 203 path/jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png uid 1000 gid 1000 atime Fri Mar 23 16:41:53 2012 mtime Mon Oct 24 21:15:56 2011 ctime Mon Oct 24 21:15:56 2011 crtime Mon Oct 24 21:15:37 2011 gen 2286679 mode100644 size26625731 parent 7001490 links 1 pflags 4080004 xattr 0 rdev0x root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490 Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, rootbp DVA[0]=0:266a0efa00:200 DVA[1]=0:31b07fbc00:200 [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79 Object lvl iblk dblk dsize lsize %full type 7001490116K512 1K512 100.00 ZFS directory 264 bonus ZFS znode dnode flags: USED_BYTES USERUSED_ACCOUNTED dnode maxblkid: 0 path/jashank/Pictures/sch/pdm-a4-11 uid 1000 gid 1000 atime Thu May 17 03:38:32 2012 mtime Mon Oct 24 21:15:37 2011 ctime Mon Oct 24 21:15:37 2011 crtime Fri Oct 14 22:17:44 2011 gen 2088407 mode40755 size6 parent 6370559 links 2 pflags 4080144 xattr 0 rdev0x microzap: 512 bytes, 4 entries
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-11-19 20:28, Peter Jeremy wrote: Yep - that's the fallback solution. With 1874 snapshots spread over 54 filesystems (including a couple of clones), that's a major undertaking. (And it loses timestamp information). Well, as long as you have and know the base snapshots for the clones, you can recreate them at the same branching point on the new copy too. Remember to use something like rsync -cavPHK --delete-after --inplace src/ dst/ to do the copy, so that the files removed from the source snapshot are removed on target, the changes are detected thanks to file checksum verification (not only size and timestamp), and changes take place within the target's copy of the file (not as rsync's default copy-and-rewrite) in order for the retained snapshots history to remain sensible and space-saving. Also, while you are at it, you can use different settings on the new pool, based on your achieved knowledge of your data - perhaps using better compression (IMHO stale old data that became mostly read-only is a good candidate for gzip-9), setting proper block sizes for files of databases and disk images, maybe setting better checksums, and if your RAM vastness and data similarity permit - perhaps employing dedup (run zdb -S on source pool to simulate dedup and see if you get any better than 3x savings - then it may become worthwhile). But, yes, this will take quite a while to effectively walk your pool several thousand times, if you do the plain rsync from each snapdir. Perhaps, if the zfs diff does perform reasonably for you, you can feed its output as the list of objects to replicate in rsync's input and save many cycles this way. Good luck, //Jim Klimov ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-11-19 20:58, Mark Shellenbaum wrote: There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file Interesting... do the ZPL files in ZFS keep pointers to parents? How in the COW transactiveness could the parent directory be removed, and not the pointer to it from the files inside it? Is this possible in current ZFS, or could this be a leftover in the pool from its history with older releases? Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
Oh, and one more thing: rsync is only good if your filesystems don't really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck with Solaris tar or Solaris cpio to carry the files over, or you have to script up replication of ACLs after rsync somehow. You should also replicate the local zfs attributes of your datasets, zfs allow permissions, ACLs on .zfs/shares/* (if any, for CIFS) - at least of their currently relevant live copies, which is also not a fatally difficult scripting (I don't know if it is possible to fetch the older attribute values from snapshots - which were in force at that past moment of time; if somebody knows anything on this - plz write). On another note, to speed up the rsyncs, you can try to save on the encryption (if you do this within a trusted LAN) - use rsh, or ssh with arcfour or none enc. algos, or perhaps rsync over NFS as if you are in the local filesystem. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 19 November, 2012 - Jim Klimov sent me these 1,1K bytes: Oh, and one more thing: rsync is only good if your filesystems don't really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck with Solaris tar or Solaris cpio to carry the files over, or you have to script up replication of ACLs after rsync somehow. Ugly hack that seems to do the trick for us is to first rsync, then: #!/usr/local/bin/perl -w for my $oldfile (@ARGV) { my $newfile = $oldfile; $newfile =~ s{/export}{/newdir/export}; next if -l $oldfile; open(F,-|,/bin/ls,-ladV,--,$oldfile); my @a = F; close(F); my $crap = shift @a; # filename line chomp(@a); for (@a) { $_ =~ s/ //g; } my $acl = join(,,@a); system(/bin/chmod,A=.$acl,$newfile); } /bin/find /export -acl -print0 | xargs -0 /blah/aclcopy.pl /Tomas -- Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of UmeƄ `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 11/19/12 1:14 PM, Jim Klimov wrote: On 2012-11-19 20:58, Mark Shellenbaum wrote: There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file Interesting... do the ZPL files in ZFS keep pointers to parents? The parent pointer for hard linked files is always set to the last link to be created. $ mkdir dir.1 $ mkdir dir.2 $ touch dir.1/a $ ln dir.1/a dir.2/a.linked $ rm -rf dir.2 Now the parent pointer for a will reference a removed directory. The parent pointer is a single 64 bit quantity that can't track all the possible parents a hard linked file could have. Now when the original dir.2 object number is recycled you could have a situation where the parent pointer for points to a non-directory. The ZPL never uses the parent pointer internally. It is only used by zfs diff and other utility code to translate object numbers to full pathnames. The ZPL has always set the parent pointer, but it is more for debugging purposes. How in the COW transactiveness could the parent directory be removed, and not the pointer to it from the files inside it? Is this possible in current ZFS, or could this be a leftover in the pool from its history with older releases? Thanks, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-11-19 22:38, Mark Shellenbaum wrote: The parent pointer is a single 64 bit quantity that can't track all the possible parents a hard linked file could have. I believe it is inode number of the parent, or similar to that - and an available inode number can get recycled and used by newer objects? Now when the original dir.2 object number is recycled you could have a situation where the parent pointer for points to a non-directory. The ZPL never uses the parent pointer internally. It is only used by zfs diff and other utility code to translate object numbers to full pathnames. The ZPL has always set the parent pointer, but it is more for debugging purposes. Thanks, very interesting! Now that this value is used and somewhat exposed to users, isn't it time to replace it with some nvlist or a different object type that would hold all such parent pointers for hardlinked files (perhaps, when moving from a single integer to nvlist if we have more than one link from a directory to a file inode)? At least, it would make zdiff more consistent and reliable, though at a cost of some complexity... inodes do already track their reference counts. If we keep track of one referrer explicitly, why not track them all? Thanks for info, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-Nov-19 21:10:56 +0100, Jim Klimov jimkli...@cos.ru wrote: On 2012-11-19 20:28, Peter Jeremy wrote: Yep - that's the fallback solution. With 1874 snapshots spread over 54 filesystems (including a couple of clones), that's a major undertaking. (And it loses timestamp information). Well, as long as you have and know the base snapshots for the clones, you can recreate them at the same branching point on the new copy too. Yes, it's just painful. Also, while you are at it, you can use different settings on the new pool, based on your achieved knowledge of your data This pool has a rebuild in its future anyway so I have this planned. - perhaps using better compression (IMHO stale old data that became mostly read-only is a good candidate for gzip-9), setting proper block sizes for files of databases and disk images, maybe setting better checksums, and if your RAM vastness and data similarity permit - perhaps employing dedup After reading the horror stories and reading up on how dedupe works, this is definitely not on the list. (run zdb -S on source pool to simulate dedup and see if you get any better than 3x savings - then it may become worthwhile). Not without lots more RAM - and that would mean a whole new box. Perhaps, if the zfs diff does perform reasonably for you, you can feed its output as the list of objects to replicate in rsync's input and save many cycles this way. The starting point of this saga was that zfs diff failed, so that isn't an option. On 2012-Nov-19 21:24:19 +0100, Jim Klimov jimkli...@cos.ru wrote: fatally difficult scripting (I don't know if it is possible to fetch the older attribute values from snapshots - which were in force at that past moment of time; if somebody knows anything on this - plz write). The best way to identify past attributes is probably to parse zfs history, though that won't help for received attributes. -- Peter Jeremy pgpgjjcrpOhyK.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Repairing corrupted ZFS pool
On 2012-Nov-19 14:38:30 -0700, Mark Shellenbaum mark.shellenb...@oracle.com wrote: On 11/19/12 1:14 PM, Jim Klimov wrote: On 2012-11-19 20:58, Mark Shellenbaum wrote: There is probably nothing wrong with the snapshots. This is a bug in ZFS diff. The ZPL parent pointer is only guaranteed to be correct for directory objects. What you probably have is a file that was hard linked multiple times and the parent pointer (i.e. directory) was recycled and is now a file Ah. Thank you for that. I knew about the parent pointer, I wasn't aware that ZFS didn't manage it correctly. The parent pointer for hard linked files is always set to the last link to be created. $ mkdir dir.1 $ mkdir dir.2 $ touch dir.1/a $ ln dir.1/a dir.2/a.linked $ rm -rf dir.2 Now the parent pointer for a will reference a removed directory. I've done some experimenting and confirmod this behaviour. I gather zdb bypasses ARC because the change of parent pointer after the ln(1) only becames visible after a sync. The ZPL never uses the parent pointer internally. It is only used by zfs diff and other utility code to translate object numbers to full pathnames. The ZPL has always set the parent pointer, but it is more for debugging purposes. I didn't realise that. I agree that the above scenario can't be tracked with a single parent pointer but I assumed that ZFS reset the parent to unknown rather than leaving it as a pointer to a random no-longer-valid object. This probably needs to be documented as a caveat on zfs diff - especially since it can cause hangs and panics with older kernel code. -- Peter Jeremy pgpFsaNn4GfUQ.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss