Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Ray Arachelian
On 11/16/2012 07:15 PM, Peter Jeremy wrote:
 I have been tracking down a problem with zfs diff that reveals
 itself variously as a hang (unkillable process), panic or error,
 depending on the ZFS kernel version but seems to be caused by
 corruption within the pool. I am using FreeBSD but the issue looks to
 be generic ZFS, rather than FreeBSD-specific.

 The hang and panic are related to the rw_enter() in
 opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()

 The error is:
 Unable to determine path or stats for object 2128453 in
tank/beckett/home@20120518: Invalid argument


Is the pool importing properly at least?  Maybe you can create another
volume and transfer the data over for that volume, then destroy it?

There are special things you can do with import where you can roll back
to a certain txg on the import if you know the damage is recent.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] suggestions for e-SATA HBA card on x86/x64

2012-11-19 Thread Jerry Kemp
Hello Gregg,

I acquired one of these

Intel RAID Controller Card SATA/SAS PCI-E x8 8internal ports (SASUC8I)

from your newegg link below, and then acquired the necessary cables to
get everything hooked up.  After multiple executions of devfsadm and
reconfigure boots, the OS see's one of my 4 drives.  The drives are 2 TB
Seagate drives.

Did you need to do anything special to get your card to work correctly?
 Did you need to do a firmware upgrade or anything?

I am running an up-to-date version of OpenIndiana b151a7.

Thank you,

Jerry




On 10/26/12 10:02 AM, Gregg Wonderly wrote:
 I've been using this card
 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16816117157
 
 for my Solaris/Open Indiana installations because it has 8 ports.  One of the 
 issues that this card seems to have, is that certain failures can cause other 
 secondary problems in other drives on the same SAS connector.  I use mirrors 
 for my storage machines with 4 pairs, and just put half the mirror on one 
 side and the other drive on the other side.  This, in general, has solved my 
 problems.  When a drive fails, I might see more than one drive no 
 functioning.  I can remove (I use hot swap bays such as 
 http://www.newegg.com/Product/Product.aspx?Item=N82E16817994097) a drive, and 
 restore the other to the pool to find which of the failed drives is actually 
 the problem.  What had happened before, was that my case was not moving 
 enough air, and the hot drives had caused odd problems with failure.
 
 For the money, and the experience I have with these controllers, I'd still 
 use them, they are 3GBs controllers.  If you want 6GBs controllers, then some 
 of the other suggestions might be a better choice for you.
 
 Gregg
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Peter Jeremy
On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote:
Is the pool importing properly at least?  Maybe you can create another
volume and transfer the data over for that volume, then destroy it?

The pool is imported and passes all tests except zfs diff.  Creating
another pool _is_ an option but I'm not sure how to transfer the data
across - using zfs send | zfs recv replicates the corruption and
tar -c | tar -x loses all the snapshots.

There are special things you can do with import where you can roll back
to a certain txg on the import if you know the damage is recent.

The damage exists in the oldest snapshot for that filesystem.

-- 
Peter Jeremy


pgpxQIIBICxmG.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Freddie Cash
On Mon, Nov 19, 2012 at 9:03 AM, Peter Jeremy pe...@rulingia.com wrote:
 On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote:
Is the pool importing properly at least?  Maybe you can create another
volume and transfer the data over for that volume, then destroy it?

 The pool is imported and passes all tests except zfs diff.  Creating
 another pool _is_ an option but I'm not sure how to transfer the data
 across - using zfs send | zfs recv replicates the corruption and
 tar -c | tar -x loses all the snapshots.

Create new pool.
Create new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname/ to new filesystem
Snapshot new filesystem.
rsync data from /path/to/filesystem/.zfs/snapshot/snapname+1/ to new filesystem
Snapshot new filesystem

See if zfs diff works.

If it does, repeat the rsync/snapshot steps for the rest of the snapshots.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Ray Arachelian
On 11/19/2012 12:03 PM, Peter Jeremy wrote:
 On 2012-Nov-19 11:02:06 -0500, Ray Arachelian r...@arachelian.com wrote:


 The damage exists in the oldest snapshot for that filesystem.


Are you able to delete that snapshot?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Peter Jeremy
On 2012-Nov-19 13:47:01 -0500, Ray Arachelian r...@arachelian.com wrote:
On 11/19/2012 12:03 PM, Peter Jeremy wrote:
 The damage exists in the oldest snapshot for that filesystem.
Are you able to delete that snapshot?

Yes but it has no effect - the corrupt object exists in the current
pool so deleting an old snapshot has no effect.

What I was hoping was that someone would have a suggestion on removing
the corruption in-place - using zdb, zhack or similar.

-- 
Peter Jeremy


pgpjDHtffkLE5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Mark Shellenbaum

On 11/16/12 17:15, Peter Jeremy wrote:

I have been tracking down a problem with zfs diff that reveals
itself variously as a hang (unkillable process), panic or error,
depending on the ZFS kernel version but seems to be caused by
corruption within the pool.  I am using FreeBSD but the issue looks to
be generic ZFS, rather than FreeBSD-specific.

The hang and panic are related to the rw_enter() in
opensolaris/uts/common/fs/zfs/zap.c:zap_get_leaf_byblk()



There is probably nothing wrong with the snapshots.  This is a bug in 
ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for 
directory objects.  What you probably have is a file that was hard 
linked multiple times and the parent pointer (i.e. directory) was 
recycled and is now a file




The error is:
Unable to determine path or stats for object 2128453 in 
tank/beckett/home@20120518: Invalid argument

A scrub reports no issues:
root@FB10-64:~ # zpool status
   pool: tank
  state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
 still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on software that does not support 
feature
 flags.
   scan: scrub repaired 0 in 3h24m with 0 errors on Wed Nov 14 01:58:36 2012
config:

 NAMESTATE READ WRITE CKSUM
 tankONLINE   0 0 0
   ada2  ONLINE   0 0 0

errors: No known data errors

But zdb says that object is the child of a plain file - which isn't sane:

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2128453
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, 
rootbp DVA[0]=0:266a0efa00:200  DVA[1]=0:31b07fbc00:200  [L0 DMU objset] 
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

 Object  lvl   iblk   dblk  dsize  lsize   %full  type
2128453116K  1.50K  1.50K  1.50K  100.00  ZFS plain file
 264   bonus  ZFS znode
 dnode flags: USED_BYTES USERUSED_ACCOUNTED
 dnode maxblkid: 0
 path???object#2128453
 uid 1000
 gid 1000
 atime   Fri Mar 23 16:34:52 2012
 mtime   Sat Oct 22 16:13:42 2011
 ctime   Sun Oct 23 21:09:02 2011
 crtime  Sat Oct 22 16:13:42 2011
 gen 2237174
 mode100444
 size1089
 parent  2242171
 links   1
 pflags  4080004
 xattr   0
 rdev0x

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 2242171
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, 
rootbp DVA[0]=0:266a0efa00:200  DVA[1]=0:31b07fbc00:200  [L0 DMU objset] 
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

 Object  lvl   iblk   dblk  dsize  lsize   %full  type
2242171316K   128K  25.4M  25.5M  100.00  ZFS plain file
 264   bonus  ZFS znode
 dnode flags: USED_BYTES USERUSED_ACCOUNTED
 dnode maxblkid: 203
 path/jashank/Pictures/sch/pdm-a4-11/stereo-pair-2.png
 uid 1000
 gid 1000
 atime   Fri Mar 23 16:41:53 2012
 mtime   Mon Oct 24 21:15:56 2011
 ctime   Mon Oct 24 21:15:56 2011
 crtime  Mon Oct 24 21:15:37 2011
 gen 2286679
 mode100644
 size26625731
 parent  7001490
 links   1
 pflags  4080004
 xattr   0
 rdev0x

root@FB10-64:~ # zdb -vvv tank/beckett/home@20120518 7001490
Dataset tank/beckett/home@20120518 [ZPL], ID 605, cr_txg 8379, 143G, 2026419 objects, 
rootbp DVA[0]=0:266a0efa00:200  DVA[1]=0:31b07fbc00:200  [L0 DMU objset] 
fletcher4 lzjb LE contiguous unique double size=800L/200P birth=8375L/8375P fill=2026419 
cksum=1acdb1fbd9:93bf9c61e94:1b35c72eb8adb:389743898e4f79

 Object  lvl   iblk   dblk  dsize  lsize   %full  type
7001490116K512 1K512  100.00  ZFS directory
 264   bonus  ZFS znode
 dnode flags: USED_BYTES USERUSED_ACCOUNTED
 dnode maxblkid: 0
 path/jashank/Pictures/sch/pdm-a4-11
 uid 1000
 gid 1000
 atime   Thu May 17 03:38:32 2012
 mtime   Mon Oct 24 21:15:37 2011
 ctime   Mon Oct 24 21:15:37 2011
 crtime  Fri Oct 14 22:17:44 2011
 gen 2088407
 mode40755
 size6
 parent  6370559
 links   2
 pflags  4080144
 xattr   0
 rdev0x
 microzap: 512 bytes, 4 entries

 

Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Jim Klimov

On 2012-11-19 20:28, Peter Jeremy wrote:

Yep - that's the fallback solution.  With 1874 snapshots spread over 54
filesystems (including a couple of clones), that's a major undertaking.
(And it loses timestamp information).


Well, as long as you have and know the base snapshots for the clones,
you can recreate them at the same branching point on the new copy too.

Remember to use something like rsync -cavPHK --delete-after --inplace 
src/ dst/ to do the copy, so that the files removed from the source

snapshot are removed on target, the changes are detected thanks to
file checksum verification (not only size and timestamp), and changes
take place within the target's copy of the file (not as rsync's default
copy-and-rewrite) in order for the retained snapshots history to remain
sensible and space-saving.

Also, while you are at it, you can use different settings on the new
pool, based on your achieved knowledge of your data - perhaps using
better compression (IMHO stale old data that became mostly read-only
is a good candidate for gzip-9), setting proper block sizes for files
of databases and disk images, maybe setting better checksums, and if
your RAM vastness and data similarity permit - perhaps employing dedup
(run zdb -S on source pool to simulate dedup and see if you get any
better than 3x savings - then it may become worthwhile).

But, yes, this will take quite a while to effectively walk your pool
several thousand times, if you do the plain rsync from each snapdir.
Perhaps, if the zfs diff does perform reasonably for you, you can
feed its output as the list of objects to replicate in rsync's input
and save many cycles this way.

Good luck,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Jim Klimov

On 2012-11-19 20:58, Mark Shellenbaum wrote:

There is probably nothing wrong with the snapshots.  This is a bug in
ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for
directory objects.  What you probably have is a file that was hard
linked multiple times and the parent pointer (i.e. directory) was
recycled and is now a file


Interesting... do the ZPL files in ZFS keep pointers to parents?

How in the COW transactiveness could the parent directory be
removed, and not the pointer to it from the files inside it?
Is this possible in current ZFS, or could this be a leftover
in the pool from its history with older releases?

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Jim Klimov

Oh, and one more thing: rsync is only good if your filesystems don't
really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck
with Solaris tar or Solaris cpio to carry the files over, or you have
to script up replication of ACLs after rsync somehow.

You should also replicate the local zfs attributes of your datasets,
zfs allow permissions, ACLs on .zfs/shares/* (if any, for CIFS) -
at least of their currently relevant live copies, which is also not a
fatally difficult scripting (I don't know if it is possible to fetch
the older attribute values from snapshots - which were in force at
that past moment of time; if somebody knows anything on this - plz
write).

On another note, to speed up the rsyncs, you can try to save on the
encryption (if you do this within a trusted LAN) - use rsh, or ssh
with arcfour or none enc. algos, or perhaps rsync over NFS as
if you are in the local filesystem.

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Tomas Forsman
On 19 November, 2012 - Jim Klimov sent me these 1,1K bytes:

 Oh, and one more thing: rsync is only good if your filesystems don't
 really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck
 with Solaris tar or Solaris cpio to carry the files over, or you have
 to script up replication of ACLs after rsync somehow.

Ugly hack that seems to do the trick for us is to first rsync, then:

#!/usr/local/bin/perl -w

for my $oldfile (@ARGV) {
my $newfile = $oldfile;
$newfile =~ s{/export}{/newdir/export};

next if -l $oldfile;

open(F,-|,/bin/ls,-ladV,--,$oldfile);
my @a = F;
close(F);
my $crap = shift @a; # filename line
chomp(@a);
for (@a) {
$_ =~ s/ //g;
}
my $acl = join(,,@a);
system(/bin/chmod,A=.$acl,$newfile);
}


/bin/find /export -acl -print0 | xargs -0 /blah/aclcopy.pl


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of UmeƄ
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Mark Shellenbaum

On 11/19/12 1:14 PM, Jim Klimov wrote:

On 2012-11-19 20:58, Mark Shellenbaum wrote:

There is probably nothing wrong with the snapshots.  This is a bug in
ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for
directory objects.  What you probably have is a file that was hard
linked multiple times and the parent pointer (i.e. directory) was
recycled and is now a file


Interesting... do the ZPL files in ZFS keep pointers to parents?



The parent pointer for hard linked files is always set to the last link 
to be created.


$ mkdir dir.1
$ mkdir dir.2
$ touch dir.1/a
$ ln dir.1/a dir.2/a.linked
$ rm -rf dir.2

Now the parent pointer for a will reference a removed directory.

The parent pointer is a single 64 bit quantity that can't track all the 
possible parents a hard linked file could have.


Now when the original dir.2 object number is recycled you could have a 
situation where the parent pointer for points to a non-directory.


The ZPL never uses the parent pointer internally.  It is only used by 
zfs diff and other utility code to translate object numbers to full 
pathnames.  The ZPL has always set the parent pointer, but it is more 
for debugging purposes.



How in the COW transactiveness could the parent directory be
removed, and not the pointer to it from the files inside it?
Is this possible in current ZFS, or could this be a leftover
in the pool from its history with older releases?

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Jim Klimov

On 2012-11-19 22:38, Mark Shellenbaum wrote:

The parent pointer is a single 64 bit quantity that can't track all the
possible parents a hard linked file could have.


I believe it is inode number of the parent, or similar to that - and
an available inode number can get recycled and used by newer objects?


Now when the original dir.2 object number is recycled you could have a
situation where the parent pointer for points to a non-directory.

The ZPL never uses the parent pointer internally.  It is only used by
zfs diff and other utility code to translate object numbers to full
pathnames.  The ZPL has always set the parent pointer, but it is more
for debugging purposes.


Thanks, very interesting!

Now that this value is used and somewhat exposed to users, isn't it
time to replace it with some nvlist or a different object type that
would hold all such parent pointers for hardlinked files (perhaps,
when moving from a single integer to nvlist if we have more than one
link from a directory to a file inode)? At least, it would make zdiff
more consistent and reliable, though at a cost of some complexity...
inodes do already track their reference counts. If we keep track of
one referrer explicitly, why not track them all?

Thanks for info,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Peter Jeremy
On 2012-Nov-19 21:10:56 +0100, Jim Klimov jimkli...@cos.ru wrote:
On 2012-11-19 20:28, Peter Jeremy wrote:
 Yep - that's the fallback solution.  With 1874 snapshots spread over 54
 filesystems (including a couple of clones), that's a major undertaking.
 (And it loses timestamp information).

Well, as long as you have and know the base snapshots for the clones,
you can recreate them at the same branching point on the new copy too.

Yes, it's just painful.

Also, while you are at it, you can use different settings on the new
pool, based on your achieved knowledge of your data

This pool has a rebuild in its future anyway so I have this planned.
 - perhaps using
better compression (IMHO stale old data that became mostly read-only
is a good candidate for gzip-9), setting proper block sizes for files
of databases and disk images, maybe setting better checksums, and if

your RAM vastness and data similarity permit - perhaps employing dedup

After reading the horror stories and reading up on how dedupe works,
this is definitely not on the list.

(run zdb -S on source pool to simulate dedup and see if you get any
better than 3x savings - then it may become worthwhile).

Not without lots more RAM - and that would mean a whole new box.

Perhaps, if the zfs diff does perform reasonably for you, you can
feed its output as the list of objects to replicate in rsync's input
and save many cycles this way.

The starting point of this saga was that zfs diff failed, so that
isn't an option.

On 2012-Nov-19 21:24:19 +0100, Jim Klimov jimkli...@cos.ru wrote:
fatally difficult scripting (I don't know if it is possible to fetch
the older attribute values from snapshots - which were in force at
that past moment of time; if somebody knows anything on this - plz
write).

The best way to identify past attributes is probably to parse
zfs history, though that won't help for received attributes.

-- 
Peter Jeremy


pgpgjjcrpOhyK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Peter Jeremy
On 2012-Nov-19 14:38:30 -0700, Mark Shellenbaum mark.shellenb...@oracle.com 
wrote:
On 11/19/12 1:14 PM, Jim Klimov wrote:
 On 2012-11-19 20:58, Mark Shellenbaum wrote:
 There is probably nothing wrong with the snapshots.  This is a bug in
 ZFS diff.  The ZPL parent pointer is only guaranteed to be correct for
 directory objects.  What you probably have is a file that was hard
 linked multiple times and the parent pointer (i.e. directory) was
 recycled and is now a file

Ah.  Thank you for that.  I knew about the parent pointer, I wasn't
aware that ZFS didn't manage it correctly.

The parent pointer for hard linked files is always set to the last link 
to be created.

$ mkdir dir.1
$ mkdir dir.2
$ touch dir.1/a
$ ln dir.1/a dir.2/a.linked
$ rm -rf dir.2

Now the parent pointer for a will reference a removed directory.

I've done some experimenting and confirmod this behaviour.  I gather
zdb bypasses ARC because the change of parent pointer after the ln(1)
only becames visible after a sync.

The ZPL never uses the parent pointer internally.  It is only used by 
zfs diff and other utility code to translate object numbers to full 
pathnames.  The ZPL has always set the parent pointer, but it is more 
for debugging purposes.

I didn't realise that.  I agree that the above scenario can't be
tracked with a single parent pointer but I assumed that ZFS reset the
parent to unknown rather than leaving it as a pointer to a random
no-longer-valid object.

This probably needs to be documented as a caveat on zfs diff -
especially since it can cause hangs and panics with older kernel code.

-- 
Peter Jeremy


pgpFsaNn4GfUQ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss