[zfs-discuss] ZFS Dataset lost structure

2010-09-17 Thread Valerio Piancastelli
After a crash, in my zpool tree, some dataset report this we i do a ls -la:

brwxrwxrwx  2  777 root 0, 0 Oct 18  2009 mail-cts

also if i set 

zfs set mountpoint=legacy dataset

and then i mount the dataset to other location

before the directory tree was only :

dataset
- vdisk.raw

The file was a backing device of a Xen VM, but i cannot access the directory 
structure of this dataset.
However i can send a snapshot of this dataset to another system, but the same 
behavior occurs.

If i do 
zdb - dataset
at the end of the output i can se the references to my file:

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 7516K   128K   149G   256G   58.26  ZFS plain file
264   bonus  ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED 
dnode maxblkid: 2097152
path/vdisk.raw
uid 777
gid 60001
atime   Sun Oct 18 00:49:05 2009
mtime   Thu Sep  9 16:22:14 2010
ctime   Thu Sep  9 16:22:14 2010
crtime  Sun Oct 18 00:49:05 2009
gen 53
mode100777
size274877906945
parent  3
links   1
pflags  4080104
xattr   0
rdev0x

if i further investigate:

zdb -d dataset 7

Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 
objects, rootbp DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D
MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
birth=182119L/182119P fill=5 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7
a1de2c

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 7516K   128K   149G   256G   58.26  ZFS plain file
264   bonus  ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED 
dnode maxblkid: 2097152
path/vdisk.raw
uid 777
gid 60001
atime   Sun Oct 18 00:49:05 2009
mtime   Thu Sep  9 16:22:14 2010
ctime   Thu Sep  9 16:22:14 2010
crtime  Sun Oct 18 00:49:05 2009
gen 53
mode100777
size274877906945
parent  3
links   1
pflags  4080104
xattr   0
rdev0x
Indirect blocks:
   0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453
   0  L31:65022f8a00:2000 4000L/2000P F=1221766 B=177453/177453
   0   L2   1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453
   0L1  1:6530718400:1600 4000L/1600P F=128 B=177453/177453
   0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453
   2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830
   4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830
   6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830
   8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830
   a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830
   c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830
   e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830
  10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830
  12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830
  14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830
  16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830
  18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830
  1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830
  1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830
  1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830
  20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830
  22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830
  24 L0 0:3c419ac600:2 2L/2P F=1 B=91830/91830
  26 L0 0:3c419cc600:2 2L/2P F=1 B=91830/91830
  28 L0 0:3c419ec600:2 2L/2P F=1 B=91830/91830
  2a L0 0:3c41a0c600:2 2L/2P F=1 B=91830/91830
 
 .. many more lines till 149G

It seems all data blocks are there.

Any ideas on hot to recover from this situation?


Valerio Piancastelli
piancaste...@iclos.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Re: [zfs-discuss] resilver = defrag?

2010-09-17 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Dyer-Bennet

  For example, if you start with an empty drive, and you write a large
  amount
  of data to it, you will have no fragmentation.  (At least, no
 significant
  fragmentation; you may get a little bit based on random factors.)  As
 life
  goes on, as long as you keep plenty of empty space on the drive,
 there's
  never any reason for anything to become significantly fragmented.
 
 Sure, if only a single thread is ever writing to the disk store at a
 time.

This has already been discussed in this thread.

The threading model doesn't affect the outcome of files being fragmented or
unfragmented on disk.  The OS is smart enough to know these blocks writen
by process A are all sequential, and those blocks all written by process B
are also sequential, but separate.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-17 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Marty Scholes
  
 What appears to be missing from this discussion is any shred of
 scientific evidence that fragmentation is good or bad and by how much.
 We also lack any detail on how much fragmentation does take place.

Agreed.  I've been rather lazily asserting a few things here and there that
I expected to be challenged, so I've been thinking up tests to
verify/dispute my claims, but then nobody challenged.  Specifically, the
blocks on disk are not interleaved just because multiple threads were
writing at the same time.

So there's at least one thing which is testable, if anyone cares.

But there's also no way that I know of, to measure fragmentation in a real
system that's been in production for a year.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-17 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bryan Horstmann-Allen

 The ability to remove the slogs isn't really the win here, it's import
 -F. The

Disagree.

Although I agree the -F is important and good, I think the log device
removal is the main win.  Prior to log device removal, if you lose your
slog, then you lose your whole pool, and probably your system halts (or does
something equally bad, which isn't strictly halting).  Therefore you want
your slog to be as redundant as the rest of your pool.

With log device removal, if you lose a slog while the system is up, worst
case is performance degradation.

With log device removal, there's only one thing you have to worry about:
Your slog goes bad, and undetected.  So the system keeps writing to it,
unaware that it will never be able to read, and therefore when you get a
system crash, and for the first time your system tries to read that device,
you lose information.  Not your whole pool.  You lose up to 30 sec of writes
that the system thought it wrote, but never did.  You require the -F to
import.

Historically, people always recommend mirroring your log device, even with
log device removal, to protect against the above situation.  But in a recent
conversation including Neil, it seems there might be a bug which causes the
log device mirror to be ignored during import, thus rendering the mirror
useless in the above situation.

Neil, or anyone, is there any confirmation or development on that bug?

Given all of this, I would say it's recommended to forget about mirroring
log devices for now.  In the past, the recommendation was Yes mirror.
Right now, it's No don't mirror, and after the bug is fixed, the
recommendation will again become Yes, mirror.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] resilver that never finishes

2010-09-17 Thread Tom Bird

Morning,

c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it?

This is actually an old capture of the status output, it got to nearly 
10T before deciding that there was an error and not completing, reseat 
disk and it's doing it all again.


It's happened on another pool as well, looking at a load av of around 40 
on the box currently, just sitting there churning disk IO.


OS is snv_134 on x86.

# zpool status -x
  pool: content4
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas 
exist for

the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: resilver in progress for 147h39m, 100.00% done, 0h0m to go
config:

NAME STATE READ WRITE CKSUM
content4 DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
c7t5000CCA221DE1E1Dd0ONLINE   0 0 0
c7t5000CCA221DE17BFd0ONLINE   0 0 0
c7t5000CCA221DE2229d0ONLINE   0 0 0
replacing-3  DEGRADED 0 0 0
  c7t5000CCA221DE0CC7d0  UNAVAIL  0 0 0  cannot 
open
  c7t5000CCA221F4EC54d0  ONLINE   0 0 0  5.63T 
resilvered

c7t5000CCA221DE200Ad0ONLINE   0 0 0
c7t5000CCA221DDFE6Ed0ONLINE   0 0 0
c7t5000CCA221DE0103d0ONLINE   0 0 0
c7t5000CCA221DE00C9d0ONLINE   0 0 0
c7t5000CCA221DE0D2Dd0ONLINE   0 0 0
c7t5000CCA221DE189Cd0ONLINE   0 0 0
c7t5000CCA221DE18A7d0ONLINE   0 0 0
c7t5000CCA221DE2A47d0ONLINE   0 0 0
c7t5000CCA221DE1E48d0ONLINE   0 0 0
c7t5000CCA221DE18A1d0ONLINE   0 0 0
c7t5000CCA221DE18A2d0ONLINE   0 0 0
c7t5000CCA221DE2A3Ed0ONLINE   0 0 0
c7t5000CCA221DE2A42d0ONLINE   0 0 0
c7t5000CCA221DE2225d0UNAVAIL  0 0 0  cannot 
open

c7t5000CCA221DE28A3d0ONLINE   0 0 0
c7t5000CCA221DE2A46d0ONLINE   0 0 0
c7t5000CCA221DE0789d0ONLINE   0 0 0
c7t5000CCA221DE221Dd0ONLINE   0 0 0
c7t5000CCA221DE054Fd0ONLINE   0 0 0
c7t5000CCA221DE2EBEd0ONLINE   0 0 0

errors: No known data errors

--
Tom

// www.portfast.co.uk
// hosted services, domains, virtual machines, consultancy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dataset lost structure

2010-09-17 Thread Victor Latushkin

What OpenSolaris build are you running?

victor

On 17.09.10 13:53, Valerio Piancastelli wrote:

After a crash, in my zpool tree, some dataset report this we i do a ls -la:

brwxrwxrwx  2  777 root 0, 0 Oct 18  2009 mail-cts

also if i set 


zfs set mountpoint=legacy dataset

and then i mount the dataset to other location

before the directory tree was only :

dataset
- vdisk.raw

The file was a backing device of a Xen VM, but i cannot access the directory 
structure of this dataset.
However i can send a snapshot of this dataset to another system, but the same 
behavior occurs.

If i do 
zdb - dataset

at the end of the output i can se the references to my file:

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 7516K   128K   149G   256G   58.26  ZFS plain file
264   bonus  ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED 
dnode maxblkid: 2097152

path/vdisk.raw
uid 777
gid 60001
atime   Sun Oct 18 00:49:05 2009
mtime   Thu Sep  9 16:22:14 2010
ctime   Thu Sep  9 16:22:14 2010
crtime  Sun Oct 18 00:49:05 2009
gen 53
mode100777
size274877906945
parent  3
links   1
pflags  4080104
xattr   0
rdev0x

if i further investigate:

zdb -d dataset 7

Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 objects, rootbp 
DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D
MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
birth=182119L/182119P fill=5 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7
a1de2c

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 7516K   128K   149G   256G   58.26  ZFS plain file
264   bonus  ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED 
dnode maxblkid: 2097152

path/vdisk.raw
uid 777
gid 60001
atime   Sun Oct 18 00:49:05 2009
mtime   Thu Sep  9 16:22:14 2010
ctime   Thu Sep  9 16:22:14 2010
crtime  Sun Oct 18 00:49:05 2009
gen 53
mode100777
size274877906945
parent  3
links   1
pflags  4080104
xattr   0
rdev0x
Indirect blocks:
   0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453
   0  L31:65022f8a00:2000 4000L/2000P F=1221766 B=177453/177453
   0   L2   1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453
   0L1  1:6530718400:1600 4000L/1600P F=128 B=177453/177453
   0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453
   2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830
   4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830
   6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830
   8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830
   a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830
   c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830
   e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830
  10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830
  12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830
  14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830
  16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830
  18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830
  1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830
  1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830
  1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830
  20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830
  22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830
  24 L0 0:3c419ac600:2 2L/2P F=1 B=91830/91830
  26 L0 0:3c419cc600:2 2L/2P F=1 B=91830/91830
  28 L0 0:3c419ec600:2 2L/2P F=1 B=91830/91830
  2a L0 0:3c41a0c600:2 2L/2P F=1 B=91830/91830
 
 .. many more lines till 149G


It seems all data blocks are there.

Any ideas on hot to recover from this situation?


Valerio Piancastelli
piancaste...@iclos.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
--
Victor Latushkin   phone: x11467 / +74959370467
TSC-Kernel EMEAmobile: +78957693012
Sun Services, Moscow   blog: http://blogs.sun.com/vlatushkin
Sun Microsystems

Re: [zfs-discuss] ZFS Dataset lost structure

2010-09-17 Thread Valerio Piancastelli
with uname -a :

SunOS disk-01 5.11 snv_111b i86pc i386 i86pc Solaris

it is Opesolaris 2009.06


other useful info:

zfs list sas/mail-cts

NAME   USED  AVAIL  REFER  MOUNTPOINT
sas/mail-cts   149G   250G   149G  /sas/mail-cts



and with df

Filesystem   1K-blocks  Used Available Use% Mounted on
sas/mail-cts 418174037 156501827 261672210  38% /sas/mail-cts

Do you need any other infos?


Valerio Piancastelli
piancaste...@iclos.com

- Messaggio originale -
Da: Victor Latushkin victor.latush...@sun.com
A: Valerio Piancastelli piancaste...@iclos.com
Cc: zfs-discuss@opensolaris.org
Inviato: Venerdì, 17 settembre 2010 16:46:31
Oggetto: Re: [zfs-discuss] ZFS Dataset lost structure

What OpenSolaris build are you running?

victor

On 17.09.10 13:53, Valerio Piancastelli wrote:
 After a crash, in my zpool tree, some dataset report this we i do a ls -la:
 
 brwxrwxrwx  2  777 root 0, 0 Oct 18  2009 mail-cts
 
 also if i set 
 
 zfs set mountpoint=legacy dataset
 
 and then i mount the dataset to other location
 
 before the directory tree was only :
 
 dataset
 - vdisk.raw
 
 The file was a backing device of a Xen VM, but i cannot access the directory 
 structure of this dataset.
 However i can send a snapshot of this dataset to another system, but the same 
 behavior occurs.
 
 If i do 
 zdb - dataset
 at the end of the output i can se the references to my file:
 
 Object  lvl   iblk   dblk  dsize  lsize   %full  type
  7516K   128K   149G   256G   58.26  ZFS plain file
 264   bonus  ZFS znode
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 2097152
 path/vdisk.raw
 uid 777
 gid 60001
 atime   Sun Oct 18 00:49:05 2009
 mtime   Thu Sep  9 16:22:14 2010
 ctime   Thu Sep  9 16:22:14 2010
 crtime  Sun Oct 18 00:49:05 2009
 gen 53
 mode100777
 size274877906945
 parent  3
 links   1
 pflags  4080104
 xattr   0
 rdev0x
 
 if i further investigate:
 
 zdb -d dataset 7
 
 Dataset store/nfs/ICLOS/prod/mail-cts [ZPL], ID 4525, cr_txg 91826, 149G, 5 
 objects, rootbp DVA[0]=0:6654f24000:200 DVA[1]=1:1a1e3c3600:200 [L0 D
 MU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P 
 birth=182119L/182119P fill=5 
 cksum=177e7dd4cd:81ae6d143ee:1782c972431a0:2f927ca7
 a1de2c
 
 Object  lvl   iblk   dblk  dsize  lsize   %full  type
  7516K   128K   149G   256G   58.26  ZFS plain file
 264   bonus  ZFS znode
 dnode flags: USED_BYTES USERUSED_ACCOUNTED 
 dnode maxblkid: 2097152
 path/vdisk.raw
 uid 777
 gid 60001
 atime   Sun Oct 18 00:49:05 2009
 mtime   Thu Sep  9 16:22:14 2010
 ctime   Thu Sep  9 16:22:14 2010
 crtime  Sun Oct 18 00:49:05 2009
 gen 53
 mode100777
 size274877906945
 parent  3
 links   1
 pflags  4080104
 xattr   0
 rdev0x
 Indirect blocks:
0 L4 1:6543e22800:400 4000L/400P F=1221767 B=177453/177453
0  L31:65022f8a00:2000 4000L/2000P F=1221766 
 B=177453/177453
0   L2   1:65325a0400:1c00 4000L/1c00P F=16229 B=177453/177453
0L1  1:6530718400:1600 4000L/1600P F=128 B=177453/177453
0 L0 0:433c473a00:2 2L/2P F=1 B=177453/177453
2 L0 1:205c471600:2 2L/2P F=1 B=91830/91830
4 L0 0:3c418ac600:2 2L/2P F=1 B=91830/91830
6 L0 0:3c418cc600:2 2L/2P F=1 B=91830/91830
8 L0 0:3c418ec600:2 2L/2P F=1 B=91830/91830
a L0 0:3c4190c600:2 2L/2P F=1 B=91830/91830
c L0 0:3c4192c600:2 2L/2P F=1 B=91830/91830
e L0 0:3c4194c600:2 2L/2P F=1 B=91830/91830
   10 L0 0:3c4198c600:2 2L/2P F=1 B=91830/91830
   12 L0 0:3c4196c600:2 2L/2P F=1 B=91830/91830
   14 L0 1:205c491600:2 2L/2P F=1 B=91830/91830
   16 L0 1:205c4b1600:2 2L/2P F=1 B=91830/91830
   18 L0 1:205c4d1600:2 2L/2P F=1 B=91830/91830
   1a L0 1:205c4f1600:2 2L/2P F=1 B=91830/91830
   1c L0 1:205c511600:2 2L/2P F=1 B=91830/91830
   1e L0 1:205c531600:2 2L/2P F=1 B=91830/91830
   20 L0 1:205c551600:2 2L/2P F=1 B=91830/91830
   22 L0 1:205c571600:2 2L/2P F=1 B=91830/91830
   24 L0 0:3c419ac600:2 2L/2P F=1 

Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Bob Friesenhahn

On Fri, 17 Sep 2010, Tom Bird wrote:


Morning,

c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it?

This is actually an old capture of the status output, it got to nearly 10T 
before deciding that there was an error and not completing, reseat disk and 
it's doing it all again.


You have twice as many big slow drives in a raidz2 that any sane 
person would recommend.  It looks like you either have drives which 
are too weak to sustain resilvering a failed disk, or a chassis which 
is not strong enough.


Your only option seems to be to also replace c7t5000CCA221DE2225d0 and 
hope for the best.  Expect the replacement to take a very long time.


It is wise to restart the pool from scratch with multiple vdevs 
comprised of fewer devices.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] can ufs zones and zfs zones coexist on a single global zone

2010-09-17 Thread Gary Dunn
Looking at migrating zones built on an M8000 and M5000 to a new M9000. On the 
M9000 we started building new deployments using ZFS. The environments on the 
M8/M5 are UFS. these are whole root zones. they will use global zone resources. 

Can this be done? Or would a ZFS migration be needed? 

thank you,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-17 Thread Richard Elling
On Sep 16, 2010, at 12:33 PM, Marty Scholes wrote:

 David Dyer-Bennet wote:
 Sure, if only a single thread is ever writing to the
 disk store at a time.
 
 This situation doesn't exist with any kind of
 enterprise disk appliance,
 though; there are always multiple users doing stuff.
 
 Ok, I'll bite.
 
 Your assertion seems to be that any kind of enterprise disk appliance will 
 always have enough simultaneous I/O requests queued that any sequential read 
 from any application will be sufficiently broken up by requests from other 
 applications, effectively rendering all read requests as random.  If I follow 
 your logic, since all requests are essentially random anyway, then where they 
 fall on the disk is irrelevant.

Allan and Neel did a study of this for MySQL.
http://www.youtube.com/watch?v=a31NhwzlAxs
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Ian Collins

On 09/18/10 04:28 AM, Tom Bird wrote:

Bob Friesenhahn wrote:

On Fri, 17 Sep 2010, Tom Bird wrote:


Morning,

c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it?

This is actually an old capture of the status output, it got to 
nearly 10T before deciding that there was an error and not 
completing, reseat disk and it's doing it all again.


You have twice as many big slow drives in a raidz2 that any sane 
person would recommend.  It looks like you either have drives which 
are too weak to sustain resilvering a failed disk, or a chassis which 
is not strong enough.


The drives and the chassis are fine, what I am questioning is how can 
it be resilvering more data to a device than the capacity of the 
device?


Is the pool in use?  If so, data will be changing while the resliver is 
running.  With such a ridiculously wide vdev and large drives, the 
resliver will take a very very long time it complete.  if the pool is 
sufficiently busy, it may never complete.


Your only option seems to be to also replace c7t5000CCA221DE2225d0 
and hope for the best.  Expect the replacement to take a very long time.


It is wise to restart the pool from scratch with multiple vdevs 
comprised of fewer devices.


This stuff should just work, if it only rewrote the 2T that was meant 
to be on the drive the rebuild would take a day or so.


Bob's comments about the pool design are correct, you have a disaster 
waiting to happen.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Tom Bird
 

We recently had a long discussion in this list, about resilver times versus
raid types.  In the end, the conclusion was:  resilver code is very
inefficient for raidzN.  Someday it may be better optimized, but until that
day comes, you really need to break your giant raidzN into smaller vdev's.

3 vdev's of 7 disk raidz is preferable over a 21 disk raidz3.

If you want this resilver to complete, you should do anything you can to (a)
stop taking snapshots (b) don't scrub (c) stop all IO possible.  And be
patient.

Most people in your situation find it faster to zfs send to some other
storage, and then destroy  recreate the pool.  I know it stinks.  But
that's what you're facing.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-17 Thread David Magda

On Sep 17, 2010, at 20:32, Edward Ned Harvey wrote:


When did that become default?  Should I *ever* say 30 sec anymore?


June 8, 2010, revision  12586:b118bbd65be9:

http://src.opensolaris.org/source/history/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-17 Thread Neil Perrin

On 09/17/10 18:32, Edward Ned Harvey wrote:

From: Neil Perrin [mailto:neil.per...@oracle.com]


you lose information.  Not your whole pool.  You lose up to 
30 sec of writes
  

The default is  now 5 seconds (zfs_txg_timeout).



When did that become default?


It was changed more recently than I remember in snv_143 as part of a of 
set of
bug fixes: 6494473, 6743992, 6936821, 6956464. They were integrated on 
6/8/10.



  Should I *ever* say 30 sec anymore?
  


Well for versions before snv_143 then 30 seconds is correct.  I was just
giving a heads up that it has changed.


In my world, the oldest machine is 10u6.  (Except one machine named
dinosaur that is sol8)


  

I believe George responded on that thread that we do handle log mirrors
correctly.
That is, if one side fails to checksum a block we do indeed check the
other side.
I should have been more cautious with my concern. I think I said I
don't know if we handle
it correctly, and George confirmed we do. Sorry for the false alarm.



Great.  ;-)  Thank you.

So the recommendation is still to mirror log devices, because the
recommendation will naturally be ultra-conservative.  ;-)  The risk is far
smaller now than it was before.  So make up your own mind.  If you are
willing to risk 5sec or 30sec of data in the situation of (a) undetected
failed log device *and* (b) ungraceful system crash, then you are willing to
run with unmirrored log devices.  In no situation does the filesystem become
inconsistent or corrupt.  In the worst case, you have a filesystem which is
consistent with a valid filesystem state, a few seconds before the system
crash.  (Assuming you have a zpool recent enough to support log device
removal.)

  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for Sol10U9 ZIL -- mirrored or not?

2010-09-17 Thread Ian Collins

On 09/18/10 04:46 PM, Neil Perrin wrote:

On 09/17/10 18:32, Edward Ned Harvey wrote:

From: Neil Perrin [mailto:neil.per...@oracle.com]

 

you lose information.  Not your whole pool.  You lose up to
30 sec of writes
   

The default is  now 5 seconds (zfs_txg_timeout).
 


When did that become default?


It was changed more recently than I remember in snv_143 as part of a 
of set of
bug fixes: 6494473, 6743992, 6936821, 6956464. They were integrated on 
6/8/10.



   Should I *ever* say 30 sec anymore?
   


Well for versions before snv_143 then 30 seconds is correct.  I was just
giving a heads up that it has changed.



In the context of this thread, was the change integrated in update 9?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss