[zfs-discuss] Backing up ZFS metadata

2012-08-24 Thread Scott Aitken
Hi all,

I know the easiest answer to this question is don't do it in the first
place, and if you do, you should have a backup, however I'll ask it
regardless.

Is there a way to backup the ZFS metadata on each member device of a pool
to another device (possibly non-ZFS)?

I have recently read a discusson on this list regarding storing the working
metadata on off-data devices (mirrored I assume).  Is there a way today to
walk, and save, the metadata of an entire pool and save it somewhere?

The main motivation for the question is that I recently ruined a large raidz
pool buy overwriting the start and end of two member disks (and possibly some
data).  I assume that if I could have restored the lost metadata I could
have recovered most of the real data.

Thanks
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering lost labels on raidz member

2012-08-13 Thread Scott
Hi Saso,

thanks for your reply.

If all disks are the same, is the root pointer the same?

Also, is there a signature or something unique to the root block that I can
search for on the disk?  I'm going through the On-disk specification at the
moment.

Scott

On Mon, Aug 13, 2012 at 10:02:58AM +0200, Sa?o Kiselkov wrote:
 On 08/13/2012 10:00 AM, Sa?o Kiselkov wrote:
  On 08/13/2012 03:02 AM, Scott wrote:
  Hi all,
 
  I have a 5 disk raidz array in a state of disrepair.  Suffice to say three
  disks are ok, while two are missing all their labels.  (Both ends of the
  disks were overwritten).  The data is still intact.
  
  There are 4 labels on a zfs-labeled disk, two at the start and two at
  the end. Have all been overwritten?
 
 Just re-read your post again, and I realized my question here is
 redundant. Without the labels your data is toast.
 
 --
 Saso
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering lost labels on raidz member

2012-08-13 Thread Scott
Thanks again Saso,

at least I have closure :)

Scott

On Mon, Aug 13, 2012 at 11:24:55AM +0200, Sa?o Kiselkov wrote:
 On 08/13/2012 10:45 AM, Scott wrote:
  Hi Saso,
  
  thanks for your reply.
  
  If all disks are the same, is the root pointer the same?
 
 No.
 
  Also, is there a signature or something unique to the root block that I 
  can
  search for on the disk?  I'm going through the On-disk specification at the
  moment.
 
 Nope. The checksums are part of the blockpointer, and the root
 blockpointer is in the uberblock, which itself resides in the label. By
 overwriting the label you've essentially erased all hope of practically
 finding the root of the filesystem tree - not even checksumming all
 possible block combinations (of which there are quite a few) will help
 you here, because you have no checksums to compare them against.
 
 I'd love to be wrong, and I might be (I don't have as intimate a
 knowledge of ZFS' on-disk structure as I'd like), but from where I'm
 standing, your raidz vdev is essentially lost.
 
 --
 Saso
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering lost labels on raidz member

2012-08-13 Thread Scott
On Mon, Aug 13, 2012 at 10:40:45AM -0700, Richard Elling wrote:
 
 On Aug 13, 2012, at 2:24 AM, Sa?o Kiselkov wrote:
 
  On 08/13/2012 10:45 AM, Scott wrote:
  Hi Saso,
  
  thanks for your reply.
  
  If all disks are the same, is the root pointer the same?
  
  No.
  
  Also, is there a signature or something unique to the root block that I 
  can
  search for on the disk?  I'm going through the On-disk specification at the
  moment.
  
  Nope. The checksums are part of the blockpointer, and the root
  blockpointer is in the uberblock, which itself resides in the label. By
  overwriting the label you've essentially erased all hope of practically
  finding the root of the filesystem tree - not even checksumming all
  possible block combinations (of which there are quite a few) will help
  you here, because you have no checksums to compare them against.
  
  I'd love to be wrong, and I might be (I don't have as intimate a
  knowledge of ZFS' on-disk structure as I'd like), but from where I'm
  standing, your raidz vdev is essentially lost.
 
 The labels are not identical, because each contains the guid for the device.
 It is possible, though nontrivial, to recreate.
 
 That said, I've never seen a failure that just takes out only the ZFS labels.

You'd have to go out of your way to take out the labels.  Which is just what
I did (imagine: moving drives over to USB external enclosures, then putting
them onto a HP Raid controller (which overwrites the end of the disk) - which
also assumed that two disks should be automatically mirrored (if you miss the
5 second prompt where you can tell it not to).

Then try and recover the labels without really knowing what you're doing (my 
bad).

Suffice to say I have no confidence in the labels of two drives.  On OI I can
forcefully import the pool but with any file that lives on multiple disks (ie,
over a certain size), all I get is an I/O error.  Some of datasets also fail
to mount.

Thanks everyone for your input.

  -- richard
 
 --
 ZFS Performance and Training
 richard.ell...@richardelling.com
 +1-760-896-4422
 
 
 
 
 
 
 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recovering lost labels on raidz member

2012-08-12 Thread Scott
Hi all,

I have a 5 disk raidz array in a state of disrepair.  Suffice to say three
disks are ok, while two are missing all their labels.  (Both ends of the
disks were overwritten).  The data is still intact.

Unfortunately I don't have a zpool.cache either.

Is there a way to reconstruct the labels using the infomration from the 3
valid disks?

Thanks
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Corrupted pool: I/O error and Bad exchange descriptor

2012-07-16 Thread Scott Aitken
Hi all,

this is a follow up some help I was soliciting with my corrupted pool.

The short story is I can have no confidence in the quality in the labels on 2
of my 5 drive RAIDZ array.  For various reasons.

There is a possibility even that one drive has label of another (a mirroring
accident).

Anyhoo, for some odd reason, the drives finally mounted (they are actually
drive images on another ZFS pool which I have snapshotted).

When I imported the pool, ZFS complained that two of the datasets would not
mount, but the remainder did.

It seems that small files read ok.  (Perhaps small enough to fit on a single 
block -
hence probably mirrored and not striped.  Assuming my understanding of what
happens to small files is correct).

But on larger files I get:

root@openindiana-01:/ZP-8T-RZ1-01/incoming# cp httpd-error.log.zip /mnt2/
cp: reading `httpd-error.log.zip': I/O error

and on some directories:

root@openindiana-01:/ZP-8T-RZ1-01/usr# ls -al
cd ..ls: cannot access obj: Bad exchange descriptor
total 54
drwxr-xr-x  5 root root  5 2011-11-03 16:28 .
drwxr-xr-x 11 root root 11 2011-11-04 13:14 ..
??  ? ?? ?? obj
drwxr-xr-x 68 root root 83 2011-10-30 01:00 ports
drwxr-xr-x 22 root root 31 2011-09-25 02:00 src

Here is the zpool status output:

root@openindiana-01:/ZP-8T-RZ1-01# zpool status
 pool: ZP-8T-RZ1-01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
   corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
   entire pool from backup.
  see: http://www.sun.com/msg/ZFS-8000-8A
 scan: scrub in progress since Sat Nov  5 23:57:46 2011
   112G scanned out of 6.93T at 6.24M/s, 318h17m to go
   305M repaired, 1.57% done
config:

   NAME  STATE READ WRITE CKSUM
   ZP-8T-RZ1-01  DEGRADED 0 0  356K
 raidz1-0DEGRADED 0 0  722K
   12339070507640025002  UNAVAIL  0 0 0  was /dev/lofi/2
   /dev/lofi/5   DEGRADED 0 0 0  too many errors 
(repairing)
   /dev/lofi/4   DEGRADED 0 0 0  too many errors 
(repairing)
   /dev/lofi/3   DEGRADED 0 0 74.4K  too many errors 
(repairing)
   /dev/lofi/1   DEGRADED 0 0 0  too many errors 
(repairing)

All those errors may be caused by one disk actually owning the wrong label.
I'm not entirely sure.

Also, while it's complaining that /dev/lofi/2 is UNAVAIL, it certainly is.
Although it's probably not labelled with '12339070507640025002'.

I'd love to get some of my data back.  Any recovery is a bonus.

If anyone is keen, I have enabled SSH into the Open Indiana box
which I'm using to try and recovery the pool, so if you'd like to take a shot
please let me know.

Thanks in advance,
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Scott Aitken
On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:
  when you say remove the device, I assume you mean simply make it unavailable
  for import (I can't remove it from the vdev).
 
 Yes, that's what I meant.
 
  root@openindiana-01:/mnt# zpool import -d /dev/lofi
  ??pool: ZP-8T-RZ1-01
  ?? ??id: 9952605666247778346
  ??state: FAULTED
  status: One or more devices are missing from the system.
  action: The pool cannot be imported. Attach the missing
  ?? ?? ?? ??devices and try again.
  ?? see: http://www.sun.com/msg/ZFS-8000-3C
  config:
 
  ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
  ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
  ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
  ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
 
  It's interesting that even though 4 of the 5 disks are available, it still
  can import it as DEGRADED.
 
 I agree that it's interesting. Now someone really knowledgable will
 need to have a look at this. I can only imagine that somehow the
 devices contain data from different points in time, and that it's too
 far apart for the aggressive txg rollback that was added in PSARC
 2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
 ZP-8T-RZ1-01.
 

Hi again,

that got slightly further, but still no dice:

root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
root@openindiana-01:/mnt# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
root@openindiana-01:/mnt# zpool status
  pool: ZP-8T-RZ1-01
 state: FAULTED
status: One or more devices could not be used because the label is missing
or invalid.  There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from
a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
  scan: none requested
config:

NAME  STATE READ WRITE CKSUM
ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted data
  raidz1-0ONLINE   0 0 6
12339070507640025002  UNAVAIL  0 0 0  was /dev/lofi/2
/dev/lofi/5   ONLINE   0 0 0
/dev/lofi/4   ONLINE   0 0 0
/dev/lofi/3   ONLINE   0 0 0
/dev/lofi/1   ONLINE   0 0 0

root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable

Thanks for your tenacity Stefan.
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Scott Aitken
On Sat, Jun 16, 2012 at 09:09:53AM -0500, Gregg Wonderly wrote:
 Use 'dd' to replicate as much of lofi/2 as you can onto another device, and 
 then 
 cable that into place?
 
 It looks like you just need to put a functioning, working, but not correct 
 device, in that slot so that it will import and then you can 'zpool replace' 
 the 
 new disk into the pool perhaps?
 
 Gregg Wonderly
 
 On 6/16/2012 2:02 AM, Scott Aitken wrote:
  On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:
  when you say remove the device, I assume you mean simply make it 
  unavailable
  for import (I can't remove it from the vdev).
  Yes, that's what I meant.
 
  root@openindiana-01:/mnt# zpool import -d /dev/lofi
  ??pool: ZP-8T-RZ1-01
  ?? ??id: 9952605666247778346
  ??state: FAULTED
  status: One or more devices are missing from the system.
  action: The pool cannot be imported. Attach the missing
  ?? ?? ?? ??devices and try again.
  ?? see: http://www.sun.com/msg/ZFS-8000-3C
  config:
 
  ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
  ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
  ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
  ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
 
  It's interesting that even though 4 of the 5 disks are available, it still
  can import it as DEGRADED.
  I agree that it's interesting. Now someone really knowledgable will
  need to have a look at this. I can only imagine that somehow the
  devices contain data from different points in time, and that it's too
  far apart for the aggressive txg rollback that was added in PSARC
  2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
  ZP-8T-RZ1-01.
 
  Hi again,
 
  that got slightly further, but still no dice:
 
  root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
  root@openindiana-01:/mnt# zpool list
  NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
  ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
  rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
  root@openindiana-01:/mnt# zpool status
 pool: ZP-8T-RZ1-01
state: FAULTED
  status: One or more devices could not be used because the label is missing
   or invalid.  There are insufficient replicas for the pool to 
  continue
   functioning.
  action: Destroy and re-create the pool from
   a backup source.
  see: http://www.sun.com/msg/ZFS-8000-5E
 scan: none requested
  config:
 
   NAME  STATE READ WRITE CKSUM
   ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted 
  data
 raidz1-0ONLINE   0 0 6
   12339070507640025002  UNAVAIL  0 0 0  was 
  /dev/lofi/2
   /dev/lofi/5   ONLINE   0 0 0
   /dev/lofi/4   ONLINE   0 0 0
   /dev/lofi/3   ONLINE   0 0 0
   /dev/lofi/1   ONLINE   0 0 0
 
  root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
  cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable
 
  Thanks for your tenacity Stefan.
  Scott
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 

Hi Greg,

lofi/2 is a dd of a real disk.  I am using disk images because I can roll
back, clone etc without using the original drives (which are long gone
anyway).

I have tried making /2 unavailable for import, and zfs just moans that it
can't be opened.  It fails to import even though I have only one disk missing
of a RAIDZ array.

Scott


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-16 Thread Scott Aitken
On Sat, Jun 16, 2012 at 09:58:40AM -0500, Gregg Wonderly wrote:
 
 On Jun 16, 2012, at 9:49 AM, Scott Aitken wrote:
 
  On Sat, Jun 16, 2012 at 09:09:53AM -0500, Gregg Wonderly wrote:
  Use 'dd' to replicate as much of lofi/2 as you can onto another device, 
  and then 
  cable that into place?
  
  It looks like you just need to put a functioning, working, but not correct 
  device, in that slot so that it will import and then you can 'zpool 
  replace' the 
  new disk into the pool perhaps?
  
  Gregg Wonderly
  
  On 6/16/2012 2:02 AM, Scott Aitken wrote:
  On Sat, Jun 16, 2012 at 08:54:05AM +0200, Stefan Ring wrote:
  when you say remove the device, I assume you mean simply make it 
  unavailable
  for import (I can't remove it from the vdev).
  Yes, that's what I meant.
  
  root@openindiana-01:/mnt# zpool import -d /dev/lofi
  ??pool: ZP-8T-RZ1-01
  ?? ??id: 9952605666247778346
  ??state: FAULTED
  status: One or more devices are missing from the system.
  action: The pool cannot be imported. Attach the missing
  ?? ?? ?? ??devices and try again.
  ?? see: http://www.sun.com/msg/ZFS-8000-3C
  config:
  
  ?? ?? ?? ??ZP-8T-RZ1-01 ?? ?? ?? ?? ?? ?? ??FAULTED ??corrupted data
  ?? ?? ?? ?? ??raidz1-0 ?? ?? ?? ?? ?? ?? ?? ??DEGRADED
  ?? ?? ?? ?? ?? ??12339070507640025002 ??UNAVAIL ??cannot open
  ?? ?? ?? ?? ?? ??/dev/lofi/5 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/4 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/3 ?? ?? ?? ?? ?? ONLINE
  ?? ?? ?? ?? ?? ??/dev/lofi/1 ?? ?? ?? ?? ?? ONLINE
  
  It's interesting that even though 4 of the 5 disks are available, it 
  still
  can import it as DEGRADED.
  I agree that it's interesting. Now someone really knowledgable will
  need to have a look at this. I can only imagine that somehow the
  devices contain data from different points in time, and that it's too
  far apart for the aggressive txg rollback that was added in PSARC
  2009/479. Btw, did you try that? Try: zpool import -d /dev/lofi -FVX
  ZP-8T-RZ1-01.
  
  Hi again,
  
  that got slightly further, but still no dice:
  
  root@openindiana-01:/mnt#  zpool import -d /dev/lofi -FVX ZP-8T-RZ1-01
  root@openindiana-01:/mnt# zpool list
  NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
  ZP-8T-RZ1-01  -  -  -  -  -  FAULTED  -
  rpool 15.9G  2.17G  13.7G13%  1.00x  ONLINE  -
  root@openindiana-01:/mnt# zpool status
pool: ZP-8T-RZ1-01
   state: FAULTED
  status: One or more devices could not be used because the label is missing
  or invalid.  There are insufficient replicas for the pool to 
  continue
  functioning.
  action: Destroy and re-create the pool from
  a backup source.
 see: http://www.sun.com/msg/ZFS-8000-5E
scan: none requested
  config:
  
  NAME  STATE READ WRITE CKSUM
  ZP-8T-RZ1-01  FAULTED  0 0 1  corrupted 
  data
raidz1-0ONLINE   0 0 6
  12339070507640025002  UNAVAIL  0 0 0  was 
  /dev/lofi/2
  /dev/lofi/5   ONLINE   0 0 0
  /dev/lofi/4   ONLINE   0 0 0
  /dev/lofi/3   ONLINE   0 0 0
  /dev/lofi/1   ONLINE   0 0 0
  
  root@openindiana-01:/mnt# zpool scrub ZP-8T-RZ1-01
  cannot scrub 'ZP-8T-RZ1-01': pool is currently unavailable
  
  Thanks for your tenacity Stefan.
  Scott
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  
  
  
  Hi Greg,
  
  lofi/2 is a dd of a real disk.  I am using disk images because I can roll
  back, clone etc without using the original drives (which are long gone
  anyway).
  
  I have tried making /2 unavailable for import, and zfs just moans that it
  can't be opened.  It fails to import even though I have only one disk 
  missing
  of a RAIDZ array.
 
 My experience is that ZFS will not import a pool with a missing disk.  There 
 has to be something in that slot before the import will occur.  Even if the 
 disk is corrupt, it needs to be there.  I think this is a failsafe 
 mechanism that tries to keep a pool from going live when you have mistakenly 
 not connected all the drives.  That keeps the disks from becoming 
 chronologically/txn misaligned which can result in data loss, in the right 
 combinations I believe.
 
 Gregg Wonderly
 

Hi again Gregg,

not sure if I should be top posting this...

Given I am working with images, it's hard to put just anything in place of
lofi/2.  ZFS scans all of the files in the directory for ZFS labels, so just
replacing lofi/2 with an empty file (for example) just means ZFS skips it,
which is the same result as deleting lofi/2 altogether.  I did this, but to
no avail.  ZFS complains about having insufficient replicas.

There is something more going

Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Scott Aitken
On Fri, Jun 15, 2012 at 10:54:34AM +0200, Stefan Ring wrote:
  Have you also mounted the broken image as /dev/lofi/2?
 
  Yep.
 
 Wouldn't it be better to just remove the corrupted device? This worked
 just fine in my case.

 
Hi Stefan,

when you say remove the device, I assume you mean simply make it unavailable
for import (I can't remove it from the vdev).

This is what happens (lofi/2 is the drive which ZFS thinks has corrupted
data):

oot@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0ONLINE
12339070507640025002  UNAVAIL  corrupted data
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE
root@openindiana-01:/mnt# lofiadm -d /dev/lofi/2
root@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0DEGRADED
12339070507640025002  UNAVAIL  cannot open
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE

So in the second import, it complains that it can't open the device, rather
than saying it has corrupted data.

It's interesting that even though 4 of the 5 disks are available, it still
can import it as DEGRADED.

Thanks again.
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-14 Thread Scott Aitken
On Thu, Jun 14, 2012 at 09:56:43AM +1000, Daniel Carosone wrote:
 On Tue, Jun 12, 2012 at 03:46:00PM +1000, Scott Aitken wrote:
  Hi all,
 
 Hi Scott. :-)
 
  I have a 5 drive RAIDZ volume with data that I'd like to recover.
 
 Yeah, still..
 
  I tried using Jeff Bonwick's labelfix binary to create new labels but it
  carps because the txg is not zero.
 
 Can you provide details of invocation and error response?

# /root/labelfix /dev/lofi/1
assertion failed for thread 0xfecb2a40, thread-id 1: txg == 0, file label.c,
line 53
Abort (core dumped)

The reporting line of code is:
VERIFY(nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_TXG, txg) == 0);

Here is the entire labelfix code:

#include devid.h
#include dirent.h
#include errno.h
#include libintl.h
#include stdlib.h
#include string.h
#include sys/stat.h
#include unistd.h
#include fcntl.h
#include stddef.h


#include sys/vdev_impl.h

/*
* Write a label block with a ZBT checksum.
*/
static void
label_write(int fd, uint64_t offset, uint64_t size, void *buf)
{
   zio_block_tail_t *zbt, zbt_orig;
   zio_cksum_t zc;

   zbt = (zio_block_tail_t *)((char *)buf + size) - 1;
   zbt_orig = *zbt;

   ZIO_SET_CHECKSUM(zbt-zbt_cksum, offset, 0, 0, 0);

   zio_checksum(ZIO_CHECKSUM_LABEL, zc, buf, size);

   VERIFY(pwrite64(fd, buf, size, offset) == size);

   *zbt = zbt_orig;
}

int
main(int argc, char **argv)
{
   int fd;
   vdev_label_t vl;
   nvlist_t *config;
   uberblock_t *ub = (uberblock_t *)vl.vl_uberblock;
   uint64_t txg;
   char *buf;
   size_t buflen;

   VERIFY(argc == 2);
   VERIFY((fd = open(argv[1], O_RDWR)) != -1);
   VERIFY(pread64(fd, vl, sizeof (vdev_label_t), 0) ==
   sizeof (vdev_label_t));
   VERIFY(nvlist_unpack(vl.vl_vdev_phys.vp_nvlist,
   sizeof (vl.vl_vdev_phys.vp_nvlist), config, 0) == 0);
   VERIFY(nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_TXG, txg) ==
0);
   VERIFY(txg == 0);
   VERIFY(ub-ub_txg == 0);
   VERIFY(ub-ub_rootbp.blk_birth != 0);

   txg = ub-ub_rootbp.blk_birth;
   ub-ub_txg = txg;

   VERIFY(nvlist_remove_all(config, ZPOOL_CONFIG_POOL_TXG) == 0);
   VERIFY(nvlist_add_uint64(config, ZPOOL_CONFIG_POOL_TXG, txg) == 0);
   buf = vl.vl_vdev_phys.vp_nvlist;
   buflen = sizeof (vl.vl_vdev_phys.vp_nvlist);
   VERIFY(nvlist_pack(config, buf, buflen, NV_ENCODE_XDR, 0) == 0);

   label_write(fd, offsetof(vdev_label_t, vl_uberblock),
   1ULL  UBERBLOCK_SHIFT, ub);

   label_write(fd, offsetof(vdev_label_t, vl_vdev_phys),
   VDEV_PHYS_SIZE, vl.vl_vdev_phys);

   fsync(fd);

   return (0);
}

 
 For the benefit of others, this was at my suggestion; I've been
 discussing this problem with Scott for.. some time. 
 
  I can also make the solaris machine available via SSH if some wonderful
  person wants to poke around. 
 
 Will take a poke, as discussed.  May well raise more discussion here
 as a result.
 
 --
 Dan.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-11 Thread Scott Aitken
Hi all,

I have a 5 drive RAIDZ volume with data that I'd like to recover.

The long story runs roughly:

1) The volume was running fine under FreeBSD on motherboard SATA controllers.
2) Two drives were moved to a HP P411 SAS/SATA controller
3) I *think* the HP controllers wrote some volume information to the end of
each disk (hence no more ZFS labels 2,3)
4) In its auto configuration wisdom, the HP controller built a mirrored
volume using the two drives (and I think started the actual mirroring
process).  (Hence on at least on of the drives - a copied labels 0,1).
5) From there everything went downhill.

This happened a while back, and so the exact order of things (including my
botched attemtps at recovery) are hazy.

I tried using Jeff Bonwick's labelfix binary to create new labels but it
carps because the txg is not zero.

The situation now is I have dd'd the drives onto a NAS.  These images are
shared via NFS to a VM running Oracle Solaris 11 11/11 X86.

When I attempt to import the pool I get:

root@solaris-01:/mnt#  zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0ONLINE
12339070507640025002  UNAVAIL  corrupted data
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE

I'm not sure why I can't import although 4 of the 5 drives are ONLINE.

Can anyone please point me to a next step?

I can also make the solaris machine available via SSH if some wonderful
person wants to poke around.  If I lose the data that's ok, but it'd be nice
to know all avenues were tried before I delete the 9TB of images (I need the
space...)

Many thanks,
Scott
zfs-list at thismonkey dot com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sudden drop in disk performance - WD20EURS 4k sectors to blame?

2011-08-15 Thread chris scott
Did you 4k align your partition table and is ashift=12?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Hard link space savings

2011-06-12 Thread Scott Lawson

Hi All,

I have an interesting question that may or may not be answerable from 
some internal

ZFS semantics.

I have a Sun Messaging Server which has 5 ZFS based email stores. The 
Sun Messaging server
uses hard links to link identical messages together. Messages are stored 
in standard SMTP
MIME format so the binary attachments are included in the message ASCII. 
Each individual

message is stored in a separate file.

So as an example if a user sends a email with a 2MB attachment to the 
staff mailing list and there
 is 3 staff stores with 500 users on each, it will generate a space 
usage like :


/store1 = 1 x 2MB + 499 x 1KB
/store2 = 1 x 2MB + 499 x 1KB
/store3 = 1 x 2MB + 499 x 1KB

So total storage used is around ~7.5MB due to the hard linking taking 
place on each store.


If hard linking capability had been turned off, this same message would 
have used 1500 x 2MB =3GB

worth of storage.

My question is there any simple ways of determining the space savings on 
each of the stores from
the usage of hard links? The reason I ask is that our educational 
institute wishes to migrate these stores
 to M$ Exchange 2010 which doesn't do message single instancing. I need 
to try and project what the storage

requirement will be on the new target environment.

If anyone has any ideas be it ZFS based or any useful scripts that could 
help here, I am all ears.


I may post this to Sun Managers as well to see if anyone there might 
have any ideas on this as well.


Regards,

Scott.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Hard link space savings

2011-06-12 Thread Scott Lawson

On 13/06/11 10:28 AM, Nico Williams wrote:

On Sun, Jun 12, 2011 at 4:14 PM, Scott Lawson
scott.law...@manukau.ac.nz  wrote:
   

I have an interesting question that may or may not be answerable from some
internal
ZFS semantics.
 

This is really standard Unix filesystem semantics.
   
I Understand this, just wanting to see if here is any easy way before I 
trawl

through 10 million little files.. ;)
   

[...]

So total storage used is around ~7.5MB due to the hard linking taking place
on each store.

If hard linking capability had been turned off, this same message would have
used 1500 x 2MB =3GB
worth of storage.

My question is there any simple ways of determining the space savings on
each of the stores from the usage of hard links?  [...]
 

But... you just did!  :)  It's: number of hard links * (file size +
sum(size of link names and/or directory slot size)).  For sufficiently
large files (say, larger than one disk block) you could approximate
that as: number of hard links * file size.  The key is the number of
hard links, which will typically vary, but for e-mails that go to all
users, well, you know the number of links then is the number of users.
   

Yes this number varies based on number of recipients, so could be as many a

You could write a script to do this -- just look at the size and
hard-link count of every file in the store, apply the above formula,
add up the inflated sizes, and you're done.
   
Looks like I will have to, just looking for a tried and tested method 
before I have to create my own
one if possible. Just was looking for an easy option before I have to 
sit down and
develop and test a script. I have resigned from my current job of 9 
years and finish in 15 days and have
a heck of a lot of documentation and knowledge transfer I need to do 
around other UNIX systems

and am running very short on time...

Nico

PS: Is it really the case that Exchange still doesn't deduplicate
e-mails?  Really?  It's much simpler to implement dedup in a mail
store than in a filesystem...
   
As a side not Exchange 2002 + Exchange 2007 do do this. But apparently 
M$ decided in Exchange
2010 that they no longer wished to do this and dropped the capability. 
Bizarre to say the least,
but it may come down to what they have done in the underlying store 
technology changes..


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Hard link space savings

2011-06-12 Thread Scott Lawson

On 13/06/11 11:36 AM, Jim Klimov wrote:

Some time ago I wrote a script to find any duplicate files and replace
them with hardlinks to one inode. Apparently this is only good for same
files which don't change separately in future, such as distro archives.

I can send it to you offlist, but it would be slow in your case 
because it

is not quite the tool for the job (it will start by calculating checksums
of all of your files ;) )

What you might want to do and script up yourself is a recursive listing
find /var/opt/SUNWmsqsr/store/partition... -ls. This would print you
the inode numbers and file sizes and link counts. Pipe it through
something like this:

find ... -ls | awk '{print $1 $4 $7}' | sort | uniq

And you'd get 3 columns - inode, count, size

My AWK math is a bit rusty today, so I present a monster-script like
this to multiply and sum up the values:

( find ... -ls | awk '{print $1 $4 $7}' | sort | uniq | awk '{ 
print $2*$3+\\ }'; echo 0 ) | bc
This looks something like what I thought would have to be done, I was 
just looking
to see if there was something tried and tested before I had to invent 
something. I was really hoping
in zdb there might have been some magic information I could have tapped 
into.. ;)


Can be done cleaner, i.e. in a PERL one-liner, and if you have
many values - that would probably complete faster too. But as
a prototype this would do.

HTH,
//Jim

PS: Why are you replacing the cool Sun Mail? Is it about Oracle
licensing and the now-required purchase and support cost?
Yes it is about cost mostly. We had Sun Mail for our Staff and students. 
We had
20,000 + students on it up until Christmas time as well. We have now 
migrated them
to M$ Live@EDU. This leaves us with 1500 Staff left who all like to use 
LookOut. The Sun
connector for LookOut is a bit flaky at best. But the Oracle licensing 
cost for Messaging
and Calendar starts at 10,000 users plus and so is now rather expensive 
for what mailboxes
we have left. M$ also heavily discounts Exchange CALS to Edu and Oracle 
is not very friendly
the way Sun was with their JES licensing. So it is bye bye Sun Messaging 
Server for us.



2011-06-13 1:14, Scott Lawson пишет:

Hi All,

I have an interesting question that may or may not be answerable from 
some internal

ZFS semantics.

I have a Sun Messaging Server which has 5 ZFS based email stores. The 
Sun Messaging server
uses hard links to link identical messages together. Messages are 
stored in standard SMTP
MIME format so the binary attachments are included in the message 
ASCII. Each individual

message is stored in a separate file.

So as an example if a user sends a email with a 2MB attachment to the 
staff mailing list and there
is 3 staff stores with 500 users on each, it will generate a space 
usage like :


/store1 = 1 x 2MB + 499 x 1KB
/store2 = 1 x 2MB + 499 x 1KB
/store3 = 1 x 2MB + 499 x 1KB

So total storage used is around ~7.5MB due to the hard linking taking 
place on each store.


If hard linking capability had been turned off, this same message 
would have used 1500 x 2MB =3GB

worth of storage.

My question is there any simple ways of determining the space savings 
on each of the stores from
the usage of hard links? The reason I ask is that our educational 
institute wishes to migrate these stores
to M$ Exchange 2010 which doesn't do message single instancing. I 
need to try and project what the storage

requirement will be on the new target environment.

If anyone has any ideas be it ZFS based or any useful scripts that 
could help here, I am all ears.


I may post this to Sun Managers as well to see if anyone there might 
have any ideas on this as well.


Regards,

Scott.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best choice - file system for system

2011-01-27 Thread Tristram Scott
I don't disagree that zfs is the better choice, but...

 Seriously though.  UFS is dead.  It has no advantage
 over ZFS that I'm aware
 of.
 

When it comes to dumping and restoring filesystems, there is still no official 
replacement for the ufsdump and ufsrestore.  The discussion has been had 
before, but to my knowledge, there is no consensus on the best method for 
backing up zfs filesystems.

Personally, I like to use variations on zfs send and zfs receive, but others 
will tell a different story.

Still, don't let this put you off using zfs as the root filesystem.  Just be 
aware that you need to do some work and decide what method of backup is best 
for you.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OT: anyone aware how to obtain 1.8.0 for X2100M2?

2010-12-19 Thread Scott Lawson

Hi,

Took me a couple of minutes to find the download for this in my Oracle 
support. Search

for the patch like this .

Patches and Updates Panel - Patch Search - Patch Name or Number is : 
10275731


Pretty easy really.

Scott.

PS. I found that patch by using product or family equals x2100  and it 
found it for me easily.




On 20/12/2010 1:04 p.m., Jerry Kemp wrote:

Eugen,

I would 2nd your observation.

I *do* have several support contracts, and as I review my Oracle 
profile, it does show that I am authorized to download patches, among 
other items.


I really haven't downloaded a lot since SunSolve was killed off.

Do others on the list have access to download stuff like this?

Or is there some other place with in Oracle's site that makes Eugen's 
link obsolete?


Jerry


On 12/19/10 12:28, Eugen Leitl wrote:


I realize this is off-topic, but Oracle has completely
screwed up the support site from Sun. I figured someone
here would know how to obtain

Sun Fire X2100 M2 Server Software 1.8.0 Image contents:

 * BIOS is version 3A21
 * SP is updated to version 3.24 (ELOM)
 * Chipset driver is updated to 9.27

from

http://www.sun.com/servers/entry/x2100/downloads.jsp

I've been trying for an hour, and I'm at the end of
my rope.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
___


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax: +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz




perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to replace failed vdev on non redundant pool?

2010-10-15 Thread Scott Meilicke
If the pool is non-redundant and your vdev has failed, you have lost your data. 
Just rebuild the pool, but consider a redundant configuration. 

On Oct 15, 2010, at 3:26 PM, Cassandra Pugh wrote:

 Hello, 
 
 I would like to know how to replace a failed vdev in a non redundant pool?
 
 I am using fiber attached disks, and cannot simply place the disk back into 
 the machine, since it is virtual.  
 
 I have the latest kernel from sept 2010 that includes all of the new ZFS 
 upgrades.
 
 Please, can you help me?
 -
 Cassandra
 (609) 243-2413
 Unix Administrator
 
 
 From a little spark may burst a mighty flame.
 -Dante Alighieri 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Optimal raidz3 configuration

2010-10-13 Thread Scott Meilicke
Hello Peter, 

Read the ZFS Best Practices Guide to start. If you still have questions, post 
back to the list.

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pool_Performance_Considerations

-Scott

On Oct 13, 2010, at 3:21 PM, Peter Taps wrote:

 Folks,
 
 If I have 20 disks to build a raidz3 pool, do I create one big raidz vdev or 
 do I create multiple raidz3 vdevs? Is there any advantage of having multiple 
 raidz3 vdevs in a single pool?
 
 Thank you in advance for your help.
 
 Regards,
 Peter
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Bursty writes - why?

2010-10-12 Thread Scott Meilicke
On Oct 12, 2010, at 3:31 PM, Bob Friesenhahn wrote:
 
 For obvious reasons, the SLOG is designed to write sequentially. Otherwise it 
 would offer much less benefit.  Maybe this random-write issue with Sandforce 
 would not be a problem?


Isn't writing from cache to disk designed to be sequential, while writes to the 
ZIL/SLOG will be more random (in order to commit quickly)?

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [RFC] Backup solution

2010-10-08 Thread Scott Meilicke

On Oct 8, 2010, at 8:25 AM, Bob Friesenhahn wrote:
 
 It also does not include the human factor which is still the most 
 significant contributor to data loss.  This is the most difficult factor to 
 diminish.  If the humans have difficulty understanding the system or the 
 hardware, then they are more likely to do something wrong which damages the 
 data.

This is often overlooked during a system design. It is very easy to lose your 
head during a high stress moment, and pull the wrong drive (I of course, have 
never done that... ahem). Having z2(3) / triple mirrors, graphical pictures 
of which disk has failed, working LED failures lights, and letting a hot spare 
finish reslivering before replacing a disk are all good counter measures.

 It also does not account for an OS kernel which caches quite a lot of data in 
 memory (relying on ECC for reliability), and which may have bugs.

At some point you have to rely on your backups for the unexpected and 
unforeseen. Make sure they are good!

Michael, nice reliability write up!

--

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [RFC] Backup solution

2010-10-07 Thread Scott Meilicke
Those must be pretty busy drives. I had a recent failure of a 1.5T disks in a 7 
disk raidz2 vdev that took about 16 hours to resliver. There was very little IO 
on the array, and it had maybe 3.5T of data to resliver.

On Oct 7, 2010, at 3:17 PM, Ian Collins wrote:  
 I would seriously consider raidz3, given I typically see 80-100 hour resilver 
 times for 500G drives in raidz2 vdevs.  If you haven't already, read Adam 
 Leventhal's paper:
 
 http://queue.acm.org/detail.cfm?id=1670144
 
 -- 
 Ian.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-06 Thread Scott Meilicke
Scrub?

On Oct 6, 2010, at 6:48 AM, Stephan Budach wrote:

 No - not a trick question., but maybe I didn't make myself clear.
 Is there a way to discover such bad files other than trying to actually read 
 from them one by one, say using cp or by sending a snapshot elsewhere?
 
 I am well aware that the file shown in  zpool status -v is damaged and I have 
 already restored it, but I wanted to know, if there're more of them.
 
 Regards,
 budy
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When is it okay to turn off the verify option.

2010-10-04 Thread Scott Meilicke
Why do you want to turn verify off? If performance is the reason, is it 
significant, on and off?

On Oct 4, 2010, at 2:28 PM, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Peter Taps
 
 As I understand, the hash generated by sha256 is almost guaranteed
 not to collide. I am thinking it is okay to turn off verify property
 on the zpool. However, if there is indeed a collision, we lose data.
 Scrub cannot recover such lost data.
 
 I am wondering in real life when is it okay to turn off verify
 option? I guess for storing business critical data (HR, finance, etc.),
 you cannot afford to turn this option off.
 
 Right on all points.  It's a calculated risk.  If you have a hash collision,
 you will lose data undetected, and backups won't save you unless *you* are
 the backup.  That is, if the good data, before it got corrupted by your
 system, happens to be saved somewhere else before it reached your system.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
Has it been running long? Initially the numbers are way off. After a while
it settles down into something reasonable.

How many disks, and what size, are in your raidz2?

-Scott

On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com wrote:

 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours, and
 maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
What version of OS?
Are snapshots running (turn them off).

So are there eight disks?


On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com wrote:

 It's always running less than an hour.
 
 It usually starts at around 300,000h estimate(at 1m in), goes up to an
 estimate in the millions(about 30mins in) and restarts.
 
 Never gets past 0.00% completion, and K resilvered on any LUN.
 
 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs.
 
 
 
 
 On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke
 scott.meili...@craneaerospace.com wrote:
 Has it been running long? Initially the numbers are way off. After a while it
 settles down into something reasonable.
 
 How many disks, and what size, are in your raidz2?  
 
 -Scott
 
 
 On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com
  wrote:
 
 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours, and
 maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: Is there any way to stop a resilver?

2010-09-29 Thread Scott Meilicke
(I left the list off last time ­ sorry)

No, the resliver should only be happening if there was a spare available. Is
the whole thing scrubbing? It looks like it. Can you stop it with a

zpool scrub ­s pool

So... Word of warning, I am no expert at this stuff. Think about what I am
suggesting before you do it :). Although stopping a scrub is pretty
innocuous.

-Scott

On 9/29/10 9:22 AM, LIC mesh licm...@gmail.com wrote:

 You almost have it - each iSCSI target is made up of 4 of the raidz vdevs - 4
 * 6 = 24 disks.
 
 16 targets total.
 
 We have one LUN with status of UNAVAIL but didn't know if removing it
 outright would help - it's actually available and well as far as the target is
 concerned, so we thought it went UNAVAIL as a result of iSCSI timeouts - we've
 since fixed the switches buffers, etc.
 
 See:
 http://pastebin.com/pan9DBBS
 
 
 
 On Wed, Sep 29, 2010 at 12:17 PM, Scott Meilicke
 scott.meili...@craneaerospace.com wrote:
 OK, let me see if I have this right:
 
 8 shelves, 1T disks, 24 disks per shelf = 192 disks
 8 shelves, 2T disks, 24 disks per shelf = 192 disks
 Each raidz is six disks.
 64 raidz vdevs
 Each iSCSI target is made up of 8 of these raidz vdevs (8 x 6 disks = 48
 disks)
 Then the head takes these eight targets, and makes a raidz2. So the raidz2
 depends upon all 384 disks. So when a failure occurs, the resliver is
 accessing all 384 disks.
 
 If I have this right, which I am in serious doubt :), then that will either
 take an enormous amount of time to complete, or never. It looks like never.
 
 Recovery:
 
 From the head, can you see which vdev has failed? If so, can you remove it to
 stop the resliver?
 
 
 
 On 9/29/10 8:57 AM, LIC mesh licm...@gmail.com http://licm...@gmail.com
  wrote:
 
 This is an iSCSI/COMSTAR array.
 
 The head was running 2009.06 stable with version 14 ZFS, but we updated that
 to build 134 (kept the old OS drives) - did not, however, update the zpool -
 it's still version 14.
 
 The targets are all running 2009.06 stable, exporting 4 raidz1 LUNs each of
 6 drives - 8 shelves have 1TB drives, the other 8 have 2TB drives.
 
 The head sees the filesystem as comprised of 8 vdevs of 8 iSCSI LUNs each,
 with SSD ZIL and SSD L2ARC.
 
 
 
 On Wed, Sep 29, 2010 at 11:49 AM, Scott Meilicke
 scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com  wrote:
 What version of OS?
 Are snapshots running (turn them off).
 
 So are there eight disks?
 
 
 
 On 9/29/10 8:46 AM, LIC mesh licm...@gmail.com
 http://licm...@gmail.com  http://licm...@gmail.com  wrote:
 
 It's always running less than an hour.
 
 It usually starts at around 300,000h estimate(at 1m in), goes up to an
 estimate in the millions(about 30mins in) and restarts.
 
 Never gets past 0.00% completion, and K resilvered on any LUN.
 
 64 LUNs, 32x5.44T, 32x10.88T in 8 vdevs.
 
 
 
 
 On Wed, Sep 29, 2010 at 11:40 AM, Scott Meilicke
 scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com
 http://scott.meili...@craneaerospace.com  wrote:
 Has it been running long? Initially the numbers are way off. After a
 while it settles down into something reasonable.
 
 How many disks, and what size, are in your raidz2?  
 
 -Scott
 
 
 On 9/29/10 8:36 AM, LIC mesh licm...@gmail.com
 http://licm...@gmail.com  http://licm...@gmail.com
  http://licm...@gmail.com  wrote:
 
 Is there any way to stop a resilver?
 
 We gotta stop this thing - at minimum, completion time is 300,000 hours,
 and maximum is in the millions.
 
 Raidz2 array, so it has the redundancy, we just need to get data off.
 
 
 We value your opinion!  http://www.craneae.com/surveys/satisfaction.htm
 How may we serve you better?Please click the survey link to tell us how we
 are doing:  http://www.craneae.com/surveys/satisfaction.htm
 http://www.craneae.com/surveys/satisfaction.htm
 http://www.craneae.com/surveys/satisfaction.htm
 
 Your feedback is of the utmost importance to us. Thank you for your time.
 
 Crane Aerospace  Electronics Confidentiality Statement:
 The information contained in this email message may be privileged and is
 confidential information intended only for the use of the recipient, or any
 employee or agent responsible to deliver it to the intended recipient. Any
 unauthorized use, distribution or copying of this information is strictly
 prohibited and may be unlawful. If you have received this communication in
 error, please notify the sender immediately and destroy the original message
 and all attachments from your electronic files.
 
 


--
Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace 
Electronics | +1 425-743-8153 | M: +1 206-406-2670



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time

[zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
This must be resliver day :)

I just had a drive failure. The hot spare kicked in, and access to the pool 
over NFS was effectively zero for about 45 minutes. Currently the pool is still 
reslivering, but for some reason I can access the file system now. 

Resliver speed has been beaten to death I know, but is there a way to avoid 
this? For example, is more enterprisy hardware less susceptible to reslivers? 
This box is used for development VMs, but there is no way I would consider this 
for production with this kind of performance hit during a resliver.

My hardware:
Dell 2950
16G ram
16 disk SAS chassis
LSI 3801 (I think) SAS card (1068e chip)
Intel x25-e SLOG off of the internal PERC 5/i RAID controller
Seagate 750G disks (7200.11)

I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
i86pc Solaris)

  pool: data01
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Wed Sep 29 14:03:52 2010
1.12T scanned out of 5.00T at 311M/s, 3h37m to go
82.0G resilvered, 22.42% done
config:

NAME   STATE READ WRITE CKSUM
data01 DEGRADED 0 0 0
  raidz2-0 ONLINE   0 0 0
c1t8d0 ONLINE   0 0 0
c1t9d0 ONLINE   0 0 0
c1t10d0ONLINE   0 0 0
c1t11d0ONLINE   0 0 0
c1t12d0ONLINE   0 0 0
c1t13d0ONLINE   0 0 0
c1t14d0ONLINE   0 0 0
  raidz2-1 DEGRADED 0 0 0
c1t22d0ONLINE   0 0 0
c1t15d0ONLINE   0 0 0
c1t16d0ONLINE   0 0 0
c1t17d0ONLINE   0 0 0
c1t23d0ONLINE   0 0 0
spare-5REMOVED  0 0 0
  c1t20d0  REMOVED  0 0 0
  c8t18d0  ONLINE   0 0 0  (resilvering)
c1t21d0ONLINE   0 0 0
logs
  c0t1d0   ONLINE   0 0 0
spares
  c8t18d0  INUSE currently in use

errors: No known data errors

Thanks for any insights.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resliver making the system unresponsive

2010-09-29 Thread Scott Meilicke
I should add I have 477 snapshots across all files systems. Most of them are 
hourly snaps (225 of them anyway).

On Sep 29, 2010, at 3:16 PM, Scott Meilicke wrote:

 This must be resliver day :)
 
 I just had a drive failure. The hot spare kicked in, and access to the pool 
 over NFS was effectively zero for about 45 minutes. Currently the pool is 
 still reslivering, but for some reason I can access the file system now. 
 
 Resliver speed has been beaten to death I know, but is there a way to avoid 
 this? For example, is more enterprisy hardware less susceptible to reslivers? 
 This box is used for development VMs, but there is no way I would consider 
 this for production with this kind of performance hit during a resliver.
 
 My hardware:
 Dell 2950
 16G ram
 16 disk SAS chassis
 LSI 3801 (I think) SAS card (1068e chip)
 Intel x25-e SLOG off of the internal PERC 5/i RAID controller
 Seagate 750G disks (7200.11)
 
 I am running Nexenta CE 3.0.3 (SunOS rawhide 5.11 NexentaOS_134f i86pc i386 
 i86pc Solaris)
 
  pool: data01
 state: DEGRADED
 status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
 action: Wait for the resilver to complete.
 scan: resilver in progress since Wed Sep 29 14:03:52 2010
1.12T scanned out of 5.00T at 311M/s, 3h37m to go
82.0G resilvered, 22.42% done
 config:
 
   NAME   STATE READ WRITE CKSUM
   data01 DEGRADED 0 0 0
 raidz2-0 ONLINE   0 0 0
   c1t8d0 ONLINE   0 0 0
   c1t9d0 ONLINE   0 0 0
   c1t10d0ONLINE   0 0 0
   c1t11d0ONLINE   0 0 0
   c1t12d0ONLINE   0 0 0
   c1t13d0ONLINE   0 0 0
   c1t14d0ONLINE   0 0 0
 raidz2-1 DEGRADED 0 0 0
   c1t22d0ONLINE   0 0 0
   c1t15d0ONLINE   0 0 0
   c1t16d0ONLINE   0 0 0
   c1t17d0ONLINE   0 0 0
   c1t23d0ONLINE   0 0 0
   spare-5REMOVED  0 0 0
 c1t20d0  REMOVED  0 0 0
 c8t18d0  ONLINE   0 0 0  (resilvering)
   c1t21d0ONLINE   0 0 0
   logs
 c0t1d0   ONLINE   0 0 0
   spares
 c8t18d0  INUSE currently in use
 
 errors: No known data errors
 
 Thanks for any insights.
 
 -Scott
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Scott Meilicke



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When Zpool has no space left and no snapshots

2010-09-28 Thread Scott Meilicke
Preemptively use quotas?


On 9/22/10 7:25 PM, Aleksandr Levchuk alevc...@gmail.com wrote:

 Dear ZFS Discussion,
 
 I ran out of space, consequently could not rm or truncate files. (It
 make sense because it's a copy-on-write and any transaction needs to
 be written to disk. It worked out really well - all I had to do is
 destroy some snapshots.)
 
 If there are no snapshots to destroy, how to prepare for a situation
 when a ZFS pool looses it's last free byte?
 
 Alex
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Scott Meilicke | Enterprise Systems Administrator | Crane Aerospace 
Electronics | +1 425-743-8153 | M: +1 206-406-2670



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-28 Thread Meilicke, Scott
Brilliant. I set those parameters via /etc/system, rebooted, and the pool
imported with just the ­f switch. I had seen this as an option earlier,
although not that thread, but was not sure it applied to my case.

Scrub is running now. Thank you very much!

-Scott


On 9/23/10 7:07 PM, David Blasingame Oracle david.blasing...@oracle.com
wrote:

 Have you tried setting zfs_recover  aok in /etc/system or setting it with the
 mdb?
 
 Read how to set via /etc/system
 http://opensolaris.org/jive/thread.jspa?threadID=114906
 
 mdb debugger
 http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set
 -zfszfsrecover1-and-aok1-in-grub-at-startup.html
 
 After you get the variables set and system booted, try importing, then running
 a scrub. 
 
 Dave
 
 On 09/23/10 19:48, Scott Meilicke wrote:
  
 I posted this on the www.nexentastor.org http://www.nexentastor.org
 forums, but no answer so far, so I apologize if you are seeing this twice. I
 am also engaged with nexenta support, but was hoping to get some additional
 insights here. 
 
 I am running nexenta 3.0.3 community edition, based on 134. The box crashed
 yesterday, and goes into a reboot loop (kernel panic) when trying to import
 my data pool, screenshot attached. What I have tried thus far:
 
 Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes
 the panic in both cases.
 Boot off of 3.0.4 beta 8, ran zpool import -fF data01
 That gives me a message like Pool data01 returned to its stat as of ...,
 and then panics.
 
 The import -fF does seem to import the pool, but then immediately panic. So
 after booting off of DVD, I can boot from my hard disks, and the system will
 not import the pool because it was last imported from another system.
 
 I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot
 and import.
 
 zpool import shows all of my disks are OK, and the pool itself is online.
 
 Is it time to start working with zdb? Any suggestions?
 
 This box is hosting development VMs, so I have some people idling their
 thumbs at the moment.
 
 Thanks everyone,
 
 -Scott
   
  
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-27 Thread Scott Meilicke
I just realized that the email I sent to David and the list did not make the 
list (at least as jive can see it), so here is what I sent on the 23rd:

Brilliant. I set those parameters via /etc/system, rebooted, and the pool 
imported with just the –f switch. I had seen this as an option earlier, 
although not that thread, but was not sure it applied to my case.

Scrub is running now. Thank you very much! 

-Scott

Update: The scrub finished with zero errors.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] My filesystem turned from a directory into a special character device

2010-09-27 Thread Scott Meilicke
I am running nexenta CE 3.0.3. 

I have a file system that at some point in the last week went from a directory 
per 'ls -l' to a  special character device. This results in not being able to 
get into the file system. Here is my file system, scott2, along with a new file 
system I  just created, as seen by ls -l:

drwxr-xr-x 4 root root4 Sep 27 09:14 scott
crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2

Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been 
fiddling with permissions last week, then had problems with a kernel panic. 
Perhaps this is related?

Any ideas how to get access to my file system? 

Thanks,
-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] My filesystem turned from a directory into a special character device

2010-09-27 Thread Scott Meilicke

On 9/27/10 9:56 AM, Victor Latushkin victor.latush...@oracle.com wrote:

 
 On Sep 27, 2010, at 8:30 PM, Scott Meilicke wrote:
 
 I am running nexenta CE 3.0.3.
 
 I have a file system that at some point in the last week went from a
 directory per 'ls -l' to a  special character device. This results in not
 being able to get into the file system. Here is my file system, scott2, along
 with a new file system I  just created, as seen by ls -l:
 
 drwxr-xr-x 4 root root4 Sep 27 09:14 scott
 crwxr-xr-x 9 root root 0, 0 Sep 20 11:51 scott2
 
 Notice the 'c' vs. 'd' at the beginning of the permissions list. I had been
 fiddling with permissions last week, then had problems with a kernel panic.
 
 Are you still running with aok/zfs_recover being set? Have you seen this issue
 before panic? 

Yes. Well, I have removed those entries in /etc/system, but have not yet
rebooted the box.

 
 Perhaps this is related?
 
 May be.
 
 Any ideas how to get access to my file system?
 
 This can be fixed, but it is a bit more complicated and error prone that
 setting couple of variables.

OK. Sounds like restoring from my backup would be best?

What causes this? I saw this exact same behavior on my home box, and had to
restore about two weeks ago. Not very encouraging. :(

Is there anything I can provide to help people who know more than me solve
this problem?

 
 Regards
 Victor

Thanks Victor.

-Scott



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-25 Thread Scott Meilicke
When I do the calculations, assuming 300bytes per block to be conservative, 
with 128K blocks, I get 2.34G of cache (RAM, L2ARC) per Terabyte of deduped 
data. But block size is dynamic, so you will need more than this.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data transfer taking a longer time than expected (Possibly dedup related)

2010-09-24 Thread Scott Meilicke
Can I disable dedup on the dataset while the transfer is going on?
Yes. Only the blocks copied after disabling dedupe will not be deduped. The 
stuff you have already copied will be deduped. 

Can I simply Ctrl-C the procress to stop it?
Yes, you can do that to a mv process. 

Maybe stop the process, delete the deduped file system (your copy target), and 
create a new file system without dedupe to see if that is any better?

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Scott Meilicke
Hi Peter,

dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple 
file systems are set to dedupe, then they all benefit from using the same pool 
of deduped blocks. In this way, if two files share some of the same blocks, 
even if they are in different file systems, they will dedupe.

I am not sure why reporting is not done at the file system level. It may be an 
accounting issue, i.e. which file system owns the dedupe blocks. But it seems 
some fair estimate could be made. Maybe the overhead to keep a file system 
updated with these stats is too high?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Configuration questions for Home File Server (CPU cores, dedup, checksum)?

2010-09-07 Thread Scott Meilicke
Craig,

3. I do not think you will get much dedupe on video, music and photos. I would 
not bother. If you really wanted to know at some later stage, you could create 
a new file system, enable dedupe, and copy your data (or a subset) into it just 
to see. In my experience there is a significant CPU penalty as well. My four 
core (1.86GHz xeons, 4 yrs old) box nearly maxes out when putting a lot of data 
into a deduped file system.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS development moving behind closed doors

2010-08-16 Thread Scott Meilicke
I had already begun the process of migrating my 134 boxes over to Nexenta 
before Oracle's cunning plans became known. This just reaffirms my decision. 

Us too. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshot space - miscalculation?

2010-08-04 Thread Scott Meilicke
Are there other file systems underneath daten/backups that have snapshots?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog/L2ARC on a hard drive and not SSD?

2010-07-21 Thread Scott Meilicke
Another data point - I used three 15K disks striped using my RAID controller as 
a slog for the zil, and performance went down. I had three raidz sata vdevs 
holding the data, and my load was VMs, i.e. a fair amount of small, random IO 
(60% random, 50% write, ~16k in size). 

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deleting large amounts of files

2010-07-19 Thread Scott Meilicke
If these files are deduped, and there is not a lot of RAM on the machine, it 
can take a long, long time to work through the dedupe portion. I don't know 
enough to know if that is what you are experiencing, but it could be the 
problem.

How much RAM do you have?

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-07-05 Thread Tristram Scott
 At this point, I will repeat my recommendation about
 using
 zpool-in-files as a backup (staging) target.
  Depending where you
 ost, and how you combine the files, you can achieve
 these scenarios
 without clunkery, and with all the benefits a zpool
 provides.
 

This is another good scheme.

I see a number of points to consider when choosing amongst the various 
suggestions for backing up zfs file systems.  In no particular order, I have 
these:

1. Does it work in place, or need an intermediate copy on disk?
2. Does it respect ACLs?
3. Does it respect zfs snapshots?
4. Does it allow random access to files, or only full file system restore?
5. Can it (mostly) survive partial data corruption?
6. Can it handle file systems larger than a single tape?
7. Can it stream to multiple tapes in parallel?
8. Does it understand the concept of incremental backups?

I still see this as a serious gap in the offering of zfs.  Clearly so do many 
other people, as there are a lot of methods offered to handle at least some of 
the above.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-06-29 Thread Tristram Scott
 
 would be nice if i could pipe the zfs send stream to
 a split and then
 send of those splitted stream over the
 network to a remote system. it would help sending it
 over to remote
 system quicker. can your tool do that?
 
 something like this
 
s | - | j
 - | o   zfs recv
(local)   l  | - | i(remote)
 t  | - | n
  copy from the fifos to tape(s).
 

 Asif Iqbal

I did look at doing this, with the intention of allowing simultaneous streams 
to multiple tape drives, but put the idea to one side.   

I thought of providing interleaved streams, but wasn't happy with the idea that 
the whole process would block when one of the pipes stalled.

I also contemplated dividing the stream into several large chunks, but for them 
to run simultaneously that seemed to require several reads of the original dump 
stream.  Besides the expense of this approach,  I am not certain that repeated 
zfs send streams have exactly the same byte content.

I think that probably the best approach would be the interleaved streams.

That said, I am not sure how this would necessarily help with the situation you 
describe.  Isn't the limiting factor going to be the network bandwidth between 
remote machines?  Won't you end up with four streams running at quarter speed?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-06-29 Thread Tristram Scott
 
 if, for example, the network pipe is bigger then one
 unsplitted stream
 of zfs send | zfs recv then splitting it to multiple
 streams should
 optimize the network bandwidth, shouldn't it ?
 

Well, I guess so.  But I wonder, what is the bottle neck here.  If it is the 
rate at which zfs send can stream data, there is a good chance that is limited 
by disk read.  If we split it into four pipes, I still think you are going to 
see four quarter rate reads.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-06-29 Thread Tristram Scott

evik wrote:


Reading this list for a while made it clear that zfs send is not a
backup solution, it can be used for cloning the filesystem to a backup
array if you are consuming the stream with zfs receive so you get
notified immediately about errors. Even one bitflip will render the
stream unusable and you will loose all data, not just part of your
backup cause zfs receive will restore the whole filesystem or nothing
at all depending on the correctness of the stream.

You can use par2 or something similar to try to protect the stream
against bit flips but that would require a lot of free storage space
to recover from errors.

e


The all or nothing aspect does make me nervous, but there are things 
which can be done about it.  The first step, I think, is to calculate a 
checksum of the data stream(s).


 -k chkfile.
  Calculates MD5 checksums for  each  tape  and  for  the
  stream  as a whole. These are written to chkfile, or if
  specified as -, then to stdout.

Run the dump stream back through digest -a md5 and verify that it is intact.

Certainly, using an error correcting code could help us out, but at 
additional expense, both computational and storage.


Personally, for disaster recovery purposes, I think that verifying the 
data after writing to tape is good enough.  What I am looking to guard 
against is the unlikely event that I have a hardware (or software) 
failure, or serious human error.  This is okay with the zfs send stream, 
unless, of course, we get a data corruption on the tape.  I think the 
correlation between hardware failure today and tape corruption since 
yesterday / last week when I last backed up must be pretty small.


In the event that I reach for the tape and find it corrupted, I go back 
a week to the previous full dump stream.


Clearly the strength of the backup solution needs to match the value of 
the data, and especially the cost of not having the data.  For our large 
database applications we mirror to a remote location, and use tape 
backup.  But still, I find the ability to restore the zfs filesystem 
with all its snapshots very useful, which is why I choose to work with 
zfs send.


Tristram



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Announce: zfsdump

2010-06-28 Thread Tristram Scott
For quite some time I have been using zfs send -R fsn...@snapname | dd 
of=/dev/rmt/1ln to make a tape backup of my zfs file system.  A few weeks back 
the size of the file system grew to larger than would fit on a single DAT72 
tape, and I once again searched for a simple solution to allow dumping of a zfs 
file system to multiple tapes.  Once again I was disappointed...

I expect there are plenty of other ways this could have been handled, but none 
leapt out at me.  I didn't want to pay large sums of cash for a commercial 
backup product, and I didn't see that Amanda would be an easy thing to fit into 
my existing scripts.  In particular, (and I could well be reading this 
incorrectly) it seems that the commercial products, Amanda, star, all are 
dumping the zfs file system file by file (with or without ACLs).  I found none 
which would allow me to dump the file system and its snapshots, unless I used 
zfs send to a scratch disk, and dumped to tape from there.  But, of course, 
that assumes I have a scratch disk large enough.

So, I have implemented zfsdump as a ksh script.  The method is as follows:
1. Make a bunch of fifos.
2. Pipe the stream from zfs send to split, with split writing to the fifos (in 
sequence).
3. Use dd to copy from the fifos to tape(s).

When the first tape is complete, zfsdump returns.  One then calls it again, 
specifying that the second tape is to be used, and so on.

From the man page:

 Example 1.  Dump the @Tues snapshot of the  tank  filesystem
 to  the  non-rewinding,  non-compressing  tape,  with a 36GB
 capacity:

  zfsdump -z t...@tues -a -R -f /dev/rmt/1ln  -s  36864 -t 0

 For the second tape:

  zfsdump -z t...@tues -a -R -f /dev/rmt/1ln  -s  36864 -t 1

If you would like to try it out, download the package from:
http://www.quantmodels.co.uk/zfsdump/

I have packaged it up, so do the usual pkgadd stuff to install.

Please, though, [b]try this out with caution[/b].  Build a few test file 
systems, and see that it works for you. 
[b]It comes without warranty of any kind.[/b]


Tristram
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Announce: zfsdump

2010-06-28 Thread Tristram Scott
 I use Bacula which works very well (much better than
 Amanda did).
 You may be able to customize it to do direct zfs
 send/receive, however I find that although they are
 great for copying file systems to other machines,
 they are inadequate for backups unless you always
 intend to restore the whole file system.  Most people
 want to restore a file or directory tree of files,
 not a whole file system.  In the past 25 years of
 backups and restores, I've never had to restore a
 whole file system.  I get requests for a few files,
 or somebody's mailbox or somebody's http document
 root.
 You can directly install it from CSW (or blastwave).

Thanks for your comments, Brian.  I should look at Bacula in more detail.

As for full restore versus ad hoc requests for files I just deleted, my 
experience is mostly similar to yours, although I have had need for full system 
restore more than once.

For the restore of a few files here and there, I believe this is now well 
handled with zfs snapshots.  I have always found these requests to be down to 
human actions.  The need for full system restore has (almost) always been 
hardware failure. 

If the file was there an hour ago, or yesterday, or last week, or last month, 
then we have it in a snapshot.

If the disk died horribly during a power outage (grrr!) then it would be very 
nice to be able to restore not only the full file system, but also the 
snapshots too.  The only way I know of achieving that is by using zfs send etc. 
 

 
 On 6/28/2010 11:26 AM, Tristram Scott wrote:
[snip]

 
  Tristram
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR iSCSI and two Windows computers

2010-06-23 Thread Scott Meilicke
Look again at how XenServer does storage. I think you will find it already has 
a solution, both for iSCSI and NFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid-z - not even iops distribution

2010-06-23 Thread Scott Meilicke
Reaching into the dusty regions of my brain, I seem to recall that since RAIDz 
does not work like a traditional RAID 5, particularly because of variably sized 
stripes, that the data may not hit all of the disks, but it will always be 
redundant. 

I apologize for not having a reference for this assertion, so I may be 
completely wrong.

I assume your hardware is recent, the controllers are on PCIe x4 buses, etc.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool export / import discrepancy

2010-06-15 Thread Scott Squires
Hello All,

I've migrated a JBOD of 16 drives from one server to another.  I did a zpool 
export from the old system and a zpool import to the new system.  One thing I 
did notice is since the drives are on a different controller card, the naming 
is different (as expected) but the order is also different.  I setup the drives 
as passthrough on the controller card and went through each drive 
incrementally.  I assumed the zpool import would have listed the drives in the 
order of c10t2d0, d1, d2, ... c10t3d7.  As shown below the order the drives 
were imported is c10t2d0, d2, d3, d1, c10t3d0 through d7.  

__
|Original zpool setup on old server:
|
|zpool status backup
|  pool: backup
| state: ONLINE
|config:
|NAME STATE READ WRITE CKSUM
|backup   ONLINE   0 0 0
|  raidz2 ONLINE   0 0 0
|c7t1d0   ONLINE   0 0 0
|c7t2d0   ONLINE   0 0 0
|c7t3d0   ONLINE   0 0 0
|c7t4d0   ONLINE   0 0 0
|c7t5d0   ONLINE   0 0 0
|c7t6d0   ONLINE   0 0 0
|c7t7d0   ONLINE   0 0 0
|c7t8d0   ONLINE   0 0 0
|c7t9d0   ONLINE   0 0 0
|c7t10d0  ONLINE   0 0 0
|c7t11d0  ONLINE   0 0 0
|c7t12d0  ONLINE   0 0 0
|c7t13d0  ONLINE   0 0 0
|c7t14d0  ONLINE   0 0 0
|c7t15d0  ONLINE   0 0 0
|spares
|  c7t16d0AVAIL   
|_

__
|Imported zpool on new server:
|
|zpool status backup
|  pool: backup
| state: ONLINE
|config:
|NAME STATE READ WRITE CKSUM
|backup   ONLINE   0 0 0
|  raidz2 ONLINE   0 0 0
|c10t2d0  ONLINE   0 0 0
|c10t2d2  ONLINE   0 0 0
|c10t2d3  ONLINE   0 0 0
|c10t2d1  ONLINE   0 0 0
|c10t2d4  ONLINE   0 0 0
|c10t2d5  ONLINE   0 0 0
|c10t2d6  ONLINE   0 0 0
|c10t2d7  ONLINE   0 0 0
|c10t3d0  ONLINE   0 0 0
|c10t3d1  ONLINE   0 0 0
|c10t3d2  ONLINE   0 0 0
|c10t3d3  ONLINE   0 0 0
|c10t3d4  ONLINE   0 0 0
|c10t3d5  ONLINE   0 0 0
|c10t3d6  ONLINE   0 0 0
|spares
|  c10t3d7AVAIL   
|_


Is ZFS dependent on the order of the drives?  Will this cause any issue down 
the road?  Thank you all; 

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OCZ Devena line of enterprise SSD

2010-06-15 Thread Scott Meilicke
Price? I cannot find it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] combining series of snapshots

2010-06-08 Thread Scott Meilicke
You might bring over all of your old data and snaps, then clone that into a new 
volume. Bring your recent stuff into the clone. Since the clone only updates 
blocks that are different than the underlying snap, you may see a significant 
storage savings.

Two clones could even be made - one for your live data, another to access the 
historical data.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-05-26 Thread Scott Meilicke
iSCSI writes require a sync to disk for every write. SMB writes get cached in 
memory, therefore are much faster.

I am not sure why it is so slow for reads.

Have you tried comstar iSCSI? I have read in these forums that it is faster.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iSCSI confusion

2010-05-24 Thread Scott Meilicke
VMware will properly handle sharing a single iSCSI volume across multiple ESX 
hosts. We have six ESX hosts sharing the same iSCSI volumes - no problems.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Consolidating a huge stack of DVDs using ZFS dedup: automation?

2010-05-04 Thread Scott Steagall
On 05/04/2010 09:29 AM, Kyle McDonald wrote:
 On 3/2/2010 10:15 AM, Kjetil Torgrim Homme wrote:
 valrh...@gmail.com valrh...@gmail.com writes:

   
 I have been using DVDs for small backups here and there for a decade
 now, and have a huge pile of several hundred. They have a lot of
 overlapping content, so I was thinking of feeding the entire stack
 into some sort of DVD autoloader, which would just read each disk, and
 write its contents to a ZFS filesystem with dedup enabled. [...] That
 would allow me to consolidate a few hundred CDs and DVDs onto probably
 a terabyte or so, which could then be kept conveniently on a hard
 drive and archived to tape.
 
 it would be inconvenient to make a dedup copy on harddisk or tape, you
 could only do it as a ZFS filesystem or ZFS send stream.  it's better to
 use a generic tool like hardlink(1), and just delete files afterwards
 with

   
 There is a perl script floating around on the internet for years that
 will convert copies of files on the same FS to hardlinks (sorry I don't
 have the name handy). So you don't need ZFS. Once this is done you can
 even recreate an ISO and burn it back to DVD (possibly merging hundreds
 of CD's into one DVD (or BD!). The script can also delete the
 duplicates, but there isn't much control over which one it keeps - for
 backupsyou may realyl want to  keep the earliest (or latest?) backup the
 file appeared in.

I've used Dirvish http://www.dirvish.org/ and rsync to do just
that...worked great!

Scott

 
 Using ZFS Dedup is an interesting way of doing this. However archiving
 the result may be hard. If you use different datasets (FS's) for each
 backup, can you only send 1 dataset at a time (since you can only
 snapshot on a dataset level? Won't that 'undo' the deduping?
  
 If you instead put all the backups on on data set, then the snapshot can
 theoretically contain the dedpued data. I'm not clear on whether
 'send'ing it will preserve the deduping or not - or if it's up to the
 receiving dataset to recognize matching blocks? If the dedup is in the
 stream, then you may be able to write the stream to a DVD or BD.
 
 Still if you save enough space so that you can add the required level of
 redundancy, you could just leave it on disk and chuck the DVD's. Not
 sure I'd do that, but it might let me put the media in the basement,
 instead of the closet, or on the desk next to me.
 
   -Kyle
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.

2010-04-23 Thread Scott Meilicke
At the time we had it setup as 3 x 5 disk raidz, plus a hot spare. These 16 
disks were in a SAS cabinet, and the the slog was on the server itself. We are 
now running 2 x 7 raidz2 plus a hot spare and slog, all inside the cabinet. 
Since the disks are 1.5T, I was concerned about resliver times for a failed 
disk.

About the only thing I would consider at this point is getting an SSD for the 
l2arc for dedupe performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benchmarking Methodologies

2010-04-23 Thread Scott Meilicke
My use case for opensolaris is as a storage server for a VM environment (we 
also use EqualLogic, and soon an EMC CX4-120). To that end, I use iometer 
within a VM, simulating my VM IO activity, with some balance given to easy 
benchmarking. We have about 110 VMs across eight ESX hosts. Here is what I do:

* Attach a 100G vmdk to one Windows 2003 R2 VM
* Create a 32G test file (my opensolaris box has 16G of RAM)
* export/import the pool on the solaris box, and reboot my guest to clear 
caches all around
* Run a disk queue depth of 32 outstanding IOs
* 60% read, 65% random, 8k block size
* Run for five minutes spool up, then run the test for five minutes

My actual workload is closer to 50% read, 16k block size, so I adjust my 
interpretation of the results accordingly. 

Probably I should run a lot more iometer daemons.

Performance will increase as the benchmark runs due to the l2arc filling up, so 
I found that running the benchmark starting at 5 minutes into the work load was 
a happy medium. Things will get a bit faster the longer the benchmark runs, but 
this is good as far as benchmarking goes.

Only occasionally due I get wacko results, which I happily toss out the window.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS for ISCSI ntfs backing store.

2010-04-16 Thread Scott Meilicke
I have used build 124 in this capacity, although I did zero tuning. I had about 
4T of data on a single 5T iSCSI volume over gigabit. The windows server was a 
VM, and the opensolaris box is on a Dell 2950, 16G of RAM, x25e for the zil, no 
l2arc cache device. I used comstar. 

It was being used as a target for Doubletake, so it only saw write IO, with 
very little read. My load testing using iometer was very positive, and I would 
not have hesitated to use it as the primary node serving about 1000 users, 
maybe 200-300 active at a time. 

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharing a ssd between rpool and l2arc

2010-03-30 Thread Scott Duckworth
Just clarifying Darren's comment - we got bitten by this pretty badly so I
figure it's worth saying again here.  ZFS will *allow* you to use a ZVOL of
one pool as a ZDEV in another pool, but it results in race conditions and an
unstable system.  (At least on Solaris 10 update 8).

We tried to use a ZVOL from rpool (on fast 15k rpm drives) as a cache device
for another pool (on slower 7.2k rpm drives).  It worked great up until it
hit the race condition and hung the system.  It would have been nice if zfs
had issued a warning, or at least if this fact was better documented.

Scott Duckworth, Systems Programmer II
Clemson University School of Computing


On Tue, Mar 30, 2010 at 5:09 AM, Darren J Moffat darr...@opensolaris.orgwrote:

 On 30/03/2010 10:05, Erik Trimble wrote:

 F. Wessels wrote:

 Thanks for the reply.

 I didn't get very much further.

 Yes, ZFS loves raw devices. When I had two devices I wouldn't be in
 this mess.
 I would simply install opensolaris on the first disk and add the
 second ssd to the
 data pool with a zpool add mpool cache cxtydz Notice that no slices or
 partitions
 were used.
 But I don't have space for two devices. So I have to deal with slices
 and partitions.
 I did another clean install in 12Gb partition leaving 18Gb free.
 I tried parted to resize the partition, but it said that resizing
 (solaris2) partitions
 wasn't implemented.
 I tried fdisk but no luck either.
 I tried the send and receive, create new partition and slices, restore
 rpool in
 slice0, do installgrub but it wouldn't boot anymore.

 Can anybody give a summary of commands/steps howto accomplish a bootable
 rpool and l2arc on a ssd. Preferably for the x86 platform.


 Look up zvols, as this is what you want to use, NOT partitions (for the
 many reasons you've encountered).


 In this case partitions is the only way this will work.


  In essence, do a normal install, using the ENTIRE disk for your rpool.

 Then create a zvol in the rpool:

 # zfs create -V 8GB rpool/zvolname

 Add this zvol as the cache device (L2arc) for your other pool

 # zpool create tank mirror c1t0d0 c1t1d0s0 cache rpool/zvolname


 That won't work L2ARC devices can not be a ZVOL of another pool, they can't
 be a file either.  An L2ARC device must be a physical device.

 --
 Darren J Moffat

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Rethinking my zpool

2010-03-19 Thread Scott Meilicke
You will get much better random IO with mirrors, and better reliability when a 
disk fails with raidz2. Six sets of mirrors are fine for a pool. From what I 
have read, a hot spare can be shared across pools. I think the correct term 
would be load balanced mirrors, vs RAID 10.

What kind of performance do you need? Maybe raidz2 will give you the 
performance you need. Maybe not. Measure the performance of each configuration 
and decide for yourself. I am a big fan of iometer for this type of work.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-19 Thread Scott Meilicke
 One of the reasons I am investigating solaris for
 this is sparse volumes and dedupe could really help
 here.  Currently we use direct attached storage on
 the dom0s and allocate an LVM to the domU on
 creation.  Just like your example above, we have lots
 of those 80G to start with please volumes with 10's
 of GB unused.  I also think this data set would
 dedupe quite well since there are a great many
 identical OS files across the domUs.  Is that
 assumption correct?

This is one reason I like NFS - thin by default, and no wasted space within a 
zvol. zvols can be thin as well, but opensolaris will not know the inside 
format of the zvol, and you may still have a lot of wasted space after a while 
as files inside of the zvol come and go. In theory dedupe should work well, but 
I would be careful about a possible speed hit. 


 I've not seen an example of that before.  Do you mean
 having two 'head units' connected to an external JBOD
 enclosure or a proper HA cluster type configuration
 where the entire thing, disks and all, are
 duplicated?

I have not done any type of cluster work myself, but from what I have read on 
Sun's site, yes, you could connect the same jbod to two head units, 
active/passive, in an HA cluster, but no duplicate disks/jbod. When the active 
goes down, passive detects this and takes over the pool by doing an import. 
During the import, any outstanding transactions on the zil are replayed, 
whether they are on a slog or not. I believe this is how Sun does it on their 
open storage boxes (7000 series). Note - two jbods could be used, one for each 
head unit, making an active/active setup. Each jbod is active on one node, 
passive on the other.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
It is hard, as you note, to recommend a box without knowing the load. How many 
linux boxes are you talking about?

I think having a lot of space for your L2ARC is a great idea.

Will you mirror your SLOG, or load balance them? I ask because perhaps one will 
be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS 
using an iometer profile that closely approximates my work load. My ~100 VMs on 
8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups.

Don't discount NFS. I absolutely love NFS for management and thin provisioning 
reasons. Much easier (to me) than managing iSCSI, and performance is similar. I 
highly recommend load testing both iSCSI and NFS before you go live. Crash 
consistent backups of your VMs are possible using NFS, and recovering a VM from 
a snapshot is a little easier using NFS, I find.

Why not larger capacity disks?

Hopefully your switches support NIC aggregation?

The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly 
attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 
6794994).

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
I was planning to mirror them - mainly in the hope that I could hot swap a new 
one in the event that an existing one started to degrade. I suppose I could 
start with one of each and convert to a mirror later although the prospect of 
losing either disk fills me with dread.

You do not need to mirror the L2ARC devices, as the system will just hit disk 
as necessary. Mirroring sounds like a good idea on the SLOG, but this has been 
much discussed on the forums.

 Why not larger capacity disks?

We will run out of iops before we run out of space.

Interesting. I find IOPS is more proportional to the number of VMs vs disk 
space. 

User: I need a VM that will consume up to 80G in two years, so give me an 80G 
disk.
Me: OK, but recall we can expand disks and filesystems on the fly, without 
downtime.
User: Well, that is cool, but 80G to start with please.
Me: sigh 

I also believe the SLOG and L2ARC will make using high RPM disks not as 
necessary. But, from what I have read, higher RPM disks will greatly help with 
scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another 
with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. 
Many options. But measure to see what works for you. iometer is great for that, 
I find. 

Any opinions on the use of battery backed SAS adapters?

Surely these will help with performance in write back mode, but I have not done 
any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly 
help with IOPS on a five disk raidz. There are pros and cons. Search the 
forums, but off the top of my head 1) SLOGs are much larger than controller 
caches: 2) only synced write activity is cached in a ZIL, whereas a controller 
cache will cache everything, needed or not, thus running out of space sooner; 
3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. 
the all in one cache of a controller. 4) A controller *may* be faster, since it 
uses ram for the cache.

One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node 
goes down, the other can bring up the pool, check the ZIL for any necessary 
transactions, and apply them. To do this with battery backed cache, you would 
need fancy interconnects between the nodes, cache mirroring, etc. All of those 
things that SAN array products do. 

Sounds like you have a fun project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread Scott Meilicke
Apple users have different expectations regarding data loss than Solaris and 
Linux users do.

Come on, no Apple user bashing. Not true, not fair.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can we get some documentation on iSCSI sharing after comstar took over?

2010-03-16 Thread Scott Meilicke
This is what I used:
http://wikis.sun.com/display/OpenSolarisInfo200906/How+to+Configure+iSCSI+Target+Ports

I distilled that to:

disable the old, enable the new (comstar)

* sudo svcadm disable iscsitgt
* sudo svcadm enable stmf

Then four steps (using my zfs/zpool info - substitute for yours):

* sudo zfs create -s -V 5t data01/san/gallardo/g (the -s makes it thin, -V 
specifies a block volume)
* sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g
* sudo itadm create-target
* sudo stmfadm add-view 600144F0E24785004A80910A0001

This should allow any initiator to connect to your volume, no security.

Not quite a one liner. After you create the target once (step 3), you do not 
have to do that again for the next volume. So three lines.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-15 Thread Scott Meilicke
Greg, I am using NetBackup 6.5.3.1 (7.x is out) with fine results. Nice and 
fast.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What a

2010-03-04 Thread Scott Meilicke
To be clear, you can do what you want with the following items (besides 
your server):

(1) OpenSolaris LiveCD
(1) 8GB USB Flash drive
As many tapes as you need to store your data pools on.

Make sure the USB drive has a saved stream from your rpool. It should 
also have a downloaded copy of whichever main backup software you use.

That's it. You backup data using Amanda/Bacula/et al onto tape. You 
backup your boot/root filesystem using 'zfs send' onto the USB key.

Erik, great! I never thought of the USB key to store an rpool copy. I will give 
it a go on my test box.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz2 array FAULTED with only 1 drive down

2010-02-25 Thread Scott Meilicke
You might have to force the import with -f.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Scott Meilicke
I don't think adding an SSD mirror to an existing pool will do much for 
performance. Some of your data will surely go to those SSDs, but I don't think 
the solaris will know they are SSDs and move blocks in and out according to 
usage patterns to give you an all around boost. They will just be used to store 
data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for 
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI 
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added 
to the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Thanks Dan.

When I try the clone then import:

pfexec zfs clone 
data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 
data01/san/gallardo/g-testandlab
pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

The sbdadm import-lu gives me:

sbdadm: guid in use

which makes sense, now that I see it. The man pages make it look like I cannot 
give it another GUID during the import. Any other thoughts? I *could* delete 
the current lu, import, get my data off and reverse the process, but that would 
take the current volume off line, which is not what I want to do.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Sure, but that will put me back into the original situation.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
That is likely it. I create the volume using 2009.06, then later upgraded to 
124. I just now created a new zvol, connected it to my windows server, 
formatted, and added some data. Then I snapped the zvol, cloned the snap, and 
used 'pfexec sbdadm create-lu'. When presented to the windows server, it 
behaved as expected. I could see the data I created prior to the snapshot.

Thank you very much Dave (and everyone else).

Now,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
I plan on filing a support request with Sun, and will try to post back with any 
results.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-04 Thread Scott Meilicke
I have a single zfs volume, shared out using COMSTAR and connected to a Windows 
VM. I am taking snapshots of the volume regularly. I now want to mount a 
previous snapshot, but when I go through the process, Windows sees the new 
volume, but thinks it is blank and wants to initialize it. Any ideas how to get 
Windows to see that it has data on it?

Steps I took after the snap:

zfs clone snapshot data01/san/gallardo/g-recovery
sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-recovery
stmfadm add-view -h HG-Gallardo -t TG-Gallardo -n 1 
600144F0EAE40A004B6B59090003

At this point, my server Gallardo can see the LUN, but like I said, it looks 
blank to the OS. I suspect the 'sbdadm create-lu' phase.

Any help to get Windows to see it as a LUN with NTFS data would be appreciated.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Scott Meilicke
Link aggregation can use different algorithms to load balance. Using L4 (IP 
plus originating port I think), using a single client computer and the same 
protocol (NFS), but different origination ports has allowed me to saturate both 
NICS in my LAG. So yes, you just need more than one 'conversation', but the LAG 
setup will determine how a conversation is defined.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Scott Meilicke
It looks like there is not a free slot for a hot spare? If that is the case, 
then it is one more factor to push towards raidz2, as you will need time to 
remove the failed disk and insert a new one. During that time you don't want to 
be left unprotected.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
[Cross-posting to ldoms-discuss]

We are occasionally seeing massive time-to-completions for I/O requests on ZFS 
file systems on a Sun T5220 attached to a Sun StorageTek 2540 and a Sun J4200, 
and using a SSD drive as a ZIL device.  Primary access to this system is via 
NFS, and with NFS COMMITs blocking until the request has been sent to disk, 
performance has been deplorable.  The NFS server is a LDOM domain on the T5220.

To give an idea of how bad the situation is, iotop from the DTrace Toolkit 
occasionally reports single I/O requests to 15k RPM FC disks that take more 
than 60 seconds to complete, and even requests to a SSD drive that take over 10 
seconds to complete.  It's not uncommon to open a small text file using vim (or 
similar editor) and nothing to pop up for 10-30 seconds.  Browsing the web 
becomes a chore, as the browser locks up for a few seconds after doing anything.

I have a full write-up of the situation at 
http://www.cs.clemson.edu/~duckwos/zfs-performance/.  Any thoughts or comments 
are welcome.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
No errors reported on any disks.

$ iostat -xe
 extended device statistics  errors --- 
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot 
vdc0  0.65.6   25.0   33.5  0.0  0.1   17.3   0   2   0   0   0   0 
vdc1 78.1   24.4 3199.2   68.0  0.0  4.4   43.3   0  20   0   0   0   0 
vdc2 78.0   24.6 3187.6   67.6  0.0  4.5   43.5   0  20   0   0   0   0 
vdc3 78.1   24.4 3196.0   67.9  0.0  4.5   43.5   0  21   0   0   0   0 
vdc4 78.2   24.5 3189.8   67.6  0.0  4.5   43.7   0  21   0   0   0   0 
vdc5 78.3   24.4 3200.3   67.9  0.0  4.5   43.5   0  21   0   0   0   0 
vdc6 78.4   24.6 3186.5   67.7  0.0  4.5   43.5   0  21   0   0   0   0 
vdc7 76.4   25.9 3233.0   67.4  0.0  4.2   40.7   0  20   0   0   0   0 
vdc8 76.7   26.0 3222.5   67.1  0.0  4.2   41.1   0  21   0   0   0   0 
vdc9 76.5   26.0 3233.9   67.7  0.0  4.2   40.8   0  20   0   0   0   0 
vdc1076.5   25.7 3221.6   67.2  0.0  4.2   41.5   0  21   0   0   0   0 
vdc1176.4   25.9 3228.2   67.4  0.0  4.2   41.1   0  20   0   0   0   0 
vdc1276.4   26.1 3216.2   67.4  0.0  4.3   41.6   0  21   0   0   0   0 
vdc13 0.08.70.3  248.4  0.0  0.01.8   0   0   0   0   0   0 
vdc1495.38.2 2919.3   28.2  0.0  2.5   24.3   0  21   0   0   0   0 
vdc1595.99.4 2917.6   26.2  0.0  2.1   19.7   0  19   0   0   0   0 
vdc1695.38.0 2924.3   28.2  0.0  2.6   25.5   0  22   0   0   0   0 
vdc1796.19.4 2920.5   26.2  0.0  2.0   19.3   0  19   0   0   0   0 
vdc1895.48.2 2923.3   28.2  0.0  2.4   23.4   0  21   0   0   0   0 
vdc1995.89.3 2903.2   26.2  0.0  2.5   24.3   0  21   0   0   0   0 
vdc2095.08.4 2877.6   28.1  0.0  2.5   23.9   0  21   0   0   0   0 
vdc2195.99.5 2848.2   26.2  0.0  2.6   24.3   0  21   0   0   0   0 
vdc2295.08.4 2874.3   28.1  0.0  2.5   23.7   0  21   0   0   0   0 
vdc2395.79.5 2854.0   26.2  0.0  2.5   23.4   0  21   0   0   0   0 
vdc2495.18.4 2883.9   28.1  0.0  2.4   23.5   0  21   0   0   0   0 
vdc2595.69.4 2839.3   26.2  0.0  2.8   26.5   0  22   0   0   0   0 
vdc26 0.06.90.2  319.8  0.0  0.02.6   0   0   0   0   0   0 

Nothing sticks out in /var/adm/messages on either the primary or cs0 domain.

The SSD is a recent addition (~3 months ago), and was added in an attempt to 
counteract the poor performance we were already seeing without the SSD.

I will check firmware versions tomorrow.  I do recall updating the firmware 
about 8 months ago when we upgraded CAM to support the new J4200 array.  At the 
time, it was the most recent CAM release available, not the outdated version 
that shipped on the CD in the array package.

My supervisor pointed me to http://forums.sun.com/thread.jspa?threadID=5416833 
which describes what seems to be an identical problem.  It references 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6547651 which was 
reported to be fixed in Solaris 10 update 4.  No solution was posted, but it 
was pointed out that a similar configuration without LDOMs in the mix provided 
superb performance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/NFS/LDOM performance issues

2010-01-19 Thread Scott Duckworth
 Thus far there is no evidence that there is anything wrong with your
 storage arrays, or even with zfs. The problem seems likely to be
 somewhere else in the kernel.

Agreed.  And I tend to think that the problem lays somewhere in the LDOM 
software.  I mainly just wanted to get some experienced eyes on the problem to 
see if anything sticks out before I go through the trouble of reinstalling the 
system without LDOMs (the original need for VMs in this application no longer 
exists).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL to disk

2010-01-15 Thread Scott Meilicke
I think Y is such a variable and complex number it would be difficult to give a 
rule of thumb, other than to 'test with your workload'. 

My server, having three, five disk raidzs (striped) and an intel x25-e as a zil 
can fill my two G ethernet pipes over NFS (~200MBps) during mostly sequential 
writes. That same server can only consume about 22 MBps using an artificial 
load designed to simulate my VM activity (using iometer). So it varies greatly 
depending upon Y.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz data loss stories?

2009-12-21 Thread Scott Meilicke
Yes, a coworker lost a second disk during a rebuild of a raid5 and lost all 
data. I have not had a failure, however when migrating EqualLogic arrays in and 
out of pools, I lost a disk on an array. No data loss, but it concerns me 
because during the moves, you are essentially reading and writing all of the 
data on the disk. Did I have a latent problem on that particular disk that only 
exposed itself when doing such a large read/write? What if another disk had 
failed, and during the rebuild this latent problem was exposed? Trouble, 
trouble.

They say security is an onion. So is data protection.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using iSCSI on ZFS with non-native FS - How to backup.

2009-12-07 Thread Scott Meilicke
It does 'just work', however you may have some file and/or file system 
corruption if the snapshot was taken at the moment that your mac is updating 
some files. So use the time slider function and take a lot of snaps. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mirroring ZIL device

2009-11-23 Thread Scott Meilicke
# 1. It may help to use 15k disks as the zil. When I tested using three 15k 
disks striped as my zil, it made my workload go slower, even though it seems 
like it should have been faster. My suggestion is to test it out, and see if it 
helps.

#3. You may get good performance with an inexpensive SSD because the SSD should 
have fast random writes, but probably not fast sequential writes. But I would 
test it first against your anticipated workload. :) An Intel 32G X25-E runs 
just shy of $400, and they are pretty speedy. I don't know if that would fit 
your budget. There is also some concern about losing power and having the X25 
RAM cache disappear during a write. 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X45xx storage vs 7xxx Unified storage

2009-11-23 Thread Scott Meilicke
If the 7310s can meet your performance expectations, they sound much better 
than a pair of x4540s. Auto-fail over, SSD performance (although these can be 
added to the 4540s), ease of management, and a great front end. 

I haven't seen if you can use your backup software with the 7310s, but from 
what I have read in this thread, that may be the only downside (a big one). 
Everything else points to the 7310s.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness

2009-11-18 Thread Scott Meilicke
I second the use of zilstat - very useful, especially if you don't want to mess 
around with adding a log device and then having to destroy the pool if you 
don't want the log device any longer.

On Nov 18, 2009, at 2:20 AM, Dushyanth wrote:

 Just to clarify : Does iSCSI traffic from a Solaris iSCSI initiator 
 to a third party target go through ZIL ?

It depends on whether the application requires a sync or not. dd does not, but 
databases (in general) do. As Richard said, ZFS treats the iSCSI volume just 
like any other vdev (pool of disks), so the fact that it is an iSCSI volume has 
nothing to do with ZFS' zil usage. 

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL/log on SSD weirdness

2009-11-17 Thread Scott Meilicke
I am sorry that I don't have any links, but here is what I observe on my 
system. dd does not do sync writes, so the ZIL is not used. iSCSI traffic does 
sync writes (as of 2009.06, but not 2008.05), so if you repeat your test using 
an iSCSI target from your system, you should see log activity. Same for NFS. I 
see no ZIL activity using rsync, for an example of a network file transfer that 
does not require sync.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS crashes when accessed with Adobe Photoshop Elements 6.0 via Vista

2009-11-10 Thread scott smallie
upgrade to the latest dev release fixed the problem for me.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CIFS crashes when accessed with Adobe Photoshop Elements 6.0 via Vista

2009-11-09 Thread scott smallie
I have a repeatable test case for this indecent.Every time I access my ZFS 
cifs shared file system with Adobe Photoshop elements 6.0 via my Vista 
workstation the OpenSolaris server stops serving CIFS.  The share functions as 
expected for all other CIFS operations.



-Begin Configuration Data-
-scotts:zelda# cat /etc/release
 OpenSolaris 2009.06 snv_111b X86
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009
-scotts:zelda# uname -a
SunOS zelda 5.11 snv_111b i86pc i386 i86pc
-scotts:zelda#


-scotts:zelda# prtdiag
System Configuration: IBM IBM eServer 325 -[8835W11]-
BIOS Configuration: IBM IBM BIOS Version 1.36 -[M1E136AUS-1.36]- 01/19/05
BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style)

 Processor Sockets 

Version  Location Tag
 --
Opteron  CPU0-Socket 940
Opteron  CPU1-Socket 940

 Memory Device Sockets 

TypeStatus Set Device Locator  Bank Locator
--- -- --- --- 
DRAMin use 1   DDR1Bank 0
DRAMin use 1   DDR2Bank 0
DRAMin use 2   DDR3Bank 1
DRAMin use 2   DDR4Bank 1
DRAMin use 3   DDR5Bank 2
DRAMin use 3   DDR6Bank 2

 On-Board Devices =

 Upgradeable Slots 

ID  StatusType Description
--- -  
1   in usePCI-XPCI-X Slot 1
2   available PCI-XPCI-X Slot 2



-scotts:zelda# zpool status
  pool: ary01
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
ary01   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t8d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c6t8d0  ONLINE   0 0 0
c6t5d0  ONLINE   0 0 0
c6t4d0  ONLINE   0 0 0
c6t3d0  ONLINE   0 0 0
c6t2d0  ONLINE   0 0 0
spares
  c6t1d0AVAIL

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3d0s0ONLINE   0 0 0

errors: No known data errors

-scotts:zelda#  zfs get all ary01/media
NAME PROPERTY   VALUE  SOURCE
ary01/media  type   filesystem -
ary01/media  creation   Fri Jul 11 23:24 2008  -
ary01/media  used   347G   -
ary01/media  available  1.09T  -
ary01/media  referenced 344G   -
ary01/media  compressratio  1.00x  -
ary01/media  mountedyes-
ary01/media  quota  none   default
ary01/media  reservationnone   default
ary01/media  recordsize 128K   default
ary01/media  mountpoint /shared_media  local
ary01/media  sharenfs   on local
ary01/media  checksum   on default
ary01/media  compressionoffdefault
ary01/media  atime  on default
ary01/media  deviceson default
ary01/media  exec   on default
ary01/media  setuid on default
ary01/media  readonly   offdefault
ary01/media  zoned  offlocal
ary01/media  snapdirvisiblelocal
ary01/media  aclmodegroupmask  default
ary01/media  aclinherit restricted default
ary01/media  canmount   on default
ary01/media  shareiscsi offdefault
ary01/media  xattr  on default
ary01/media  copies 1  default
ary01/media  version3  -
ary01/media  utf8only   

Re: [zfs-discuss] Difficulty testing an SSD as a ZIL

2009-10-30 Thread Scott Meilicke
Excellent! That worked just fine. Thank you Victor.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Difficulty testing an SSD as a ZIL

2009-10-29 Thread Scott Meilicke
Hi all,

I received my SSD, and wanted to test it out using fake zpools with files as 
backing stores before attaching it to my production pool. However, when I 
exported the test pool and imported, I get an error. Here is what I did:

I created a file to use as a backing store for my new pool:
mkfile 1g /data01/test2/1gtest

Created a new pool:
zpool create ziltest2 /data01/test2/1gtest 

Added the SSD as a log:
zpool add -f ziltest2 log c7t1d0

(c7t1d0 is my SSD. I used the -f option since I had done this before with a 
pool called 'ziltest', same results)

A 'zpool status' returned no errors.

Exported:
zpool export ziltest2

Imported:
zpool import -d /data01/test2 ziltest2
cannot import 'ziltest2': one or more devices is currently unavailable

This happened twice with two different test pools using file-based backing 
stores.

I am nervous about adding the SSD to my production pool. Any ideas why I am 
getting the import error?

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] File level cloning

2009-10-28 Thread Scott Meilicke
I don't think so. But, you can clone at the ZFS level, and then just use the 
vmdk(s) that you need. As long as you don't muck about with the other stuff in 
the clone, the space usage should be the same.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool getting in a stuck state?

2009-10-28 Thread Scott Meilicke
Hi Jeremy,

I had a loosely similar problem with my 2009.06 box. In my case (which may not 
be yours), working with support we found a bug that was causing my pool to 
hang. I also got erroneous errors when I did a scrub ( 3 x 5 disk raidz). I am 
using the same LSI controller. A sure fire way to kill the box was to setup a 
file system as an iSCSI target, and write a lot of data to it, around 1-2MB/s. 
It would usually die inside of a few hours. NFS writing was not as bad, but 
within a day it would panic there too.

The solution for me was to upgrade to 124. Since the upgrade three weeks ago, I 
have had no problems.

Again, I don't know if this would fix your problem, but it may be worth a try. 
Just don't upgrade your ZFS version, and you will be able to roll back to 
2009.06 at any time.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Meilicke, Scott
Interesting. We must have different setups with our PERCs. Mine have  
always auto rebuilt.


--
Scott Meilicke

On Oct 22, 2009, at 6:14 AM, Edward Ned Harvey  
sola...@nedharvey.com wrote:


Replacing failed disks is easy when PERC is doing the RAID. Just  
remove
the failed drive and replace with a good one, and the PERC will  
rebuild

automatically.


Sorry, not correct.  When you replace a failed drive, the perc card  
doesn't

know for certain that the new drive you're adding is meant to be a
replacement.  For all it knows, you could coincidentally be adding  
new disks
for a new VirtualDevice which already contains data, during the  
failure
state of some other device.  So it will not automatically resilver  
(which
would be a permanently destructive process, applied to a disk which  
is not

*certainly* meant for destruction).

You have to open the perc config interface, tell it this disk is a
replacement for the old disk (probably you're just saying This disk  
is the
new global hotspare) or else the new disk will sit there like a  
bump on a

log.  Doing nothing.




We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:

http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly prohibited 
and may be unlawful. If you have received this communication in error, please notify 
the sender immediately and destroy the original message and all attachments from 
your electronic files.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Meilicke, Scott
Thank you Bob and Richard. I will go with A, as it also keeps things simple.
One physical device per pool.

-Scott


On 10/20/09 6:46 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote:

 On Tue, 20 Oct 2009, Richard Elling wrote:
 
 The ZIL device will never require more space than RAM.
 In other words, if you only have 16 GB of RAM, you won't need
 more than that for the separate log.
 
 Does the wasted storage space annoy you? :-)
 
 What happens if the machine is upgraded to 32GB of RAM later?
 
 The write performace of the X25-E is likely to be the bottleneck for a
 write-mostly storage server if the storage server has excellent
 network connectivity.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/



We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:
http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly 
prohibited 
and may be unlawful. If you have received this communication in error, please 
notify 
the sender immediately and destroy the original message and all attachments 
from 
your electronic files.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Meilicke, Scott
Thanks Ed. It sounds like you have run in this mode? No issues with  
the perc?


--
Scott Meilicke

On Oct 20, 2009, at 9:59 PM, Edward Ned Harvey  
sola...@nedharvey.com wrote:



System:
Dell 2950
16G RAM
16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no
extra drive slots, a single zpool.
svn_124, but with my zpool still running at the 2009.06 version (14).

My plan is to put the SSD into an open disk slot on the 2950, but  
will

have to configure it as a RAID 0, since the onboard PERC5 controller
does not have a JBOD mode.


You can JBOD with the perc.  It might be technically a raid0 or  
raid1 with a

single disk in it, but that would be functionally equivalent to JBOD.





We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:

http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly prohibited 
and may be unlawful. If you have received this communication in error, please notify 
the sender immediately and destroy the original message and all attachments from 
your electronic files.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Scott Meilicke
sigh

Thanks Frédéric, that is a very interesting read. 

So my options as I see them now:

1. Keep the x25-e, and disable the cache. Performance should still be improved, 
but not by a *whole* like, right? I will google for an expectation, but if 
anyone knows off the top of their head, I would be appreciative.
2. Buy a ZEUS or similar SSD with a cap backed cache. Pricing is a little hard 
to come by, based on my quick google, but I am seeing $2-3k for an 8G model. Is 
that right? Yowch.
3. Wait for the x25-e g2, which is rumored to have cap backed cache, and may or 
may not work well (but probably will).
4. Put the x25-e with disabled cache behind my PERC with the PERC cache enabled.

My budget is tight. I want better performance now. #4 sounds good. Thoughts?

Regarding mirrored SSDs for the ZIL, it was my understanding that if the SSD 
backed ZIL failed, ZFS would fail back to using the regular pool for the ZIL, 
correct? Assuming this is correct, a mirror would be to preserve performance 
during a failure?

Thanks everyone, this has been really helpful.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-21 Thread Scott Meilicke
Ed, your comment:

If solaris is able to install at all, I would have to acknowledge, I
have to shutdown anytime I need to change the Perc configuration, including
replacing failed disks.

Replacing failed disks is easy when PERC is doing the RAID. Just remove the 
failed drive and replace with a good one, and the PERC will rebuild 
automatically. But are you talking about OpenSolaris managed RAID? I am pretty 
sure, but not tested, that in pseudo JBOD mode (each disk a raid 0 or 1), the 
PERC would still present a replaced disk to the OS without reconfiguring the 
PERC BIOS.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-20 Thread Scott Meilicke
I have an Intel X25-E 32G in the mail (actually the kingston version), and 
wanted to get a sanity check before I start.

System:
Dell 2950
16G RAM
16 1.5T SATA disks in a SAS chassis hanging off of an LSI 3801e, no extra drive 
slots, a single zpool.
svn_124, but with my zpool still running at the 2009.06 version (14).

I will likely get another chassis and 16 disks for another pool in the 3-18 
month time frame.

My plan is to put the SSD into an open disk slot on the 2950, but will have to 
configure it as a RAID 0, since the onboard PERC5 controller does not have a 
JBOD mode.

Options I am considering:

A. Use all 32G for the ZIL
B. Use 8G for the ZIL, 24G for an L2ARC. Any issues with slicing up an SSD like 
this?
C. Use 8G for the ZIL, 16G for an L2ARC, and reserve 8G to be used as a ZIL for 
the future zpool.

Since my future zpool would just be used as a backup to disk target, I am 
leaning towards option C. Any gotchas I should be aware of?  

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   >