Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-04-26 Thread Chris Ridd

On 26 Apr 2010, at 06:02, Dave Pooser wrote:

 On 4/25/10 6:07 PM, Rich Teer rich.t...@rite-group.com wrote:
 
 Sounds fair enough!  Let's move this to email; meanwhile, what's the
 packet sniffing incantation I need to use?  On Solaris I'd use snoop,
 but I don't htink Mac OS comes with that!
 
 Use Wireshark (formerly Ethereal); works great for me. It does require X11
 on your machine.

Macs come with the command-line tcpdump tool. Wireshark (recommended anyway!) 
can read files saved by tcpdump and snoop.

Cheers,

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives

2010-04-26 Thread Peter Tribble
On Mon, Apr 26, 2010 at 6:21 AM, Dave Pooser dave@alfordmedia.com wrote:
 I have one storage server with 24 drives, spread across three controllers
 and split into three RAIDz2 pools. Unfortunately, I have no idea which bay
 holds which drive. Fortunately, this server is used for secondary storage so
 I can take it offline for a bit. My plan is to use zpool export to take each
 pool offline and then dd to do a sustained read off each drive in turn and
 watch the blinking lights to see which drive is which. In a nutshell:
 zpool export uberdisk1
 zpool export uberdisk2
 zpool export uberdisk3
 dd if=/dev/rdsk/c9t0d0 of=/dev/null
 dd if=/dev/rdsk/c9t1d0 of=/dev/null
  [etc. 22 more times]
 zpool import uberdisk1
 zpool import uberdisk2
 zpool import uberdisk3

 Are there any glaring errors in my reasoning here? My thinking is I should
 probably identify these disks before any problems develop, in case of
 erratic read errors that are enough to make me replace the drive without
 being enough to make the hardware ID it as bad.

There should be no need to take pools offline or anything like that. If it's
just secondary storage then normal usage should be low enough to easily
spot which drive you're hammering. (Personally, format-analyze-read
rather than dd.) And there ought to be a consistent pattern rather than
locations being random.

If you can see the serial numbers on the drives then cross-referencing those
with the serial numbers from the OS (eg from iostat -En) would be a good idea.

(You are, I presume, using regular scrubs to catch latent errors.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]

2010-04-26 Thread Phillip Oldham
 Then perhaps you should do zpool import -R / pool
 *after* you attach EBS.
 That way Solaris won't automatically try to import
 the pool and your 
 scripts will do it once disks are available.

zpool import doesn't work as there was no previous export. 

I'm trying to solve the case where the instance terminates unexpectedly; think 
of someone just pulling the plug. There's no way to do the export operation 
before it goes down, but I still need to bring it back up, attach the EBS 
drives and continue as previous.

The start/attach/reboot/available cycle is interesting, however. I may be able 
to init a reboot after attaching the drives, but it's not optimal - there's 
always a chance the instance might not come back up after the reboot. And it 
still doesn't answer *why* the drives aren't showing any data after they're 
initially attached.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Constantin Gonzalez

Hi Tim,

thanks for sharing your dedup experience. Especially for Virtualization, having
a good pool of experience will help a lot of people.

So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on
the same ZFS backing store, if I understand you correctly.

What dedup ratios do you see for the third, fourth and fifth server
installation?

Also, maybe dedup is not the only way to save space. What compression rate
do you get?

And: Have you tried setting up a Windows System, then setting up the next one
based on a ZFS clone of the first one?


Hope this helps,
   Constantin

On 04/23/10 08:13 PM, tim Kries wrote:

Dedup is a key element for my purpose, because i am planning a central 
repository for like 150 Windows Server 2008 (R2) servers which would take a lot 
less storage if they dedup right.


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist   Blog: constantin.glez.de
Tel.: +49 89/4 60 08-25 91  Twitter: @zalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Jürgen Kunz
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]

2010-04-26 Thread Robert Milkowski

On 26/04/2010 09:27, Phillip Oldham wrote:

Then perhaps you should do zpool import -R / pool
*after* you attach EBS.
That way Solaris won't automatically try to import
the pool and your
scripts will do it once disks are available.
 

zpool import doesn't work as there was no previous export.

I'm trying to solve the case where the instance terminates unexpectedly; think 
of someone just pulling the plug. There's no way to do the export operation 
before it goes down, but I still need to bring it back up, attach the EBS 
drives and continue as previous.

The start/attach/reboot/available cycle is interesting, however. I may be able 
to init a reboot after attaching the drives, but it's not optimal - there's 
always a chance the instance might not come back up after the reboot. And it 
still doesn't answer *why* the drives aren't showing any data after they're 
initially attached.
   


You don't have to do exports as I suggested to use 'zpool -R / pool' 
(notice -R).
If you do so that a pool won't be added to zpool.cache and therefore 
after a reboot (unexpected or not) you will be able to import it again 
(and do so with -R). That way you can easily script it so import happens 
after your disks ara available.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 and ZFS dedupe status

2010-04-26 Thread Neil Simpson
 div id=jive-html-wrapper-div
 brdivdivOn Jan 5, 2010, at 4:38 PM, Bob
 Friesenhahn wrote:/divbr
 class=Apple-interchange-newlineblockquote
 type=citedivOn Mon, 4 Jan 2010, Tony Russell
 wrote:brbrblockquote type=citeI am under the
 impression that dedupe is still only in OpenSolaris
 and that support for dedupe is limited or non
 existent.nbsp; Is this true?nbsp; I would like to
 use ZFS and the dedupe capability to store multiple
 virtual machine images.nbsp; The problem is that
 this will be in a production environment and would
 probably call for Solaris 10 instead of
 OpenSolaris.nbsp; Are my statements on this valid or
 am I off track?br/blockquotebrIf dedup gets
 scheduled for Solaris 10 (I don't know), it would
 surely not be available until at least a year from
 now.brbrDedup in OpenSolaris still seems risky to
 use other than for experimental purposes. nbsp;It
 has only recently become
 available.br/div/blockquotebr/divdivI've
 just wrote an entry about update 9, nbsp;I think it
 will contain zpool version 19, so no dedup for this
 release if that's
 nbsp;correct./divdivbr/divdivRegards/div
 brdiv
 span class=Apple-style-span
 style=border-collapse: separate; color: rgb(0, 0,
 0); font-family: Helvetica; font-size: medium;
 font-style: normal; font-variant: normal;
 font-weight: normal; letter-spacing: normal;
 line-height: normal; orphans: 2; text-align: auto;
 text-indent: 0px; text-transform: none; white-space:
 normal; widows: 2; word-spacing: 0px;
 -webkit-border-horizontal-spacing: 0px;
 -webkit-border-vertical-spacing: 0px;
 -webkit-text-decorations-in-effect: none;
 -webkit-text-size-adjust: auto;
 -webkit-text-stroke-width: 0px; divHenrikdiva
 href=http://sparcv9.blogspot.com/;http://sparcv9.blo
 gspot.com/a/div/div/span
 /div
 
 br
 /div___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

I'm pretty sure Solaris 10 update 9 will have zpool version 22 so WILL have 
dedup.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Roy Sigurd Karlsbakk
- Dave Pooser dave@alfordmedia.com skrev:

 I'm building another 24-bay rackmount storage server, and I'm
 considering
 what drives to put in the bays. My chassis is a Supermicro SC846A, so
 the
 backplane supports SAS or SATA; my controllers are LSI3081E, again
 supporting SAS or SATA.
 
 Looking at drives, Seagate offers an enterprise (Constellation) 2TB
 7200RPM
 drive in both SAS and SATA configurations; the SAS model offers one
 quarter
 the buffer (16MB vs 64MB on the SATA model), the same rotational
 speed, and
 costs 10% more than its enterprise SATA twin. (They also offer a
 Barracuda
 XT SATA drive; it's roughly 20% less expensive than the Constellation
 drive,
 but rated at 60% the MTBF of the others and a predicted rate of
 nonrecoverable errors an order of magnitude higher.)
 
 Assuming I'm going to be using three 8-drive RAIDz2 configurations,
 and
 further assuming this server will be used for backing up home
 directories
 (lots of small writes/reads), how much benefit will I see from the
 SAS
 interface?

We haver a similar system, SuperMicro 24-bay server with 22x2TB (and two SSDs 
for the root) configured as three RAIDz2 sets with seven drives each and a 
spare. We chose 'desktop' drives, since they offer (more or less) the same 
speed and with that redundancy, the chance for pool failure is so low, I guess 
'enterprise' drives wouldn't help a lot more.

About SAS vs SATA, I'd guess you won't be able to see any change at all. The 
bottleneck is the drives, not the interface to them.

roy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to delegate zfs snapshot destroy to users?

2010-04-26 Thread Vladimir Marek
Hi,

I'm trying to let zfs users to create and destroy snapshots in their zfs
filesystems.

So rpool/vm has the permissions:

osol137 19:07 ~: zfs allow rpool/vm
 Permissions on rpool/vm -
Permission sets:
@virtual 
clone,create,destroy,mount,promote,readonly,receive,rename,rollback,send,share,snapshot,userprop
Create time permissions:
@virtual
Local permissions:
group staff create,mount


now as regular user I do:

$ zfs create rpool/vm/vm156888
$ zfs create rpool/vm/vm156888/a
$ zfs snapshot rpool/vm/vm156888/a...@1
$ zfs destroy rpool/vm/vm156888/a...@1
cannot destroy 'rpool/vm/vm156888/a...@1': permission denied


The only way around I found is to add 'allow' right to the @virtual
group

sudo zfs allow -s @virtual allow rpool/vm

Now as regular user I can:

zfs allow vm156888 mount,destroy rpool/vm/vm156888/a
zfs destroy rpool/vm/vm156888/a...@1

I believe that I need to do this because the Create time permissions
are used only as Local permissions on new filesystem, while for
deleting snapshot I need them as Local+Descendent.


So user if he wants to use snapshots, he has to know to grant himself
mount+delete permissions first. Is this the intended way to go?

Thank you
-- 
Vlad
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-26 Thread Edward Ned Harvey
 From: Richard Elling [mailto:richard.ell...@gmail.com]
 Sent: Sunday, April 25, 2010 2:12 PM
 
  E did exist.  Inode 12345 existed, but it had a different name at the
 time
 
 OK, I'll believe you.
 
 How about this?
 
   mv a/E/c a/c
   mv a/E a/c
   mv a/c a/E

The thing that's still confusing you is the idea that directory names or
locations matter.  They don't.

Remember that a directory is just an inode, with text and data inside it,
which stores an association of child names and child inode numbers.  Suppose
somedir is inode 12345.  Then if you ls somedir/.snapshot/somesnap then
the system is reading a version of inode 12345 in a time gone by.  At that
time, inode 12345 may have been referenced by its parent using the name
foo instead of somedir but that won't even matter in this case because
we've only instructed the system to read the contents of a past version of
inode 12345.  In this case, we haven't told the system to do anything even
slightly related to any parent of that inode.  We're not even going to know
what name was associated with inode 12345 at that time.

At the time of somesnap, inode 12345 had contents which indicate a.txt is
inode 1000 and b.txt is inode 1050 and so on.  So a.txt and b.txt will
appear in the directory listing, and if you cat a.txt or b.txt, the system
will fetch inode 1000 or 1050 as it appeared at the time of the snapshot.

Does that help?

There is no actual entity called .snapshot  It's a magical thing, just
like there is no actual entity called .zfs  If you ls somedir or ls
somezfsfilesystem you will see, that the parent inode does not contain any
reference to anything called .snapshot or .zfs   (Unless you turned it
on for some reason.)  

However, if you cd .snapshot or cd .zfs then there's some magic behind
the scenes that's able to handle that differently.  I don't know how they do
that.  But I do know it's not listed in the inode like any other normal
child subdirectory or file.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Pool, what happen when disk failure

2010-04-26 Thread Edward Ned Harvey
 From: Ian Collins [mailto:i...@ianshome.com]
 Sent: Sunday, April 25, 2010 5:09 PM
 To: Edward Ned Harvey
 Cc: 'Robert Milkowski'; zfs-discuss@opensolaris.org
 Subject: Re: [zfs-discuss] ZFS Pool, what happen when disk failure
 
 On 04/26/10 12:08 AM, Edward Ned Harvey wrote:
 
 [why do you snip attributions?]

Nobody snipped attributions, and even if they did, get over it.  It's not
always needed for any reason.


   On 04/26/10 01:45 AM, Robert Milkowski wrote:
  The system should boot-up properly even if some pools are not
  accessible
  (except rpool of course).
  If it is not the case then there is a bug - last time I checked it
  worked perfectly fine.
 
  This may be different in the latest opensolaris, but in the latest
 solaris,
  this is what I know:
 
  If a pool fails, and forces an ungraceful shutdown, then during the
 next
  bootup, the pool is treated as currently in use by another system.
 The OS
  doesn't come up all the way; you have to power cycle again, and go
 into
  failsafe mode.  Then you can zpool import I think requiring the -f
 or -F,
  and reboot again normal.
 
 
 I think you are describing what happens if the root pool has problems.
 Other pools are just shown as unavailable.
 
 The system will come up, but failure to mount any filesystems in the
 absent pool will cause the filesystem/local service to be in
 maintenance
 state.

No.  

I don't know how to resolve this - I also have Solaris 10/09, and it's
somewhat a regular occurrence for the system to halt and refuse to come up,
because something went wrong with the external nonredundant zpool.  Namely
... the power got accidentally knocked off the external device.  Or the
device enclosure failed.  Or something like that.

So I have to do as I said, power cycle, go into failsafe mode, do a zpool
import and I'll see the external pool is in use by system blahblahblah
And then I zpool import it, with the -f or -F, and init 6.  And then the
system comes up clean.

I don't know why my experience is different from Robert's.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Dave Pooser
 
 (lots of small writes/reads), how much benefit will I see from the SAS
 interface?

In some cases, SAS outperforms SATA.  I don't know what circumstances those
are.

I think the main reason anyone buys SAS disks is for reliability reasons.  I
maintain data centers for two companies, one of which uses all SAS, and the
other uses mostly SATA.  I have replaced many SATA disks in the last 3
years, and I have never replaced a single SAS disk.

I don't know if my experience would be reflected in the published MTBF of
the disks in question.  Sometimes those numbers are sort of fudged, so I
don't trust 'em or bother to look at them.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?

2010-04-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Travis Tabbal
 
 I have a few old drives here that I thought might help me a little,
 though not at much as a nice SSD, for those uses. I'd like to speed up
 NFS writes, and there have been some mentions that even a decent HDD
 can do this, though not to the same level a good SSD will.

If your clients are mounting async don't bother.  If the clients are
mounting async, then all the writes are done asynchronously, fully
accelerated, and never any data written to ZIL log.

If you'd like to measure whether or not you have anything to gain ...

Temporarily disable the ZIL on the server.  (And remount your filesystem.)
If performance doesn't improve, then you can't gain anything by using a
dedicated ZIL device.

If performance does improve ... then you could expect to gain about half of
the difference, by using a really good SSD.  Rough numbers.  Very rough.

It's not advisable, in most cases, to leave the ZIL disabled.  It's valuable
after an ungraceful shutdown.  So I'd advise only disabling the ZIL while
you're testing for performance.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?

2010-04-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Travis Tabbal

Oh, one more thing.  Your subject says ZIL/L2ARC and your message says I
want to speed up NFS writes.

ZIL (log) is used for writes.
L2ARC (cache) is used for reads.

I'd recommend looking at the ZFS Best Practices Guide.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
 
 About SAS vs SATA, I'd guess you won't be able to see any change at
 all. The bottleneck is the drives, not the interface to them.

That doesn't agree with my understanding.  My understanding, for a single
disk, you're right.  No disk can come near the bus speed for either SATA or
SAS.  But SCSI vs ATA, the SCSI is supposed to have a more efficient bus
utilization when many disks are all doing thing simultaneously, such as they
might in a big old RAID, 48 disks, etc, like you have.

'Course none of that matters if you're serving it all over a 1Gb ether.  ;-)

I don't know under what circumstances SAS performance would be SATA.  Nor do
I know by how much.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-attaching zpools after machine termination [amazon ebs ec2]

2010-04-26 Thread Robert Milkowski

On 26/04/2010 11:14, Phillip Oldham wrote:

You don't have to do exports as I suggested to use
'zpool -R / pool'
(notice -R).
 

I tried this after your suggestion (including the -R switch) but it failed, 
saying the pool I was trying to import didn't exist.

   
which means it couldn't discover it. does 'zpool import' (no other 
options) list the pool?


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expand zpool capacity

2010-04-26 Thread Vladimir L.
It's a litle while ago, but i've found a a 
href=http://www.youtube.com/watch?v=tpzsSptzmyA;pretty helpful video on 
YT/a how to completely migrate from one harddrive to another.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?

2010-04-26 Thread Travis Tabbal
 If your clients are mounting async don't bother.
  If the clients are
 ounting async, then all the writes are done
 asynchronously, fully
 accelerated, and never any data written to ZIL log.


I've tried async, things run well until you get to the end of the job, then the 
process hangs until the write is complete. This was just with tar extracting to 
the NFS drive.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?

2010-04-26 Thread Travis Tabbal
  From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Travis Tabbal
 
 Oh, one more thing.  Your subject says ZIL/L2ARC
 and your message says I
 want to speed up NFS writes.
 
 ZIL (log) is used for writes.
 L2ARC (cache) is used for reads.
 
 I'd recommend looking at the ZFS Best Practices
 Guide.

At the end of my OP I mentioned that I was interested in L2ARC for dedupe. It 
sounds like the DDT can get bigger than RAM and slow things to a crawl. Not 
that I expect a lot from using an HDD for that, but I thought it might help. 
I'd like to get a nice SSD or two for this stuff, but that's not in the budget 
right now.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-26 Thread Richard Elling
On Apr 26, 2010, at 5:02 AM, Edward Ned Harvey wrote:

 From: Richard Elling [mailto:richard.ell...@gmail.com]
 Sent: Sunday, April 25, 2010 2:12 PM
 
 E did exist.  Inode 12345 existed, but it had a different name at the
 time
 
 OK, I'll believe you.
 
 How about this?
 
  mv a/E/c a/c
  mv a/E a/c
  mv a/c a/E
 
 The thing that's still confusing you is the idea that directory names or
 locations matter.  They don't.

Maybe directory consistency doesn't matter for MS-DOS 1.0, but I'm
pretty sure that directory consistency is useful in UNIX.

 Remember that a directory is just an inode, with text and data inside it,
 which stores an association of child names and child inode numbers.  Suppose
 somedir is inode 12345.  Then if you ls somedir/.snapshot/somesnap then
 the system is reading a version of inode 12345 in a time gone by.  At that
 time, inode 12345 may have been referenced by its parent using the name
 foo instead of somedir but that won't even matter in this case because
 we've only instructed the system to read the contents of a past version of
 inode 12345.  In this case, we haven't told the system to do anything even
 slightly related to any parent of that inode.  We're not even going to know
 what name was associated with inode 12345 at that time.
 
 At the time of somesnap, inode 12345 had contents which indicate a.txt is
 inode 1000 and b.txt is inode 1050 and so on.  So a.txt and b.txt will
 appear in the directory listing, and if you cat a.txt or b.txt, the system
 will fetch inode 1000 or 1050 as it appeared at the time of the snapshot.
 
 Does that help?

I completely understand this.  No magic here.

 There is no actual entity called .snapshot  It's a magical thing, just
 like there is no actual entity called .zfs  If you ls somedir or ls
 somezfsfilesystem you will see, that the parent inode does not contain any
 reference to anything called .snapshot or .zfs   (Unless you turned it
 on for some reason.)  

Yes. And you agree that the relationship to parent directories does
not matter, correct? In other words, a tool that looks at either the parent 
or child snapshot directories is useless. Put another way, you cannot
implement something like time machine using directory-level snapshot 
subdirectories.

 However, if you cd .snapshot or cd .zfs then there's some magic behind
 the scenes that's able to handle that differently.  I don't know how they do
 that.  But I do know it's not listed in the inode like any other normal
 child subdirectory or file.


'nuff said
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Richard Elling
On Apr 25, 2010, at 10:02 PM, Dave Pooser wrote:

 I'm building another 24-bay rackmount storage server, and I'm considering
 what drives to put in the bays. My chassis is a Supermicro SC846A, so the
 backplane supports SAS or SATA; my controllers are LSI3081E, again
 supporting SAS or SATA.
 
 Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM
 drive in both SAS and SATA configurations; the SAS model offers one quarter
 the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and
 costs 10% more than its enterprise SATA twin. (They also offer a Barracuda
 XT SATA drive; it's roughly 20% less expensive than the Constellation drive,
 but rated at 60% the MTBF of the others and a predicted rate of
 nonrecoverable errors an order of magnitude higher.)
 
 Assuming I'm going to be using three 8-drive RAIDz2 configurations, and
 further assuming this server will be used for backing up home directories
 (lots of small writes/reads), how much benefit will I see from the SAS
 interface?

For a single connection from a host to a disk, they are basically equivalent.
SAS shines with multiple connections to one or more hosts.  Hence, SAS 
is quite popular when implementing HA clusters.

Note: drive differentiation is market driven, not technology driven.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread tim Kries
Hi,

The setting was this:

Fresh installation of 2008 R2 - server backup with the backup feature - move 
vhd to zfs - install active directory role - backup again - move vhd to same 
share


I am kinda confused over the change of dedup ratio from changing the record 
size, since it should dedup 256-bit blocks.

I have to set up the opensolaris again since it died in my virtualbox (no sure 
why), so i cant test more server installations atm.

Compression seemed to work pretty good (i used gzip-6) and i think it was 
compress ratio ~4, but i dont think that would work well for productive systems 
since you would need some serious cpu-power to work with.

I will setup up another test in a few hours.

Personally i am not sure if using clones might be a good idea for windows 
server 2008, all these problems with sid...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to delegate zfs snapshot destroy to users?

2010-04-26 Thread Cindy Swearingen

Hi Vlad,

The create-time permissions do not provide the correct permissions for
destroying descendent datasets, such as clones.

See example 9-5 in this section that describes how to use zfs allow -d
option to grant permissions on descendent datasets:

http://docs.sun.com/app/docs/doc/819-5461/gebxb?l=ena=view

Example 9–5 Delegating Permissions at the Correct File System Level

Delegating or granting the appropriate permissions will take some
testing on the part of the administrator who is granting the
permissions. I hope the examples help.

Thanks,

Cindy



On 04/26/10 05:28, Vladimir Marek wrote:

Hi,

I'm trying to let zfs users to create and destroy snapshots in their zfs
filesystems.

So rpool/vm has the permissions:

osol137 19:07 ~: zfs allow rpool/vm
 Permissions on rpool/vm -
Permission sets:
@virtual 
clone,create,destroy,mount,promote,readonly,receive,rename,rollback,send,share,snapshot,userprop
Create time permissions:
@virtual
Local permissions:
group staff create,mount


now as regular user I do:

$ zfs create rpool/vm/vm156888
$ zfs create rpool/vm/vm156888/a
$ zfs snapshot rpool/vm/vm156888/a...@1
$ zfs destroy rpool/vm/vm156888/a...@1
cannot destroy 'rpool/vm/vm156888/a...@1': permission denied


The only way around I found is to add 'allow' right to the @virtual
group

sudo zfs allow -s @virtual allow rpool/vm

Now as regular user I can:

zfs allow vm156888 mount,destroy rpool/vm/vm156888/a
zfs destroy rpool/vm/vm156888/a...@1

I believe that I need to do this because the Create time permissions
are used only as Local permissions on new filesystem, while for
deleting snapshot I need them as Local+Descendent.


So user if he wants to use snapshots, he has to know to grant himself
mount+delete permissions first. Is this the intended way to go?

Thank you

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives

2010-04-26 Thread Richard Elling
luxadm(1m) has a led_blink subcommand you might find useful.
 -- richard

On Apr 25, 2010, at 10:21 PM, Dave Pooser wrote:

 I have one storage server with 24 drives, spread across three controllers
 and split into three RAIDz2 pools. Unfortunately, I have no idea which bay
 holds which drive. Fortunately, this server is used for secondary storage so
 I can take it offline for a bit. My plan is to use zpool export to take each
 pool offline and then dd to do a sustained read off each drive in turn and
 watch the blinking lights to see which drive is which. In a nutshell:
 zpool export uberdisk1
 zpool export uberdisk2
 zpool export uberdisk3
 dd if=/dev/rdsk/c9t0d0 of=/dev/null
 dd if=/dev/rdsk/c9t1d0 of=/dev/null
 [etc. 22 more times]
 zpool import uberdisk1
 zpool import uberdisk2
 zpool import uberdisk3
 
 Are there any glaring errors in my reasoning here? My thinking is I should
 probably identify these disks before any problems develop, in case of
 erratic read errors that are enough to make me replace the drive without
 being enough to make the hardware ID it as bad.
 -- 
 Dave Pooser, ACSA
 Manager of Information Services
 Alford Media  http://www.alfordmedia.com
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expand zpool capacity

2010-04-26 Thread Cindy Swearingen

Yes, it is helpful in that it reviews all the steps needed to get the
replacement disk labeled properly for a root pool and is identical
to what we provide in the ZFS docs.

The part that is not quite accurate is the reasons for having to relabel 
the replacement disk with the format utility.


If the replacement disk had an identical slice 0 (same size or greater)
with an SMI label then no need exists to relabel the disk. In this case,
he could have just attached the replacement disk, installed the boot
blocks, tested booting from the replacement disk, and detached the older 
disk.


If replacement disk had an EFI label or no slice 0, or a slice 0 that
is too small, then yes, you have to perform the format steps as
described in this video.

Thanks,

Cindy
On 04/26/10 08:24, Vladimir L. wrote:

It's a litle while ago, but i've found a a 
href=http://www.youtube.com/watch?v=tpzsSptzmyA;pretty helpful video on YT/a how to 
completely migrate from one harddrive to another.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Chris Du
SAS: full duplex
SATA: half duplex

SAS: dual port
SATA: single port (some enterprise SATA has dual port)

SAS: 2 active channel - 2 concurrent write, or 2 read, or 1 write and 1 read
SATA: 1 active channel - 1 read or 1 write

SAS: Full error detection and recovery on both read and write
SATA: error detection and recovery on write, only error detection on read

If you connect only one disk per port, not a big deal. If you connect multiple 
disks to raid card, or through backplane, expander, SAS makes big difference on 
reliability. 

If I had the money, I always go with SAS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help:Is zfs-fuse's performance is not good

2010-04-26 Thread Roy Sigurd Karlsbakk
- Tonmaus sequoiamo...@gmx.net skrev:

 I wonder if this is the right place to ask, as the Filesystem in User
 Space implementation is a separate project. In Solaris ZFS runs in
 kernel. FUSE implementations are slow, no doubt. Same goes for other
 FUSE implementations, such as for NTFS.

The classic answers from (open)solaris folks would be 'Why not run 
(open)solaris?' and 'why don't you just try it out yourself?'

The zfs fuse project will give you most of the nice zfs stuff, but it probably 
won't give you the same performance. I don't think opensolaris has been 
compared to FUSE ZFS, but it might be interesting to see that.

roy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Brandon High
On Sun, Apr 25, 2010 at 10:02 PM, Dave Pooser dave@alfordmedia.com wrote:
 Assuming I'm going to be using three 8-drive RAIDz2 configurations, and
 further assuming this server will be used for backing up home directories
 (lots of small writes/reads), how much benefit will I see from the SAS
 interface?

SAS drives are generally intended to be used in a multi-drive / RAID
environment, and are delivered with TLER / CCTL / ERC enabled to
prevent them from falling out of arrays when they hit a read error.

SAS drives will generally have a longer warranty than desktop drives.

The SMART command set in ATA-7 and the ATA-8 spec should eliminate the
distinction, but until it's fully supported by manufacturers desktop
drives may not degrade as gracefully in an array when hitting an
error. From what I've read, both WD and Seagate desktop drives ignore
the ERC command. Samsung drives are reported to work, and I'm not sure
about Hitachi.

So far as backplanes are concerned - You can connect the backplane
with SAS and still use SATA drives.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Roy Sigurd Karlsbakk
- Brandon High bh...@freaks.com skrev:

 SAS drives are generally intended to be used in a multi-drive / RAID
 environment, and are delivered with TLER / CCTL / ERC enabled to
 prevent them from falling out of arrays when they hit a read error.
 
 SAS drives will generally have a longer warranty than desktop drives.

With 2TB drives priced at €150 or lower, I somehow think paying for drive 
lifetime is far more expensive than getting a few more drives and add redundancy

Just my 2c

roy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10 and ZFS dedupe status

2010-04-26 Thread Roy Sigurd Karlsbakk
- Neil Simpson neil.simp...@sun.com skrev:
 I'm pretty sure Solaris 10 update 9 will have zpool version 22 so WILL
 have dedup.

Interesting - from where do you have this information?

roy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Dave Pooser
On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote:

 SAS shines with multiple connections to one or more hosts.  Hence, SAS
 is quite popular when implementing HA clusters.

So that would be how one builds something like the active/active controller
failover in standalone RAID boxes. Is there a good resource on doing
something like that with an OpenSolaris storage server? I could see that as
a project I might want to attempt.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Bob Friesenhahn

On Mon, 26 Apr 2010, Roy Sigurd Karlsbakk wrote:


SAS drives will generally have a longer warranty than desktop drives.


With 2TB drives priced at €150 or lower, I somehow think paying for 
drive lifetime is far more expensive than getting a few more drives 
and add redundancy


This really depends on if you are willing to pay in advance, or pay 
after the failure.  Even with redundancy, the cost of a failure may be 
high due to loss of array performance and system administration time. 
Array performance may go into the toilet during resilvers, depending 
on the redundancy configuration and the type of drives used.


All types of drives fail but typical SATA drives fail more often.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread tim Kries
I found the VHD specification here:

http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc

I am not sure if i understand it right, but it seems like data on disk gets 
compressed into the vhd (no empty space), so even a slight difference in the 
beginning of the file will slide through and ruin the pattern for block based 
dedup.

As I am not an expert on file systems, someone with more expertise would be 
appreciated to look at this.

Would be a real shame.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help:Is zfs-fuse's performance is not good

2010-04-26 Thread Brandon High
On Mon, Apr 26, 2010 at 9:43 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net 
wrote:
 The zfs fuse project will give you most of the nice zfs stuff, but it 
 probably won't give you the same performance. I don't think opensolaris has 
 been compared to FUSE ZFS, but it might be interesting to see that.

AFAIK zfs-fuse hasn't been updated recently, so it's implementing a
version of zfs from over a year ago. There have been numerous
performance and stability improvements in that time.

If you really, really want to use zfs and linux, run OpenSolaris and
set up a linux xen or Virtualbox instance.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making ZFS better: zfshistory

2010-04-26 Thread Ragnar Sundblad

On 25 apr 2010, at 20.12, Richard Elling wrote:

 On Apr 25, 2010, at 5:45 AM, Edward Ned Harvey wrote:
 
 From: Richard Elling [mailto:richard.ell...@gmail.com]
 Sent: Saturday, April 24, 2010 7:42 PM
 
 Next,
 mv /a/e /a/E
 ls -l a/e/.snapshot/snaptime
 
 ENOENT?
 
 ls -l a/E/.snapshot/snapname/d.txt
 
 this should be ENOENT because d.txt did not exist in a/E at snaptime.
 
 Incorrect.  
 
 E did exist.  Inode 12345 existed, but it had a different name at the time
 of snapshot.  Therefore, 
 a/e/.snapshot/snapname/c/d.txt  is the file at the time of snapshot.
 But these are also the same thing:
 a/E/.snapshot/snapname/c/d.txt
 a/E/c/.snapshot/snapname/d.txt
 
 OK, I'll believe you.
 
 How about this?
 
   mv a/E/c a/c
   mv a/E a/c
   mv a/c a/E
 
 now a/E/.snapshot/snapname/c/d.txt is ENOENT, correct?

Sadly I can't test it myself right now, maybe someone else can,
but I'd except:
[start: we have a file: a/E/c/d.txt]
[snap1]
mv a/E/c a/c
[snap2]
mv a/E a/c
mv a/c a/E
would result in:
 a/.snapshot/snap1/E/c/d.txt
 a/.snapshot/snap2/E/ (empty)
 a/.snapshot/snap2/c/d.txt
 a/E/.snapshot/snap1/c/d.txt
 a/E/.snapshot/snap2/ (empty)
 a/E/ (empty)

Wouldn't that be logical, and what would be the problem?

 It would be very annoying if you could have a directory named foo which
 contains all the snapshots for its own history, and then mv foo bar and
 suddenly the snapshots all disappear.  This is not the behavior.
 
 The behavior is:  If you mv foo bar then the snapshots which were
 previously accessible under foo are now accessible under bar.  However, if
 you look in the snapshot of foo's parent, then you will see foo and not
 bar.   Just the way it would have looked, at the time of the snapshot.
 
 
 The only way I know to describe this is that the path is lost.
 In other words, you cannot say ../.snapshot/snapname/self is the same as
 self/.snapshot/snapname, thus the relationship previously described as:
 
   Snapshots are taken.  You can either file.txt via any of the following:
 /root/.snapshot/branch/leaf/file.txt
 /root/branch/.snapshot/leaf/file.txt
 /root/branch/leaf/.snapshot/file.txt
 
 is not guaranteed to be correct.

No, not if the hierarchy is changed between the snapshots, I think
it was just a way to illustrate how the .snapshot directories work.

It isn't in zfs either, if the example above would be a zfs, we would
have:
a/.zfs/snapshot/snap1/E/c/d.txt
a/.zfs/snapshot/snap2/c/d.txt
a/E/ (empty)

I still don't understand why the OnTap model is losing more paths
than zfs.

I'd be happy if you could take one more shot at explaining.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to delegate zfs snapshot destroy to users?

2010-04-26 Thread Vladimir Marek
Hi Cindy,

 The create-time permissions do not provide the correct permissions for
 destroying descendent datasets, such as clones.
 
 See example 9-5 in this section that describes how to use zfs allow -d
 option to grant permissions on descendent datasets:
 
 http://docs.sun.com/app/docs/doc/819-5461/gebxb?l=ena=view

Ah I was missing the fact, that subsequent snapshots inherit access
modes. So simple

zfs allow -d -g staff mount,destroy rpool/vm

fixed things for me.

Thank you !
-- 
Vlad
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making an rpool smaller?

2010-04-26 Thread Brandon High
On Fri, Apr 16, 2010 at 4:41 PM, Brandon High bh...@freaks.com wrote:
 When I set up my opensolaris system at home, I just grabbed a 160 GB
 drive that I had sitting around to use for the rpool.

Just to follow up, after testing in Virtualbox, my initial plan is
very close to what worked. This is what I did:

1. Shutdown the system and attach the new drives.
2. Reboot from LiveCD or USB installer.
3. Run 'format' to set up the new drive(s).
4. zpool create -f -R /mnt/rpool_new rpool_new ${NEWDRIVE_DEV}s0
5. zpool import -o ro -R /mnt/rpool_old -f rpool
6. zfs send all datasets from rpool to rpool_new
7. installgrub /boot/grub/stage1 /boot/grub/stage2
/dev/rdsk/${NEWDRIVE_DEV}s0 on the ssd
8. zfs mount rpool_new/ROOT/snv_133 and delete
/mnt/rpool_new/etc/zfs/zpool.cache
9. zfs export the rpool and rpool_new
10. 'zpool import -R /mnt/rpool new_rpool rpool' to rename the pool.
(Not needed except to be OCD)
11. 'zpool export rpool'
12. Disconnect the original drive and boot from your new root.

After that, it just worked. I tested it again with a physical box that
boots off of USB thumb drives as well. The only caveat with that is
you must use 'format -e' to partition the thumb drives. Oh, and wait a
LONG time, because most flash drives are really, really slow.

You could also do this from a non-LiveCD environment, but the name
rpool may already be in use.

If you move the new drive to the original's port, you don't need to
delete the zpool.cache. It would be nice if there was a boot flag you
could use to ignore the zpool.cache so you don't have to boot into
another environment when the device moves.

Another benefit of doing the above is that you can enable compression
and dedup on the rpool prior to the send, which gives you creamy
compressed dedup goodness on your entire rpool. No matter how
tempting, don't use gzip-9 compression. I learned the hard way that
grub doesn't support it.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Roy Sigurd Karlsbakk
 This really depends on if you are willing to pay in advance, or pay 
 after the failure.  Even with redundancy, the cost of a failure may be
 
 high due to loss of array performance and system administration time.
 
 Array performance may go into the toilet during resilvers, depending 
 on the redundancy configuration and the type of drives used.
 
 All types of drives fail but typical SATA drives fail more often.

Failure ratio does not depend on interface. Enterprise grade SATA drives have 
the same build quality as with their SAS brothers and sisters. With RAIDz2 or 
-3, you're quite sure things will work fine even after a disk failure, and the 
performance penalty isn't that bad. Choosing SAS over SATA for a single setup 
must be more of a religious approach

roy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Gary Mills
On Mon, Apr 26, 2010 at 01:32:33PM -0500, Dave Pooser wrote:
 On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote:
 
  SAS shines with multiple connections to one or more hosts.  Hence, SAS
  is quite popular when implementing HA clusters.
 
 So that would be how one builds something like the active/active controller
 failover in standalone RAID boxes. Is there a good resource on doing
 something like that with an OpenSolaris storage server? I could see that as
 a project I might want to attempt.

This is interesting.  I have a two-node SPARC cluster that uses a
multi-initiator SCSI array for shared storage.  As an application
server, it need only two disks in the array.  They are a ZFS mirror.
This all works quite nicely under Sun Cluster.

I'd like to duplicate this configuration with two small x86 servers
and a small SAS array, also with only two disks.  It should be easy to
find a pair of 1U servers, but what's the smallest SAS array that's
available?  Does it need an array controller?  What's needed on the
servers to connect to it?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Spare in use althought disk is healthy ?

2010-04-26 Thread Lutz Schumann
Hello list, 

a pool shows some strange status: 

volume: zfs01vol
 state: ONLINE
 scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38
2010
config:

NAME   STATE READ WRITE CKSUM
zfs01vol   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t4d0 ONLINE   0 0 0
c3t4d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t5d0 ONLINE   0 0 0
c3t5d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t8d0 ONLINE   0 0 0
c3t8d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t9d0 ONLINE   0 0 0
c3t9d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t12d0ONLINE   0 0 0
spare  ONLINE   0 0 0
  c3t12d0  ONLINE   0 0 0
  c3t21d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t13d0ONLINE   0 0 0
c3t13d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t16d0ONLINE   0 0 0
c3t16d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t17d0ONLINE   0 0 0
c3t17d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t20d0ONLINE   0 0 0
c3t20d0ONLINE   0 0 0
logs   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t0d0 ONLINE   0 0 0
c3t0d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t1d0 ONLINE   0 0 0
c3t1d0 ONLINE   0 0 0
cache
  c0t0d0   ONLINE   0 0 0
  c0t1d0   ONLINE   0 0 0
  c0t2d0   ONLINE   0 0 0
spares
  c2t21d0  AVAIL
  c3t21d0  INUSE currently in use

The spare is in use, altought there is no failed disk in the pool.

Can anyone interpret this ? Is this a bug ? 

Thanks, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare in use althought disk is healthy ?

2010-04-26 Thread Ian Collins

On 04/27/10 09:41 AM, Lutz Schumann wrote:

Hello list,

a pool shows some strange status:

volume: zfs01vol
  state: ONLINE
  scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38
   



   mirror   ONLINE   0 0 0
 c2t12d0ONLINE   0 0 0
 spare  ONLINE   0 0 0
   c3t12d0  ONLINE   0 0 0
   c3t21d0  ONLINE   0 0 0
   



 spares
   c2t21d0  AVAIL
   c3t21d0  INUSE currently in use

The spare is in use, altought there is no failed disk in the pool.

Can anyone interpret this ? Is this a bug ?

   

Was the drive c3t12d0 replaced or faulty at some point?

You should be able to detach the spare.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
 
 With 2TB drives priced at €150 or lower, I somehow think paying for
 drive lifetime is far more expensive than getting a few more drives and
 add redundancy

If you have a 48-disk enclosure, and you've configured 6x 8disk raid-6 or 
raidz2 volumes, how do you add more disks to increase redundancy?

Point is:  Adding disks often means adding slots, and since adding slots ain't 
free, it would generally translate not so much as adding slots, but decreasing 
the number of usable drive capacities...  And keeping an inventory of offline 
spares for the sake of immediate replacement upon failure.

Also, you'll only find the cheapest generic disks available at the stated 
price.  If you have one of those disks fail 6 months from now, you will not be 
able to purchase that model drive again.  (God forbid you should have to 
replace one 3 yrs from now, when the current implementation of SAS or SATA 
isn't even for sale anymore, and you can't even get a suitable equivalent 
replacement.)

I hate it whenever people over-simplify and say disk is cheap.

Also, if you've got all those disks in an array, and they're MTBF is ... let's 
say 25,000 hours ... then 3 yrs later when they begin to fail, they have a 
tendency to all fail around the same time, which increases the probability of 
exceeding your designed level of redundancy.

I recently bought 2x 1Tb disks for my sun server, for $650 each.  This was 
enough to make me do the analysis, why am I buying sun branded overpriced 
disks?  Here is the abridged version:

We recently had an Apple XRAID system lose a disk.  It's 3 yrs old.  It uses 
500G ATA-133 disks, which are not available from anywhere at any price...  
Except Apple was willing to sell us one for $1018.  Naturally, we declined to 
make that purchase.  We did find some disks available from various sources, 
which should be equivalent, but not Apple branded or certified; functional 
equivalents but not identical.  Prices around $200 to $300.

I asked around, apple admins who had used generic disks in their Xraid systems. 
 About 50% said they used generic disks with no problem.  The other 50% were 
mixed between we used generic disks, seemed to work, but had strange problems 
like horrible performance or disk suddenly going offline and coming back online 
again spontaneously and we tried to use generic disks, but the system refused 
to even acknowledge the disk present in the system.

Also, take a look in the present mailing list, many people complaining of 
drives with firmwares that incorrectly acknowledge cache flushes before they're 
actually flushed.  Even then, we're talking about high end Intel SSD's.  And 
the consequence of incorrect firmware is data loss.  Maybe even pool loss.

The reason why we pay for overpriced disks is to get the manufacturer's seal of 
approval, the Apple or Sun or Dell branded firmware.  The availability of mfgr 
warranties, the long-term supportability.  It costs about 4x-5x more per disk 
to buy up front, but since you have to buy 2x as many generic disks (for the 
sake of spare inventory availability) you're only paying 2x overall, and you 
can rest much more assured in the stability.

Even at the higher hardware price, the value of the data is presumed to be much 
greater than the cost of the hardware.  So then it's easy to justify higher 
cost hardware, with the belief it'll be somehow lower data risk.

Sometimes people will opt for cheaper.  Sometimes people will opt for lower 
risk.

I just hate it when people oversimplify and say disk is cheap.  That is so 
over simplified, it doesn't benefit anyone.

end rant  begin breathe ...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare in use althought disk is healthy ?

2010-04-26 Thread Cindy Swearingen

Hi Lutz,

You can try the following commands to see what happened:

1. Someone else replaced the disk with a spare, which would be
recorded in this command:

# zpool history -l zfs01vol

2. If the disk had some transient outage then maybe the spare kicked
in. Use the following command to see if something happened to this
disk:

# fmdump -eV

This command might produce a lot of output, but look for c3t12d0
occurrences.

3. If the c3t12d0 disk is okay, try detaching the spare back to the
spare pool like this:

# zpool detach zfs01vol c3t21d0

Thanks,

Cindy

On 04/26/10 15:41, Lutz Schumann wrote:
Hello list, 

a pool shows some strange status: 


volume: zfs01vol
 state: ONLINE
 scrub: scrub completed after 1h21m with 0 errors on Sat Apr 24 04:22:38
2010
config:

NAME   STATE READ WRITE CKSUM
zfs01vol   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t4d0 ONLINE   0 0 0
c3t4d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t5d0 ONLINE   0 0 0
c3t5d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t8d0 ONLINE   0 0 0
c3t8d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t9d0 ONLINE   0 0 0
c3t9d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t12d0ONLINE   0 0 0
spare  ONLINE   0 0 0
  c3t12d0  ONLINE   0 0 0
  c3t21d0  ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t13d0ONLINE   0 0 0
c3t13d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t16d0ONLINE   0 0 0
c3t16d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t17d0ONLINE   0 0 0
c3t17d0ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t20d0ONLINE   0 0 0
c3t20d0ONLINE   0 0 0
logs   ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t0d0 ONLINE   0 0 0
c3t0d0 ONLINE   0 0 0
  mirror   ONLINE   0 0 0
c2t1d0 ONLINE   0 0 0
c3t1d0 ONLINE   0 0 0
cache
  c0t0d0   ONLINE   0 0 0
  c0t1d0   ONLINE   0 0 0
  c0t2d0   ONLINE   0 0 0
spares
  c2t21d0  AVAIL
  c3t21d0  INUSE currently in use

The spare is in use, altought there is no failed disk in the pool.

Can anyone interpret this ? Is this a bug ? 

Thanks, 
Robert

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk/Partition replacement - do partition begin/end/offsets matter?

2010-04-26 Thread devsk
I went through with it and it worked fine. So, I could successfully move my ZFS 
device to the beginning of the new disk.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data movement across filesystems within a pool

2010-04-26 Thread devsk
Could this be a future enhancement for ZFS? Like provide 'zfs move fs1/path1 
fs2/path2', which will do the needful without really copying anything?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Brandon High
On Mon, Apr 26, 2010 at 8:51 AM, tim Kries tim.kr...@gmx.de wrote:
 I am kinda confused over the change of dedup ratio from changing the record 
 size, since it should dedup 256-bit blocks.

Dedup works on the blocks or either recordsize or volblocksize. The
checksum is made per block written, and those checksums are used to
dedup the data.

With a recordsize of 128k, two blocks with a one byte difference would
not dedup. With an 8k recordsize, 15 out of 16 blocks would dedup.
Repeat over the entire VHD.

Setting the record size equal to a multiple of the VHD's internal
block size and ensuring that the internal filesystem is block aligned
will probably help to improve dedup ratios. So for an NTFS guest with
4k blocks, use a 4k, 8k or 16k record size and ensure that when you
install in the VHD that its partitions are block aligned for the
recordsize you're using.

VHD supports fixed size and dynamic size images. If you're using a
fixed image, the space is pre-allocated. This doesn't mean you'll
waste unused space on ZFS with compression, since all those zeros will
take up almost no space. Your VHD file should remain block-aligned
however. I'm not sure that a dynamic size image will block align if
there is empty space. Using compress=zle will only compress the zeros
with almost no cpu penalty.

Using a COMSTAR iscsi volume is probably an even better idea, since
you won't have the POSIX layer in the path, and you won't have the VHD
file header throwing off your block alignment.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on drives for ZIL/L2ARC?

2010-04-26 Thread Brandon High
On Mon, Apr 26, 2010 at 8:01 AM, Travis Tabbal tra...@tabbal.net wrote:
 At the end of my OP I mentioned that I was interested in L2ARC for dedupe. It 
 sounds like the DDT can get bigger than RAM and slow things to a crawl. Not 
 that I expect a lot from using an HDD for that, but I thought it might help. 
 I'd like to get a nice SSD or two for this stuff, but that's not in the 
 budget right now.

A large DDT will require a lot of random reads, which isn't an ideal
use case for a spinning disk. Plus, 10k disks are loud and hot.

You can get a 30-40gb ssd for about $100 these days. It doesn't matter
if a disk for the L2ARC obeys cache flushing, etc. Regardless of
whether the host is shutdown cleanly or not, the L2ARC starts cold. It
doesn't matter if the data is corrupted, because a failed checksum
will cause the pool to go back to the data disks.

As far as using 10k disks for a slog, it depends on what kind of
drives are in your pool and how it's laid out. If you have a wide
raidz stripe on slow disks, just about anything will help. If you've
got striped mirrors on fast disks, then it probably won't help much,
especially for what sounds like a server with a small number of
clients.

I've got an OCZ Vertex 30gb drive with a 1GB stripe used for the slog
and the rest used for the L2ARC, which for ~ $100 has been a nice
boost to nfs writes.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread James C. McPherson

On 26/04/10 03:02 PM, Dave Pooser wrote:

I'm building another 24-bay rackmount storage server, and I'm considering
what drives to put in the bays. My chassis is a Supermicro SC846A, so the
backplane supports SAS or SATA; my controllers are LSI3081E, again
supporting SAS or SATA.

Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM
drive in both SAS and SATA configurations; the SAS model offers one quarter
the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and
costs 10% more than its enterprise SATA twin. (They also offer a Barracuda
XT SATA drive; it's roughly 20% less expensive than the Constellation drive,
but rated at 60% the MTBF of the others and a predicted rate of
nonrecoverable errors an order of magnitude higher.)

Assuming I'm going to be using three 8-drive RAIDz2 configurations, and
further assuming this server will be used for backing up home directories
(lots of small writes/reads), how much benefit will I see from the SAS
interface?


I would expect to see the SAS drives have built-in support
for multipathing, with no extra hardware required.

Also, hear yourself chanting but SAS is more ENTERPRISEY
over and over again :-)

I don't know of any other specific difference between Enterprise
SATA and SAS drives.


James C. McPherson
--
Senior Software Engineer, Solaris
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] rpool on ssd. endurance question.

2010-04-26 Thread Yuri Vorobyev

Hello.

If anybody uses SSD for rpool more than half-year, can you post SMART 
information about HostWrites attribute?


I want to see how SSD wear for system disk purposes.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Daniel Carosone
On Mon, Apr 26, 2010 at 10:02:42AM -0700, Chris Du wrote:
 SAS: full duplex
 SATA: half duplex
 
 SAS: dual port
 SATA: single port (some enterprise SATA has dual port)
 
 SAS: 2 active channel - 2 concurrent write, or 2 read, or 1 write and 1 read
 SATA: 1 active channel - 1 read or 1 write
 
 SAS: Full error detection and recovery on both read and write
 SATA: error detection and recovery on write, only error detection on read

SAS:  Full SCSI TCQ
SATA: Lame ATA NCQ

--
Dan.



pgpfPAxGyNIbj.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rpool on ssd. endurance question.

2010-04-26 Thread Paul Gress

On 04/26/10 11:54 PM, Yuri Vorobyev wrote:

Hello.

If anybody uses SSD for rpool more than half-year, can you post SMART 
information about HostWrites attribute?


I want to see how SSD wear for system disk purposes.



I'd be happy to, exactly what commands shall I run?

Paul
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss