Re: [zfs-discuss] Server upgrade

2012-02-20 Thread David Dyer-Bennet

On Thu, February 16, 2012 11:18, Paul Kraus wrote:
 On Thu, Feb 16, 2012 at 11:42 AM, David Dyer-Bennet d...@dd-b.net wrote:

 I'm seriously thinking of going Nexenta, as I think it would let me be a
 little less of a sysadmin.  Solaris 11 express is tempting in its own
 way
 though, if I decide the price is tolerable.

 I looked at the Nexenta route, and while it is _very_ attractive,
 I need my home server to function as DHCP and DNS server as well (and
 a couple other services would be nice as well). Since Nexenta is a
 storage appliance, I could not go that route and get what I needed
 without hacking into it.

Ah, that might be a problem.  Not those specific services currently, but I
do now and then run things.  MRTG, maybe Nagios, are on the list to do
(though it's so much harder to get anything like that going on Solaris,
I'm tempted to run a linux virtual server; that would be on the same box
though, so still a problem).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-16 Thread David Dyer-Bennet

On Thu, February 16, 2012 08:54, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Dyer-Bennet

 While I'm not in need of upgrading my server at an emergency level, I'm
 starting to think about it -- to be prepared (and an upgrade could be
 triggered by a failure at this point; my server dates to 2006).

 There are only a few options for you to consider.  I don't know which ones
 support encryption, or which ones offer an upgrade path from your version
 of
 opensolaris, but I figure you can probably easily evaluate each of the
 options for your own purposes.

 No matter which you use, I assume you will be exporting the data pool, and
 later importing it.  But the OS will either need to be wiped and
 reinstalled
 from scratch, or obviously, follow your upgrade path (which has never
 worked
 for me; I invariably end up wiping the OS and reinstalling.  Good thing I
 keep documentation about how I configure my OS.)

This is already getting useful; which has never worked for me for
example is the sort of observation I find informative, since I've been
seeing your name around here for some time and have the general impression
that you're not stupid or incompetent.

Yeah, I'll try to export and import the pool.  AND I'll have three current
backups on external drives, at least one out of the house and at least one
in the house :-).  I'm kind of fond of this data, and wouldn't like
anything to happen to it (I could recover some of the last decade of
photography from optical disks, with a lot of work, and the online copies
would remain but those aren't high-res).

 Nexenta, OpenIndiana, Solaris 11 Express (free version only permitted for
 certain uses, no regular updates available), or commercial Solaris.

 If you consider paying for solaris - at Oracle, you just pay them for An
 OS and they don't care which one you use.  Could be oracle linux,
 solaris,
 or solaris express.  I would recommend solaris 11 express based on
 personal
 experience.  It gets bugfixes and new features sooner than commercial
 solaris.

I was going to say the commercial version wasn't an option -- but on
consideration, I haven't done the research to determine that.  So that's a
task (how hard can it be to find out how much they want?).

Listing the options is extremely useful, in fact.  Even though I've heard
of all of them, seeing how you group things helps me too.

I'm seriously thinking of going Nexenta, as I think it would let me be a
little less of a sysadmin.  Solaris 11 express is tempting in its own way
though, if I decide the price is tolerable.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-16 Thread David Dyer-Bennet

On Thu, February 16, 2012 13:31, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Dyer-Bennet

 This is already getting useful; which has never worked for me for
 example is the sort of observation I find informative, since I've been
 seeing your name around here for some time and have the general
 impression
 that you're not stupid or incompetent.

 Just because I talk a lot doesn't mean I'm not stupid or incompetent.  ;-)

I resemble that remark!

But, slightly more seriously, I've read what you said, not just noticed
the volume :-).

 Never worked for me, in this case, basically means I tried upgrading
 from
 one opensolaris to another... which went horribly wrong...  And even when
 applying system updates (paid commercial solaris 10 support, applying
 security patches etc) those often cause problems too.  But I wouldn't call
 them horribly wrong.

I've gotten at least that to work a few times.  But for me, keeping up
with OS upgrades is one of the most important sysadmin tasks.  Otherwise,
you're leaving unpatched vulnerabilities sitting around.

 I was going to say the commercial version wasn't an option -- but on
 consideration, I haven't done the research to determine that.  So that's
 a
 task (how hard can it be to find out how much they want?).

 You mean, how much it costs?  http://oracle.com  click on Store, and
 Solaris.  Looks like $1,000 per socket per year for 1-4 sockets.

You beat me to it.  And if that's the order of magnitude, then I was right
the first time, the commercial versions are completely out of the
question.  I might, if I felt really friendly towards Oracle, consider a
one-shot payment of 1/10 or maybe a little more :-).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-16 Thread David Dyer-Bennet

On Wed, February 15, 2012 18:06, Brandon High wrote:
 On Wed, Feb 15, 2012 at 9:16 AM, David Dyer-Bennet d...@dd-b.net wrote:
 Is there an upgrade path from (I think I'm running Solaris Express) to
 something modern?  (That could be an Oracle distribution, or the free

 There *was* an upgrade path from snv_134 to snv_151a (Solaris 11
 Express) but I don't know if Oracle still supports it. There was an
 intermediate step or two along the way (snv_134b I think?) to move
 from OpenSolaris to Oracle Solaris.

 As others mentioned, you could jump to OpenIndiana from your current
 version. You may not be able to move between OI and S11 in the future,
 so it's a somewhat important decision.

Thanks.  Given the pricing for commercial Solaris versions, I don't think
moving to them is likely to ever be important to me.  It looks like OI and
Nexenta are the viable choices I have to look at.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Server upgrade

2012-02-15 Thread David Dyer-Bennet
While I'm not in need of upgrading my server at an emergency level, I'm
starting to think about it -- to be prepared (and an upgrade could be
triggered by a failure at this point; my server dates to 2006).

I'm actually more concerned with software than hardware.  My load is
small, the current hardware is handling it no problem.  I don't see myself
as a candidate for dedup, so I don't need to add huge quantities of RAM. 
I'm handling compression on backups just fine (the USB external disks are
the choke-point, so compression actually speeds up the backups).

I'd like to be on a current software stream that I can easily update with
bug-fixes and new features.  The way I used to do that got broke in the
Oracle takeover.

I'm interested in encryption for my backups, if that's functional (and
safe) in current software versions.  I take copies off-site, so that's a
useful precaution.

Whatever I do, I'll of course make sure my backups are ALL up-to-date and
at least one is back off-site before I do anything drastic.

Is there an upgrade path from (I think I'm running Solaris Express) to
something modern?  (That could be an Oracle distribution, or the free
software fork, or some Nexenta distribution; my current data pool is 1.8T,
and I don't expect it to grow terribly fast, so the fully-featured free
version fits my needs for example.)  Upgrading might perhaps save me from
changing all the user passwords (half a dozen, not a huge problem) and
software packages I've added.

(uname -a says SunOS fsfs 5.11 snv_134 i86pc i386 i86pc).

Or should I just export my pool and do a from-scratch install of
something?  (Then recreate the users and install any missing software. 
I've got some cron jobs, too.)

AND, what something should I upgrade to or install?  I've tried a couple
of times to figure out the alternatives and it's never really clear to me
what my good options are.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 20:08, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Anatoly

 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

 I suggest watching zpool iostat before, during, and after the send to
 /dev/null.  Actually, I take that back - zpool iostat seems to measure
 virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k
 ops,
 which is at least 5-6x higher than my hard drive can handle, which can
 only
 mean it's reading a lot of previously aggregated small blocks from disk,
 which are now sequentially organized on disk.  How do you measure physical
 iops?  Is it just regular iostat?  I have seriously put zero effort into
 answering this question (sorry.)

 I have certainly noticed a delay in the beginning, while the system thinks
 about stuff for a little while to kick off an incremental... And it's
 acknowledged and normal that incrementals are likely fragmented all over
 the
 place so you could be IOPS limited (hence watching the iostat).

 Also, whenever I sit and watch it for long times, I see that it varies
 enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
 will be 5x higher...

 Whatever it is, it's something we likely are all seeing, but probably just
 ignoring.  If you can find it in your heart to just ignore it too, then
 great, no problem.  ;-)  Otherwise, it's a matter of digging in and
 characterizing to learn more about it.

I see rather variable io stats while sending incremental backups.  The
receiver is a USB disk, so fairly slow, but I get 30MB/s in a good
stretch.  I'm compressing the ZFS filesystem on the receiving end, but
much of my content is already-compressed photo files, so it doesn't make a
huge difference.   Helps some, though, and at 30MB/s there's no shortage
of CPU horsepower to handle the compression.

The raw files are around 12MB each, probably not fragmented much (they're
just copied over from memory cards).  For a small number of the files,
there's a photoshop file that's much bigger (sometimes more than 1GB, if
it's a stitched panorama with layers of changes).  And then there are
sidecar XMP files, mostly two per image, and for most of them
web-resolution images, 100kB.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove corrupt files from snapshot

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 10:07, sbre...@hotmail.com wrote:


 Would it make sense to do zfs scrub regularly and have a report sent,
 i.e. once a day, so discrepancy would be noticed beforehand? Is there
 anything readily available in the Freebsd ZFS package for this?

If you're not scrubbing regularly, you're losing out on one of the key
benefits of ZFS.  In nearly all fileserver situations, a good amount of
the content is essentially archival, infrequently accessed but important
now and then.  (In my case it's my collection of digital and digitized
photos.)

A weekly scrub combined with a decent backup plan will detect bit-rot
before the backups with the correct data cycle into the trash (and, with
redundant storage like mirroring or RAID, the scrub will probably be able
to fix the error without resorting to restoring files from backup).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 17:05, Anatoly wrote:
 Good day,

 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
 to 100+ disks in pool. But the speed doesn't vary in any degree. As I
 understand 'zfs send' is a limiting factor. I did tests by sending to
 /dev/null. It worked out too slow and absolutely not scalable.
 None of cpu/memory/disk activity were in peak load, so there is of room
 for improvement.

What you're probably seeing with incremental sends is that the disks being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that's changed since the last incremental send is
read, in TXG order.  So that's essentially random reads all of the disk.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding mirrors to an existing zfs-pool]

2011-07-28 Thread David Dyer-Bennet

On Tue, July 26, 2011 09:55, Cindy Swearingen wrote:

 Subject: Re: [zfs-discuss] Adding mirrors to an existing zfs-pool
 Date: Tue, 26 Jul 2011 08:54:38 -0600
 From: Cindy Swearingen cindy.swearin...@oracle.com
 To: Bernd W. Hennig consult...@hennig-consulting.com
 References: 342994905.11311662049567.JavaMail.Twebapp@sf-app1

 Hi Bernd,

 If you are talking about attaching 4 new disks to a non redundant pool
 with 4 disks, and then you want to detach the previous disks then yes,
 this is possible and a good way to migrate to new disks.

 The new disks must be the equivalent size or larger than the original
 disks.

 See the hypothetical example below.

 If you mean something else, then please provide your zpool status
 output.

 Thanks,

 Cindy


 # zpool status tank
   pool: tank
   state: ONLINE
   scan: resilvered 1018K in 0h0m with 0 errors on Fri Jul 22 15:54:52 2011
 config:

  NAMESTATE READ WRITE CKSUM
  tankONLINE   0 0 0
  c4t1d0  ONLINE   0 0 0
  c4t2d0  ONLINE   0 0 0
  c4t3d0  ONLINE   0 0 0
  c4t4d0  ONLINE   0 0 0


 # zpool attach tank c4t1d0 c6t1d0
 # zpool attach tank c4t2d0 c6t2d0
 # zpool attach tank c4t3d0 c6t3d0
 # zpool attach tank c4t4d0 c6t4d0

 The above syntax will create 4 mirrored pairs of disks.

I was somewhat surprised when I first learned of this.  In my head, I now
remember it as a single disk in ZFS seems to be treated as a one-disk
mirror.  Previously, in my head, single disks were very different objects
from mirrors!

I'm still impressed by the ability to attach and detach arbitrary numbers
of disks to mirrors.  It makes upgrading mirrored disks very very safe,
since I can perform the entire procedure without ever reducing redundancy
below my starting point (using the classic attach new, resilver, detach
old sequence, repeated for however many disks were in the original
mirror).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-26 Thread David Dyer-Bennet

On Mon, July 25, 2011 10:03, Orvar Korvar wrote:
 There is at least a common perception (misperception?) that devices
 cannot process TRIM requests while they are 100% busy processing other
 tasks.

 Just to confirm; SSD disks can do TRIM while processing other tasks?

Processing the request just means flagging the blocks, though, right? 
And the actual benefits only acrue if the garbage collection / block
reshuffling background tasks get a chance to run?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Quick zfs send -i performance questions

2011-05-04 Thread David Dyer-Bennet

On Tue, May 3, 2011 19:39, Rich Teer wrote:

 I'm playing around with nearline backups using zfs send | zfs recv.
 A full backup made this way takes quite a lot of time, so I was
 wondering: after the initial copy, would using an incremental send
 (zfs send -i) make the process much quick because only the stuff that
 had changed between the previous snapshot and the current one be
 copied?  Is my understanding of incremental zfs send correct?

Yes, that works.  In my setup, a full backup takes 6 hours (about 800GB of
data to an external USB 2 drive), the incremental maybe 20 minutes even if
I've added several gigabytes of images.

 Also related to this is a performance question.  My initial test involved
 copying a 50 MB zfs file system to a new disk, which took 2.5 minutes
 to complete.  The strikes me as being a bit high for a mere 50 MB;
 are my expectation realistic or is it just because of my very budget
 concious set up?  If so, where's the bottleneck?

In addition to issues others have mentiond, the way incremental send
works, it follows the order the blocks were written in rather than disk
order, so that can sometimes be bad.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Network video streaming [Was: Re: X4540 no next-gen product?]

2011-04-11 Thread David Dyer-Bennet
On 04/08/2011 07:22 PM, J.P. King wrote:

 No, I haven't tried a S7000, but I've tried other kinds of network
 storage and from a design perspective, for my applications, it doesn't
 even make a single bit of sense. I'm talking about high-volume
 real-time
 video streaming, where you stream 500-1000 (x 8Mbit/s) live streams
 from
 a machine over UDP. Having to go over the network to fetch the data
 from
 a different machine is kind of like building a proxy which doesn't
 really do anything - if the data is available from a different machine
 over the network, then why the heck should I just put another machine
 in
 the processing path? For my applications, I need a machine with as few
 processing components between the disks and network as possible, to
 maximize throughput, maximize IOPS and minimize latency and jitter.

Amusing history here -- the Thumper was developed at Kealia specifically
for their streaming video server.  Sun then bought them, and continued the
video server project until Oracle ate them (the Sun Streaming Video
Server).  That product supported 80,000 (not a typo) 4 megabit/sec video
streams if fully configured.  (Not off a single thumper, though, I don't
believe.)

However, there was a custom hardware board handling streaming, into
multiple line cards with multiple 10G optical ethernet interfaces.  And a
LOT of buffer memory; the card could support 2TB of RAM, though I believe
real installations were using 512GB.

Data got from the Thumpers to the streaming board over Ethernet, though. 
In big chunks -- 10MB maybe?  (Been a while; I worked on the user
interface level, but had little to do with the streaming hardware.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send/receive to Solaris/FBSD/OpenIndiana/Nexenta VM guest?

2011-04-06 Thread David Dyer-Bennet

On Tue, April 5, 2011 14:38, Joe Auty wrote:

 Migrating to a new machine I understand is a simple matter of ZFS
 send/receive, but reformatting the existing drives to host my existing
 data is an area I'd like to learn a little more about. In the past I've
 asked about this and was told that it is possible to do a send/receive
 to accommodate this, and IIRC this doesn't have to be to a ZFS server
 with the same number of physical drives?

The internal structure of the pool (how many vdevs, and what kind) is
irrelevant to zfs send / receive.  So I routinely send from a pool of 3
mirrored pairs of disks to a pool of one large drive, for example (it's
how I do my backups).   I've also gone the other way once :-( (It's good
to have backups).

I'm not 100.00% sure I understand what you're asking; does that answer it?

Mind you, this can be slow.  On my little server (under 1TB filled) the
full backup takes about 7 hours (largely because the single large external
drive is a USB drive; the bottleneck is the USB).  Luckily an incremental
backup is rather faster.

 How about getting a little more crazy... What if this entire server
 temporarily hosting this data was a VM guest running ZFS? I don't
 foresee this being a problem either, but with so much at stake I thought
 I would double check :) When I say temporary I mean simply using this
 machine as a place to store the data long enough to wipe the original
 server, install the new OS to the original server, and restore the data
 using this VM as the data source.

I haven't run ZFS extensively in VMs (mostly just short-lived small test
setups).  From my limited experience, and what I've heard on the list,
it's solid and reliable, though, which is what you need for that
application.

 Also, more generally, is ZFS send/receive mature enough that when you do
 data migrations you don't stress about this? Piece of cake? The
 difficulty of this whole undertaking will influence my decision and the
 whole timing of all of this.

A full send / receive has been reliable for a long time.  With a real
(large) data set, it's often a long run.  It's often done over a network,
and any network outage can break the run, and at that point you start
over, which can be annoying.  If the servers themselves can't stay up for
10 or 20 hours you presumably aren't ready to put them into production
anyway :-).

 I'm also thinking that a ZFS VM guest might be a nice way to maintain a
 remote backup of this data, if I can install the VM image on a
 drive/partition large enough to house my data. This seems like it would
 be a little less taxing than rsync cronjobs?

I'm a big fan of rsync, in cronjobs or wherever.  What it won't do is
properly preserve ZFS ACLs, and ZFS snapshots, though.  I moved from using
rsync to using zfs send/receive for my backup scheme at home, and had
considerable trouble getting that all working (using incremental
send/receive when there are dozens of snapshots new since last time).  But
I did eventually get up to recent enough code that it's working reliably
now.

If you can provision big enough data stores for your VM to hold what you
need, that seems a reasonable approach to me, but I haven't tried anything
much like it, so my opinion is, if you're very lucky, maybe worth what you
paid for it.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [illumos-Developer] zfs incremental send?

2011-03-30 Thread David Dyer-Bennet

On Tue, March 29, 2011 07:39, Richard Elling wrote:
 On Mar 29, 2011, at 3:10 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net
 wrote:

 - Original Message -
 On 2011-Mar-29 02:19:30 +0800, Roy Sigurd Karlsbakk
 r...@karlsbakk.net wrote:
 Is it (or will it) be possible to do a partial/resumable zfs
 send/receive? If having 30TB of data and only a gigabit link, such
 transfers takes a while, and if interrupted, will require a
 re-transmit of all the data.

 zfs send/receive works on snapshots: The smallest chunk of data that
 can be sent/received is the delta between two snapshots. There's no
 way to do a partial delta - defining the endpoint of a partial
 transfer or the starting point for resumption is effectively a
 snapshot.

 I know that's how it works, I'm merely pointing out that changing this
 to something resumable would be rather nice, since an initial transfer
 or 30 or 300 terabytes may easily be interrupted.

 In the UNIX tradition, the output and input are pipes. This allows you to
 add whatever transport you'd like for moving the bits. There are many that
 offer protection against network interruptions. Look for more, interesting
 developments in this area soon...

Name three :-).  I don't happen to have run into any that I can remember.

And in any case, that doesn't actually help my situation, where I'm
running both processes on the same box (the receive is talking to an
external USB disk that I disconnect and take off-site after the receive is
complete).  A system crash (or power shutdown, or whatever) during this
process seems to make the receiving pool unimportable.  Possibly I could
use recovery tricks to step back a TXG or two until I get something valid,
and then manually remove the snapshots added to get back to the initial
state, and then I could start the incremental again; in practice, I
haven't made that work, and just do another full send to start over (7
hours, not too bad really).

Anyway, the incremental send/receive seems to be the fragile point in my
backup scheme as well.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Good SLOG devices?

2011-03-02 Thread David Dyer-Bennet

On Tue, March 1, 2011 16:32, Rocky Shek wrote:
 David,

 STEC/DataON ZeusRAM(Z4RZF3D-8UC-DNS) SSD now available for users in
 channel.

 It is 8GB DDR3 RAM based SAS SSD protected by supercapacitor and NVRAM
 16GB.

 It is designed for ZFS ZIL with low latency

 http://dataonstorage.com/zeusram

Says call for price.  I know what that means, it means If you have to
ask, you can't afford it.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Good SLOG devices?

2011-03-02 Thread David Dyer-Bennet

On Tue, March 1, 2011 10:35, Garrett D'Amore wrote:

   a) do you need an SLOG at all?  Some workloads (asynchronous ones) will
 never benefit from an SLOG.

I've been fighting the urge to maybe do something about ZIL (which is what
we're talking about here, right?).  My load is CIFS, not NFS (so not
synchronous, right?), but there are a couple of areas that are significant
to me where I do decent-size (100MB to 1GB) sequential writes (to
newly-created files).  On the other hand, when those writes seem to me to
be going slowly, the disk access lights aren't mostly on, suggesting that
the disk may not be what's holding me up.  I can test that by saving to
local disk and comparing times, also maybe running zpool iostat.

This is a home system, lightly used; the performance issue is me sitting
waiting while big Photoshop files save.  So of some interest to me
personally, and not at ALL like what performance issues on NAS usually
look like.  It's on a UPS, so I'm not terribly worried about losses on
power failure; and I'd just lose my work since the last save, generally,
at worst.

I might not believe the disk access lights on the box (Chenbro chassis,
with two 4-drive hot-swap bays for the data disks; driven off the
motherboard  SATA plus a Supermicro 8-port SAS controller with SAS-to-SATA
cables).  In doing a drive upgrade just recently, I got rather confusing
results with the lights, perhaps the controller or the drive model made a
difference in when the activity lights came on.

The VDEVs in the pool are mirror pairs.  It's been expanded twice by
adding VDEVs and once by replacing devices in one VDEV.  So the load is
probably fairly unevenly spread across them just now.  My desktop connects
to this server over gigabit ethernet (through one switch; the boxes sit
next to each other on a shelf over my desk).

I'll do more research before spending money.  But as a question of general
theory, should a decent separate intent log device help for a single-user
sequential write sequence in the 100MB to 1GB size range?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive i/o anomaly

2011-02-09 Thread David Dyer-Bennet

On Wed, February 9, 2011 04:51, Matt Connolly wrote:

 Nonetheless,  I still find it odd that the whole io system effectively
 hangs up when one drive's queue fills up. Since the purpose of a mirror is
 to continue operating in the case of one drive's failure, I find it
 frustrating that the system slows right down so much because one drive's
 i/o queue is full.

I see what you're saying.  But I don't think mirror systems really try to
handle asymmetric performance.  They either treat the drives equivalently,
or else they decide one of them is broken and don't use it at all.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/Drobo (Newbie) Question

2011-02-08 Thread David Dyer-Bennet

On Mon, February 7, 2011 14:59, David Dyer-Bennet wrote:

 On Sat, February 5, 2011 11:54, Gaikokujin Kyofusho wrote:
 Thank you kebabber. I will try out indiana and virtual box to play
 around
 with it a bit.

 Just to make sure I understand your example, if I say had a 4x2tb
 drives,
 2x750gb, 2x1.5tb drives etc then i could make 3 groups (perhaps 1 raidz1
 +
 1 mirrored + 1 mirrored), in terms of accessing them would they just be
 mounted like 3 partitions or could it all be accessed like one big
 partition?

 A ZFS pool can contain many vdevs; you could put the three groups you
 describe into one pool, and then assign one (or more) file-systems to that
 pool.  Putting them all in one pool seems to me the natural way to handle
 it; they're all similar levels of redundancy.  It's more flexible to have
 everything in one pool, generally.

 (You could also make separate pools; my experience, for what it's worth,
 argues for making pools based on redundancy and performance (and only
 worry about BIG differences), and assign file-systems to pools based on
 needs for redundancy and performance.  And for my home system I just have
 one big data pool, currently consisting of 1x1TB, 2x400GB, 2x400GB, plus
 1TB hot spare.)

Typo; I don't in fact have a non-redundant vdev in my main data pool! 
It's *2*x1TB at the start of that list.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/Drobo (Newbie) Question

2011-02-08 Thread David Dyer-Bennet

On Tue, February 8, 2011 13:03, Roy Sigurd Karlsbakk wrote:
 Or you could stick strictly to mirrors; 4 pools 2x2T, 2x2T, 2x750G,
 2x1.5T. Mirrors are more flexible, give you more redundancy, and are
 much easier to work with.

 Easier to work with, yes, but a RAIDz2 will statistically be safer than a
 set of mirrors, since in many cases, you loose a drive and during
 resilver, you find bad sectors on another drive in the same VDEV,
 resulting in data corruption. With RAIDz2 (or 3), the chance of these
 errors to be on the same place on all drives is quite minimal. With a
 (striped?) mirror, a single bitflip on the 'healthy' drive will involve
 data corruption.

Wait, are you saying that the handling of errors in RAIDZ and mirrors is
completely different?  That it dumps the mirror disk immediately, but
keeps trying to get what it can from the RAIDZ disk?  Because otherwise,
you assertion doesn't seem to hold up.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/Drobo (Newbie) Question

2011-02-08 Thread David Dyer-Bennet

On 2011-02-08 21:39, Brandon High wrote:

On Tue, Feb 8, 2011 at 12:53 PM, David Dyer-Bennetd...@dd-b.net  wrote:

Wait, are you saying that the handling of errors in RAIDZ and mirrors is
completely different?  That it dumps the mirror disk immediately, but
keeps trying to get what it can from the RAIDZ disk?  Because otherwise,
you assertion doesn't seem to hold up.


I think he meant that if one drive in a mirror dies completely, then
any single read error on the remaining drive is not recoverable.

With raidz2 (or a 3-way mirror for that matter), if one drive dies
completely, you still have redundancy.


Sure, a 2-way mirror has only 100% redundancy; if one dies, no more 
redundancy.  Same for a RAIDZ -- if one dies, no more redundancy.  But a 
4-drive RAIDZ has roughly twice the odds of a 2-drive mirror of having a 
drive die. And sure, a RAIDZ two has more redundancy -- as does a 3-way 
mirror.


Or a 48-way mirror (I read a report from somebody who mirrored all the 
drives in a Thumper box, just to see if he could).


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replace block devices to increase pool size

2011-02-07 Thread David Dyer-Bennet

On Sun, February 6, 2011 08:41, Achim Wolpers wrote:

 I have a zpool biult up from two vdrives (one mirror and one raidz). The
 raidz is built up from 4x1TB HDs. When I successively replace each 1TB
 drive with a 2TB drive will the capacity of the raidz double after the
 last block device is replaced?

You may have to manually set property autoexpand=on; I found yesterday
that I had to (in my case on a mirror that I was upgrading).  Probably
depends on what version you created things at and/or what version you're
running now.

I replaced the drives in one of the three mirror vdevs in my main pool
over this last weekend, and it all went quite smoothly, but I did have to
turn on autoexpand at the end of the process to see the new space.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 64, Issue 13

2011-02-07 Thread David Dyer-Bennet

On Sun, February 6, 2011 13:01, Michael Armstrong wrote:
 Additionally, the way I do it is to draw a diagram of the drives in the
 system, labelled with the drive serial numbers. Then when a drive fails, I
 can find out from smartctl which drive it is and remove/replace without
 trial and error.

Having managed to muddle through this weekend without loss (though with a
certain amount of angst and duplication of efforts), I'm in the mood to
label things a bit more clearly on my system :-).

smartctl doesn't seem to be on my system, though.  I'm running
snv_134.  I'm still pretty badly lost in the whole repository /
package thing with Solaris, most of my brain cells were already
occupied with Red Hat, Debian, and Perl package information :-( .
Where do I look?

Are the controller port IDs, the C9T3D0 things that ZFS likes,
reasonably stable?  They won't change just because I add or remove
drives, right; only maybe if I change controller cards?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Newbie question

2011-02-07 Thread David Dyer-Bennet

On Sat, February 5, 2011 03:54, Gaikokujin Kyofusho wrote:

 From what I understand using ZFS one could setup something like RAID 6
 (RAID-Z2?) but with the ability to use drives of varying
 sizes/speeds/brands and able to add additional drives later. Am I about
 right? If so I will continue studying up on this if not then I guess I
 need to continue exploring different options. Thanks!!

IMHO, your best bet for this kind of configuration is to use mirror pairs,
not RAIDZ*.  Because...

Things you can't do with RAIDZ*:

You cannot remove a vdev from a pool.

You cannot make a RAIDZ* vdev smaller (fewer disks).

You cannot make a RAIDZ* vdev larger (more disks).

To increase the storage capacity of a RAIDZ* vdev you need to replace all
the drives, one at a time, waiting for resilver between replacements
(resilver times can be VERY long with big modern drives).  And during each
resilver, your redundancy will be reduced by 1 -- meaning a RAIDZ array
would have NO redundancy during the resilver.  (And activity in the pool
is high during the resilver -- meaning the chances of any marginal drive
crapping out are higher than normal during the resilver.)

With mirrors, you can add new space by adding simply two drives (add a new
mirror vdev).

You can upgrade an existing mirror by replacing only two drives.

You can upgrade an existing mirror without reducing redundancy below your
starting point ever -- you attach a new drive, wait for the resilver to
complete (at this point you have a three-way mirror), then detach one of
the original drives; repeat for another new drive and the other original
drive.

Obviously, using mirrors requires you to buy more drives for any given
amount of usable space.

I must admit that my 8-bay hot-swap ZFS server cost me a LOT more than a
Drobo (but then I bought in 2006, too).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/Drobo (Newbie) Question

2011-02-07 Thread David Dyer-Bennet

On Sat, February 5, 2011 11:54, Gaikokujin Kyofusho wrote:
 Thank you kebabber. I will try out indiana and virtual box to play around
 with it a bit.

 Just to make sure I understand your example, if I say had a 4x2tb drives,
 2x750gb, 2x1.5tb drives etc then i could make 3 groups (perhaps 1 raidz1 +
 1 mirrored + 1 mirrored), in terms of accessing them would they just be
 mounted like 3 partitions or could it all be accessed like one big
 partition?

A ZFS pool can contain many vdevs; you could put the three groups you
describe into one pool, and then assign one (or more) file-systems to that
pool.  Putting them all in one pool seems to me the natural way to handle
it; they're all similar levels of redundancy.  It's more flexible to have
everything in one pool, generally.

(You could also make separate pools; my experience, for what it's worth,
argues for making pools based on redundancy and performance (and only
worry about BIG differences), and assign file-systems to pools based on
needs for redundancy and performance.  And for my home system I just have
one big data pool, currently consisting of 1x1TB, 2x400GB, 2x400GB, plus
1TB hot spare.)

Or you could stick strictly to mirrors; 4 pools 2x2T, 2x2T, 2x750G,
2x1.5T.  Mirrors are more flexible, give you more redundancy, and are much
easier to work with.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding directio, O_DSYNC and zfs_nocacheflush on ZFS

2011-02-07 Thread David Dyer-Bennet

On Mon, February 7, 2011 14:49, Yi Zhang wrote:
 On Mon, Feb 7, 2011 at 3:14 PM, Bill Sommerfeld sommerf...@alum.mit.edu
 wrote:
 On 02/07/11 11:49, Yi Zhang wrote:

 The reason why I
 tried that is to get the side effect of no buffering, which is my
 ultimate goal.

 ultimate = final.  you must have a goal beyond the elimination of
 buffering in the filesystem.

 if the writes are made durable by zfs when you need them to be durable,
 why
 does it matter that it may buffer data while it is doing so?

                                                -
 Bill

 If buffering is on, the running time of my app doesn't reflect the
 actual I/O cost. My goal is to accurately measure the time of I/O.
 With buffering on, ZFS would batch up a bunch of writes and change
 both the original I/O activity and the time.

I'm not sure I understand what you're trying to measure (which seems to be
your top priority).  Achievable performance with ZFS would be better using
suitable caching; normally that's the benchmark statistic people would
care about.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA)

2011-02-06 Thread David Dyer-Bennet

On 2011-02-06 05:58, Orvar Korvar wrote:

Will this not ruin the zpool? If you overwrite one of discs in the zpool won't 
the zpool go broke, so you need to repair it?


Without quoting I can't tell what you think you're responding to, but 
from my memory of this thread, I THINK you're forgetting how dd works. 
The dd commands being proposed to create drive traffic are all read-only 
accesses, so they shouldn't damage anything


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Identifying drives (SATA), question about hot spare allocation

2011-02-06 Thread David Dyer-Bennet

Following up to myself, I think I've got things sorted, mostly.

1.  The thing I was most sure of, I was wrong about.  Some years back, I 
must have split the mirrors so that they used different brand disks.  I 
probably did this, maybe even accidentally, when I had to restore from 
backups at one point.   I suppose I could have physically labeled the 
carriers...no, that's crazy talk!


2.  The dd trick doesn't produce reliable activity light activation in 
my system.  I think some of the drives and/or controllers only turn on 
the activity light for writes.


3.  However, in spite of all this, I have replaced the disks in mirror-0 
with the bigger disks (via attach-new-resilver-detach-old), and added 
the third drive I bought as a hot spare.  All without having to restore 
from backups.


4.  AND I know which physical drive the detached 400GB drive is.  It 
occurs to me I could make that a second hot spare -- there are 4 
remaining 400GB drives in the pool, so it's useful for 2/3 of the 
failures by drive count.


Leading to a new question -- is ZFS smart about hot spare sizes?  Will 
it skip over too-small drives?  Will it, even better, prefer smaller 
drives to larger so long as they are big enough (thus leaving the big 
drives for bigger failures)?


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Identifying drives (SATA)

2011-02-05 Thread David Dyer-Bennet
I've got a small home fileserver, Chenowith case with 8 hot-swap bays. 
Of course, at this level, I don't have cute little lights next to each 
drive that the OS knows about and can control to indicate things to me.


The configuration I think I have is three mirror pairs.  I've got 
motherboard SATA connections, and an add-in SAS card with SAS-to-SATA 
cabling (all drives are SATA), and I've tried to wire it so each mirror 
is split across the two controllers.  However -- the old disks were 
already a pool before.  So if I put them in the wrong physical slots, 
when I imported the pool it would have still found them.  So I could 
have the disks in slots that aren't what I expected, without knowing it.


I'm planning to upgrade the first mirror by attaching new, larger, 
drives, letting the resilver finish, and eventually detaching the old 
drives.  I just installed the first new drive, located what controller 
it was on, and typed an attach command that did what I wanted:


bash-4.0$ zpool status zp1
  pool: zp1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h4m, 3.13% done, 2h5m to go
config:

NAMESTATE READ WRITE CKSUM
zp1 ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c9t3d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c9t5d0  ONLINE   0 0 0  14.0G resilvered
  mirror-1  ONLINE   0 0 0
c9t4d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0
  mirror-2  ONLINE   0 0 0
c9t2d0  ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0

errors: No known data errors

As you can see, the new drive being resilvered is in fact associated 
with the first mirror, as I had intended.  (The old drives in the first 
mirror are older than in the second two, and all three are the same 
size, so that's definitely the one to replace first.)


HOWEVER...the activity lights on the drives aren't doing what I expect. 
 The activity light on the new drive is on pretty solidly (that I 
expected), but the OTHER activity puzzles me.  (User activity is so 
close to nil that I'm quite confident that's not confusing me; 95% + of 
the access right now is the resilver.  Besides, usage could light up 
other drives, but it couldn't turn off the lights on the ones being 
resilvered.)


At first, I saw the second drive in the rack light up.  I believe that 
to be c5t1d0, the second disk in mirror-0, and it's the drive I 
specified for the old drive in the attach command.


However, soon I started seeing the fourth drive in the rack light up.  I 
believe that to be c6t1d0; part of mirror-1, and thus having no place in 
this resilver.  It remained active.  And after a while, the second drive 
activity light went off.  For some minutes now, I've been seeing 
activity ONLY on the new drive, and on drive 4 (the one I don't think is 
part of mirror 0).


The activity lights aren't connected by separate cables, so I don't see 
how I could have them hooked up differently from the disks.


It's clear from zpool status that I have attached the new drive to the 
right mirror.  So things are fine for now, I can let the resilver run to 
completion.   I can detach one of the old drives fine, because that's 
done with logical names, and those are shown in zpool status, so I have 
no doubt which logical names are the old drives in mirror 0.


However, eventually it will be time to physically remove the old drives. 
 If I remove only one at a time, I shouldn't cause a disaster even if 
I pull the wrong one, and I can tell by checking spool status right away 
whether I pulled the right or wrong one.  But this gets me into what I 
regard as risky territory -- if I pull a live drive, I'm going to 
suddenly need to know the commands needed to reattach it.  Can somebody 
point me at clear examples of that (or post them)?


I just found zpool iostat -v; now that I'm seeing traffic on the 
individual drives in the pool, it's clearly reading from both the old 
drives, and writing to the new drive, exactly as expected.  But only one 
activity light is lit on any of the old drives.


Is there a clever way to figure out which drive is which?  And if I have 
to fall back on removing a drive I think is right, and seeing if that's 
true, what admin actions will I have to perform to get the pool back to 
safety?  (I've got backups, but it's a pain to restore of course.) 
(Hmmm; in single-user mode, use dd to read huge chunks of one disk, and 
see which lights come on?  Do I even need to be in single-user mode to 
do that?)

--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net

[zfs-discuss] /dev/dsk files missing

2011-02-05 Thread David Dyer-Bennet
And devfsadm doesn't create them.  Am I looking at the wrong program, or 
what?

--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drive id confusion

2011-02-05 Thread David Dyer-Bennet
Solaris and/or ZFS are badly confused about drive IDs.  The c5t0d0 
names are very far removed from the real world, and possibly they've 
gotten screwed up somehow.  Is devfsadm supposed to fix those, or does 
it only delete excess?


Reason I believe it's confused:

zpool status shows mirror-0 on c9t3d0, c9t2d0, and c9t5d0.  But format 
shows the one remaining Seagate 400GB drive at c5t0d0 (my initial pool 
was two of those; I replaced one with a Samsung 1TB earlier today).  Now 
the mirror with three drives in is my very first mirror, which has to 
have the one remaining Seagate drive in it (given that I removed one 
Seagate drive; otherwise I could be confused about order of creation vs. 
mirror numbering).


I'm thinking either Solaris' appalling mess of device files is somehow 
scrod, or else ZFS is confused in its reporting (perhaps because of 
cache file contents?).  Is there anything I can do about either of 
these?  Does devfsadm really create the apporpirate /dev/dsk and etc. 
files based on what's present?  Would deleting the cache file while the 
pool is exported, and then searching for and importing the pool, help?


How worried should I be?  (I've got current backups).
--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] BOOT, ZIL, L2ARC one one SSD?

2011-01-06 Thread David Dyer-Bennet

On Thu, December 23, 2010 22:45, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bill Werner

 on a single 60GB SSD drive, use FDISK to create 3 physical partitions, a
 20GB
 for boot, a 30GB for L2ARC and a 10GB for ZIL?   Or is 3 physical
 Solaris
 partitions on a disk not considered the entire disk as far as ZFS is
 concerned?

 You can do that.  Other people have before.  But IMHO, it demonstrates a
 faulty way of thinking.

 SSD's are big and cheap now, so I can buy one of these high performance
 things, and slice it up!  In all honesty, GB availability is not your
 limiting factor.  Speed is your limiting factor.  That's the whole point
 of
 buying the thing in the first place.  If you have 3 SSD's, they're each
 able
 to talk 3Gbit/sec at the same time.  But if you buy one SSD which is 3x
 larger, you save money but you get 1/3 the speed.

Boot, at least, largely doesn't overlap with any significant traffic to
ZIL, for example.

And where I come from, even at work, money doesn't grow on trees.  Sure,
three separate SSDs will clearly perform better.  They will also cost 3x
as much.  (Or more, if you don't have three free bays and controller
ports.)

The question we often have to address is, what's the biggest performance
increase we can get for $500.  I considered multiple rotating disks vs.
one SSD for that reason, for example.

Yeah, anybody quibbling about $500 isn't building top-performance
enterprise-grade storage.  We do know this.  It's still where a whole lot
of us live -- especially those running a home NAS.

 That's not to say there's never a situation where it makes sense.  Other
 people have done it, and maybe it makes sense for you.  But probably not.

Yeah, okay, maybe we're not completely disagreeing.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] file-inherit and dir-inherit at toplevel of ZFS CIFS share

2010-10-25 Thread David Dyer-Bennet
It looks like permissions don't descend properly from the top-level share
in CIFS; I had to set them on the next level down to get the intended
results (including on lower levels; they seem to inherit properly from the
second level, just not from the top).  Is this a known behavior, or am I
confused and setting myself up for trouble later?

More broadly, is there anything good about best practices for using ACLS
with ZFS and CIFS shares?  For example, there are so many defined
attributes, some of them with the same short-form letter (I think one is
for directories and one is for files in that case, but that's not
documented that I can find), that I find myself wondering what standard
bundles of permissions would be useful.   Is it generally better to have
separate permissions to inherit for files and directories, or can most
things you want be accomplished with just one?

Back to specifics again -- I was running into a problem where a user on
the Solaris box could rename a file or directory, but an XP box
authenticating as the same user could not.  This was the one that seemed
to be solved by setting the permissions again one level down (dunno what
happens with new top-level items yet).  Is this normal behavior of
something that makes sense?  It's terribly weird.  (In windows, I could
right-click and create the new directory or whatever, but when I then
filled in the name I wanted and hit enter, I got a permission error.  I
could just leave it named new directory, though.  And I could rename it
on the Linux side as the same user that failed to rename it from the
Windows side.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Balancing LVOL fill?

2010-10-20 Thread David Dyer-Bennet

On Wed, October 20, 2010 04:24, Tuomas Leikola wrote:

 I wished for a more aggressive write balancer but that may be too much
 to ask for.

I don't think it can be too much to ask for.  Storage servers have long
enough lives that adding disks to them is a routine operation; to the
extent that that's a problem, that really needs to be fixed.

However, it's not the sort of thing one should hold one's breath waiting for!

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-11 Thread David Dyer-Bennet

On Fri, October 8, 2010 04:47, Stephan Budach wrote:
 So, I decided to give tar a whirl, after zfs send encountered the next
 corrupted file, resulting in an I/O error, even though scrub ran
 successfully w/o any erors.

I must say that this concept of scrub running w/o error when corrupted
files, detectable to zfs send, apparently exist, is very disturbing. 
Background scrubbing, and the block checksums to make it more meaningful
than just reading the disk blocks, was the key thing that drew me into
ZFS, and this seems to suggest that it doesn't work.

Does your sequence of tests happen to provide evidence that the problem
isn't new errors appearing, sometimes after a scrub and before the send? 
For example, have you done 1) scrub finds no error, 2) send finds error,
3) scrub finds no error?  (with nothing in between that could have cleared
or fixed the error).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread David Dyer-Bennet

On Tue, October 5, 2010 17:20, Richard Elling wrote:
 On Oct 5, 2010, at 2:06 PM, Michael DeMan wrote:

 On Oct 5, 2010, at 1:47 PM, Roy Sigurd Karlsbakk wrote:

 Well, here it's about 60% up and for 150 drives, that makes a wee
 difference...

 Understood on 1.6  times cost, especially for quantity 150 drives.

 One service outage will consume far more in person-hours and downtime than
 this little bit of money.  Penny-wise == Pound-foolish?

That looks to be true, yes (going back to the actual prices, 150 drives
would cost $6000 extra for the enterprise versions).

It's still quite annoying to be jerked around by people charging 60% extra
for changing a timeout in the firmware, and carefully making it NOT
user-alterable.

Also, the non-TLER versions are a constant threat to anybody running home
systems, who might quite reasonably think they could put those in a home
server.

(Yeah, I know the enterprise versions have other differences.  I'm not
nearly so sure I CARE about the other differences, in the size servers I'm
working with.)
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread David Dyer-Bennet

On Tue, October 5, 2010 16:47, casper@sun.com wrote:


My immediate reaction to this is time to avoid WD drives for a while;
until things shake out and we know what's what reliably.

But, um, what do we know about say the Seagate Barracuda 7200.12 ($70),
the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5
($70)?


 I've seen several important features when selecting a drive for
 a mirror:

   TLER (the ability of the drive to timeout a command)

I went and got what detailed documentation I could on a couple of the
Seagate drives last night, and I couldn't find anything on how they
behaved in that sort of error cases.  (I believe TLER is a WD-specific
term, but I didn't just search, I read them through.)

So that's inconvenient.  How do we find out about that sort of thing?

   sector size (native vs virtual)

Richard Elling said ZFS handles the 4k real 512byte fake drives okay now
in default setups; but somebody immediately asked for version info, so I'm
still watching this one.

   power use (specifically at home)

Hadn't thought about that.  But when I'm upgrading drives, I figure I'm
always going to come out better on power than when I started.

   performance (mostly for work)

I can't bring myself to buy below 7200RPM, but it's probably foolish
(except that other obnoxious features tend to come in the green drives).

   price

Yeah, well.  I'm cheap.

 I've heard scary stories about a mismatch of the native sector size and
 unaligned Solaris partitions (4K sectors, unaligned cylinder).

So have I.  Sounds like you get read-modify-write actions for non-aligned
accesses.

I hope the next generation of drives admit to being 4k sectors, and that
ZFS will be prepared to use them sensibly.  But I'm not sure I'm willing
to wait for that; the oldest drives in my box are now 4 years old, and I'm
about ready for the next capacity upgrade.

 I was pretty happen with the WD drives (except for the one with a
 seriously
 broken cache) but I see the reasons to not to pick WD drives over the 1TB
 range.

And the big ones are what pretty much everybody is using at home. 
Capacity and price are vastly more important than performance for most of
us.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Increase size of 2-way mirror

2010-10-06 Thread David Dyer-Bennet

On Wed, October 6, 2010 14:14, Tony MacDoodle wrote:
 Is it possible to add 2 disks to increase the size of the pool below?

 NAME STATE READ WRITE CKSUM
   testpool ONLINE 0 0 0
 mirror-0 ONLINE 0 0 0
 c1t2d0 ONLINE 0 0 0
 c1t3d0 ONLINE 0 0 0
 mirror-1 ONLINE 0 0 0
 c1t4d0 ONLINE 0 0 0
 c1t5d0 ONLINE 0 0 0

You have two ways to increase the size of this pool (sanely).

First, you can add a third mirror vdev.  I think that's what you're
specifically asking about.  You do this with the zpool add ... command,
see man page.

Second, you can add (zpool attach) two larger disks to one of the existing
mirror vdevs, wait until the resilvers have finished, and then detach the
two original (smaller) disks.  At that point (with recent versions; with
older versions you have to set a property) the vdev will expand to use the
full capacity of the new larger disks, and that space will become
available in the pool.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] TLER and ZFS

2010-10-05 Thread David Dyer-Bennet

On Tue, October 5, 2010 15:30, Roy Sigurd Karlsbakk wrote:


 I just discovered WD Black drives are rumored not to be set to allow TLER.
 Does anyone know how much performance impact the lack of TLER might have
 on a large pool? Choosing Enterprise drives will cost about 60% more, and
 on a large install, that means a lot of money...

My immediate reaction to this is time to avoid WD drives for a while;
until things shake out and we know what's what reliably.

But, um, what do we know about say the Seagate Barracuda 7200.12 ($70),
the SAMSUNG Spinpoint F3 1TB ($75), or the HITACHI Deskstar 1TB 3.5
($70)?

This is not a completely theoretical question to me; it's getting on
towards time to at least consider replacing my oldest mirrored pair; those
are 400GB Seagate, I think, dating from 2006.  I'd want something at least
twice as big (to make the space upgrade worthwhile), and I'm expecting to
buy three of them rather than just two because I think it's time to add a
hot spare to the system (currently 3 pair of data disks, and I've got two
more bays; I think a hot spare is a better use for them than a fourth
pair; safety of the data is very important, performance is adequate, and I
need a modest capacity upgrade, but the whole pool is currently 1.2TB
usable, not large).

On the third hand, there's the Barracuda 7200.11 1.5TB for only $75, which
is a really small price increment for a big space increment.

The WD RE3 1TB is $130 (all these prices are from Newegg just now). 
That's very close to TWICE the price of the competing 1TB drives.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When Zpool has no space left and no snapshots

2010-09-29 Thread David Dyer-Bennet

On Wed, September 22, 2010 21:25, Aleksandr Levchuk wrote:

 I ran out of space, consequently could not rm or truncate files. (It
 make sense because it's a copy-on-write and any transaction needs to
 be written to disk. It worked out really well - all I had to do is
 destroy some snapshots.)

 If there are no snapshots to destroy, how to prepare for a situation
 when a ZFS pool looses it's last free byte?

Add some more space somewhere around 90%, or earlier :-).

If you do get stuck,  you can add another vdev when full, too. Just
remember that you're stuck with whatever you add forever, since there's
no way to remove a vdev from a pool.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] When Zpool has no space left and no snapshots

2010-09-29 Thread David Dyer-Bennet

On Wed, September 29, 2010 15:17, Matt Cowger wrote:
 You can truncate a file:

 Echo   bigfile

 That will free up space without the 'rm'

Copy-on-write; the new version gets written to the disk before the old
version is released, it doesn't just overwrite.  AND, if it's in any
snapshots, the old version doesn't get released.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread David Dyer-Bennet

On Thu, September 23, 2010 01:33, Alexander Skwar wrote:
 Hi.

 2010/9/19 R.G. Keen k...@geofex.com

 and last-generation hardware is very, very cheap.

 Yes, of course, it is. But, actually, is that a true statement? I've read
 that it's *NOT* advisable to run ZFS on systems which do NOT have ECC
 RAM. And those cheapo last-gen hardware boxes quite often don't have
 ECC, do they?

Last-generation server hardware supports ECC, and was usually populated
with ECC.  Last-generation desktop hardware rarely supports ECC, and was
even more rarely populated with ECC.

The thing is, last-generation server hardware is, um, marvelously adequate
for most home setups (the problem *I* see with it, for many home setups,
is that it's *noisy*).  So, if you can get it cheap in a sound-level that
fits your needs, that's not at all a bad choice.

I'm running a box I bought new as a home server, but it's NOW at least
last-generation hardware (2006), and it's still running fine; in
particular the CPU load remains trivial compared to what the box supports
(not doing compression or dedup on the main data pool, though I do
compress the backup pools on external USB disks).  (It does have ECC; even
before some of the cases leading to that recommendation were explained on
that list, I just didn't see the percentage in not protecting the memory.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-16 Thread David Dyer-Bennet

On Wed, September 15, 2010 16:18, Edward Ned Harvey wrote:

 For example, if you start with an empty drive, and you write a large
 amount
 of data to it, you will have no fragmentation.  (At least, no significant
 fragmentation; you may get a little bit based on random factors.)  As life
 goes on, as long as you keep plenty of empty space on the drive, there's
 never any reason for anything to become significantly fragmented.

Sure, if only a single thread is ever writing to the disk store at a time.

This situation doesn't exist with any kind of enterprise disk appliance,
though; there are always multiple users doing stuff.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-14 Thread David Dyer-Bennet
The difference between multi-user thinking and single-user thinking is
really quite dramatic in this area.  I came up the time-sharing side
(PDP-8, PDP-11, DECSYSTEM-20); TOPS-20 didn't have any sort of disk
defragmenter, and nobody thought one was particularly desirable, because
the normal access pattern of a busy system was spread all across the disk
packs anyway.

On a desktop workstation, it makes some sense to think about loading big
executable files fast -- that's something the user is sitting there
waiting for, and there's often nothing else going on at that exact moment.
 (There *could* be significant things happening in the background, but
quite often there aren't.)  Similarly, loading a big document
(single-file book manuscript, bitmap image, or whatever) happens at a
point where the user has requested it and is waiting for it right then,
and there's mostly nothing else going on.

But on really shared disk space (either on a timesharing system, or a
network file server serving a good-sized user base), the user is competing
for disk activity (either bandwidth or IOPs, depending on the access
pattern of the users).  Generally you don't get to load your big DLL in
one read -- and to the extent that you don't, it doesn't matter much how
it's spread around the disk, because the head won't be in the same spot
when you get your turn again.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver = defrag?

2010-09-13 Thread David Dyer-Bennet

On Mon, September 13, 2010 07:14, Edward Ned Harvey wrote:
 From: Richard Elling [mailto:rich...@nexenta.com]

 This operational definition of fragmentation comes from the single-
 user,
 single-tasking world (PeeCees). In that world, only one thread writes
 files
 from one application at one time. In those cases, there is a reasonable
 expectation that a single file's blocks might be contiguous on a single
 disk.
 That isn't the world we live in, where have RAID, multi-user, or multi-
 threaded
 environments.

 I don't know what you're saying, but I'm quite sure I disagree with it.

 Regardless of multithreading, multiprocessing, it's absolutely possible to
 have contiguous files, and/or file fragmentation.  That's not a
 characteristic which depends on the threading model.

 Also regardless of raid, it's possible to have contiguous or fragmented
 files.  The same concept applies to multiple disks.

The attitude that it *matters* seems to me to have developed, and be
relevant only to, single-user computers.

Regardless of whether a file is contiguous or not, by the time you read
the next chunk of it, in the multi-user world some other user is going to
have moved the access arm of that drive.  Hence, it doesn't matter if the
file is contiguous or not.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Configuration questions for Home File Server (CPU cores, dedup, checksum)?

2010-09-13 Thread David Dyer-Bennet

On Tue, September 7, 2010 15:58, Craig Stevenson wrote:

 3.  Should I consider using dedup if my server has only 8Gb of RAM?  Or,
 will that not be enough to hold the DDT?  In which case, should I add
 L2ARC / ZIL or am I better to just skip using dedup on a home file server?

I would not consider using dedup in the current state of the code.  I hear
too many horror stories.

Also, why do you think you'd get much benefit?  It takes pretty big blocks
of exact bit-for-bit duplication to actually trigger the code, and you're
not going to find them in compressed image (including motion picture /
video) or audio files, for example (the main things that take up much
space on most home servers).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage server hardwae

2010-08-26 Thread David Dyer-Bennet

On Thu, August 26, 2010 13:58, Tom Buskey wrote:

 I usually see 17 MB/s max on an external USB 2.0 drive.

Interesting; I routinly see 27 MB/s peaking to 30 MB/s on the cheap WD 1TB
external drives I use for backups.  (Backup is probably best case, the
only user of that drive is a zfs receive process.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread David Dyer-Bennet

On Mon, August 16, 2010 10:48, Joerg Schilling wrote:
 Ray Van Dolson rvandol...@esri.com wrote:

  I absolutely guarantee Oracle can and likely already has
  dual-licensed BTRFS.

 Well, Oracle obviously would want btrfs to stay as part of the Linux
 kernel rather than die a death of anonymity outside of it...

 As such, they'll need to continue to comply with GPLv2 requirements.

 No, there is definitely no need for Oracle to comply with the GPL as they
 own the code.

Ray's point is, how long would BTRFS remain in the Linux kernel in that case?
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread David Dyer-Bennet

On Mon, August 16, 2010 10:43, Joerg Schilling wrote:
 David Dyer-Bennet d...@dd-b.net wrote:


 On Sun, August 15, 2010 20:44, Peter Jeremy wrote:

  Irrespective of the above, there is nothing requiring Oracle to
 release
  any future btrfs or ZFS improvements (or even bugfixes).  They can't
  retrospectively change the license on already released code but they
  can put a different (non-OSS) license on any new code.

 That's true.

 However, if Oracle makes a binary release of BTRFS-derived code, they
 must
 release the source as well; BTRFS is under the GPL.

 This claim would only be true in case that Oracle does not own the
 copyright
 on its' code...

Oops, yeah, you're right there; the copyright holder can grant additional
licenses and do things itself.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread David Dyer-Bennet

On Mon, August 16, 2010 11:01, Joerg Schilling wrote:
 David Dyer-Bennet d...@dd-b.net wrote:

  As such, they'll need to continue to comply with GPLv2 requirements.
 
  No, there is definitely no need for Oracle to comply with the GPL as
 they
  own the code.

 Ray's point is, how long would BTRFS remain in the Linux kernel in that
 case?

 Such a license change can happen at any time. The Linux folks have no
 grant
 that it would not happen.

And they have every right to stop including BTRFS in the kernel whenever
they wish.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread David Dyer-Bennet

On Sun, August 15, 2010 09:19, David Magda wrote:
 On Aug 14, 2010, at 14:54, Edward Ned Harvey wrote:

 From:  Russ Price


 For me, Solaris had zero mindshare since its beginning, on account of
 being prohibitively expensive.

 I hear that a lot, and I don't get it.  $400/yr does move it out of
 peoples'
 basements generally, and keeps sol10 out of enormous clustering
 facilities
 that don't have special purposes or free alternatives.  But I
 wouldn't call
 it prohibitively expensive, for a whole lot of purposes.

 But that US$ 400 was only if you wanted support. For the last little
 while you could run Solaris 10 legally without a support contract
 without issues.

Looks like there are prices for service for things that could
legitimately be called RedHat Enterprise Linux from $80/year up into at
least the mid thousands; this may account for the range of impressions
people have.

The 24/7 Premium subscription for a two-socket server is $1299/year.  The
business-hours plan is $799.

https://www.redhat.com/wapps/store/catalog.html

Your point that free has been important is very true.  I'm not sure that
what Oracle says they're doing with Solaris 11 Express won't cover that at
least for business customers, though.  (I do think that they'll lose out
on the extensive testing we've been providing.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread David Dyer-Bennet

On Mon, August 16, 2010 15:35, Joerg Schilling wrote:

 I know of ext* performance checks where people did run gtar to unpack a
 linux
 kernel archive and these people did nothing but metering the wall clock
 time
 for gtar.

 I repeated this test and it turned out, that Linux did not even start to
 write
 to the disk when gtar finished.

As a test of ext? performance, that does seem to be lacking something!

I guess it's a consequence of the low sound levels of modern disk drives;
you go back enough years, that error couldn't have passed unnoticed :-) .

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New Supermicro SAS/SATA controller: AOC-USAS2-L8e in SOHO NAS and HD HT

2010-08-13 Thread David Dyer-Bennet

On Aug 12, 2010, at 7:03 PM, valrh...@gmail.com wrote:

 Has anyone bought one of these cards recently? It seems to list for
 around $170 at various places, which seems like quite a decent deal. But
 no well-known reputable vendor I know seems to sell these, and I want to
 be able to have someone backing the sale if something isn't perfect.
 Where do you all recommend buying this card from?

I put something very similar in -- same number with an 'i' suffix instead
of the 'e'.  I remember seeing both existed at the time, and that the i
was what I needed.  I'm using SATA cables, and no expanders (each cable
goes directly to a drive), maybe the 'e' has more advanced features (that
I knew I didn't need).

I can't imagine the retailer would be of any value for support on such a
card; perhaps, in the worst case, they  might possibly take it back. 
Selling it on Ebay is often more profitable, since the buyer pays shipping
:-).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-11 Thread David Dyer-Bennet

On Tue, August 10, 2010 23:13, Ian Collins wrote:
 On 08/11/10 03:45 PM, David Dyer-Bennet wrote:

 cannot receive incremental stream: most recent snapshot of
 bup-wrack/fsfs/zp1/ddb does not
 match incremental source

 That last error occurs if the snapshot exists, but has changed, it has
 been deleted and a new one with the same name created.

So for testing purposes at least, I need to shut down everything I have
that creates or deletes snapshots.  (I don't, though, have anything that
would delete one and create one with the same name.  I create snapshots
with various names (2hr, daily, weekly, monthly, yearly) and a current
timestamp, and I delete old ones (many days old at a minimum).)

And I think I'll abstract the commands from my backup script into a
simpler dedicated test script, so I'm sure I'm doing exactly the same
thing each time (that should cause me to hit on a combination that works
right away :-) ).

Is there anything stock in b134 that messes with snapshots that I should
shut down to keep things stable, or am I only worried about my own stuff?

Are other people out there not using send/receive for backups?  Or not
trying to preserve snapshots while doing it?  Or, are you doing what I'm
doing, and not having the problems I'm having?
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-11 Thread David Dyer-Bennet

On Tue, August 10, 2010 16:41, Dave Pacheco wrote:
 David Dyer-Bennet wrote:

 If that turns out to be the problem, that'll be annoying to work around
 (I'm making snapshots every two hours and deleting them after a couple
 of
 weeks).  Locks between admin scripts rarely end well, in my experience.
 But at least I'd know what I had to work around.

 Am I looking for too much here?  I *thought* I was doing something that
 should be simple and basic and frequently used nearly everywhere, and
 hence certain to work.  What could go wrong?, I thought :-).  If I'm
 doing something inherently dicey I can try to find a way to back off; as
 my primary backup process, this needs to be rock-solid.


 It's certainly a reasonable thing to do and it should work.  There have
 been a few problems around deleting and renaming snapshots as they're
 being sent, but the delete issues were fixed in build 123 by having
 zfs_send hold snapshots being sent (as long as you've upgraded your pool
 past version 18), and it sounds like you're not doing renames, so your
 problem may be unrelated.

AHA!  You may have nailed the issue -- I've upgraded from 111b to 134, but
have not yet upgraded my pool.  Checking...yes, the pool I'm sending from
is V14.  (I don't instantly upgrade pools; I need to preserve the option
of falling back to older software for a while after an upgrade.)

So, I should try either turning off my snapshot creator/deleter during the
backup, or upgrade the pool.  Will do!  (I will eventually upgrade the
pool of course, but I think I'll try the more reversible option first.  I
can have the deleter check for the pid file the backup already creates to
avoid two backups running at once.)

Thank you very much!  This is extremely encouraging.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet
My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages. 
Errors at the end.

sending from @bup-daily-20100726-10CDT to
zp1/d...@bup-daily-20100727-10cdt
received 3.80GB stream in 136 seconds (28.6MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100727-10cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100727-10cdt
sending from @bup-daily-20100727-10CDT to
zp1/d...@bup-daily-20100728-11cdt
received 192MB stream in 10 seconds (19.2MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100728-11cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100728-11cdt
sending from @bup-daily-20100728-11CDT to
zp1/d...@bup-daily-20100729-10cdt
received 170MB stream in 9 seconds (18.9MB/sec)
receiving incremental stream of zp1/d...@bup-daily-20100729-10cdt into
bup-wrack/fsfs/z
p1/d...@bup-daily-20100729-10cdt
sending from @bup-daily-20100729-10CDT to
zp1/d...@bup-2hr-20100729-22cdt
warning: cannot send 'zp1/d...@bup-2hr-20100729-22cdt': no such pool or
dataset
sending from @bup-2hr-20100729-22CDT to
zp1/d...@bup-2hr-20100730-00cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-00cdt': no such pool or
dataset
sending from @bup-2hr-20100730-00CDT to
zp1/d...@bup-2hr-20100730-02cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-02cdt': no such pool or
dataset
sending from @bup-2hr-20100730-02CDT to
zp1/d...@bup-2hr-20100730-04cdt
warning: cannot send 'zp1/d...@bup-2hr-20100730-04cdt': incremental
source (@bup-2hr-20
100730-02CDT) does not exist
sending from @bup-2hr-20100730-04CDT to
zp1/d...@bup-2hr-20100730-06cdt
sending from @bup-2hr-20100730-06CDT to
zp1/d...@bup-2hr-20100730-08cdt
sending from @bup-2hr-20100730-08CDT to
zp1/d...@bup-daily-20100730-10cdt
sending from @bup-daily-20100730-10CDT to
zp1/d...@bup-2hr-20100730-10cdt
sending from @bup-2hr-20100730-10CDT to
zp1/d...@bup-2hr-20100730-12cdt
sending from @bup-2hr-20100730-12CDT to
zp1/d...@bup-2hr-20100730-14cdt
sending from @bup-2hr-20100730-14CDT to
zp1/d...@bup-2hr-20100730-16cdt
sending from @bup-2hr-20100730-16CDT to
zp1/d...@bup-2hr-20100730-18cdt
sending from @bup-2hr-20100730-18CDT to
zp1/d...@bup-2hr-20100730-20cdt
sending from @bup-2hr-20100730-20CDT to
zp1/d...@bup-2hr-20100730-22cdt
received 162MB stream in 9 seconds (18.0MB/sec)
receiving incremental stream of zp1/d...@bup-2hr-20100730-06cdt into
bup-wrack/fsfs/zp1
/d...@bup-2hr-20100730-06cdt
cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?







-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet
Additional information.  I started another run, and captured the exact
expanded commands.  These SHOULD BE the exact commands used in the last
run except for the snapshot name (this script makes a recursive snapshot
just before it starts a backup).  In any case they ARE the exact commands
used in this new run, and we'll see what happens at the end of this run.

(These are from a bash trace as produced by set -x)

+ zfs create -p bup-wrack/fsfs/zp1
+ zfs send -Rp z...@bup-20100810-154542gmt
+ zfs recv -Fud bup-wrack/fsfs/zp1

(The send and the receive are source and sink in a pipeline).  As you can
see, the destination filesystem is new in the bup-wrack pool.  The -R on
the send should, as I understand it, create a replication stream which
will replicate  the specified filesystem, and all descendent file
systems, up to the  named  snapshot.  When received, all properties,
snapshots, descendent file systems, and clones are preserved.  This
should send the full state of zp1 up to the snapshot.  And the receive
should receive it into bup-wrack/fsfs/zp1.)

Isn't this how a full backup should be made using zfs send/receive? 
(Once this is working, I think intend to use -I to send incremental
streams to update it regularly.)

bash-4.0$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
bup-wrack   928G  4.62G   923G 0%  1.00x  ONLINE  /backups/bup-wrack
rpool   149G  10.0G   139G 6%  1.00x  ONLINE  -
zp11.09T   743G   373G66%  1.00x  ONLINE  -

zp1 is my primary data pool.  It's not very big (physically it's 3 2-way
mirrors of 400GB drives).  It has 743G of data in it.  bup-wrack is the
backup pool, it's a single 1TB external USB drive.  This was taken shortly
after starting the second try at a full backup (since the b134 upgrade),
so bup-wrack is still mostly empty.

None of the pools have shown any errors of any sort in months.  zp1 and
rpool are scrubbed weekly.





-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread David Dyer-Bennet

On 10-Aug-10 13:46, David Dyer-Bennet wrote:


On Tue, August 10, 2010 13:23, Dave Pacheco wrote:

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages.
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but
the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?



Is it possible that snapshots were renamed on the sending pool during
the send operation?


I don't have any scripts that rename a snapshot (in fact I didn't know it
was possible until just now), and I don't have other users with permission
to make snapshots (either delegated or by root access).  I'm not using the
Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I
know what it does).  So I don't at the moment see how one would be getting
renamed.

It's possible that a snapshot was *deleted* on the sending pool during the
send operation, however.  Also that snapshots were created (however, a
newly created one would be after the one specified in the zfs send -R, and
hence should be irrelevant).  (In fact it's certain that snapshots were
created and I'm nearly certain of deleted.)


More information.  The test I started this morning errored out somewhat 
similarly, and one set of errors is clearly deleted snapshots (they're 
2hr snapshots that some of get deleted every 2 hours).  There are also 
errors relating to incremental streams which is strange since I'm not 
using -I or -i at all.


Here are the commands again, and all the output.

+ zfs create -p bup-wrack/fsfs/zp1
+ zfs send -Rp z...@bup-20100810-154542gmt
+ zfs recv -Fud bup-wrack/fsfs/zp1
warning: cannot send 'zp1/d...@bup-2hr-20100731-12cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-14cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-16cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-20100731-213303gmt': incremental 
source (@bup-2hr-20100731-16CDT) does not exist
warning: cannot send 'zp1/d...@bup-2hr-20100731-18cdt': no such pool 
or dataset
warning: cannot send 'zp1/d...@bup-2hr-20100731-20cdt': incremental 
source (@bup-2hr-20100731-18CDT) does not exist
cannot receive incremental stream: most recent snapshot of 
bup-wrack/fsfs/zp1/ddb does not

match incremental source

Afterward,

bash-4.0$ zpool list
NAMESIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
bup-wrack   928G   687G   241G73%  1.00x  ONLINE  /backups/bup-wrack
rpool   149G  10.0G   139G 6%  1.00x  ONLINE  -
zp11.09T   743G   373G66%  1.00x  ONLINE  -

So quite a lot did get transferred; but not all.

So, it appears clear that snapshots being deleted during the zfs send -R 
causes a warning.  A warning is fine, since they're not there it can't 
send them, and they were there when the command was given so it makes 
sense for it to try.


That last message, which is not tagged as either warning or error, 
worries me though.  And wondering how complete the transfer is; I 
believe the backup copy is compressed whereas the zp1 copy isn't, so the 
ALLOC being that different isn't clear-cut evidence of anything.


I'll try to guess a few things that should be recent and see if they in 
fact got into the backup.


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Directory tree renaming -- disk usage

2010-08-09 Thread David Dyer-Bennet
If I have a directory with a bazillion files in it (or, let's say, a
directory subtree full of raw camera images, about 15MB each, totalling
say 50GB) on a ZFS filesystem, and take daily snapshots of it (without
altering it), the snapshots use almost no extra space, I know.

If I now rename that directory, and take another snapshot, what happens? 
Do I get two copies of the unchanged data now, or does everything still
reference the same original data (file content)?  Seems like the new
directory tree contains the same old files, same inodes and so forth, so
it shouldn't be duplicating the data as I understand it; is that correct?

This would, obviously, be fairly easy to test; and, if I removed the
snapshots afterward, wouldn't take space permanently (have to make sure
that the scheduler doesn't do one of my permanent snapshots during the
test).  But I'm interested in the theoretical answer in any case.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Errors upgrading 2009.06 to dev build 134

2010-08-05 Thread David Dyer-Bennet

Last night I upgraded from 2009.6 to b134 from the dev branch. I
haven't tried to boot the resulting BE yet, because I got the
following errors:

PHASEACTIONS
Removal Phase16199/21806
Warning - directory etc/sma/snmp/mibs not empty - contents preserved in
/tmp/tmp3udGdj/var/pkg/lost+found/etc/sma/snmp/mibs-20100805T075635Z
Removal Phase21806/21806
Install Phase79042/80017 The 'pcieb' driver
shares the
alias 'pciexclass,060400' with the 'pcie_pci'
driver, but the system cannot determine how the latter was delivered.
Its entry on line 2 in /etc/driver_aliases has been commented
out.  If this driver is no longer needed, it may be removed by booting
into the 'opensolaris-3' boot environment and invoking 'rem_drv pcie_pci'
as well as removing line 2 from /etc/driver_aliases or, before
rebooting, mounting the 'opensolaris-3' boot environment and running
'rem_drv -b mountpoint pcie_pci' and removing line 2 from
mountpoint/etc/driver_aliases.
The 'pcieb' driver shares the alias 'pciexclass,060401' with the 'pcie_pci'
driver, but the system cannot determine how the latter was delivered.
Its entry on line 3 in /etc/driver_aliases has been commented
out.  If this driver is no longer needed, it may be removed by booting
into the 'opensolaris-3' boot environment and invoking 'rem_drv pcie_pci'
as well as removing line 3 from /etc/driver_aliases or, before
rebooting, mounting the 'opensolaris-3' boot environment and running
'rem_drv -b mountpoint pcie_pci' and removing line 3 from
mountpoint/etc/driver_aliases.
Install Phase80017/80017
Update Phase 27721/27760 driver (aggr) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
Update Phase 27725/27760 driver (softmac) upgrade
(removal of policy'read_priv_set=net_rawaccess
write_priv_set=net_rawaccess) failed:
minor node spec required.
Update Phase 27726/27760 driver (vnic) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
Update Phase 27736/27760 driver (ibd) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
Update Phase 27743/27760 driver (dnet) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
Update Phase 27744/27760 driver (elxl) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
Update Phase 27745/27760 driver (iprb) upgrade
(removal
of policy'read_priv_set=net_rawaccess write_priv_set=net_rawaccess)
failed: minor
node spec required.
U

Do these look familiar to anybody?  Can they, I hope, be ignored?  Or
does anybody have any ideas what needs to be fixed?

I didn't install any drivers beyond what the earlier installers
figured out for themselves I needed, and I didn't mess with driver
config that I recall.

I know I can probably fall back to what I'm running now if this new
install fails to run, and I'll eventually just try it.  I've got a
couple of bootable CDs with recovery consoles that at least get me
single user, and one with full LiveCD capability, so I should be able
to unwind the mess if necessary.

I guess technically this has no business on zfs-discuss; apologies for
that, but all the prior discussion of this upgrade, and the motivation
for it, is that I need a more current ZFS, and everybody I know is in
this list, not over in the install-discuss list.




-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Upgrading 2009.06 to something current

2010-08-01 Thread David Dyer-Bennet
What's a good choice for a decently stable upgrade?  I'm unable to run 
backups because ZFS send/receive won't do full-pool replication 
reliably, it hangs better than 2/3 of the time, and people here have 
told me later versions (later than 111b) fix this.  I was originally 
waiting for the spring release, but okay, I've kind of given up on 
that.  This is a home production server; it's got all my photos on it. 
 And the backup isn't as current as I'd like, and I'm having trouble 
getting a better backup.  (I'll do *something* before I risk the 
upgrade; maybe brute force, rsync to an external drive, to at least give 
me a clean copy of the current state; I can live without ACLs.)


I find various blogs with instructions for how to do such an upgrade, 
and they don't agree, and each one has posts from people for whom it 
didn't work, too.  Is there any kind of consensus on what the best way 
to do this is?


--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-16 Thread David Dyer-Bennet

On Fri, July 16, 2010 08:39, Richard L. Hamilton wrote:
  It'd be handy to have a mechanism where
 applications could register for
  snapshot notifications. When one is about to
 happen, they could be told
  about it and do what they need to do. Once all the
 applications have
  acknowledged the snapshot alert--and/or after a
 pre-set timeout--the file
  system would create the snapshot, and then notify
 the applications that
  it's done.
 
 Why would an application need to be notified? I think
 you're under the
 misconception that something happens when a ZFS
 snapshot is taken.
 NOTHING happens when a snapshot is taken (OK, well,
 there is the
 snapshot reference name created). Blocks aren't moved
 around, we don't
 copy anything, etc. Applications have no need to do
 anything before a
 snapshot it taken.

 It would be nice to have applications request to be notified
 before a snapshot is taken, and when that have requested
 notification have acknowledged that they're ready, the snapshot
 would be taken; and then another notification sent that it was
 taken.  Prior to indicating they were ready, the apps could
 have achieved a logically consistent on disk state.  That
 would eliminate the need for (for example) separate database
 backups, if you could have a snapshot with the database on it
 in a consistent state.

Any software dependent on cooperating with the filesystem to ensure that
the files are consistent in a snapshot fails the cord-yank test (which is
equivalent to the processor explodes test and the power supply bursts
into flames test and the disk drive shatters test and so forth).  It
can't survive unavoidable physical-world events.

Conversely, any scheme for a program writing to its files that PASSES
those tests will be fine with arbitrary snapshots, too.

For that matter, remember that the snapshot may be taken on a zfs server
on another continent which is making the storage available via iScsi;
there's currently no notification channel to tell the software the
snapshot is happening.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-16 Thread David Dyer-Bennet

On Fri, July 16, 2010 14:07, Frank Cusack wrote:
 On 7/16/10 12:02 PM -0500 David Dyer-Bennet wrote:
 It would be nice to have applications request to be notified
 before a snapshot is taken, and when that have requested
 notification have acknowledged that they're ready, the snapshot
 would be taken; and then another notification sent that it was
 taken.  Prior to indicating they were ready, the apps could
 have achieved a logically consistent on disk state.  That
 would eliminate the need for (for example) separate database
 backups, if you could have a snapshot with the database on it
 in a consistent state.

 Any software dependent on cooperating with the filesystem to ensure that
 the files are consistent in a snapshot fails the cord-yank test (which
 is
 equivalent to the processor explodes test and the power supply bursts
 into flames test and the disk drive shatters test and so forth).  It
 can't survive unavoidable physical-world events.

 It can, if said software can roll back to the last consistent state.
 That may or may not be recent wrt a snapshot.  If an application is
 very active, it's possible that many snapshots may be taken, none of
 which are actually in a state the application can use to recover from.
 Rendering snapshots much less effective.

Wait, if the application can in fact survive the cord pull test then by
definition of survive, all the snapshots are useful.  They'll be
everything consistent that was committed to disk by the time of the yank
(or snapshot); which, it seems to me, is the very best that anybody could
hope for.

 Also, just administratively, and perhaps legally, it's highly desirable
 to know that the time of a snapshot is the actual time that application
 state can be recovered to or referenced to.

Maybe, but since that's not achievable for your core corporate asset (the
database), I think of it as a pipe dream rather than a goal.

 Also, if an application cannot survive a cord-yank test, it might be
 even more highly desirable that snapshots be a stable that from which
 the application can be restarted.

If it cannot survive a cord-yank test, it should not be run, ever, by
anybody, for any purpose more important than playing a game.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-15 Thread David Dyer-Bennet

On Wed, July 14, 2010 23:51, Tim Cook wrote:
 On Wed, Jul 14, 2010 at 9:27 PM, BM bogdan.maryn...@gmail.com wrote:

 On Thu, Jul 15, 2010 at 12:49 AM, Edward Ned Harvey
 solar...@nedharvey.com wrote:
  I'll second that.  And I think this is how you can tell the
 difference:
  With supermicro, do you have a single support number to call and a
 4hour
  onsite service response time?

 Yes.

 BTW, just for the record, people potentially have a bunch of other
 supermicros in a stock, that they've bought for the rest of the money
 that left from a budget that was initially estimated to get shiny
 Sun/Oracle hardware. :) So normally you put them online in a cluster
 and don't really worry that one of them gone — just power that thing
 down and disconnect from the whole grid.

  When you pay for the higher prices for OEM hardware, you're paying for
 the
  knowledge of parts availability and compatibility. And a single point
  vendor who supports the system as a whole, not just one component.

 What exactly kind of compatibility you're talking about? For example,
 if I remove my broken mylar air shroud for X8 DP with a
 MCP-310-18008-0N number because I step on it accidentally :-D, pretty
 much I think I am gonna ask them to replace exactly THAT thing back.
 Or you want to let me tell you real stories how OEM hardware is
 supported and how many emails/phonecalls it involves? One of the very
 latest (just a week ago): Apple Support reported me that their
 engineers in US has no green idea why Darwin kernel panics on their
 XServe, so they suggested me replace mother board TWICE and keep OLDER
 firmware and never upgrade, since it will cause crash again (although
 identical server works just fine with newest firmware)! I told them
 NNN times that traceback of Darwin kernel was yelling about ACPI
 problem and gave them logs/tracebacks/transcripts etc, but they still
 have no idea where is the problem. Do I need such support? No. Not
 at all.

 --
 Kind regards, BM

 Things, that are stupid at the beginning, rarely ends up wisely.
 ___



 You're clearly talking about something completely different than everyone
 else.  Whitebox works GREAT if you've got 20 servers.  Try scaling it to
 10,000.  A couple extras ends up being an entire climate controlled
 warehouse full of parts that may or may not be in the right city.  Not to
 mention you've then got full-time staff on-hand to constantly be replacing
 parts.  Your model doesn't scale for 99% of businesses out there.  Unless
 they're google, and they can leave a dead server in a rack for years, it's
 an unsustainable plan.  Out of the fortune 500, I'd be willing to bet
 there's exactly zero companies that use whitebox systems, and for a
 reason.

You might want to talk to Google about that; as I understand it they
decided that buying expensive servers was a waste of money precisely
because of the high numbers they needed.  Even with the good ones, some
will fail, so they had to plan to work very well through server failures,
so they can save huge amounts of money on hardware by buying cheap servers
rather than expensive ones.

And your juxtaposition of fortune 500 and 99% of businesses is
significant; possibly the Fortune 500, other than Google, use expensive
proprietary hardware; but 99% of businesses out there are NOT in the
Fortune 500, and mostly use whitebox systems (and not rackmount at all;
they'll have one or at most two tower servers).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-15 Thread David Dyer-Bennet

On Thu, July 15, 2010 09:29, Tim Cook wrote:
 On Thu, Jul 15, 2010 at 9:09 AM, David Dyer-Bennet d...@dd-b.net wrote:


 On Wed, July 14, 2010 23:51, Tim Cook wrote:

  You're clearly talking about something completely different than
 everyone
  else.  Whitebox works GREAT if you've got 20 servers.  Try scaling it
 to
  10,000.  A couple extras ends up being an entire climate controlled
  warehouse full of parts that may or may not be in the right city.  Not
 to
  mention you've then got full-time staff on-hand to constantly be
 replacing
  parts.  Your model doesn't scale for 99% of businesses out there.
 Unless
  they're google, and they can leave a dead server in a rack for years,
 it's
  an unsustainable plan.  Out of the fortune 500, I'd be willing to bet
  there's exactly zero companies that use whitebox systems, and for a
  reason.

 You might want to talk to Google about that; as I understand it they
 decided that buying expensive servers was a waste of money precisely
 because of the high numbers they needed.  Even with the good ones, some
 will fail, so they had to plan to work very well through server
 failures,
 so they can save huge amounts of money on hardware by buying cheap
 servers rather than expensive ones.

 Obviously someone was going to bring up google, whose business model is
 unique, and doesn't really apply to anyone else.  Google makes it work
 because they order so many thousands of servers at a time, they can demand
 custom made parts for the servers, that are built to their specifications.

Certainly they're one of the most unusual setups out there, in several
ways (size, plus details of what they do with their computers.

  Furthermore, the clustering and filesystem they use wouldn't function at
 all for 99% of the workloads out there.  Their core application: search,
 is
 what makes the hardware they use possible.  If they were serving up a
 highly
 transactional database that required millisecond latency it would be a
 different story.

Again, I'm not at all convinced of that 99% bit.

Obviously low-latency transactional database applications are about the
polar opposite of what Google does.  However, transactional database
applications are nearer 1% than 99% of the workloads out there, at every
shop I've worked at or seen detailed descriptions of.

Big email farms, for example, don't generally have that kind of database
at all.  Big web farms probably do have some databases used that way --
but not for that high a percentage of their traffic, and generally running
on one big server while the web is spread across hundreds of servers.
Akamai is more like Google in a bunch of ways than most places.  Wikipedia
and ebay and amazon have huge web front-ends, while also needing
transactional database support.

Um, maybe I'm getting really too far afield from ZFS.  I'll shut up now :-) .
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] preparing for future drive additions

2010-07-14 Thread David Dyer-Bennet

On Wed, July 14, 2010 14:58, Daniel Taylor wrote:

 I'm about the build a opensolaris NAS system, currently we have two drives
 and are planning on adding two more at a later date (2TB enterprise level
 HDD are a bit expensive!).

Do you really need them?  Now?  Maybe 1TB drives are good now, and then
add a pair of 2TB in a year?

 Whats the best configuration for setting up these drives bearing in mind I
 want to expand in the future?

Mirror now (pool consisting of one two-way mirror vdev).  Add second
mirror vdev to the pool when you need to expand.

 I was thinking of mirroring the drives and then converting to raidz some
 how?

No way to convert to raidz.  (That is, no magic simple way; you can of
course put in new drives for the raidz and copy the data across.)

 It will only be a max of 4 drives, the second two of which will be bought
 later.

5 drives would be a lot better.  You could keep a hot spare -- and you
could expand mirror vdevs safely (never dropping below your normal
redundancy level), too.

You can add new vdevs to a pool.  This is very useful for a growing system
(until you run out of drive slots).

You can expand an existing vdev by replacing all the drives (one at a
time).  It's a lot cleaner and safer with mirror vdevs than with raidz[
23] vdevs.

In a raid vdev, you can replace drives individually and wait for them to
resilver.  When each drive is done, replace the next.  When you have
replaced all of the drives, the vdev will then make the new space
available.  HOWEVER, doing this takes away a level of redundancy -- you
take away a live drive.  For a RAIDZ, that means no redundancy during the
resilver (which takes a while on a 2TB drive, if you haven't noticed). 
And the resilver is stressing the drives, so if there's any incipient
failure, it's more likely to show up during the resilver.  Scary!  (RAIDZ2
is better in that you still have one layer of redundancy when you take one
drive out; but in a 4-drive chassis forget it!).

In a mirror vdev,  you can be much cleverer, IF you can connect the new
drive while the old drives are all still present.  Attach the new bigger
drive as a THIRD drive to the mirror vdev, and wait for the resilver.  You
now have a three-way mirror, and you never dropped below a two-way mirror
at any time during the process.  Detach one small drive and attach a new
big drive, and wait again.  And detach the last small drive, and you have
now expanded your mirror vdev without ever dropping below your normal
redundancy.  (There are variants on this; the key point is that a mirror
vdev can be an n-way mirror for any value of n your hardware can support.)

If your backups are good and your uptime requirements aren't really
strict, of course the risks can be tolerated better.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/recv hanging in 2009.06

2010-07-13 Thread David Dyer-Bennet

On Fri, July 9, 2010 16:49, BJ Quinn wrote:
 I have a couple of systems running 2009.06 that hang on relatively large
 zfs send/recv jobs.  With the -v option, I see the snapshots coming
 across, and at some point the process just pauses, IO and CPU usage go to
 zero, and it takes a hard reboot to get back to normal.  The same script
 running against the same data doesn't hang on 2008.05.

 There are maybe 100 snapshots, 200GB of data total.  Just trying to send
 to a blank external USB drive in one case, and in the other, I'm restoring
 from a USB drive to a local drive, but the behavior is the same.

 I see that others have had a similar problem, but there doesn't seem to be
 any answers -

 https://opensolaris.org/jive/thread.jspa?messageID=384540
 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg34493.html
 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg37158.html

 I'd like to stick with a released version of OpenSolaris, so I'm hoping
 that the answer isn't to switch to the dev repository and pull down b134.

I still have this problem (I was msg34493 there).

My original plan was to wait for the Spring release, to get me to a stable
release on more recent code.  I'm still following that plan, i.e. haven't
done anything else yet.  At the time the March release was expected to
actually appear by April.

Other than trying more recent code, I don't recall any useful ideas coming
through the list.

It seems like the thing people recommend as the backup scheme for ZFS
simply doesn't work yet.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/recv hanging in 2009.06

2010-07-13 Thread David Dyer-Bennet

On Fri, July 9, 2010 18:42, Giovanni Tirloni wrote:
 On Fri, Jul 9, 2010 at 6:49 PM, BJ Quinn bjqu...@seidal.com wrote:
 I have a couple of systems running 2009.06 that hang on relatively large
 zfs send/recv jobs.  With the -v option, I see the snapshots coming
 across, and at some point the process just pauses, IO and CPU usage go
 to zero, and it takes a hard reboot to get back to normal.  The same
 script running against the same data doesn't hang on 2008.05.

 There are issues running concurrent zfs receive in 2009.6. Try to run
 just one at a time.

He's doing the same thing I'm doing -- one send, one receive.  (But
incremental replication.)

 Switching to a development build (b134) is probably the answer until
 we've a new release.

Given that the spring stable release was my planned solution, I'm
starting to think about doing something else myself.

Does anybody have any idea what's up with the stable release, though?  Has
anything been said about the plans that I've maybe missed?

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Crucial RealSSD C300 and cache flush?

2010-06-24 Thread David Dyer-Bennet

On Thu, June 24, 2010 08:58, Arne Jansen wrote:

 Cross check: we pulled also while writing with cache enabled, and it lost
 8 writes.

I'm SO pleased to see somebody paranoid enough to do that kind of
cross-check doing this benchmarking!

Benchmarking is hard!

 So I'd say, yes, it flushes its cache on request.

Starting to sound pretty convincing,  yes.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Complete Linux Noob

2010-06-15 Thread David Dyer-Bennet

On Tue, June 15, 2010 14:13, CarlPalmer wrote:
 I have been researching different types of raids, and I happened across
 raidz, and I am blown away.  I have been trying to find resources to
 answer some of my questions, but many of them are either over my head in
 terms of details, or foreign to me as I am a linux noob, and I have to
 admit I have never even looked at Solaris.

Heh; caught another one :-) .

 Are the Parity drives just that, a drive assigned to parity, or is the
 parity shared over several drives?

No drives are formally designated for parity; all n drives in the RAIDZ
vdev are used together in such a way that you can lose one drive without
loss of data, but exactly which bits are data and which bits are
parity and where they are stored is not something the admin has to think
about or know (and in fact cannot know).

 I understand that you can build a raidz2 that will have 2 parity disks.
 So in theory I could lose 2 disks and still rebuild my array so long as
 they are not both the parity disks correct?

Any two disks out of a raidz2 vdev can be lost.  Lose a third before the
recover completes and your data is toast.

 I understand that you can have Spares assigned to the raid, so that if a
 drive fails, it will immediately grab the spare and rebuild the damaged
 drive.  Is this correct?

Yes, RAIDZ (including z2 and z3) and mirror vdevs will grab a hot spare
if one is assigned and needed, and start the resilvering operation
immediately.

 Now I can not find anything on how much space is taken up in the raidz1 or
 raidz2.  If all the drives are the same size, does a raidz2 take up the
 space of 2 of the drives for parity, or is the space calculation
 different?

That's the right calculation.

 I get that you can not expand a raidz as you would a normal raid, by
 simply slapping on a drive.  Instead it seems that the preferred method is
 to create a new raidz.  Now Lets say that I want to add another raidz1 to
 my system, can I get the OS to present this as one big drive with the
 space from both raid pools?

You can't expand a normal RAID, either, anywhere I've ever seen.

A pool can contain multiple vdevs.  You can add additional vdevs to a
pool and the new space become immediately available to the pool, and hence
to anything (like a filesystem) drawing from that pool.

(The zpool command will attempt to stop you from mixing vdevs of different
redundancy in the same pool, but you can force it to let you.  Mixing a
RAIDZ vdev and a RAIDZ3 vdev in the same pool is a silly thing to do,
since you don't control where in the pool any new data goes, and it's
likely to be striped across the vdevs in the pool.)

You can also replace all the drives in a vdev, serially (and waiting for
the resilver to complete at each step before continuing to the next
drive), and if the new drives are larger than the old drives, when  you've
replaced all of them the new space will be usable in that vdev.  This is
particularly useful with mirrors, where there are only two drives to
replace.

(Well, actually, ZFS mirrors can have any number of drives.  To avoid the
risk of loss when upgrading the drives in a mirror, attach the new bigger
drive FIRST, wait for the resilver, and THEN detach one of the smaller
original drives, repeat for the second drive, and you will never go to a
redundancy lower than 2.  You can even attach BOTH new disks at once, if
you have the slots and controller space, and have a 4-way  mirror for a
while.  Somebody reported configuring ALL the drives in a 'Thumper' as a
mirror, a 48-way mirror, just to see if it worked.  It did.)

 How do I share these types of raid pools across the network.  Or more
 specifically, how do I access them from Windows based systems?  Is there
 any special trick?

Nothing special.  In-kernel CIFS is better than SAMBA, and supports full
NTFS ACLs.  I hear it also attaches to AD cleanly, but I haven't done
that, don't run AD at home.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread David Dyer-Bennet

On Thu, June 10, 2010 12:26, patto...@yahoo.com wrote:
 It's getting downright ridiculous. The digest people will kiss you.

But those reading via individual message email quite possibly will not. 
Quoting at least what you're actually responding to is crucial to making
sense out here.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Depth of Scrub

2010-06-04 Thread David Dyer-Bennet

On Fri, June 4, 2010 03:29, sensille wrote:
 Hi,

 I have a small question about the depth of scrub in a raidz/2/3
 configuration.
 I'm quite sure scrub does not check spares or unused areas of the disks
 (it
 could check if the disks detects any errors there).
 But what about the parity? Obviously it has to be checked, but I can't
 find
 any indications for it in the literature. The man page only states that
 the
 data is being checksummed and only if that fails the redundancy is being
 used.
 Please tell me I'm wrong ;)

I believe you're wrong.  Scrub checks all the blocks used by ZFS,
regardless of what's in them.  (It doesn't check free blocks.)

 But what I'm really targeting with my question: How much coverage can be
 reached with a find | xargs wc in contrast to scrub? It misses the
 snapshots, but anything beyond that?

Your find script misses the redundant data; scrub checks it all.

It may well miss some of the metadata as well, and probably misses the
redundant copies of metadata.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Wed, June 2, 2010 17:54, Roman Naumenko wrote:
 Recently I talked to a co-worker who manages NetApp storages. We discussed
 size changes for pools in zfs and aggregates in NetApp.

 And some time before I had suggested to a my buddy zfs for his new home
 storage server, but he turned it down since there is no expansion
 available for a pool.

I set up my home fileserver with ZFS (in 2006) BECAUSE zfs could expand
the pool for me, and nothing else I had access to could do that (home
fileserver, little budget).

My server is currently running with one data pool, three vdevs.  Each of
the data vdev is a two-way mirror.  I started with one, expanded to two,
then expanded to three.  Rather than expanding to four when this fills up,
I'm going to attach a larger drive to the first mirror vdev, and then a
second one, and then remove the two current drives, thus expanding the
vdev without ever compromising the redundancy.

My choice of mirrors rather than RAIDZ is based on the fact that I have
only 8 hot-swap bays (I still think of this as LARGE for a home server;
the competition, things like the Drobo, tends to have 4 or 5), that I
don't need really large amounts of storage (after my latest upgrade I'm
running with 1.2TB of available data space), and that I expected to need
to expand storage over the life of the system.  With mirror vdevs, I can
expand them without compromising redundancy even temporarily, by attaching
the new drives before I detach the old drives; I couldn't do that with
RAIDZ.  Also, the fact that disk is now so cheap means that 100%
redundancy is affordable, I don't have to compromise on RAIDZ.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:15, Garrett D'Amore wrote:
 Using a stripe of mirrors (RAID0) you can get the benefits of multiple
 spindle performance, easy expansion support (just add new mirrors to the
 end of the raid0 stripe), and 100% data redundancy.   If you can afford
 to pay double for your storage (the cost of mirroring), this is IMO the
 best solution.

Referencing RAID0 here in the context of ZFS is confusing, though.  Are
you suggesting using underlying RAID hardware to create virtual volumes to
then present to ZFS, or what?

 Note that this solution is not quite as resilient against hardware
 failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
 multiple drive failures, if both both drives in a mirror fail, you lose
 data.

In a RAIDZ solution, two or more drive failures lose your data.  In a
mirrored solution, losing the WRONG two drives will still lose your data,
but you have some chance of surviving losing a random two drives.  So I
would describe the mirror solution as more resilient.

So going to RAIDZ2 or even RAIDZ3 would be better, I agree.

In an 8-bay chassis, there are other concerns, too.  Do I keep space open
for a hot spare?  There's no real point in a hot spare if you have only
one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
plus a hot spare.  And putting everything into one vdev means that for any
upgrade I have to replace all 8 drives at once, a financial problem for a
home server.

 If you're clever, you'll also try to make sure each side of the mirror
 is on a different controller, and if you have enough controllers
 available, you'll also try to balance the controllers across stripes.

I did manage to split the mirrors accross controllers (I have 6 SATA on
the motherboard and I added an 8-port SAS card with SAS-SATA cabling).

 One way to help with that is to leave a drive or two available as a hot
 spare.

 Btw, the above recommendation mirrors what Jeff Bonwick himself (the
 creator of ZFS) has advised on his blog.

I believe that article directly influenced my choice, in fact.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:50, Marty Scholes wrote:
 David Dyer-Bennet wrote:
 My choice of mirrors rather than RAIDZ is based on
 the fact that I have
 only 8 hot-swap bays (I still think of this as LARGE
 for a home server;
 the competition, things like the Drobo, tends to have
 4 or 5), that I
 don't need really large amounts of storage (after my
 latest upgrade I'm
 running with 1.2TB of available data space), and that
 I expected to need
 to expand storage over the life of the system.  With
 mirror vdevs, I can
 expand them without compromising redundancy even
 temporarily, by attaching
 the new drives before I detach the old drives; I
 couldn't do that with
 RAIDZ.  Also, the fact that disk is now so cheap
 means that 100%
 redundancy is affordable, I don't have to compromise
 on RAIDZ.

 Maybe I have been unlucky too many times doing storage admin in the 90s,
 but simple mirroring still scares me.  Even with a hot spare (you do have
 one, right?) the rebuild window leaves the entire pool exposed to a single
 failure.

No hot spare currently.  And now running on 4-year-old disks, too.

For me, mirroring is a big step UP from bare single drives.  That's my
default state.

Of course, I'm a big fan of multiple levels of backup.

 One of the nice things about zfs is that allows, to each his own.  My
 home server's main pool is 22x 73GB disks in a Sun A5000 configured as
 RAIDZ3.  Even without a hot spare, it takes several failures to get the
 pool into trouble.

Yes, it's very flexible, and while there are no doubt useless degenerate
cases here and there, lots of the cases are useful for some environment or
other.

That does seem like rather an extreme configuration.

 At the same time, there are several downsides to a wide stripe like that,
 including relatively poor iops and longer rebuild windows.  As noted
 above, until bp_rewrite arrives, I cannot change the geometry of a vdev,
 which kind of limits the flexibility.

There are a LOT of reasons to want bp_rewrite, certainly.

 As a side rant, I still find myself baffled that Oracle/Sun correctly
 touts the benefits of zfs in the enterprise, including tremendous
 flexibility and simplicity of filesystem provisioning and nondisruptive
 changes to filesystems via properties.

 These forums are filled with people stating that the enterprise demands
 simple, flexibile and nondisruptive filesystem changes, but no enterprise
 cares about simple, flexibile and nondisruptive pool/vdev changes, e.g.
 changing a vdev geometry or evacuating a vdev.  I can't accept that zfs
 flexibility is critical and zpool flexibility is unwanted.

We could certainly use that level of pool-equivalent flexibility at work;
we don't currently have it (not ZFS, not high-end enterprise storage
units).

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 10:50, Garrett D'Amore wrote:
 On Thu, 2010-06-03 at 10:35 -0500, David Dyer-Bennet wrote:
 On Thu, June 3, 2010 10:15, Garrett D'Amore wrote:
  Using a stripe of mirrors (RAID0) you can get the benefits of multiple
  spindle performance, easy expansion support (just add new mirrors to
 the
  end of the raid0 stripe), and 100% data redundancy.   If you can
 afford
  to pay double for your storage (the cost of mirroring), this is IMO
 the
  best solution.

 Referencing RAID0 here in the context of ZFS is confusing, though.
 Are
 you suggesting using underlying RAID hardware to create virtual volumes
 to
 then present to ZFS, or what?

 RAID0 is basically the default configuration of a ZFS pool -- its a
 concatenation of the underlying vdevs.  In this case the vdevs should
 themselves be two-drive mirrors.

 This of course has to be done in the ZFS layer, and ZFS doesn't call it
 RAID0, any more than it calls a mirror RAID1, but effectively that's
 what they are.

Kinda mostly, anyway.  I thought we recently had this discussion, and
people were pointing out things like the striping wasn't physically the
same on each drive and such.

  Note that this solution is not quite as resilient against hardware
  failure as raidz2 or raidz3.  While the RAID1+0 solution can tolerate
  multiple drive failures, if both both drives in a mirror fail, you
 lose
  data.

 In a RAIDZ solution, two or more drive failures lose your data.  In a
 mirrored solution, losing the WRONG two drives will still lose your
 data,
 but you have some chance of surviving losing a random two drives.  So I
 would describe the mirror solution as more resilient.

 So going to RAIDZ2 or even RAIDZ3 would be better, I agree.

From a data resiliency point, yes, raidz2 or raidz3 offers better
 protection.  At a significant performance cost.

The place I care about performance is almost entirely sequential
read/write -- loading programs, and loading and saving large image files. 
I don't know a lot of home users that actually need high IOPS.

 Given enough drives, one could probably imagine using raidz3 underlying
 vdevs, with RAID0 striping to spread I/O across multiple spindles.  I'm
 not sure how well this would perform, but I suspect it would perform
 better than straight raidz2/raidz3, but at a significant expense (you'd
 need a lot of drives).

Might well work that way; it does sound about right.

 In an 8-bay chassis, there are other concerns, too.  Do I keep space
 open
 for a hot spare?  There's no real point in a hot spare if you have only
 one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
 plus a hot spare.  And putting everything into one vdev means that for
 any
 upgrade I have to replace all 8 drives at once, a financial problem for
 a
 home server.

 This is one of the reasons I don't advocate using raidz (any version)
 for home use, unless you can't afford the cost in space represented by
 mirroring and a hot spare or two.  (The other reason ... for my use at
 least... is the performance cost.  I want to use my array to host
 compilation workspaces, and for that I would prefer to get the most
 performance out of my solution.  I suppose I could add some SSDs... but
 I still think multiple spindles are a good option when you can do it.)

 In an 8 drive chassis, without any SSDs involved,I'd configure 6 of the
 drives as a 3 vdev stripe consisting of mirrors of 2 drives, and I'd
 leave the remaining two bays as hot spares.  Btw, using the hot spares
 in this way potentially means you can use those bays later to upgrade to
 larger drives in the future, without offlining anything and without
 taking too much of a performance penalty when you do so.

And the three 2-way mirrors is exactly where I am right now.  I don't have
hot spares in place, but I have the bays reserved for that use.

In the latest upgrade, I added 4 2.5 hot-swap bays (which got the system
disks out of the 3.5 hot-swap bays).  I have two free, and that's the
form-factor SSDs come in these days, so if I thought it would help I could
add an SSD there.  Have to do quite a bit of research to see which uses
would actually benefit me, and how much.  It's not obvious that either
l2arc or zil on SSD would help my program loading, image file loading, or
image file saving cases that much.  There may be more other stuff than I
really think of though.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 13:04, Garrett D'Amore wrote:
 On Thu, 2010-06-03 at 11:49 -0500, David Dyer-Bennet wrote:
 hot spares in place, but I have the bays reserved for that use.

 In the latest upgrade, I added 4 2.5 hot-swap bays (which got the
 system
 disks out of the 3.5 hot-swap bays).  I have two free, and that's the
 form-factor SSDs come in these days, so if I thought it would help I
 could
 add an SSD there.  Have to do quite a bit of research to see which uses
 would actually benefit me, and how much.  It's not obvious that either
 l2arc or zil on SSD would help my program loading, image file loading,
 or
 image file saving cases that much.  There may be more other stuff than I
 really think of though.

 It really depends on the working sets these programs deal with.

 zil is useful primarily when doing lots of writes, especially lots of
 writes to small files or to data scattered throughout a file.  I view it
 as a great solution for database acceleration, and for accelerating the
 filesystems I use for hosting compilation workspaces.  (In retrospect,
 since by definition the results of compilation are reproducible, maybe I
 should just turn off synchronous writes for build workspaces... provided
 that they do not contain any modifications to the sources themselves.
 I'm going to have to play with this.)

I suspect there are more cases here than I immediately think of.  For
example, sitting here thinking, I wonder if the web cache would benefit a
lot?  And all those email files?

RAW files from my camera are 12-15MB, and the resulting Photoshop files
are around 50MB (depending on compression, and they get bigger fast if I
add layers).  Those aren't small, and I don't read the same thing over and
over lots.

For build spaces, definitely should be reproducible from source.  A
classic production build starts with checking out a tagged version from
source control, and builds from there.

 l2arc is useful for data that is read back frequently but is too large
 to fit in buffer cache.  I can imagine that it would be useful for
 hosting storage associated with lots of  programs that are called
 frequently. You can think of it as a logical extension of the buffer
 cache in this regard... if your working set doesn't fit in RAM, then
 l2arc can prevent going back to rotating media.

I don't think I'm going to benefit much from this.

 All other things being equal, I'd increase RAM before I'd worry too much
 about l2arc.  The exception to that would be if I knew I had working
 sets that couldn't possibly fit in RAM... 160GB of SSD is a *lot*
 cheaper than 160GB of RAM. :-)

I just did increase RAM, same upgrade as the 2.5 bays and the additional
controller and the third mirrored vdev.  I increased it all the way to
4GB!  And I can't increase it further feasibly (4GB sticks of ECC RAM
being hard to find and extremely pricey; plus I'd have to displace some of
my existing memory).

Since this is a 2006 system, in another couple of years it'll be time to
replace MB and processor and memory, and I'm sure it'll have a lot more
memory next time.

I'm desperately waiting for Solaris 2006.$Q2 (Q2 since it was pointed
out last time that Spring was wrong on half the Earth), since I hope it
will resolve my backup problems so I can get incremental backups happening
nightly (intention is to use zfs send/receive with incremental replication
streams, to keep external drives up-to-date with data and all snapshots). 
The oldness of the system and especially the drives makes this more
urgent, though of course it's important in general.  I do manage a full
backup that completes now and then, anyway, and they'll complete overnight
if they don't hang. Problem is, if they hang, have to reboot the Solaris
box and every Windows box using it.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one more time: pool size changes

2010-06-03 Thread David Dyer-Bennet

On Thu, June 3, 2010 12:03, Bob Friesenhahn wrote:
 On Thu, 3 Jun 2010, David Dyer-Bennet wrote:

 In an 8-bay chassis, there are other concerns, too.  Do I keep space
 open
 for a hot spare?  There's no real point in a hot spare if you have only
 one vdev; that is, 8-drive RAIDZ3 is clearly better than 7-drive RAIDZ2
 plus a hot spare.  And putting everything into one vdev means that for
 any
 upgrade I have to replace all 8 drives at once, a financial problem for
 a
 home server.

 It is not so clear to me that an 8-drive raidz3 is clearly better than
 7-drive raidz2 plus a hot spare.  From a maintenance standpoint, I
 think that it is useful to have a spare drive or even an empty spare
 slot so that it is easy to replace a drive without needing to
 physically remove it from the system.  A true hot spare allows
 replacement to start automatically right away if a failure is
 detected.

But is having a RAIDZ2 drop to single redundancy, with replacement
starting instantly, actually as good or better than having a RAIDZ3 drop
to double redundancy, with actual replacement happening later?  The
degraded state of the RAIDZ3 has the same redundancy as the healthy
state of the RAIDZ2.

Certainly having a spare drive bay to play with is often helpful; though
the scenarios that most immediately spring to mind are all mirror-related
and hence don't apply here.

 With only 8-drives, the reliability improvement from raidz3 is
 unlikely to be borne out in practice.  Other potential failures modes
 will completely drown out the on-paper reliability improvement
 provided by raidz3.

I wouldn't give up much of anything to add Z3 on 8 drives, no.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I change copies to 2 *after* I have copied a bunch of files?

2010-06-01 Thread David Dyer-Bennet

On Fri, May 28, 2010 11:04, Thanassis Tsiodras wrote:
 I've read on the web that copies=2 affects only the files copied *after* I
 have changed the setting

That is correct.

Rewriting datasets is a feature desired for future versions (it would make
a LOT of things, including shrinking pools and adding compression or extra
redundancy later work).  Nobody has promised a date for it that I recall.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] send/recv over ssh

2010-05-21 Thread David Dyer-Bennet

On Thu, May 20, 2010 19:44, Freddie Cash wrote:
 And you can always patch OpenSSH with HPN, thus enabling the NONE
 cipher,
 which disable encryption for the data transfer (authentication is always
 encrypted).  And twiddle the internal buffers that OpenSSH uses to improve
 transfer rates, especially on 100 Mbps or faster links.

Ah!  I've been wanting that for YEARS.  Very glad to hear somebody has
done it.

With the common use of SSH for for moving bulk data (under rsync as well),
this is a really useful idea.  Of course one should think about where one
is moving one's data unencrypted; but the precise cases where the
performance hit of encryption will show are the safe ones, such as between
my desktop and server which are plugged into the same switch; no data
would leave that small LAN segment.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interesting experience with Nexenta - anyone seen it?

2010-05-21 Thread David Dyer-Bennet

On Fri, May 21, 2010 10:19, Bob Friesenhahn wrote:
 On Fri, 21 May 2010, Miika Vesti wrote:

 AFAIK OCZ Vertex 2 does not use volatile DRAM cache but non-volatile
 NAND
 grid. Whether it respects or ignores the cache flush seems irrelevant.

 There has been previous discussion about this:
 http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/35702

 I'm pretty sure that all SandForce-based SSDs don't use DRAM as their
 cache, but take a hunk of flash to use as scratch space instead. Which
 means that they'll be OK for ZIL use.

 So, OCZ Vertex 2 seems to be a good choice for ZIL.

 There seem to be quite a lot of blind assumptions in the above.  The
 only good choice for ZIL is when you know for a certainty and not
 assumptions based on 3rd party articles and blog postings.  Otherwise
 it is like assuming that if you jump through an open window that there
 will be firemen down below to catch you.

Just how DOES one know something for a certainty, anyway?  I've seen LOTS
of people mess up performance testing in ways that gave them very wrong
answers; relying solely on your own testing is as foolish as relying on a
couple of random blog posts.

To be comfortable (I don't ask for know for a certainty; I'm not sure
that exists outside of faith), I want a claim by the manufacturer and
multiple outside tests in significant journals -- which could be the
blog of somebody I trusted, as well as actual magazines and such. 
Ideally, certainly if it's important, I'd then verify the tests myself.

There aren't enough hours in the day, so I often get by with less.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] send/recv over ssh

2010-05-21 Thread David Dyer-Bennet

On Fri, May 21, 2010 12:59, Brandon High wrote:
 On Fri, May 21, 2010 at 7:12 AM, David Dyer-Bennet d...@dd-b.net wrote:

 On Thu, May 20, 2010 19:44, Freddie Cash wrote:
 And you can always patch OpenSSH with HPN, thus enabling the NONE
 cipher,
 which disable encryption for the data transfer (authentication is
 always
 encrypted).  And twiddle the internal buffers that OpenSSH uses to
 improve
 transfer rates, especially on 100 Mbps or faster links.

 Ah!  I've been wanting that for YEARS.  Very glad to hear somebody has
 done it.

 ssh-1 has had the 'none' cipher from day one, though it looks like
 openssh has removed it at some point. Fixing the buffers seems to be a
 nice tweak though.

I thought I remembered a none cipher, but couldn't find it the other
year and decided I must have been wrong.  I did use ssh-1, so maybe I
really WAS remembering after all.

 With the common use of SSH for for moving bulk data (under rsync as
 well),
 this is a really useful idea.  Of course one should think about where
 one

 I think there's a certain assumption that using ssh = safe, and by
 enabling a none cipher you break that assumption. All of us know
 better, but less experienced admins may not.

Seems a high price to pay to try to protect idiots from being idiots. 
Anybody who doesn't understand that encryption = none means it's not
encrypted and hence not safe isn't safe as an admin anyway.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-05-03 Thread David Dyer-Bennet

On Sun, May 2, 2010 14:12, Richard Elling wrote:
 On May 1, 2010, at 1:56 PM, Bob Friesenhahn wrote:
 On Fri, 30 Apr 2010, Freddie Cash wrote:
 Without a periodic scrub that touches every single bit of data in the
 pool, how can you be sure
 that 10-year files that haven't been opened in 5 years are still
 intact?

 You don't.  But it seems that having two or three extra copies of the
 data on different disks should instill considerable confidence.  With
 sufficient redundancy, chances are that the computer will explode before
 it loses data due to media corruption.  The calculated time before data
 loss becomes longer than even the pyramids in Egypt could withstand.

 These calculations are based on fixed MTBF.  But disk MTBF decreases with
 age. Most disks are only rated at 3-5 years of expected lifetime. Hence,
 archivists
 use solutions with longer lifetimes (high quality tape = 30 years) and
 plans for
 migrating the data to newer media before the expected media lifetime is
 reached.
 In short, if you don't expect to read your 5-year lifetime rated disk for
 another 5 years,
 then your solution is uhmm... shall we say... in need of improvement.

Are they giving tape that long an estimated life these days?  They
certainly weren't last time I looked.

And I basically don't trust tape; too many bad experiences (ever since I
moved off of DECTape, I've been having bad experiences with tape).  The
drives are terribly expensive and I can't afford redundancy, and in thirty
years I very probably could not buy a new drive for my old tapes.

I started out a big fan of tape, but the economics have been very much
against it in the range I'm working (small; 1.2 terabytes usable on my
server currently).

I don't expect I'll keep my hard disks for 30 years; I expect I'll upgrade
them periodically, probably even within their MTBF.  (Although note that,
though tests haven't been run, the MTBF of a 5-year disk after 4 years is
nearly certainly greater than 1 year.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-05-03 Thread David Dyer-Bennet

On Mon, May 3, 2010 17:02, Richard Elling wrote:
 On May 3, 2010, at 2:38 PM, David Dyer-Bennet wrote:
 On Sun, May 2, 2010 14:12, Richard Elling wrote:
 On May 1, 2010, at 1:56 PM, Bob Friesenhahn wrote:
 On Fri, 30 Apr 2010, Freddie Cash wrote:
 Without a periodic scrub that touches every single bit of data in the
 pool, how can you be sure
 that 10-year files that haven't been opened in 5 years are still
 intact?

 You don't.  But it seems that having two or three extra copies of the
 data on different disks should instill considerable confidence.  With
 sufficient redundancy, chances are that the computer will explode
 before
 it loses data due to media corruption.  The calculated time before
 data
 loss becomes longer than even the pyramids in Egypt could withstand.

 These calculations are based on fixed MTBF.  But disk MTBF decreases
 with
 age. Most disks are only rated at 3-5 years of expected lifetime.
 Hence,
 archivists
 use solutions with longer lifetimes (high quality tape = 30 years) and
 plans for
 migrating the data to newer media before the expected media lifetime is
 reached.
 In short, if you don't expect to read your 5-year lifetime rated disk
 for
 another 5 years,
 then your solution is uhmm... shall we say... in need of improvement.

 Are they giving tape that long an estimated life these days?  They
 certainly weren't last time I looked.

 Yes.
 http://www.oracle.com/us/products/servers-storage/storage/tape-storage/036556.pdf
 http://www.sunstarco.com/PDF%20Files/Quantum%20LTO3.pdf

Yep, they say 30 years.  That's probably in the same years where the MAM
gold archival DVDs are good for 200, I imagine.  (i.e. based on
accelerated testing, with the lab knowing what answer the client wants). 
Although we may know more about tape aging, the accelerated tests may be
more valid for tapes?

But LTO-3 is a 400GB tape that costs, hmmm, maybe $40 each (maybe less
with better shopping, that's a quick Amazon price rounded down).  (I don't
factor in compression in my own analysis because my data is overwhelmingly
image filee and MP3 files, which don't compress further very well.)

Plus a $1000 drive, or $2000 for a 3-tape changer (and that's barely big
enough to back up my small server without manual intervention, might not
be by the end of the  year).

Tape is a LOT more expensive than my current hard-drive based backup
scheme, even if I use the backup drives only three years (and since they
spin less than 10% of the time, they should last pretty well).

Also, I lose my snapshots in a tape backup, whereas I keep them on my hard
drive backups.  (Or else I'm storing a ZFS send stream on tape and hoping
it will actually restore.)

 And I basically don't trust tape; too many bad experiences (ever since I
 moved off of DECTape, I've been having bad experiences with tape).  The
 drives are terribly expensive and I can't afford redundancy, and in
 thirty
 years I very probably could not buy a new drive for my old tapes.

 I started out a big fan of tape, but the economics have been very much
 against it in the range I'm working (small; 1.2 terabytes usable on my
 server currently).

 I don't expect I'll keep my hard disks for 30 years; I expect I'll
 upgrade
 them periodically, probably even within their MTBF.  (Although note
 that,
 though tests haven't been run, the MTBF of a 5-year disk after 4 years
 is
 nearly certainly greater than 1 year.)

 Yes, but MTBF != expected lifetime.  MTBF is defined as Mean Time Between
 Failures (a rate), not Time Until Death (a lifetime).  If your MTBF was 1
 year,
 then the probability of failing within 1 year would be approximately 63%,
 assuming an exponential distribution.

Yeah, sorry, I stumbled into using the same wrong figures lots of people
were.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-30 Thread David Dyer-Bennet

On Thu, April 29, 2010 17:35, Bob Friesenhahn wrote:

 In my opinion periodic scrubs are most useful for pools based on
 mirrors, or raidz1, and much less useful for pools based on raidz2 or
 raidz3.  It is useful to run a scrub at least once on a well-populated
 new pool in order to validate the hardware and OS, but otherwise, the
 scrub is most useful for discovering bit-rot in singly-redundant
 pools.

I've got 10 years of photos on my disk now, and it's growing at faster
than one year per year (since I'm scanning backwards slowly through the
negatives).  Many of them don't get accessed very often; they're archival,
not current use.  Scrub was one of the primary reasons I chose ZFS for the
fileserver they live on -- I want some assurance, 20 years from now, that
they're still valid.  I needed something to check them periodically, and
something to check *against*, and block checksums and scrub seemed to fill
the bill.

So, yes, I want to catch bit rot -- on a pool of mirrored VDEVs.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-30 Thread David Dyer-Bennet

On Fri, April 30, 2010 13:44, Freddie Cash wrote:
 On Fri, Apr 30, 2010 at 11:35 AM, Bob Friesenhahn 
 bfrie...@simple.dallas.tx.us wrote:

 On Thu, 29 Apr 2010, Tonmaus wrote:

  Recommending to not using scrub doesn't even qualify as a workaround,
 in
 my regard.


 As a devoted believer in the power of scrub, I believe that after the
 OS,
 power supplies, and controller have been verified to function with a
 good
 scrubbing, if there is more than one level of redundancy, scrubs are not
 really warranted.  With just one level of redundancy it becomes much
 more
 important to verify that both copies were written to disk correctly.

 Without a periodic scrub that touches every single bit of data in the
 pool,
 how can you be sure that 10-year files that haven't been opened in 5 years
 are still intact?

 Self-healing only comes into play when the file is read.  If you don't
 read
 a file for years, how can you be sure that all copies of that file haven't
 succumbed to bit-rot?

Yes, that's precisely my point.  That's why it's especially relevant to
archival data -- it's important (to me), but not frequently accessed.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-04-28 Thread David Dyer-Bennet

On Wed, April 28, 2010 10:16, Eric D. Mudama wrote:
 On Wed, Apr 28 at  1:34, Tonmaus wrote:
 Zfs scrub needs to access all written data on all
 disks and is usually
 disk-seek or disk I/O bound so it is difficult to
 keep it from hogging
 the disk resources.  A pool based on mirror devices
 will behave much
 more nicely while being scrubbed than one based on
 RAIDz2.

 Experience seconded entirely. I'd like to repeat that I think we
 need more efficient load balancing functions in order to keep
 housekeeping payload manageable. Detrimental side effects of scrub
 should not be a decision point for choosing certain hardware or
 redundancy concepts in my opinion.

 While there may be some possible optimizations, i'm sure everyone
 would love the random performance of mirror vdevs, combined with the
 redundancy of raidz3 and the space of a raidz1.  However, as in all
 systems, there are tradeoffs.

The situations being mentioned are much worse than what seem reasonable
tradeoffs to me.  Maybe that's because my intuition is misleading me about
what's available.  But if the normal workload of a system uses 25% of its
sustained IOPS, and a scrub is run at low priority, I'd like to think
that during a scrub I'd see a little degradation in performance, and that
the scrub would take 25% or so longer than it would on an idle system. 
There's presumably some inefficiency, so the two loads don't just add
perfectly; so maybe another 5% lost to that?  That's the big uncertainty. 
I have a hard time believing in 20% lost to that.

Do you think that's a reasonable outcome to hope for?  Do you think ZFS is
close to meeting it?

People with systems that live at 75% all day are obviously going to have
more problems than people who live at 25%!

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-27 Thread David Dyer-Bennet

On Mon, April 26, 2010 17:21, Edward Ned Harvey wrote:

 Also, if you've got all those disks in an array, and they're MTBF is ...
 let's say 25,000 hours ... then 3 yrs later when they begin to fail, they
 have a tendency to all fail around the same time, which increases the
 probability of exceeding your designed level of redundancy.

It's useful to consider this when doing mid-life upgrades.  Unfortunately
there's not too much useful to be done right now with RAID setups.

With mirrors, when adding some disks mid-life (seems like a common though
by no means universal scenario to not fully populate the chassis at first,
and add more 1/3 to 1/2 way through the projected life), with some extra
trouble one can attach a new disk as a n+1st disk in an existing mirror,
wait for the resilver, and detach an old disk.  That mirror is now one new
disk and one old disk, rather than two disks of the same age.  Then build
a new mirror out of the freed disk plus another new disk.  Now you've got
both mirrors consisting of disks of different ages, less prone to failing
at the same time.  (Of course this doesn't work when you're using bigger
drives for the mid-life kicker, and most of the time it would make sense
to do so.)

Even buying different (mixed) brands initially doesn't help against aging;
only against batch or design problems.

Hey, you know what might be helpful?  Being able to add redundancy to a
raid vdev.  Being able to go from RAIDZ2 to RAIDZ3 by adding another drive
of suitable size.  Also being able to go the other way.  This lets you do
the trick of temporarily adding redundancy to a vdev while swapping out
devices one at a time to eventually upgrade the size (since you're
deliberately creating a fault situation, increasing redundancy before you
do it makes loads of sense!).

 I recently bought 2x 1Tb disks for my sun server, for $650 each.  This was
 enough to make me do the analysis, why am I buying sun branded overpriced
 disks?  Here is the abridged version:

No argument that, in the existing market, with various levels of need,
this is often the right choice.

I find it deeply frustrating and annoying that this dilemma exists
entirely due to bad behavior by the disk companies, though.  First they
sell deliberately-defective drives (lie about cache flush, for example)
and then they (in conspiracy with an accomplice company) charge us many
times the cost of the physical hardware for fixed versions.  This MUST be
stopped.  This is EXACTLY what standards exist for -- so we can buy
known-quantity products in a competitive market.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-27 Thread David Dyer-Bennet

On Tue, April 27, 2010 10:38, Bob Friesenhahn wrote:
 On Tue, 27 Apr 2010, David Dyer-Bennet wrote:

 Hey, you know what might be helpful?  Being able to add redundancy to a
 raid vdev.  Being able to go from RAIDZ2 to RAIDZ3 by adding another
 drive
 of suitable size.  Also being able to go the other way.  This lets you
 do
 the trick of temporarily adding redundancy to a vdev while swapping out
 devices one at a time to eventually upgrade the size (since you're
 deliberately creating a fault situation, increasing redundancy before
 you
 do it makes loads of sense!).

 You can already replace one drive with another (zpool replace) so as
 long as there is space for the new drive, it is not necessary to
 degrade the array and lose redundancy while replacing a device.  As
 long as you can physically add a drive to the system (even
 temporarily) it is not necessary to deliberately create a fault
 situation.

I don't think I understand your scenario here.  The docs online at
http://docs.sun.com/app/docs/doc/819-5461/gazgd?a=view describe uses of
zpool replace that DO run the array degraded for a while, and don't seem
to mention any other.

Could you be more detailed?
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-27 Thread David Dyer-Bennet

On Tue, April 27, 2010 11:17, Bob Friesenhahn wrote:
 On Tue, 27 Apr 2010, David Dyer-Bennet wrote:

 I don't think I understand your scenario here.  The docs online at
 http://docs.sun.com/app/docs/doc/819-5461/gazgd?a=view describe uses
 of
 zpool replace that DO run the array degraded for a while, and don't seem
 to mention any other.

 Could you be more detailed?

 If a disk has failed, then it makes sense to physically remove the old
 disk, insert a new one, and do 'zpool replace tank c1t1d0'.  However
 if the disk has not failed, then you can install a new disk in another
 location and use the two argument form of replace like 'zpool replace
 tank c1t1d0 c1t1d7'.  If I understand things correctly, this allows
 you to replace one good disk with another without risking the data in
 your pool.

I don't see any reason to think the old device remains in use until the
new device is resilvered, and if it doesn't, then you're down one level of
redundancy the instant the old device goes out of service.

I don't have a RAIDZ group, but trying this while there's significant load
on the group, it should be easy to see if there's traffic on the old drive
after the resilver starts.  If there is, that would seem to be evidence
that it's continuing to use the old drive while resilvering to the new
one, which would be good.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?

2010-04-14 Thread David Dyer-Bennet

On Wed, April 14, 2010 08:52, Tonmaus wrote:
 safe to say: 2009.06 (b111) is unusable for the purpose, ans CIFS is dead
 in this build.

That's strange; I run it every day (my home Windows My Documents folder
and all my photos are on 2009.06).


-bash-3.2$ cat /etc/release
 OpenSolaris 2009.06 snv_111b X86
   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
  Assembled 07 May 2009


 I am using B133, but I am not sure if this is best choice. I'd like to
 hear from others as well.

Well, it's technically not a stable build.

I'm holding off to see what 2010.$Spring ends up being; I'll convert to
that unless it turns into a disaster.

Is it possible to switch to b132 now, for example?  I don't think the old
builds are available after the next one comes out; I haven't been able to
find them.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions about current ZFS setup

2010-04-14 Thread David Dyer-Bennet

On Tue, April 13, 2010 09:48, Christian Molson wrote:


 Now I would like to add my 4 x 2TB drives, I get a warning message saying
 that: Pool uses 5-way raidz and new vdev uses 4-way raidz  Do you think
 it would be safe to use the -f switch here?

Yes.  4-way on the bigger drive is *more* redundancy (25%, rather than
20%) (though not necessarily safer, since the bigger drive increases
recovery time) than 5-way on the smaller drive.  I'd describe these as
vaguely the same level of redundancy, and hence not especially
inappropriate to put in the same pool.  Putting a single disk into a pool
that's otherwise RAIDZ would be a bad idea, obviously, and that's what
that message is particularly to warn you about I believe.

However, I have some doubts about using 2TB drives with single redundancy
in general.  It takes a LONG time to resilver a drive that big, and during
the resilver you have no redundancy and are hence subject to data loss if
one of the remaining drives also fails.  And resilvering puts extra stress
on the IO system and drives, so probably the risk of failure is increased.
(If your backups are good enough, you may plan to cover the possibility of
that second failure by restoring from backups.  That works, if they're
really good enough; it just takes more work and time.)

24 hot-swap bays in your home chassis?  Now that does sound pretty
extreme.  I felt like my 8-bay chassis is a bit excessive for home; and it
only has 6 bays populated with data-disks, and they're just 400GB.  And I
store a lot of RAW files from DSLRs on it it, I feel like I use quite a
bit of space (until I see somebody come along casually talking about
vaguely 10 times more space).   How DO you deal with backup at that data
size?  I can back up to a single external USB disk (I have 3 I rotate),
and a full backup completes overnight.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions about current ZFS setup

2010-04-14 Thread David Dyer-Bennet

On Tue, April 13, 2010 10:38, Bob Friesenhahn wrote:
 On Tue, 13 Apr 2010, Christian Molson wrote:

 Now I would like to add my 4 x 2TB drives, I get a warning message
 saying that: Pool uses 5-way raidz and new vdev uses 4-way raidz
 Do you think it would be safe to use the -f switch here?

 It should be safe but chances are that your new 2TB disks are
 considerably slower than the 1TB disks you already have.  This should
 be as much cause for concern (or more so) than the difference in raidz
 topology.

Not necessarily for a home server.  While mine so far is all mirrored
pairs of 400GB disks, I don't even think about performance issues, I
never come anywhere near the limits of the hardware.

Your suggestion (snipped) that he test performance on the new drives to
see how they differ is certainly good if he needs to worry about
performance.  Testing actual performance in your own exact hardware is
always smart.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions about current ZFS setup

2010-04-14 Thread David Dyer-Bennet

On Wed, April 14, 2010 12:06, Bob Friesenhahn wrote:
 On Wed, 14 Apr 2010, David Dyer-Bennet wrote:
 It should be safe but chances are that your new 2TB disks are
 considerably slower than the 1TB disks you already have.  This should
 be as much cause for concern (or more so) than the difference in raidz
 topology.

 Not necessarily for a home server.  While mine so far is all mirrored
 pairs of 400GB disks, I don't even think about performance issues, I
 never come anywhere near the limits of the hardware.

 I don't see how the location of the server has any bearing on required
 performance.  If these 2TB drives are the new 4K sector variety, even
 you might notice.

The location does not, directly, of course; but the amount and type of
work being supported does, and most home servers see request streams very
different from commercial servers.

The last server software I worked on was able to support 80,000
simultaneous HD video streams.  Coming off Thumpers, in fact (well, coming
out of a truly obscene amount of DRAM buffer on the streaming board, which
was in turn loaded from Thumpers); this was the thing that Thumper was
originally designed for, known when I worked there as the Sun Streaming
System I believe.  You don't see loads like that on home servers :-).  And
a big database server would have an equally extreme but totally different
access pattern.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?

2010-04-14 Thread David Dyer-Bennet

On Wed, April 14, 2010 11:51, Tonmaus wrote:

 On Wed, April 14, 2010 08:52, Tonmaus wrote:
  safe to say: 2009.06 (b111) is unusable for the
 purpose, ans CIFS is dead
  in this build.

 That's strange; I run it every day (my home Windows
 My Documents folder
 and all my photos are on 2009.06).


 -bash-3.2$ cat /etc/release
 OpenSolaris 2009.06 snv_111b
  X86
 Copyright 2009 Sun Microsystems, Inc.  All
 Rights Reserved.
 Use is subject to license
  terms.
  Assembled 07 May 2009


 I would be really interested how you got past this
 http://defect.opensolaris.org/bz/show_bug.cgi?id=11371
 which I was so badly bitten by that I considered giving up on OpenSolaris.


I don't get random hangs in normal use; so I haven't done anything to get
past this.

I DO get hangs when funny stuff goes on, which may well be related to that
problem (at least they require a reboot).  Hmmm; I get hangs sometimes
when trying to send a full replication stream to an external backup drive,
and I have to reboot to recover from them.  I can live with this, in the
short term.  But now I'm feeling hopeful that they're fixed in what I'm
likely to be upgrading to next.

  not sure if this is best choice. I'd like to
  hear from others as well.
 Well, it's technically not a stable build.

 I'm holding off to see what 2010.$Spring ends up
 being; I'll convert to
 that unless it turns into a disaster.

 Is it possible to switch to b132 now, for example?  I
 don't think the old
 builds are available after the next one comes out; I
 haven't been able to
 find them.

 There are methods to upgrade to any dev build by pkg. Can't tell you from
 the top of my head, but I have done it with success.

 I wouldn't know why to go to 132 instead of 133, though. 129 seems to be
 an option.

Because 132 was the most current last time I paid much attention :-).  As
I say, I'm currently holding out for 2010.$Spring, but knowing how to get
to a particular build via package would be potentially interesting for the
future still.  Having been told it's possible helps, makes it worth
looking harder.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggestions about current ZFS setup

2010-04-14 Thread David Dyer-Bennet

On Wed, April 14, 2010 12:29, Bob Friesenhahn wrote:
 On Wed, 14 Apr 2010, David Dyer-Bennet wrote:

 Not necessarily for a home server.  While mine so far is all mirrored
 pairs of 400GB disks, I don't even think about performance issues, I
 never come anywhere near the limits of the hardware.

 I don't see how the location of the server has any bearing on required
 performance.  If these 2TB drives are the new 4K sector variety, even
 you might notice.

 The location does not, directly, of course; but the amount and type of
 work being supported does, and most home servers see request streams
 very
 different from commercial servers.

 If it was not clear, the performance concern is primarily for writes
 since zfs will load-share the writes across the available vdevs using
 an algorithm which also considers the write queue/backlog for each
 vdev.  If a vdev is slow, then it may be filled more slowly than the
 other vdevs.  This is also the reason why zfs encourages that all
 vdevs use the same organization.

As I said, I don't think of performance issues on mine.  So I wasn't
thinking of that particular detail, and it's good to call it out
explicitly.  If the performance of the new drives isn't adequate, then the
performance of the entire pool will become inadequate, it looks like.

I expect it's routine to have disks of different generations in the same
pool at this point (and if it isn't now, it will be in 5 years), just due
to what's available, replacing bad drives, and so forth.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?

2010-04-14 Thread David Dyer-Bennet

On Wed, April 14, 2010 15:28, Miles Nordin wrote:
 dd == David Dyer-Bennet d...@dd-b.net writes:

 dd Is it possible to switch to b132 now, for example?

 yeah, this is not so bad.  I know of two approaches:

Thanks, I've filed and flagged this for reference.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Which build is the most stable, mainly for NAS (zfs)?

2010-04-14 Thread David Dyer-Bennet

On 14-Apr-10 22:44, Ian Collins wrote:

On 04/15/10 06:16 AM, David Dyer-Bennet wrote:

Because 132 was the most current last time I paid much attention :-). As
I say, I'm currently holding out for 2010.$Spring, but knowing how to get
to a particular build via package would be potentially interesting for
the
future still.


I hope it's 2010.$Autumn, I don't fancy waiting until October.

Hint: the southern hemisphere does exist!


I've even been there.

But the month/season relationship is too deeply built into too many 
things I follow (like the Christmas books come out of the publisher's 
fall list; for that matter, like that Christmas is in the winter) to go 
away at all easily.


California doesn't have seasons anyway.

--
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send hangs

2010-04-09 Thread David Dyer-Bennet

On Fri, April 9, 2010 13:20, Daniel Bakken wrote:
 My zfs filesystem hangs when transferring large filesystems (500GB)
 with a couple dozen snapshots between servers using zfs send/receive
 with netcat. The transfer hangs about halfway through and is
 unkillable, freezing all IO to the filesystem, requiring a hard
 reboot. I have attempted this three times and failed every time.

 On the destination server I use:
 nc -l -p 8023 | zfs receive -vd sas

 On the source server I use:
 zfs send -vR promise1/rbac...@daily.1 | nc mothra 8023

I have problems using incremental replication streams that sound similar
(hands, IO system disruption).  I'm on build 111b, that is, 2009.06. I'm
hoping things will clear up when 2010.$Spring comes out, which should be
soon.  Your data point is not helping my confidence there, though!

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about backup and mirrored pools

2010-04-09 Thread David Dyer-Bennet

On Fri, April 9, 2010 14:38, Harry Putnam wrote:

 I happened to notice someones' config posted here recently where a
 single zpool was made up of several mirror sets.

From: Andreas Höschler ahoe...@smartsoft.de
Subject: Replacing disk in zfs pool
Newsgroups: gmane.os.solaris.opensolaris.zfs
To: zfs-discuss@opensolaris.org
Date: Fri, 9 Apr 2010 10:58:16 +0200
Message-ID: 099e714d-43b6-11df-83fb-000393ca0...@smartsoft.de

 I hadn't even thought of such a setup, but wonder now if that would
 have been a better way to go.

Probably; unless you need different performance out of the two, or something.

 My needs are small, and the zfs server acts mostly as NAS for home
 lan.

That's the job mine does; keeping all those photos, and a little music.

 I've been thinking the mirrors on the zfs server were the final
 stopping place for my backups.  I'm thinking the mirrors are reliable
 enough that I don't do even more backups of the backup zpools.

 I mean other than auto snapshots.

 I'm thinking a crippled mirror can be recovered rather than needing a
 backup of it. And that short of 2 mirrored disks dieing at the same
 time.  I'm in pretty good shape.

 Am I way wrong on this, and further I'm curious if it would make more
 versatile use of the space if I were to put the mirrored pairs into
 one big pool containing 3 mirrored pairs (6 discs)

Well, my own thinking doesn't consider that adequate for my own data;
which is not identical to thinking you're actually wrong, of course.

Issues I see include:  Flood, fire, foes, bugs, user error.  rm -rf /
will destroy your data just as well on the mirror as on a single disk, as
will hacker breakins.  OS and driver bugs can corrupt both sides of the
mirror.  And burning your house down, or flooding it perhaps (depending on
where your server is; mine's in the basement, so if we flood, it gets
wet), will destroy your data.

I make and keep off-site backups, formerly on optical media, moving
towards external disk drives.

 So where they had been separate pools, where one might fill up while
 another stayed fairly empty, if they were all in a single pool none
 would fill up until they all filled up.

Yes, that's the advantage.  I'm running three mirror vdevs in one data pool.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send and ARC

2010-03-26 Thread David Dyer-Bennet

On Fri, March 26, 2010 07:06, Edward Ned Harvey wrote:
 In the Thoughts on ZFS Pool Backup Strategies thread it was stated
 that zfs send, sends uncompress data and uses the ARC.

 If zfs send sends uncompress data which has already been compress
 this is not very efficient, and it would be *nice* to see it send the
 original compress data. (or an option to do it)

 You've got 2 questions in your post.  The one above first ...

 It's true that zfs send sends uncompressed data.  So I've heard.  I
 haven't tested it personally.

 I seem to remember there's some work to improve this, but not available
 yet.  Because it was easier to implement the uncompressed send, and that
 already is super-fast compared to all the alternatives.

I don't know that it makes sense to.  There are lots of existing filter
packages that do compression; so if you want compression, just put them in
your pipeline.  That way you're not limited by what zfs send has
implemented, either.  When they implement bzip98 with a new compression
technology breakthrough, you can just use it :-) .

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   4   >