Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-16 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bill Sommerfeld
 
  But simply creating the snapshot on the sending side should be no
 problem.
 
 By default, zvols have reservations equal to their size (so that writes
 don't fail due to the pool being out of space).
 
 Creating a snapshot in the presence of a reservation requires reserving
 enough space to overwrite every block on the device.

This is surprising, because it's not like the normal zfs behavior.  Normal 
ZFS does not reserve snapshot space to guarantee you can always completely 
overwrite every single used block of every single file in the system.  It just 
starts consuming space for changed blocks, and if you fill up the zpool, 
further writes are denied until you delete some snaps.

But you're saying it handles zvols differently - That when you create a zvol, 
it reserves enough space for it, and when you snapshot it, it reserves enough 
space to completely overwrite it and keep both the snapshot and the current 
live version without running out of storage space.  I never heard that before - 
and I can see some good reasons to do it this way - But it's surprising.

Based on what I'm hearing now, it also seems - Upon zvol creation, you create a 
reservation.  Upon the first snapshot, you double the reservation, but upon 
subsequent snapshots, you don't need to increase the reservation each time.  
Because snapshots are read-only, the system is able to account for all the used 
space of the snapshots, plus a reservation for the live current version.  
Total space reserved will be 2x the size of the zvol, plus the actual COW 
consumed space for all the snapshots.

The point is to guarantee that writes to a zvol will never be denied, 
presumably because there's an assumption zvol's are being used by things like 
VM's and iscsi shares, which behave very poorly if write is denied.  Unlike 
normal files, where write denied is generally an annoyance but doesn't cause 
deeper harm, such as virtual servers crashing.

There's another lesson to be learned here.

As mentioned by Matthew, you can tweak your reservation (or refreservation) on 
the zvol, but you do so at your own risk, possibly putting yourself into a 
situation where writes to the zvol might get denied.

But the important implied meaning is the converse - If you have guest VM's in 
the filesystem (for example, if you're sharing NFS to ESX, or if you're running 
VirtualBox) then you might want to set the reservation (or refreservation) for 
those filesystems modeled after the zvol  behavior.  In other words, you might 
want to guarantee that ESX or VirtualBox can always write.  It's probably a 
smart thing to do, in a lot of situations.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-16 Thread Fajar A. Nugraha
On Sun, Sep 16, 2012 at 7:43 PM, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris)
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
 There's another lesson to be learned here.

 As mentioned by Matthew, you can tweak your reservation (or refreservation) 
 on the zvol, but you do so at your own risk, possibly putting yourself into a 
 situation where writes to the zvol might get denied.

 But the important implied meaning is the converse - If you have guest VM's in 
 the filesystem (for example, if you're sharing NFS to ESX, or if you're 
 running VirtualBox) then you might want to set the reservation (or 
 refreservation) for those filesystems modeled after the zvol  behavior.  In 
 other words, you might want to guarantee that ESX or VirtualBox can always 
 write.  It's probably a smart thing to do, in a lot of situations.

I'd say just do what you normally do.

In my case, I use sparse files or dynamic disk images anyway, so when
I use zvols I use zfs create -s. That single switch sets reservation
and refreservation to none,

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-16 Thread Richard Elling
On Sep 15, 2012, at 6:03 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us 
wrote:

 On Sat, 15 Sep 2012, Dave Pooser wrote:
 
  The problem: so far the send/recv appears to have copied 6.25TB of 
 5.34TB.
 That... doesn't look right. (Comparing zfs list -t snapshot and looking at
 the 5.34 ref for the snapshot vs zfs list on the new system and looking at
 space used.)
 Is this a problem? Should I be panicking yet?
 
 Does the old pool use 512 byte sectors while the new pool uses 4K sectors?  
 Is there any change to compression settings?
 
 With volblocksize of 8k on disks with 4K sectors one might expect very poor 
 space utilization because metadata chunks will use/waste a minimum of 4k.  
 There might be more space consumed by the metadata than the actual data.

With a zvol of 8K blocksize, 4K sector disks, and raidz you will get 12K (data
plus parity) written for every block, regardless of how many disks are in the 
set.
There will also be some metadata overhead, but I don't know of a metadata
sizing formula for the general case.

So the bad news is, 4K sector disks with small blocksize zvols tend to
have space utilization more like mirroring. The good news is that performance
is also more like mirroring.
 -- richard

--
illumos Day  ZFS Day, Oct 1-2, 2012 San Fransisco 
www.zfsday.com
richard.ell...@richardelling.com
+1-760-896-4422








___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-16 Thread Dave Pooser
On 9/16/12 10:40 AM, Richard Elling richard.ell...@gmail.com wrote:

With a zvol of 8K blocksize, 4K sector disks, and raidz you will get 12K
(data
plus parity) written for every block, regardless of how many disks are in
the set.
There will also be some metadata overhead, but I don't know of a metadata
sizing formula for the general case.

So the bad news is, 4K sector disks with small blocksize zvols tend to
have space utilization more like mirroring. The good news is that
performance
is also more like mirroring.
 -- richard

Ok, that makes sense. And since there's no way to change the blocksize of
a zvol after creation (AFAIK) I can either live with the size, find 3TB
drives with 512byte sectors (I think Seagate Constellations would work)
and do yet another send/receive, or create a new zvol with a larger
blocksize and copy the files from one zvol to the other. (Leaning toward
option 3 because the files are mostly largish graphics files and the like.)

Thanks for the help!
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Bill Sommerfeld
On 09/14/12 22:39, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Dave Pooser

Unfortunately I did not realize that zvols require disk space sufficient
to duplicate the zvol, and my zpool wasn't big enough. After a false start
(zpool add is dangerous when low on sleep) I added a 250GB mirror and a
pair of 3GB mirrors to miniraid and was able to successfully snapshot the
zvol: miniraid/RichRAID@exportable


This doesn't make any sense to me.  The snapshot should not take up any 
(significant) space on the sending side.  It's only on the receiving side, 
trying to receive a snapshot, that you require space.  Because it won't clobber 
the existing zvol on the receiving side until the complete new zvol was 
received to clobber it with.

But simply creating the snapshot on the sending side should be no problem.


By default, zvols have reservations equal to their size (so that writes 
don't fail due to the pool being out of space).


Creating a snapshot in the presence of a reservation requires reserving 
enough space to overwrite every block on the device.


You can remove or shrink the reservation if you know that the entire 
device won't be overwritten.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Dave Pooser
 The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
 That... doesn't look right. (Comparing zfs list -t snapshot and looking at
 the 5.34 ref for the snapshot vs zfs list on the new system and looking at
 space used.)
 
 Is this a problem? Should I be panicking yet?

Well, the zfs send/receive finally finished, at a size of 9.56TB (apologies
for the HTML, it was the only way I could make the columns readable):

root@archive:/home/admin# zfs get all archive1/RichRAID
NAMEPROPERTY  VALUE  SOURCE
archive1/RichRAID   type  volume -
archive1/RichRAID   creation  Fri Sep 14  4:17 2012  -
archive1/RichRAID   used  9.56T  -
archive1/RichRAID   available 1.10T  -
archive1/RichRAID   referenced9.56T  -
archive1/RichRAID   compressratio 1.00x  -
archive1/RichRAID   reservation   none   default
archive1/RichRAID   volsize   5.08T  local
archive1/RichRAID   volblocksize  8K -
archive1/RichRAID   checksum  on default
archive1/RichRAID   compression   offdefault
archive1/RichRAID   readonly  offdefault
archive1/RichRAID   copies1  default
archive1/RichRAID   refreservationnone   default
archive1/RichRAID   primarycache  alldefault
archive1/RichRAID   secondarycachealldefault
archive1/RichRAID   usedbysnapshots   0  -
archive1/RichRAID   usedbydataset 9.56T  -
archive1/RichRAID   usedbychildren0  -
archive1/RichRAID   usedbyrefreservation  0  -
archive1/RichRAID   logbias   latencydefault
archive1/RichRAID   dedup offdefault
archive1/RichRAID   mlslabel  none   default
archive1/RichRAID   sync  standard   default
archive1/RichRAID   refcompressratio  1.00x  -
archive1/RichRAID   written   9.56T  -

So used is 9.56TB, volsize is 5.08TB (which is the amount of data used on
the volume). The Mac connected to the FC target sees a 5.6TB volume with
5.1TB used, so that makes sense-- but where did the other 4TB go?

(I'm about at the point where I'm just going to create and export another
volume on a second zpool and then let the Mac copy from one zvol to the
other-- this is starting to feel like voodoo here.)
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Bob Friesenhahn

On Sat, 15 Sep 2012, Dave Pooser wrote:


  The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
That... doesn't look right. (Comparing zfs list -t snapshot and looking at
the 5.34 ref for the snapshot vs zfs list on the new system and looking at
space used.)

Is this a problem? Should I be panicking yet?


Does the old pool use 512 byte sectors while the new pool uses 4K 
sectors?  Is there any change to compression settings?


With volblocksize of 8k on disks with 4K sectors one might expect very 
poor space utilization because metadata chunks will use/waste a 
minimum of 4k.  There might be more space consumed by the metadata 
than the actual data.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Matthew Ahrens
On Fri, Sep 14, 2012 at 11:07 PM, Bill Sommerfeld sommerf...@hamachi.orgwrote:

 On 09/14/12 22:39, Edward Ned Harvey 
 (**opensolarisisdeadlongliveopens**olaris)
 wrote:

 From: 
 zfs-discuss-bounces@**opensolaris.orgzfs-discuss-boun...@opensolaris.org[mailto:
 zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Dave Pooser

 Unfortunately I did not realize that zvols require disk space sufficient
 to duplicate the zvol, and my zpool wasn't big enough. After a false
 start
 (zpool add is dangerous when low on sleep) I added a 250GB mirror and a
 pair of 3GB mirrors to miniraid and was able to successfully snapshot the
 zvol: miniraid/RichRAID@exportable


 This doesn't make any sense to me.  The snapshot should not take up any
 (significant) space on the sending side.  It's only on the receiving side,
 trying to receive a snapshot, that you require space.  Because it won't
 clobber the existing zvol on the receiving side until the complete new zvol
 was received to clobber it with.

 But simply creating the snapshot on the sending side should be no problem.


 By default, zvols have reservations equal to their size (so that writes
 don't fail due to the pool being out of space).

 Creating a snapshot in the presence of a reservation requires reserving
 enough space to overwrite every block on the device.

 You can remove or shrink the reservation if you know that the entire
 device won't be overwritten.


This is the right idea, but it's actually the refreservation (reservation
on referenced space) that has this behavior, and is set by default on
zvols.  The reservation (on used space) covers the space consumed by
snapshots, so taking a snapshot doesn't affect it (at first, but the
reservation will be consumed as you overwrite space and the snapshot
grows).

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Matthew Ahrens
On Sat, Sep 15, 2012 at 2:07 PM, Dave Pooser dave@alfordmedia.comwrote:

 The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
 That... doesn't look right. (Comparing zfs list -t snapshot and looking at
 the 5.34 ref for the snapshot vs zfs list on the new system and looking at
 space used.)

 Is this a problem? Should I be panicking yet?


 Well, the zfs send/receive finally finished, at a size of 9.56TB
 (apologies for the HTML, it was the only way I could make the columns
 readable):

 root@archive:/home/admin# zfs get all archive1/RichRAID
 NAMEPROPERTY  VALUE  SOURCE
 archive1/RichRAID   type  volume -
 archive1/RichRAID   creation  Fri Sep 14  4:17 2012  -
 archive1/RichRAID   used  9.56T  -
 archive1/RichRAID   available 1.10T  -
 archive1/RichRAID   referenced9.56T  -
 archive1/RichRAID   compressratio 1.00x  -
 archive1/RichRAID   reservation   none   default
 archive1/RichRAID   volsize   5.08T  local
 archive1/RichRAID   volblocksize  8K -
 archive1/RichRAID   checksum  on default
 archive1/RichRAID   compression   offdefault
 archive1/RichRAID   readonly  offdefault
 archive1/RichRAID   copies1  default
 archive1/RichRAID   refreservationnone   default
 archive1/RichRAID   primarycache  alldefault
 archive1/RichRAID   secondarycachealldefault
 archive1/RichRAID   usedbysnapshots   0  -
 archive1/RichRAID   usedbydataset 9.56T  -
 archive1/RichRAID   usedbychildren0  -
 archive1/RichRAID   usedbyrefreservation  0  -
 archive1/RichRAID   logbias   latencydefault
 archive1/RichRAID   dedup offdefault
 archive1/RichRAID   mlslabel  none   default
 archive1/RichRAID   sync  standard   default
 archive1/RichRAID   refcompressratio  1.00x  -
 archive1/RichRAID   written   9.56T  -

 So used is 9.56TB, volsize is 5.08TB (which is the amount of data used on
 the volume). The Mac connected to the FC target sees a 5.6TB volume with
 5.1TB used, so that makes sense-- but where did the other 4TB go?


I'm not sure.  The output of zdb -bbb archive1 might help diagnose it.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-14 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Dave Pooser
 
 Unfortunately I did not realize that zvols require disk space sufficient
 to duplicate the zvol, and my zpool wasn't big enough. After a false start
 (zpool add is dangerous when low on sleep) I added a 250GB mirror and a
 pair of 3GB mirrors to miniraid and was able to successfully snapshot the
 zvol: miniraid/RichRAID@exportable 

This doesn't make any sense to me.  The snapshot should not take up any 
(significant) space on the sending side.  It's only on the receiving side, 
trying to receive a snapshot, that you require space.  Because it won't clobber 
the existing zvol on the receiving side until the complete new zvol was 
received to clobber it with.

But simply creating the snapshot on the sending side should be no problem.


 The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
 That... doesn't look right. 

I don't know why that happens, but sometimes it happens.  So far, I've always 
waited it out, and so far it's always succeeded for me.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-14 Thread Ian Collins

On 09/15/12 04:46 PM, Dave Pooser wrote:

I need a bit of a sanity check here.

1) I have a a RAIDZ2 of 8 1TB drives, so 6TB usable, running on an ancient
version of OpenSolaris (snv_134 I think). On that zpool (miniraid) I have
a zvol (RichRAID) that's using almost the whole FS. It's shared out via
COMSTAR Fibre Channel target mode. I'd like to move that zvol to a newer
server with a larger zpool. Sounds like a job for ZFS send/receive, right?

2) Since ZFS send/receive is snapshot-based I need to create a snapshot.
Unfortunately I did not realize that zvols require disk space sufficient
to duplicate the zvol, and my zpool wasn't big enough.


To do what?

A snapshot only starts to consume space when data in the 
filesystem/volume changes.



After a false start
(zpool add is dangerous when low on sleep) I added a 250GB mirror and a
pair of 3GB mirrors to miniraid and was able to successfully snapshot the
zvol: miniraid/RichRAID@exportable (I ended up booting off an OI 151a5 USB
stick to make that work, since I don't believe snv_134 could handle a 3TB
disk).

3) Now it's easy, right? I enabled root login via SSH on the new host,
which is running a zpool archive1 consisting of a single RAIDZ2 of 3TB
drives using ashift=12, and did a ZFS send:
ZFS send miniraid/RichRAID@exportable | ssh root@newhost zfs receive
archive1/RichRAID

It asked for the root password, I gave it that password, and it was off
and running. GigE ain't super fast, but I've got time.

The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
That... doesn't look right. (Comparing zfs list -t snapshot and looking at
the 5.34 ref for the snapshot vs zfs list on the new system and looking at
space used.)

Is this a problem? Should I be panicking yet?


No.

Do you have compression on on one side but no the other?  Either way, 
let things run to completion.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss