Re: [zfs-macos] pros/cons of multiple zfs filesystems

2014-03-17 Thread Dave Cottlehuber
On 17. März 2014 at 05:00:25, roemer (uwe.ro...@gmail.com) wrote:
 Thanks for the detailed example!
  
 On Monday, 17 March 2014 07:34:45 UTC+11, dch wrote:
 
  I've been a happy maczfs and also zfsosx user for several years now.
  [...]
  zfs send is a very easy way to do a very trustable
  backup, once you get past the first potentially large transfers.
 
  Can this happen bi-directiona? Or is it only applicable for creating
 'read-only' replicas of a master filesystem onto some clients?
 I mean, what happens once you cloned one file system, sent it to your
 laptop, then edit on both the laptop and your ZFS server?

Then you’re screwed :-). It’s not duplicity or some other low-level sync
tool. I find it works best when you have a known master that you’re working
off.

Slightly OT, but in FreeBSD with HAST you can do some gonzo crazy stuff:
 http://www.aisecure.net/2012/02/07/hast-freebsd-zfs-with-carp-failover/

  All my source code  work lives in a zfs case sensitive noatime
  copies=2 filesystem, and I replicate that regularly to my other boxes
  as required.
 
  How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even
 RAIDZ2) pool?
 RAIDZ would have all data stored redundantly already, so would 'copies=2'
 not end up in quadrupling the storage requirement if used on a raidz pool?

Yes, but in this case, the laptop isn’t redundant, and my data is precious.
IIRC the whole repos dataset, even with history, is  40 Gb, so that’s
reasonable IMO.

  For most customer projects I will have 3 or more VMs running different
  configs or operating systems under VMWare Fusion. These each live in
  their own zfs filesystem, compressed lz4 noatime case sensitive. I
  snapshot these after creation using vagrant install, again after
  config, and the changes are replicated using zfs snapshots again to
  the other OSX system, and also to the remote FreeBSD box.
 
  I can see that zfs is really good for handling multiple virtual machines.

Yup, zfs rollback for testing deployments or upgrades is simply bliss.

 In summary, I'm more than happy with the performance once I used
  ashift=12 and moved past 8GB ram. Datasets once you get used to them
  are extraordinarily useful -- snapshot your config just before a
  critical upgrade.
 
  I start seeing the potential in snapshots. In fact, I just realised that I
 do manual
 'snapshots' on some of my repeating projects already for quite some time
 with annual
 clones of the previous directory structure. So ZFS snapshots would be a
 natural fit here.

 But regarding the memory consumption:
 What makes ZFS so memory hungry in your case?

I don’t think it’s very hungry actually. 4GB (under the old MacZFS 74.1)
simply wasn’t enough and I’d get crashes. With 8GB that went away. Bearing
in mind with 16GB RAM I can run a web browser (oink at least 1GB), a 20GB VM
that’s been compressed into a 10GB RAMdisk, +1 GB RAM for the VM, that seems
pretty reasonable. That would leave 4GB for ZFS and the normal OSX baseline
stuff roughly.

I’m happy to report back with RAM usage if somebody tells me what z* 
incantation is needed.

 Do you use deduplication?

Never. But I do use cloned datasets a fair bit, which probably helps the
 situation a bit.

The 2nd law of ZFS is not to use deduplication, even if you think you need it.
IIRC the rough numbers are 1GB RAM / TB storage, and I’d want ECC RAM for that.

BTW pretty sure the 1st law of ZFS is not to trust USB devices with your data.

--  
Dave Cottlehuber
Sent from my PDP11



-- 

--- 
You received this message because you are subscribed to the Google Groups 
zfs-macos group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [zfs-macos] pros/cons of multiple zfs filesystems

2014-03-17 Thread Philip Robar
On Mon, Mar 17, 2014 at 3:35 AM, Dave Cottlehuber d...@jsonified.com wrote:

 On 17. März 2014 at 05:00:25, roemer (uwe.ro...@gmail.com) wrote:

   How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even
   RAIDZ2) pool? RAIDZ would have all data stored redundantly already, so

  would 'copies=2' not end up in quadrupling the storage requirement if
 used
   on a raidz pool?


 Yes


No, RAIDZ does not store your data redundantly. It splits your data across
multiple drives and uses space equivalent to one drive to store parity
information about the data so that it can be mathematically made whole if
one drive goes missing. RAIDZ2 or RAIDZ3 just raise the level of parity,
i.e. the number of disk failures that can happen before data is lost, to
two or three respectively.

So the amount of space lost to parity is a constant of disk size x RAID
level. Thus, if you're using copies, the amount of space lost is just
dataset size / copies. One of the nice things about using copies as opposed
to mirroring is that you can set it on a per file system (e.g. dataset) as
opposed to mirroring which affects the entire vdev.

On the other hand, if you're using mirroring, then yes turning on copies=2
does cut your storage space to pool size / 4. (Assuming all datasets in the
pool have this set.)

RAIDZ vs mirroring vs copies all comes down to trading off performance vs
Reliability, Availability and Serviceability vs space. There are formulas
for figuring all of this out. Start at Serve the Home's Raid Reliablitity
calculatorhttp://www.servethehome.com/raid-calculator/raid-reliability-calculator-simple-mttdl-model/*
which
takes into account everything, but increasing file redundancy. For that
there's this article: ZFS, Copies, and Data
Protectionhttps://blogs.oracle.com/relling/entry/zfs_copies_and_data_protection.
And for RAIDZ vs Mirroring performance see When To (And Not To) Use
RAID-Zhttps://blogs.oracle.com/roch/entry/when_to_and_not_to
.


Phil

* Note that the Mean Time to Data Loss calculated at this site, while being
an industry standard, is essentially useless other than for getting a
relative comparison of different configurations. For details see: Mean time
to meaningless: MTTDL, Markov models, and storage system
reliabilityhttps://www.usenix.org/legacy/event/hotstorage10/tech/full_papers/Greenan.pdf
.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
zfs-macos group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [zfs-macos] pros/cons of multiple zfs filesystems

2014-03-16 Thread roemer
Thanks for the response, Björn.
The hint regarding dataset-specific snapshots is good, though I have to 
first think about how I would best make use of them.

However another point that you raised is interesting:

On Sunday, 16 March 2014 10:34:52 UTC+11, Bjoern Kahl wrote:

 [...]
  Under Mac OSX, a mounted file system comes at higher costs than on 
  other Unix like operating systems, due to the Finder and MDS services, 
  so I would not suggest to really try to have hundreds of file systems 
  mounted at the same time.  But any reasonable number (some 10) go 
  without noticeable performance impact. 


I would need about 10 separate mount points / data sets, so I guess this 
would be fine.
MDS services however means Spotlight, but the MacZFS Wiki as well as 
several other posts on the web give the advice to switch off spotlight for 
ZFS with
mdutil -i off mountPoint

Why is Spotlight thought to be evil for ZFS? 
Or does your comment imply that these advices are outdated, and 
mds-indexing for ZFS mount points is ok nowadays?
Note that I am mainly aiming to store static 'archival' data and documents 
on ZFS, not my main user directory.
 

 [...] Snapshots can also easily be used for real 
  off-site backups by the zfs send / receive mechanism. 

 Haven't looked at send/receive yet, but if they require network 
connections, I am afraid classical ADSL speeds with mac 1MBit/s upload will 
not be much fun...
And for periodic backup to an external HDD I was thinking about ChronoSync 
or simply rsync

roemer

-- 

--- 
You received this message because you are subscribed to the Google Groups 
zfs-macos group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [zfs-macos] pros/cons of multiple zfs filesystems

2014-03-16 Thread Simon Casady
An advantage of snapshots is with active filesystems such as those used by
a database.  For a consist at database backup you of course need to stop
the program then backup then restart ( or use some database tool if
available) .  The time to create a snapshot is essentially zero so the
above start - stop is actually practical.  Then you use your backup
software of choice on the snapshot not the active file system.


On Sun, Mar 16, 2014 at 7:16 AM, roemer uwe.ro...@gmail.com wrote:

 Thanks for the response, Björn.
 The hint regarding dataset-specific snapshots is good, though I have to
 first think about how I would best make use of them.

 However another point that you raised is interesting:

 On Sunday, 16 March 2014 10:34:52 UTC+11, Bjoern Kahl wrote:

 [...]

  Under Mac OSX, a mounted file system comes at higher costs than on
  other Unix like operating systems, due to the Finder and MDS services,
  so I would not suggest to really try to have hundreds of file systems
  mounted at the same time.  But any reasonable number (some 10) go
  without noticeable performance impact.


 I would need about 10 separate mount points / data sets, so I guess this
 would be fine.
 MDS services however means Spotlight, but the MacZFS Wiki as well as
 several other posts on the web give the advice to switch off spotlight for
 ZFS with
 mdutil -i off mountPoint

 Why is Spotlight thought to be evil for ZFS?
 Or does your comment imply that these advices are outdated, and
 mds-indexing for ZFS mount points is ok nowadays?
 Note that I am mainly aiming to store static 'archival' data and documents
 on ZFS, not my main user directory.


 [...] Snapshots can also easily be used for real
  off-site backups by the zfs send / receive mechanism.

 Haven't looked at send/receive yet, but if they require network
 connections, I am afraid classical ADSL speeds with mac 1MBit/s upload will
 not be much fun...
 And for periodic backup to an external HDD I was thinking about ChronoSync
 or simply rsync

 roemer

 --

 ---
 You received this message because you are subscribed to the Google Groups
 zfs-macos group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to zfs-macos+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 

--- 
You received this message because you are subscribed to the Google Groups 
zfs-macos group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [zfs-macos] pros/cons of multiple zfs filesystems

2014-03-16 Thread roemer
Thanks for the detailed example!

On Monday, 17 March 2014 07:34:45 UTC+11, dch wrote:

 I've been a happy maczfs and also zfsosx user for several years now. 
 [...]
 zfs send is a very easy way to do a very trustable 
 backup, once you get past the first potentially large transfers. 

 Can this happen bi-directiona? Or is it only applicable for creating 
'read-only' replicas of a master filesystem onto some clients?
I mean, what happens once you cloned one file system, sent it to your 
laptop, then edit on both the laptop and your ZFS server?
 

 All my source code  work lives in a zfs case sensitive noatime 
 copies=2 filesystem, and I replicate that regularly to my other boxes 
 as required. 

 How does a 'copies=2' filesystem play together with a 'RAIDZ1' (or even 
RAIDZ2) pool?
RAIDZ would have all data stored redundantly already, so would 'copies=2'
not end up in quadrupling the storage requirement if used on a raidz pool?
 

 For most customer projects I will have 3 or more VMs running different 
 configs or operating systems under VMWare Fusion. These each live in 
 their own zfs filesystem, compressed lz4 noatime case sensitive. I 
 snapshot these after creation using vagrant install, again after 
 config, and the changes are replicated using zfs snapshots again to 
 the other OSX system, and also to the remote FreeBSD box. 

 I can see that zfs is really good for handling multiple virtual machines.
 

 [...]

In summary, I'm more than happy with the performance once I used 
 ashift=12 and moved past 8GB ram. Datasets once you get used to them 
 are extraordinarily useful -- snapshot your config just before a 
 critical upgrade. 

 I start seeing the potential in snapshots. In fact, I just realised that I 
do manual 
'snapshots' on some of my repeating projects already for quite some time 
with annual 
clones of the previous directory structure. So ZFS snapshots would be a 
natural fit here.

But regarding the memory consumption:
What makes ZFS so memory hungry in your case?
Do you use deduplication?

-- 

--- 
You received this message because you are subscribed to the Google Groups 
zfs-macos group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to zfs-macos+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.