Re: [zfs-discuss] VDI iops with caching
On 13-01-04 02:08 PM, Richard Elling wrote: All of these IOPS -- VDI users guidelines are wrong. The problem is that the variability of response time is too great for a HDD. The only hope we have of getting the back-of-the-napkin calculations to work is to reduce the variability by using a device that is more consistent in its response (eg SSDs). For sure there is going to be a lot of variability, but it seems we aren't even close. Have you seen any back-of-the-napkin calculations which take into consideration SSDs for cache usage? Yes. I've written a white paper on the subject, somewhere on the nexenta.com http://nexenta.com website (if it is still available). But more current info is presentation at ZFSday. http://www.youtube.com/watch?v=A4yrSfaskwI http://www.slideshare.net/relling Great presentation Richard. Our system is designed to provide hands-on labs for education. We use a saved state file for our VMs which eliminate the need for cold boot/login and shutdown issues. This reduces the need for random IO. As well, in this scenario we don't need to worry about software updates or AV scans, because the labs are completely sandboxed. We need to use HDDs because you have a large amount of labs which can be stored for an extended period. I have been asked to adapt the platform to deliver a VDI solution so I need to make a few more tweaks. thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] VDI iops with caching
Thanks Richard, Happy New Year. On 13-01-03 09:45 AM, Richard Elling wrote: On Jan 2, 2013, at 8:45 PM, Geoff Nordli geo...@gnaa.net mailto:geo...@gnaa.net wrote: I am looking at the performance numbers for the Oracle VDI admin guide. http://docs.oracle.com/html/E26214_02/performance-storage.html From my calculations for 200 desktops running Windows 7 knowledge user (15 iops) with a 30-70 read/write split it comes to 5100 iops. Using 7200 rpm disks the requirement will be 68 disks. This doesn't seem right, because if you are using clones with caching, you should be able to easily satisfy your reads from ARC and L2ARC. As well, Oracle VDI by default caches writes; therefore the writes will be coalesced and there will be no ZIL activity. All of these IOPS -- VDI users guidelines are wrong. The problem is that the variability of response time is too great for a HDD. The only hope we have of getting the back-of-the-napkin calculations to work is to reduce the variability by using a device that is more consistent in its response (eg SSDs). For sure there is going to be a lot of variability, but it seems we aren't even close. Have you seen any back-of-the-napkin calculations which take into consideration SSDs for cache usage? Anyone have other guidelines on what they are seeing for iops with vdi? The successful VDI implementations I've seen have relatively small space requirements for the performance-critical work. So there are a bunch of companies offering SSD-based arrays for that market. If you're stuck with HDDs, then effective use of snapshots+clones with a few GB of RAM and slog can support quite a few desktops. -- richard Yes, I would like to stick with HDDs. I am just not quite sure what quite a few desktops mean. I thought for sure there would be lots of people around that have done small deployments using a standard ZFS deployment. thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] VDI iops with caching
I am looking at the performance numbers for the Oracle VDI admin guide. http://docs.oracle.com/html/E26214_02/performance-storage.html From my calculations for 200 desktops running Windows 7 knowledge user (15 iops) with a 30-70 read/write split it comes to 5100 iops. Using 7200 rpm disks the requirement will be 68 disks. This doesn't seem right, because if you are using clones with caching, you should be able to easily satisfy your reads from ARC and L2ARC. As well, Oracle VDI by default caches writes; therefore the writes will be coalesced and there will be no ZIL activity. Anyone have other guidelines on what they are seeing for iops with vdi? Happy New Year! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
On 12-11-16 03:02 AM, Jim Klimov wrote: On 2012-11-15 21:43, Geoff Nordli wrote: Instead of using vdi, I use comstar targets and then use vbox built-in scsi initiator. Out of curiosity: in this case are there any devices whose ownership might get similarly botched, or you've tested that this approach also works well for non-root VMs? Did you measure any overheads of initiator-target vs. zvol, both being on the local system? Is there any significant performance difference worth thinking and talking about? Hi Jim. This works for non-root VMs. I haven't measured the difference between them, but it has been working fine. These aren't high-performance VMs. The design was to replicate the entire infrastructure for a small office every night to an off-site location.I have two of these in production right now and it has been working really well. I still need to work on some scripts to on the fly rebuild the VMs. One thing that I have done in the past is store the LUN and LU GUID in the zfs user defined properties to keep track of it. I love zfs user defined properties, they are one of the killer features of ZFS. Really, no reason to not be able to store the entire VM configuration as zfs properties. That could be interesting with your vboxsvc smf project. I work for another company that uses vbox for a lab management solution for education. We use the same architecture (vbox iscsi initiator - comstar target) but separate out the virtual machines from the storage. It is a very slick system. have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zvol access rights - chown zvol on reboot / startup / boot
On 12-11-15 11:57 AM, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: When I google around for anyone else who cares and may have already solved the problem before I came along - it seems we're all doing the same thing for the same reason.If by any chance you are running VirtualBox on a solaris / opensolaris / openidiana / whatever ZFS host, you could of course use .vdi files for the VM virtual disks, but a lot of us are using zvol instead, for various reasons.To do the zvol, you first create the zvol (sudo zfs create -V) and then chown it to the user who runs VBox (sudo chown someuser /dev/zvol/rdsk/...) and then create a rawvmdk that references it (VBoxManage internalcommands createrawvmdk -filename /home/someuser/somedisk.vmdk -rawdisk /dev/zvol/rdsk/...) The problem is - during boot / reboot, or anytime the zpool or zfs filesystem is mounted or remounted, export, import...The zvol ownership reverts back to root:root.So you have to repeat your sudo chown before the guest VM can start. And the question is ...Obviously I can make an SMF service which will chown those devices automatically, but that's kind of a crappy solution. Is there any good way to assign the access rights, or persistently assign ownership of zvol's? Instead of using vdi, I use comstar targets and then use vbox built-in scsi initiator. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?
Dan, If you are going to do the all in one with vbox, you probably want to look at: http://sourceforge.net/projects/vboxsvc/ It manages the starting/stopping of vbox vms via smf. Kudos to Jim Klimov for creating and maintaining it. Geoff On Thu, Nov 8, 2012 at 7:32 PM, Dan Swartzendruber dswa...@druber.comwrote: I have to admit Ned's (what do I call you?)idea is interesting. I may give it a try... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] defer_destroy property set on snapshot creation
I am running NexentaOS_134f This is really weird, but I for some reason the defer_destroy property is being set on new snapshots and I can't turn it off. Normally it should be enabled when using the zfs destroy -d command. The property doesn't seem to be inherited from anywhere. It seems to have just started happening. Here are the steps showing how it works. Really, it is working as expected, but the property shouldn't be set on creation. Create snapshot: root@grok-zfs1:~# zfs snapshot groklab/ws08r2-U2037@5 root@grok-zfs1:~# zfs get defer_destroy | grep U2037\@5 groklab/ws08r2-U2037@5 defer_destroy on - Create a clone: root@grok-zfs1:~# zfs clone groklab/ws08r2-U2037@5 groklab/test2 root@grok-zfs1:~# zfs list -t all | grep test2 groklab/test20 886G 11.6G - The snapshot is still there: root@grok-zfs1:~# zfs list -t all | grep U2037\@5 groklab/ws08r2-U2037@5 0 - 11.6G - Destroy the clone: root@grok-zfs1:~# zfs destroy groklab/test2 Snapshot is gone: root@grok-zfs1:~# zfs list -t all | grep U2037\@5 root@grok-zfs1:~# thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] SAS world-wide name
trying to figure out a reliable way to identify drives to make sure I pull the right drive when there is a failure. These will be smaller installations (16 drives) I am pretty sure the wwn name on a sas device is preassigned like a MAC address, but I just want to make sure. Is there any scenario where the wwn changes? so ideally, as long as I label the disk, with the correct wwn, then I should be able to identify it as failed, and be able to pull it? NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t5000C50033F5BD7Fd0 ONLINE 0 0 0 c2t5000C50033F5BE3Bd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c2t5000C50033F5BF9Fd0 ONLINE 0 0 0 c2t5000C50033F5BFBBd0 ONLINE 0 0 0 spares c2t5000C50033F5D607d0AVAIL thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using Solaris iSCSI target in VirtualBox iSCSI Initiator
-Original Message- From: Thierry Delaitre Sent: Wednesday, February 23, 2011 4:42 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] Using Solaris iSCSI target in VirtualBox iSCSI Initiator Hello, Im using ZFS to export some iscsi targets for the virtual box iscsi initiator. It works ok if I try to install the guest OS manually. However, Id like to be able to import my already prepared guest os vdi images into the iscsi devices but I cant figure out how to do it. Each time I tried, I cannot boot. It only works if I save the manually installed guest os and re-instate the same as follows: dd if=/dev/dsk/c3t600144F07551C2004D619D170002d0p0 of=debian.vdi dd if=debian.vdi of=/dev/dsk/c3t600144F07551C2004D619D170002d0p0 fdisk /dev/dsk/c3t600144F07551C2004D619D170002d0p0 Total disk size is 512 cylinders Cylinder size is 4096 (512 byte) blocks Cylinders Partition Status Type Start End Length % = == = === == === 1 Active Linux native 0 463 464 91 2 EXT-DOS 464 511 48 9 Im wondering whether there is an issue with the disk geometry hardcoded in the vdi file container ? Does the VDI disk geometry need to match the LUN size ? Thanks, Thierry. Hi Thierry. You need to convert the VDI image into a raw image before you import into a zvol. Something like: vboxmanage internalcommands converttoraw debian.vdi debian.raw Then I run dd directly into the zvol device, not the iscsi LUN like: /dev/zvol/rdsk/zpool_name/debian Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
From: Edward Ned Harvey Sent: Monday, December 20, 2010 9:25 AM Subject: RE: [zfs-discuss] a single nfs file system shared out twice with different permissions From: Richard Elling zfs create tank/snapshots zfs set sharenfs=on tank/snapshots on by default sets the NFS share parameters to: rw You can set specific NFS share parameters by using a string that contains the parameters. For example, zfs set sharenfs=rw=192.168.12.13,ro=192.168.12.14 my/file/system sets readonly access for host 192.168.12.14 and read/write access for 192.168.12.13. Yeah, but for some reason, the OP didn't want to make it readonly for different clients ... He wanted a single client to have it mounted twice on two different directories, one with readonly, and the other with read-write. I guess he has some application he can imprison into a specific read-only subdirectory, while some other application should be able to read/write or something like that, using the same username, on the same machine. It is the same application, but for some functions it needs to use read-only access or it will modify the files when I don't want it to. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
From: Richard Elling Sent: Monday, December 20, 2010 8:14 PM Subject: Re: [zfs-discuss] a single nfs file system shared out twice with different permissions On Dec 20, 2010, at 11:26 AM, Geoff Nordli geo...@gnaa.net wrote: From: Edward Ned Harvey Sent: Monday, December 20, 2010 9:25 AM Subject: RE: [zfs-discuss] a single nfs file system shared out twice with different permissions From: Richard Elling zfs create tank/snapshots zfs set sharenfs=on tank/snapshots on by default sets the NFS share parameters to: rw You can set specific NFS share parameters by using a string that contains the parameters. For example, zfs set sharenfs=rw=192.168.12.13,ro=192.168.12.14 my/file/system sets readonly access for host 192.168.12.14 and read/write access for 192.168.12.13. Yeah, but for some reason, the OP didn't want to make it readonly for different clients ... He wanted a single client to have it mounted twice on two different directories, one with readonly, and the other with read-write. Is someone suggesting my solution won't work? Or are they just not up to the challenge? :-) It won't work :) The challenge is exporting two shares from the same folder. Linux has a bind command which will make this work, but from what I can see there isn't an equivalent on OpenSolaris. This isn't a big deal though; I can make it work using CIFS. It isn't something that has to be NFS, but I thought I would ask to see if there was a simple solution I was missing. I guess he has some application he can imprison into a specific read-only subdirectory, while some other application should be able to read/write or something like that, using the same username, on the same machine. It is the same application, but for some functions it needs to use read-only access or it will modify the files when I don't want it to. Sounds like a simple dtrace script should do the trick, too. Unfortunately, there isn't anything I can do about the application, and it really isn't a big deal. There is a pretty straight forward workaround. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
From: Darren J Moffat Sent: Monday, December 20, 2010 4:15 AM Subject: Re: [zfs-discuss] a single nfs file system shared out twice with different permissions On 18/12/2010 07:09, Geoff Nordli wrote: I am trying to configure a system where I have two different NFS shares which point to the same directory. The idea is if you come in via one path, you will have read-only access and can't delete any files, if you come in the 2nd path, then you will have read/write access. That sounds very similar to what you would do with Trusted Extensions. The read/write label would be a higher classification than the read-only one - since you can read down, can't see higher and need to be equal to modify. For more information on Trusted Extensions start with these resources: Oracle Solaris 11 Express Trusted Extensions Collection http://docs.sun.com/app/docs/coll/2580.1?l=en OpenSolaris Security Community pages on TX: http://hub.opensolaris.org/bin/view/Community+Group+security/tx Darren, thanks for the suggestion. I think I am going to go back to using CIFS. It seems to be quite a bit simpler than what I am looking at with NFS. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] a single nfs file system shared out twice with different permissions
-Original Message- From: Edward Ned Harvey [mailto:opensolarisisdeadlongliveopensola...@nedharvey.com] Sent: Saturday, December 18, 2010 6:13 AM To: 'Geoff Nordli'; zfs-discuss@opensolaris.org Subject: RE: [zfs-discuss] a single nfs file system shared out twice with different permissions From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Geoff Nordli I am trying to configure a system where I have two different NFS shares which point to the same directory. The idea is if you come in via one path, you will have read-only access and can't delete any files, if you come in the 2nd path, then you will have read/write access. I think you can do this client-side. mkdir /foo1 mkdir /foo2 mount nfsserver:/exports/bar /foo1 mount -o ro nfsserver:/exports/bar /foo2 Thanks Edward. The client side solution works great. Happy holidays!! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] a single nfs file system shared out twice with different permissions
I am trying to configure a system where I have two different NFS shares which point to the same directory. The idea is if you come in via one path, you will have read-only access and can't delete any files, if you come in the 2nd path, then you will have read/write access. For example, create the read/write nfs share: zfs create tank/snapshots zfs set sharenfs=on tank/snapshots r...@grok-zfs1:/# sharemgr show -vp default nfs=() zfs zfs/tank/snapshots nfs=() /tank/snapshots I have had some luck doing it with Samba. Any pointers to making it work with NFS? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs list takes a long time to return
I am running the latest version of Nexenta Core 3.0 (b134 + extra backports). The time to run zfs list is starting to increase as the number of datasets increase where it takes almost 30 seconds now to return 1500 datasets. r...@zfs1:/etc# time zfs list -t all | wc -l 1491 real0m29.614s user0m0.382s sys 0m6.329s This machine has plenty room on the memory side of things: ARC Size: Current Size: 1090 MB (arcsize) Target Size (Adaptive): 7159 MB (c) Min Size (Hard Limit):894 MB (zfs_arc_min) Max Size (Hard Limit):7159 MB (zfs_arc_max) Are there other things I can look at which may improve the performance of the zfs list command or is this as good as it is going to get? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] way to find out of a dataset has children
Is there a way to find out if a dataset has children or not using zfs properties or other scriptable method? I am looking for a more efficient way to delete datasets after they are finished being used. Right now I use custom property to set delete=1 on a dataset, and then I have a script that runs async to clean them up. If there are children then the delete will fail. This method works, but I would rather filter it again so it only tries to delete a dataset which can actually be deleted. I have looked at the usedbychildren and usedbysnapshots, but they are just showing 0 in the columns. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] way to find out of a dataset has children
From: Darren J Moffat Sent: Monday, September 27, 2010 11:03 AM On 27/09/2010 18:14, Geoff Nordli wrote: Is there a way to find out if a dataset has children or not using zfs properties or other scriptable method? I am looking for a more efficient way to delete datasets after they are finished being used. Right now I use custom property to set delete=1 on a dataset, and then I have a script that runs async to clean them up. If there are children then the delete will fail. This method works, but I would rather filter it again so it only tries to delete a dataset which can actually be deleted. This sounds very like what 'zfs hold' and 'zfs destroy -d' were designed for. When using 'zfs send' holds will automatically be taken out for pool versions 18 and higher. Darren, thanks for this tip. It this looks like it will work well for snapshots, but I can't apply the property to a clone. Are there any properties I can set on the clone side? I could definitely do a zfs list, and look for same name as the clone which I am trying to delete, but I am looking for a better way. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] way to find out of a dataset has children
From: Richard Elling Sent: Monday, September 27, 2010 1:01 PM On Sep 27, 2010, at 11:54 AM, Geoff Nordli wrote: Are there any properties I can set on the clone side? Each clone records its origin snapshot in the origin property. $ zfs get origin syspool/rootfs-nmu-001 NAMEPROPERTY VALUE SOURCE syspool/rootfs-nmu-001 originsyspool/rootfs-nmu-...@nmu-001 - Enjoy -- richard Hi Richard. Yes, we can use the origin, but that tells me where it came from, not how many snapshots are built from it. Before I can delete it, the clone can't have any snapshots under it. This isn't hard to solve, I can just do a regex on the clone name looking for snapshots name, but I was hoping there was a simple zfs property I can query. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] stmf corruption and dealing with dynamic lun mapping
I am running Nexenta NCP 3.0 (134f). My stmf configuration was corrupted. I was getting errors like in /var/adm/messages: Sep 1 10:32:04 llift-zfs1 svc-stmf[378]: [ID 130283 user.error] get property view_entry-0/all_hosts failed - entity not found Sep 1 10:32:04 llift-zfs1 svc.startd[9]: [ID 652011 daemon.warning] svc:/system/stmf:default: Method /lib/svc/method/svc-stmf start failed with exit status 1 In the /var/adm/system-stmf\:default.log [ Sep 1 10:32:05 Executing start method (/lib/svc/method/svc-stmf start). ] svc-stmf: Unable to load the configuration. See /var/adm/messages for details svc-stmf: For information on reverting the stmf:default instance to a previously running configuration see the man page for svccfg(1M) svc-stmf: After reverting the instance you must clear the service maintenance state. See the man page for svcadm(1M) I fixed it by going into the svccfg and reverted to the previous running snap. We have a lab management system which continuously creates and deletes LUNs as virtual machines are built and destroyed. When I recovered to the previous running state we had a mismatch between what the LUNs should be and what they were. Is there a backup configuration somewhere, or a way to re-read the LUN configuration? If not, I set the LUN for each volume in the custom zfs properties. I may just need to build a sanitizer script to rebuild the LUN mappings in the event of catastrophic failure. BTW, I am running this system inside a VMWare Server vm, which has caused some instability, but I guess it is good to be prepared. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
From: Edward Ned Harvey [mailto:sh...@nedharvey.com] Sent: Sunday, August 08, 2010 8:34 PM boun...@opensolaris.org] On Behalf Of Geoff Nordli Anyone have any experience with a R510 with the PERC H200/H700 controller with ZFS? My perception is that Dell doesn't play well with OpenSolaris. I have an R710... Not quite the same, but similar. When you say doesn't play nice with opensolaris, you're hitting a grain of truth but blowing it out of proportion. It's true that some stuff might not brainlessly work the way you'd like. And it's true that you'll be lacking some functionality that you'd probably like. But it's easy to quantify and describe, and easy to describe the options of dealing with it, too: #1 Optical drive, by default, in AHCI mode. For some reason, apparently using a chipset that osol can't handle. So put the optical drive into ATA or Legacy mode before booting. Or use an external optical drive for installation. #2 On my R710, the perc driver wasn't included in the sol10 DVD by default, so I had to download a special one from support.dell.com, and hit 5 - insert driver disk or whatever, during sol10 installation. This was not an issue for osol installation. #3 OMSA provides the web-based gui to manage your raid card. Not available in sol10/osol. Instead, find the appropriate MegaCLI for your device ... which is really tough to do. This is needed if you want the ability to replace a failed drive without rebooting. Once you overcome these things, it works great. Thanks Edward. What did you end up using for the L2ARC? The SSDs shown in the online configurator are SLC based. Did you use the Broadcom or the optional Intel NIC? Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
Anyone have any experience with a R510 with the PERC H200/H700 controller with ZFS? My perception is that Dell doesn't play well with OpenSolaris. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS
-Original Message- From: Brian Hechinger [mailto:wo...@4amlunch.net] Sent: Saturday, August 07, 2010 8:10 AM To: Geoff Nordli Subject: Re: [zfs-discuss] PowerEdge R510 with PERC H200/H700 with ZFS On Sat, Aug 07, 2010 at 08:00:11AM -0700, Geoff Nordli wrote: Anyone have any experience with a R510 with the PERC H200/H700 controller with ZFS? Not that particular setup, but I do run Solaris on a Precision 690 with PERC 6i controllers. My perception is that Dell doesn't play well with OpenSolaris. What makes you say that? I've run Solaris on quite a few Dell boxes and have never had any issues. -brian -- Hi Brian. I am glad to hear that, because I would prefer to use a dell box. Is there a JBOD mode with the PERC 6i? It is funny how sometimes one forms these views as you gather information. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] block align SSD for use as a l2arc cache
I have an Intel X25-M 80GB SSD. For optimum performance, I need to block align the SSD device, but I am not sure exactly how I should to it. If I run the format - fdisk it allows me to partition based on a cylinder, but I don't think that is sufficient enough. Can someone tell me how they block aligned an SSD device for use in l2arc. Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] block align SSD for use as a l2arc cache
-Original Message- From: Erik Trimble Sent: Friday, July 09, 2010 6:45 PM Subject: Re: [zfs-discuss] block align SSD for use as a l2arc cache On 7/9/2010 5:55 PM, Geoff Nordli wrote: I have an Intel X25-M 80GB SSD. For optimum performance, I need to block align the SSD device, but I am not sure exactly how I should to it. If I run the format - fdisk it allows me to partition based on a cylinder, but I don't think that is sufficient enough. Can someone tell me how they block aligned an SSD device for use in l2arc. Thanks, Geoff (a) what makes you think you need to do block alignment for an L2ARC usage (particularly if you give the entire device to ZFS)? (b) what makes you think that even if (a) is needed, that ZFS will respect 4k block boundaries? That is, why do you think that ZFS would put any effort into doing block alignment with its L2ARC writes? Thanks Erik. So obviously what you are saying is you don't need to worry about doing block alignment with an l2arc cache device because it will randomly read/write at the device block level instead of doing a larger writes like a file system. Have a great weekend! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup RAM requirements, vs. L2ARC?
Actually, I think the rule-of-thumb is 270 bytes/DDT entry. It's 200 bytes of ARC for every L2ARC entry. DDT doesn't count for this ARC space usage E.g.:I have 1TB of 4k files that are to be deduped, and it turns out that I have about a 5:1 dedup ratio. I'd also like to see how much ARC usage I eat up with a 160GB L2ARC. (1)How many entries are there in the DDT: 1TB of 4k files means there are 2^30 files (about 1 billion). However, at a 5:1 dedup ratio, I'm only actually storing 0% of that, so I have about 214 million blocks. Thus, I need a DDT of about 270 * 214 million =~ 58GB in size (2)My L2ARC is 160GB in size, but I'm using 58GB for the DDT. Thus, I have 102GB free for use as a data cache. 102GB / 4k =~ 27 million blocks can be stored in the emaining L2ARC space. However, 26 million files takes up: 200 * 27 million =~ GB of space in ARC Thus, I'd better have at least 5.5GB of RAM allocated olely for L2ARC reference pointers, and no other use. Hi Erik. Are you saying the DDT will automatically look to be stored in an L2ARC device if one exists in the pool, instead of using ARC? Or is there some sort of memory pressure point where the DDT gets moved from ARC to L2ARC? Thanks, Geoff -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OCZ Vertex 2 Pro performance numbers
From: Arne Jansen Sent: Friday, June 25, 2010 3:21 AM Now the test for the Vertex 2 Pro. This was fun. For more explanation please see the thread Crucial RealSSD C300 and cache flush? This time I made sure the device is attached via 3GBit SATA. This is also only a short test. I'll retest after some weeks of usage. cache enabled, 32 buffers, 64k blocks linear write, random data: 96 MB/s linear read, random data: 206 MB/s linear write, zero data: 234 MB/s linear read, zero data: 255 MB/s random write, random data: 84 MB/s random read, random data: 180 MB/s random write, zero data: 224 MB/s randow read, zero data: 190 MB/s cache enabled, 32 buffers, 4k blocks linear write, random data: 93 MB/s linear read, random data: 138 MB/s linear write, zero data: 113 MB/s linear read, zero data: 141 MB/s random write, random data: 41 MB/s (10300 ops/s) random read, random data: 76 MB/s (19000 ops/s) random write, zero data: 54 MB/s (13800 ops/s) random read, zero data: 91 MB/s (22800 ops/s) cache enabled, 1 buffer, 4k blocks linear write, random data: 62 MB/s (15700 ops/s) linear read, random data: 32 MB/s (8000 ops/s) linear write, zero data: 64 MB/s (16100 ops/s) linear read, zero data: 45 MB/s (11300 ops/s) random write, random data: 14 MB/s (3400 ops/s) random read, random data: 22 MB/s (5600 ops/s) random write, zero data: 19 MB/s (4500 ops/s) random read, zero data: 21 MB/s (5100 ops/s) cache enabled, 1 buffer, 4k blocks, with cache flushes: linear write, random data, flush after every write: 5700 ops/s linear write, zero data, flush after every write: 5700 ops/s linear write, random data, flush after every 4th write: 8500 ops/s linear write, zero data, flush after every 4th write: 8500 ops/s Some remarks: The random op numbers have to be read with care: - reading occurs in the same order as the writing before - the ops are not aligned to any specific boundary The device also passed the write-loss-test: after 5 repeats no data has been lost. It doesn't make any difference if the cache is enabled or disabled, so it might be worth to tune zfs to not issue cache flushes. Conclusion: This device will make an excellent slog device. I'll order them today ;) --Arne Arne, thanks for doing these tests, they are great to see. Is this the one (http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/maxim um-performance-enterprise-solid-state-drives/ocz-vertex-2-pro-series-sata-ii -2-5--ssd-.html) with the built in supercap? Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Please trim posts
-Original Message- From: Linder, Doug Sent: Friday, June 18, 2010 12:53 PM Try doing inline quoting/response with Outlook, where you quote one section, reply, quote again, etc. It's impossible. You can't split up the quoted section to add new text - no way, no how. Very infuriating. It's like Outlook was *designed* to force people to top post. Hi Doug. I use Outlook too, and you are right, it is a major PITA. I was hoping that OL2010 was going to solve the problem, but it doesn't :( The only way I can get it to sort of work is by editing the HTML message, and saving it as plain text, then replying to that. If you try to reply to an HTML formatted message, it is awful. I also manually clean up some of the header information below Original Message. Have a great weekend! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup... still in beta status
From: Fco Javier Garcia Sent: Tuesday, June 15, 2010 11:21 AM Realistically, I think people are overtly-enamored with dedup as a feature - I would generally only consider it worth-while in cases where you get significant savings. And by significant, I'm talking an order of magnitude space savings. A 2x savings isn't really enough to counteract the down sides. Especially when even enterprise disk space is (relatively) cheap. I think dedup may have its greatest appeal in VDI environments (think about a environment with 85% if the data that the virtual machine needs is into ARC or L2ARC... is like a dream...almost instantaneous response... and you can boot a new machine in a few seconds)... Does dedup benefit in the ARC/L2ARC space? For some reason, I have it in my head that for each time it requests the block from storage it will copy it into cache; therefore if I had 10 VMs requesting the same dedup'd block, there will be 10 copies of the same block in ARC/L2ARC. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks
On Behalf Of Joe Auty Sent: Tuesday, June 08, 2010 11:27 AM I'd love to use Virtualbox, but right now it (3.2.2 commercial which I'm evaluating, I haven't been able to compile OSE on the CentOS 5.5 host yet) is giving me kernel panics on the host while starting up VMs which are obviously bothersome, so I'm exploring continuing to use VMWare Server and seeing what I can do on the Solaris/ZFS side of things. I've also read this on a VMWare forum, although I don't know if this correct? This is in context to me questioning why I don't seem to have these same load average problems running Virtualbox: Hi Joe. One thing about Vbox is they are rapidly adding new features which cause some instability and regressions. Unless there is a real need for one of the new features in the 3.2 branch, I would recommend working with the 3.0 branch in a production environment. They will announce when they feel that 3.2 becomes production ready. VirtualBox is a great type 2 hypervisor, and I can't believe how much it has improved over the last year. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks
Brandon High wrote: On Tue, Jun 8, 2010 at 10:33 AM, besson3c j...@netmusician.org wrote: What VM software are you using? There are a few knobs you can turn in VBox which will help with slow storage. See http://www.virtualbox.org/manual/ch12.html#id2662300 for instructions on reducing the flush interval. -B Hi Brandon. Have you played with the flush interval? I am using iscsi based zvols, and I am thinking about not using the caching in vbox and instead rely on the comstar/zfs side. What do you think? Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
-Original Message- From: Matt Connolly Sent: Wednesday, May 26, 2010 5:08 AM I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s Is there something obvious I've missed here? -- Hi Matt, here is a decent post on how to setup the COMSTAR and disable the old iscsitgt service. http://toic.org/2009/11/08/opensolaris-server-with-comstar-and-zfs/ Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] can you recover a pool if you lose the zil (b134+)
-Original Message- From: Edward Ned Harvey [mailto:solar...@nedharvey.com] Sent: Monday, May 17, 2010 6:29 AM I was messing around with a ramdisk on a pool and I forgot to remove it before I shut down the server. Now I am not able to mount the pool. I am not concerned with the data in this pool, but I would like to try to figure out how to recover it. I am running Nexenta 3.0 NCP (b134+). Try this: zpool upgrade By default, it will just tell you the current versions of zpools, without actually doing any upgrades. If your zpool is 19 or greater, then the loss of a ZIL is not fatal to the pool. You should be able to zpool import and then you'll see a message about zpool import -F If you have zpool 19, then it's lost. BTW, just to make sure you know ... Having a ZIL in RAM makes no sense whatsoever, except for academic purposes. For a system in actual usage, you should either implement nonvolatile ZIL device, or disable ZIL (to be used with caution.) Thanks Edward. The syspool is sitting at level 18 so I assume the old pool is toast. I was more curious why nothing was working because there are reports that you can do it, but it wasn't working for me. This system isn't in production, I was just testing to see if the zil was being used or not. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] can you recover a pool if you lose the zil (b134+)
I was messing around with a ramdisk on a pool and I forgot to remove it before I shut down the server. Now I am not able to mount the pool. I am not concerned with the data in this pool, but I would like to try to figure out how to recover it. I am running Nexenta 3.0 NCP (b134+). I have tried a couple of the commands (zpool import -f and zpool import -FX llift) r...@zfs1:/export/home/gnordli# zpool import -f pool: llift id: 15946357767934802606 state: UNAVAIL status: One or more devices are missing from the system. action: The pool cannot be imported. Attach the missing devices and try again. see: http://www.sun.com/msg/ZFS-8000-6X config: lliftUNAVAIL missing device mirror-0 ONLINE c4t8d0 ONLINE c4t9d0 ONLINE mirror-1 ONLINE c4t10d0 ONLINE c4t11d0 ONLINE Additional devices are known to be part of this pool, though their exact configuration cannot be determined. r...@zfs1:/export/home/gnordli# zpool import -FX llift cannot import 'llift': no such pool or dataset Destroy and re-create the pool from a backup source. I do not have a copy of the zpool.cache file. Any other commands I could try to recover it or is it just unrecoverable? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
From: James C. McPherson [mailto:james.mcpher...@oracle.com] Sent: Wednesday, May 12, 2010 2:28 AM On 12/05/10 03:18 PM, Geoff Nordli wrote: I have been wondering what the compatibility is like on OpenSolaris. My perception is basic network driver support is decent, but storage controllers are more difficult for driver support. Now wait just a minute. You're casting aspersions on stuff here without saying what you're talking about, still less where you're getting your info from. Be specific - put up, or shut up. My perception is if you are using external cards which you know work for networking and storage, then you should be alright. Am I out in left-field on this? I believe you are talking through your hat. James, it is not my intention to cast an aspersion in this thread. I should have worded my reply differently instead of posting my perception, but I really didn't think I would get piled on for it. This subject interests me because we are going to have customers deploy OpenSolaris on their own equipment and I have been concerned about compatibility. Is the MOBO/Chipset/CPU actually something to be worried about with OpenSolaris compatibility? I know this is not an all-encompassing list, but I got my hardware info from the Nexenta site (http://www.nexenta.com/corp/supported-hardware/hardware-supported-list) because that is the distro I started with. Have a great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
-Original Message- From: Brandon High [mailto:bh...@freaks.com] Sent: Monday, May 10, 2010 5:56 PM On Mon, May 10, 2010 at 3:53 PM, Geoff Nordli geo...@gnaa.net wrote: Doesn't this alignment have more to do with aligning writes to the stripe/segment size of a traditional storage array? The articles I am It is a lot like a stripe / segment size. If you want to think of it in those terms, you've got a segment of 512b (the iscsi block size) and a width of 16, giving you an 8k stripe size. Any write that is less than 8k will require a RMW cycle, and any write in multiples of 8k will do full stripe writes. If the write doesn't start on an 8k boundary, you risk having writes span multiple underlying zvol blocks. When using a zvol, you've essentially got $volblocksize sized physical sectors, but the initiator sees the 512b block size that the LUN is reporting. If you don't block align, you risk having a write straddle two zfs blocks. There may be some benefit to using a 4k volblocksize, but you'll use more time and space on block checksums and, etc in your zpool. I think 8k is a reasonable trade off. If you're using the whole disk with zfs, you don't need to worry about it. If you're using fdisk partitions or slices, you need be a little more careful. So... as long as you use whole disks, set the volblocksize to a multiple of the virtual machines file system allocation size, then you don't have to worry about alignment/optimization with ZFS. Thanks again!! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opteron 6100? Does it work with opensolaris?
On Behalf Of James C. McPherson Sent: Tuesday, May 11, 2010 5:41 PM On 12/05/10 10:32 AM, Michael DeMan wrote: I agree on the motherboard and peripheral chipset issue. This, and the last generation AMD quad/six core motherboards all seem to use the AMD SP56x0/SP5100 chipset, which I can't find much information about support on for either OpenSolaris or FreeBSD. If you can get the device driver detection utility to run on it, that will give you a reasonable idea. Another issue is the LSI SAS2008 chipset for SAS controller which is frequently offered as an onboard option for many motherboards as well and still seems to be somewhat of a work in progress in regards to being 'production ready'. What metric are you using for production ready ? Are there features missing which you expect to see in the driver, or is it just oh noes, I haven't seen enough big customers with it ? I have been wondering what the compatibility is like on OpenSolaris. My perception is basic network driver support is decent, but storage controllers are more difficult for driver support. My perception is if you are using external cards which you know work for networking and storage, then you should be alright. Am I out in left-field on this? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
-Original Message- From: Brandon High [mailto:bh...@freaks.com] Sent: Monday, May 10, 2010 9:55 AM On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli geo...@gnaa.net wrote: I am looking at using 8K block size on the zfs volume. 8k is the default for zvols. You are right, I didn't look at that property, and instead I was focused on the record size property. I was looking at the comstar iscsi settings and there is also a blk size configuration, which defaults as 512 bytes. That would make me believe that all of the IO will be broken down into 512 bytes which seems very inefficient. I haven't done any tuning on my comstar volumes, and they're using 8k blocks. The setting is in the dataset's volblocksize parameter. When I look at the stmfadm llift-lu -v it shows me the block size of 512. I am running NexentaCore 3.0 (b134+) . I wonder if the default size has changed with different versions. It seems this value should match the file system allocation/cluster size in the VM, maybe 4K if you are using an ntfs file system. You'll have more overhead using smaller volblocksize values, and get worse compression (since compression is done on the block). If you have dedup enabled, you'll create more entries in the DDT which can have pretty disastrous consequences on write performance. Ensuring that your VM is block-aligned to 4k (or the guest OS's block size) boundaries will help performance and dedup as well. This is where I am probably the most confused l need to get straightened in my mind. I thought dedup and compression is done on the record level. As long as you are using a multiple of the file system block size, then alignment shouldn't be a problem with iscsi based zvols. When using a zvol comstar stores the metadata in a zvol object; instead of the first part of the volume. As Roy pointed out, you have to be careful on the record size because DDT and L2ARC lists consuming lots of RAM. But it seems you have four things to look at: File system block size - Iscsi blk size - zvol block size - zvol record size. What is the relationship between iscsi blk size and zvol block size? What is the relationship between zvol block size and zvol record size? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS and Comstar iSCSI BLK size
I am using ZFS as the backing store for an iscsi target running a virtual machine. I am looking at using 8K block size on the zfs volume. I was looking at the comstar iscsi settings and there is also a blk size configuration, which defaults as 512 bytes. That would make me believe that all of the IO will be broken down into 512 bytes which seems very inefficient. It seems this value should match the file system allocation/cluster size in the VM, maybe 4K if you are using an ntfs file system. Does anyone have any input on this? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots and Data Loss
-Original Message- From: Ross Walker [mailto:rswwal...@gmail.com] Sent: Friday, April 23, 2010 7:08 AM We are currently porting over our existing Learning Lab Infrastructure platform from MS Virtual Server to VBox + ZFS. When students connect into their lab environment it dynamically creates their VMs and load balances them across physical servers. You can also check out OpenSolaris' Xen implementation, which if you use Linux VMs will allow PV VMs as well as hardware assisted full virtualized Windows VMs. There are public domain Windows Xen drivers out there. The advantage of using Xen is it's VM live migration and XMLRPC management API. As it runs as a bare metal hypervisor it also allows fine granularity of CPU schedules, between guests and the host VM, but unfortunately it's remote display technology leaves something to be desired. For Windows VMs I use the built-in remote desktop, and for Linux VMs I use XDM and use something like 'thinstation' on the client side. -Ross Hi Ross. We decided to use a hosted hypervisor like VirtualBox because our customers use a variety of different platforms and they don't run high end workloads. We want a lot of flexibility on configuration and OS support (both host and guest). Remote control is a challenge. In our scenario students are going to spin up exact copies of a lab environment and we need to isolate their machines in separate networks so you can't directly connect to the VMs. We don't know what Guest OS they are going to run so we can't rely on the guest OS remote control tools. We want students to be able to have console access and they need to be able to share it out with an instructor. We want students to be able to connect from any type of device. We don't want to rely on other connection broker software to coordinate access. VirtualBox is great because it provides console level access via RDP. RDP performs well enough and is pretty much on everything. This is probably getting a bit off topic now :) Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots and Data Loss
From: Ross Walker [mailto:rswwal...@gmail.com] Sent: Thursday, April 22, 2010 6:34 AM On Apr 20, 2010, at 4:44 PM, Geoff Nordli geo...@grokworx.com wrote: If you combine the hypervisor and storage server and have students connect to the VMs via RDP or VNC or XDM then you will have the performance of local storage and even script VirtualBox to take a snapshot right after a save state. A lot less difficult to configure on the client side, and allows you to deploy thin clients instead of full desktops where you can get away with it. It also allows you to abstract the hypervisor from the client. Need a bigger storage server with lots of memory, CPU and storage though. Later, if need be, you can break out the disks to a storage appliance with an 8GB FC or 10Gbe iSCSI interconnect. Right, I am in the process now of trying to figure out what the load looks like with a central storage box and how ZFS needs to be configured to support that load. So far what I am seeing is very exciting :) We are currently porting over our existing Learning Lab Infrastructure platform from MS Virtual Server to VBox + ZFS. When students connect into their lab environment it dynamically creates their VMs and load balances them across physical servers. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots and Data Loss
From: matthew patton [mailto:patto...@yahoo.com] Sent: Tuesday, April 20, 2010 12:54 PM Geoff Nordli geo...@grokworx.com wrote: With our particular use case we are going to do a save state on their virtual machines, which is going to write 100-400 MB per VM via CIFS or NFS, then we take a snapshot of the volume, which guarantees we get a consistent copy of their VM. maybe you left out a detail or two but I can't see how your ZFS snapshot is going to be consistent UNLESS every VM on that ZFS volume is prevented from doing any and all I/O from the time it finishes save state and you take your ZFS snapshot. If by save state you mean something akin to VMWare's disk snapshot, why would you even bother with a ZFS snapshot in addition? We are using VirtualBox as our hypervisor. When it does a save state it generates a memory file. The memory file plus the volume snapshot creates a consistent state. In our platform each student's VM points to a unique backend volume via iscsi using VBox's built-in iscsi initiator. So there is a one-to-one relationship between VM and Volume. Just for clarity, a single VM could have multiple disks attached to it. In that scenario, then a VM would have multiple volumes. end we could have maybe 20-30 VMs getting saved at the same time, which could mean several GB of data would need to get written in a short time frame and would need to get committed to disk. So it seems the best case would be to get those save state writes as sync and get them into a ZIL. That I/O pattern is vastly 32kb and so will hit the 'rust' ZIL (which ALWAYS exists) and if you were thinking an SSD would help you, I don't see any/much evidence it will buy you anything. If I set the logbias (b122) to latency, then it will direct all sync IO to the log device, even if it exceeds the zfs_immediate_write_sz threshold. Have great day! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots and Data Loss
From: Richard Elling [mailto:richard.ell...@gmail.com] Sent: Monday, April 19, 2010 10:17 PM Hi Geoff, The Canucks have already won their last game of the season :-) more below... Hi Richard, I didn't watch the game last night, but obviously Vancouver better pick up their socks or they will be joining San Jose on the sidelines. With Ottawa, Montreal on the way out too, it could be a tough spring for Canadian hockey fans. On Apr 18, 2010, at 11:21 PM, Geoff Nordli wrote: Hi Richard. Can you explain in a little bit more detail how this process works? Let's say you are writing from a remote virtual machine via an iscsi target set for async writes and I take a snapshot of that volume. Are you saying any outstanding writes for that volume will need to be written to disk before the snapshot happens? Yes. That is interesting, so if your system is under write load and you are doing snapshots it could lead to problems. I was thinking writes wouldn't be an issue because they would be lazily written. With our particular use case we are going to do a save state on their virtual machines, which is going to write 100-400 MB per VM via CIFS or NFS, then we take a snapshot of the volume, which guarantees we get a consistent copy of their VM. When a class came to and end we could have maybe 20-30 VMs getting saved at the same time, which could mean several GB of data would need to get written in a short time frame and would need to get committed to disk. So it seems the best case would be to get those save state writes as sync and get them into a ZIL. Would you agree with that? I'm glad you enjoyed it. I'm looking forward to Vegas next week and there are some seats still open. -- richard I would love to go to Vegas, but I need to work on getting our new product out the door. Enjoy yourself in Vegas next week! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Snapshots and Data Loss
On Apr 13, 2010, at 5:22 AM, Tony MacDoodle wrote: I was wondering if any data was lost while doing a snapshot on a running system? ZFS will not lose data during a snapshot. Does it flush everything to disk or would some stuff be lost? Yes, all ZFS data will be committed to disk and then the snapshot is taken. Hi Richard. Can you explain in a little bit more detail how this process works? Let's say you are writing from a remote virtual machine via an iscsi target set for async writes and I take a snapshot of that volume. Are you saying any outstanding writes for that volume will need to be written to disk before the snapshot happens? Setting the target to sync writes and using a ZIL might have better performance, if you were doing a lot of snapshots. I know there is a potential to lose data with async set target, but the virtual machines running on the system are just lab machines using non-production data. BTW, great Nexenta / ZFS class in Atlanta. Thanks for getting me on the right track!! Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss