Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
On 19 nov. 2010, at 03:53, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- SAS Controller and all ZFS Disks/ Pools are passed-through to Nexenta to have full ZFS-Disk control like on real hardware. This is precisely the thing I'm interested in. How do you do that? On my ESXi (test) server, I have a solaris ZFS VM. When I configure it... and add disk ... my options are (a) create a new virtual disk (b) use an existing virtual disk, or (c) (grayed out) raw device mapping. There is a comment Give your virtual machine direct access to a SAN. So I guess it only is available if you have some iscsi target available... But you seem to be saying ... don't add the disks individually to the ZFS VM. You seem to be saying... Ensure the bulk storage is on a separate sas/scsi/sata controller from the ESXi OS... And then add the sas/scsi/sata PCI device to the guest, which will implicitly get all of the disks. Right? Or maybe ... the disks have to be scsi (sas)? And then you can add the scsi device directly pass-thru? As mentioned by Will, you'll need to use the VMDirectPath which allows you to map a hardware device (the disk controller) directly to the VM without passing through the VMware managed storage stack. Note that you are presenting the hardware directly so it needs to be a compatible controller. You'll need two controllers in the server since ESXi needs at least one disk that it controls to be formatted a VMFS to hold some of its files as well as the .vmx configuration files for the VM that will host the storage (and the swap file so it's got to be at least as large as the memory you plan to assign to the VM). Caveats - while you can install ESXi onto a USB drive, you can't manually format a USB drive as VMFS so for best performance you'll want at least one SATA or SAS controller that you can leave controlled by ESXi and the second controller where the bulk of the storage is attached for the ZFS VM. As far as the eggs in one basket issue goes, you can either use a clustering solution like the Nexenta HA between two servers and then you have a highly available storage solution based on two servers that can also run your VMs or for a more manual failover, just use zfs send|recv to replicate the data. You can also accomplish something similar if you have only the one controller by manually created local Raw Device Maps of the local disks and presenting them individually to the ZFS VM but you don't have direct access to the controller so I don't think stuff like blinking a drive will work in this configuration since you're not talking directly to the hardware. There's no UI for creating RDMs for local drives, but there's a good procedure over at http://www.vm-help.com/esx40i/SATA_RDMs.php which explains the technique. From a performance standpoint it works really well - I have NFS hosted VMs in this configuration getting 396Mo/s throughput on simple dd tests backed by 10 zfs mirrored disks, all protected with hourly send|recv to a second box. Cheers, Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
hmmm br http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide br br Disabling the ZIL (Don't) br Caution: Disabling the ZIL on an NFS server can lead to client side corruption. The ZFS pool integrity itself is not compromised by this tuning. brbr so especially with nfs i won`t disable it.brbr its better to add ssd read/write caches or use ssd-only pools. we use spindels for backups or test-server. out main vms are all on ssd-pools (striped raid1 build of 120 GB sandforce based mlc drives, about 190 euro each) brbr we do not use slc, i suppose mlc are good enough for the next three (the warranty-time). we will change them after this. about integrated storage in vmware: i have some infos on my homepage about our solution br http://www.napp-it.org/napp-it/all-in-one/index_en.html br gea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
On 19/11/2010 00:39, David Magda wrote: On Nov 16, 2010, at 05:09, Darren J Moffat wrote: Both CCM[1] and GCM[2] are provided so that if one turns out to have flaws hopefully the other will still be available for use safely even though they are roughly similar styles of modes. On systems without hardware/cpu support for Galios multiplication (Intel Westmere and later and SPARC T3 and later) GCM will be slower because the Galios field multiplication has to happen in software without any hardware/cpu assist. However depending on your workload you might not even notice the difference. Both modes of operation are authenticating. At one point the design of ZFS crypto had the checksum automatically go to SHA-256 when it was enabled. [1] Is SHA activation still the case, or are the two modes of operations simply used in themselves to verify data integrity? That is still the case, the blockpointer contains the IV, the SHA256 checksum (truncated) and the MAC from CCM and GCM. Also, are slog and cache devices encrypted at this time? Given a pool, and the fact that only particular data sets on it could be encrypted, would these special devices be entirely encrypted, or only data from the particular encrypted data set/s? I would also assume the in-memory ARC would be clear-text. The ZIL wither it is in pool or on a slog is always encrypted for an encrypted dataset, it is encrypted in exactly the same way. Data from encrypted datasets does not currently go to the L2ARC cache devices. The in memory ARC is in the clear and it has to be because those buffers can be shared via zero copy means to other parts of the system including other filesystems like NFS and CIFS. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Crypto in Oracle Solaris 11 Express
The design for ZFS crypto was done in the open via opensolaris.org and versions of the source (though not the final version at this time) are available on opensolaris.org. It was reviewed by internal and external to Sun/Oracle people who have considerable crypto experience. Important parts of the cryptography design were also discussed on other archived public forums as well as zfs-crypto-discuss. The design was also presented at IEEE 1619 SISWG and at SNIA. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
From: Saxon, Will [mailto:will.sa...@sage.com] In order to do this, you need to configure passthrough for the device at the host level (host - configuration - hardware - advanced settings). This Awesome. :-) The only problem is that once a device is configured to pass-thru to the guest VM, then that device isn't available for the host anymore. So you have to have your boot disks on a separate controller from the primary storage disks that are pass-thru to the guest ZFS server. For a typical ... let's say dell server ... that could be a problem. The boot disks would need to hold ESXi plus a ZFS server and then you can pass-thru the primary hotswappable storage HBA to the ZFS guest. Then the ZFS guest can export its storage back to the ESXi host via NFS or iSCSI... So all the remaining VM's can be backed by ZFS. Of course you have to configure ESXi to boot the ZFS guest before any of the other guests. The problem is just the boot device. One option is to boot from a USB dongle, but that's unattractive for a lot of reasons. Another option would be a PCIe storage device, which isn't too bad an idea. Anyone using PXE to boot ESXi? Got any other suggestions? In a typical dell server, there is no place to put a disk, which isn't attached via the primary hotswappable storage HBA. I suppose you could use a 1U rackmount server with only 2 internal disks, and add a 2nd HBA with external storage tray, to use as pass-thru to the ZFS guest. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of VO How to accomplish ESXi 4 raw device mapping with SATA at least: http://www.vm-help.com/forum/viewtopic.php?f=14t=1025 It says: You can pass-thru individual disks, if you have SCSI, but you can't pass-thru individual SATA disks. I don't have any way to verify this, but it seems unlikely... since SAS and SATA are interchangeable. (Sort of.) I know I have a dell server, with a few SAS disks plugged in, and a few SATA disks plugged in. Maybe the backplane is doing some kind of magic? But they're all presented to the OS by the HBA, and the OS has no way of knowing if the disks are actually SAS or SATA... As far as I know. It also says: You can pass-thru PCI SATA controller, but the entire controller must be given to the guest. This I have confirmed. I have an ESXi server with eSATA controller and external disk attached. One reboot was required in order to configure the pass-thru. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of VO This sounds interesting as I have been thinking something similar but never implemented it because all the eggs would be in the same basket. If you don't mind me asking for more information: Since you use Mapped Raw LUNs don't you lose HA/fault tolerance on the storage servers as they cannot be moved to another host? There is at least one situation I can imagine, where you wouldn't care. At present, I have a bunch of Linux servers, with local attached disk. I often wish I could run ZFS on Linux. You could install ESXi, Linux, and a ZFS server all into the same machine. You could export the ZFS filesystem to the Linux system via NFS. Since the network interfaces are all virtual, you should be able to achieve near-disk speed from the Linux client, and you should have no problem doing snapshots zfs send all the other features of ZFS. I'd love to do a proof of concept... Or hear that somebody has. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
From: Gil Vidals [mailto:gvid...@gmail.com] connected to my ESXi hosts using 1 gigabit switches and network cards: The speed is very good as can be seen by IOZONE tests: KB reclen write rewrite read reread 512000 32 71789 76155 94382 101022 512000 1024 75104 69860 64282 58181 1024000 1024 66226 60451 65974 61884 These speeds were achieved by: 1) Turning OFF ZIL Cache (write cache) 2) Using SSD drives for L2ARC (read cache) 3) Use NFSv3 as NFSv4 isn't supported by ESXi version 4.0. I have the following results using local disk. ZIL enabled, no SSD, HBA writeback enabled. KB reclen write rewritereadreread 524288 64 189783 200303 2827021 2847086 5242881024 201472 201837 3094348 3100793 10485761024 201883 201154 3076932 3087206 So ... I think your results were good relative to a 1Gb interface, but I think you're severely limited by the 1Gb as compared to local disk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Günther br br Disabling the ZIL (Don't) br This is relative. There are indeed situations where it's acceptable to disable ZIL. To make your choice, you need to understand a few things... #1 In the event of an ungraceful reboot, with your ZIL disabled, after reboot, your filesystem will be in a valid state, which is not the latest point of time before the crash. Your filesystem will be valid, but you will lose up to 30 seconds of the latest writes leading up to the crash. #2 Even if you have ZIL enabled, all of the above statements still apply to async writes. The ZIL only provides nonvolatile storage for sync writes. Given these facts, it quickly becomes much less scary to disable the ZIL, depending on what you use your server for. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
On 19 nov. 2010, at 15:04, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Günther br br Disabling the ZIL (Don't) br This is relative. There are indeed situations where it's acceptable to disable ZIL. To make your choice, you need to understand a few things... #1 In the event of an ungraceful reboot, with your ZIL disabled, after reboot, your filesystem will be in a valid state, which is not the latest point of time before the crash. Your filesystem will be valid, but you will lose up to 30 seconds of the latest writes leading up to the crash. #2 Even if you have ZIL enabled, all of the above statements still apply to async writes. The ZIL only provides nonvolatile storage for sync writes. Given these facts, it quickly becomes much less scary to disable the ZIL, depending on what you use your server for. Not to mention that in this particular scenario (local storage, local VM, loopback to ESXi) where the NFS server is only publishing to the local host, if the local host crashes, there are no other NFS clients involved that have local caches that will be out of sync with the storage. Cheers, Erik ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Replacing log devices takes ages
Disclaimer: Solaris 10 U8. I had an SSD die this morning and am in the process of replacing the 1GB partition which was part of a log mirror. The SSDs do nothing else. The resilver has been running for ~30m, and suggests it will finish sometime before Elvis returns from Andromeda, though perhaps only just barely (we'll probably have to run to the airport to meet him at security). scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go According to zpool iostat -v, the log contains ~900k of data on it. The disks are not particularly busy (c0t3d0 is the replacing disk): # iostat -xne c0t3d0 c0t5d0 5 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.20.10.65.0 0.0 0.00.05.7 0 0 0 0 0 0 c0t3d0 5.3 52.3 68.2 1694.1 0.0 0.20.04.2 0 2 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 3.0 112.60.9 6064.0 0.0 0.10.00.8 0 9 0 0 0 0 c0t3d0 6.4 118.8 39.5 6519.7 0.0 0.00.00.3 0 3 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 1.0 50.20.3 5068.8 0.0 1.40.0 27.5 0 6 0 0 0 0 c0t3d0 36.0 61.8 534.1 5921.6 0.0 0.50.05.5 0 6 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 58.00.0 1590.4 0.0 0.00.00.8 0 3 0 0 0 0 c0t3d0 39.2 67.0 651.3 1884.9 0.0 0.00.00.5 0 3 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 23.40.0 678.3 0.0 0.00.00.4 0 1 0 0 0 0 c0t3d0 11.8 30.6 135.0 1025.4 0.0 0.00.00.3 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 20.20.0 1045.0 0.0 0.00.01.2 0 1 0 0 0 0 c0t3d0 14.8 25.8 131.9 1335.7 0.0 0.00.00.4 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 33.00.0 2029.6 0.0 0.10.01.9 0 2 0 0 0 0 c0t3d0 1.8 37.6 37.9 2107.0 0.0 0.00.00.6 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 21.20.0 797.6 0.0 0.00.00.7 0 1 0 0 0 0 c0t3d0 12.2 22.8 111.9 823.2 0.0 0.00.00.4 0 1 2 0 0 2 c0t5d0 My question is twofold: Why do log mirrors need to resilver at all? Why does this seem like it's going to take a full day, if I'm lucky? (If the answer is: Shut up and upgrade, that's fine.) Cheers. -- bdha cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
i have the same problem with my 2HE supermicro server (24x2,5, connected via 6x mini SAS 8087) and no additional mounting possibilities for 2,5 or 3,5 drives. brbr on those machines i use one sas port (4 drives) of an old adaptec 3805 (i have used them in my pre zfs-times) to build a raid-1 + hotfix for esxi to boot from. the other 20 slots are connected to 3 lsi sas controller for pass-through - so i have 4 sas controller in these machines. brbr maybee the new ssd-drives mounted on a pci-e (ex ocz revo drive) may be an alternative. have anyone used them already with esxi? brbr gea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
On Fri, 19 Nov 2010 07:16:20 PST, Günther wrote: i have the same problem with my 2HE supermicro server (24x2,5, connected via 6x mini SAS 8087) and no additional mounting possibilities for 2,5 or 3,5 drives. brbr on those machines i use one sas port (4 drives) of an old adaptec 3805 (i have used them in my pre zfs-times) to build a raid-1 + hotfix for esxi to boot from. the other 20 slots are connected to 3 lsi sas controller for pass-through - so i have 4 sas controller in these machines. brbr maybee the new ssd-drives mounted on a pci-e (ex ocz revo drive) may be an alternative. have anyone used them already with esxi? brbr gea Hey - just as a side note.. Depending on what motherboard you use, you may be able to use this: MCP-220-82603-0N - Dual 2.5 fixed HDD tray kit for SC826 (for E-ATX X8 DP MB) I haven't used one yet myself but am currently planning a SMC build and contacted their support as I really did not want to have my system drives hanging off the controller. As far as I can tell from a picture they sent, it mounts on top of the motherboard itself somewhere where there is normally open space, and it can hold two 2.5 drives. So maybe give in touch with their support and see if you can use something similar. Cheers, Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing log devices takes ages
I'm not sure that leaving the ZIL enabled whilst replacing the log devices is a good idea? Also - I had no idea Elvis was coming back tomorrow! Sweet. ;-) --- W. A. Khushil Dep - khushil@gmail.com - 07905374843 Visit my blog at http://www.khushil.com/ On 19 November 2010 14:57, Bryan Horstmann-Allen b...@mirrorshades.netwrote: Disclaimer: Solaris 10 U8. I had an SSD die this morning and am in the process of replacing the 1GB partition which was part of a log mirror. The SSDs do nothing else. The resilver has been running for ~30m, and suggests it will finish sometime before Elvis returns from Andromeda, though perhaps only just barely (we'll probably have to run to the airport to meet him at security). scrub: resilver in progress for 0h25m, 3.15% done, 13h8m to go scrub: resilver in progress for 0h26m, 3.17% done, 13h36m to go scrub: resilver in progress for 0h27m, 3.18% done, 14h4m to go scrub: resilver in progress for 0h28m, 3.19% done, 14h32m to go scrub: resilver in progress for 0h29m, 3.20% done, 15h0m to go scrub: resilver in progress for 0h30m, 3.23% done, 15h25m to go scrub: resilver in progress for 0h31m, 3.25% done, 15h50m to go scrub: resilver in progress for 0h32m, 3.30% done, 16h7m to go scrub: resilver in progress for 0h33m, 3.34% done, 16h24m to go scrub: resilver in progress for 0h35m, 3.37% done, 16h43m to go scrub: resilver in progress for 0h36m, 3.39% done, 17h5m to go According to zpool iostat -v, the log contains ~900k of data on it. The disks are not particularly busy (c0t3d0 is the replacing disk): # iostat -xne c0t3d0 c0t5d0 5 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.20.10.65.0 0.0 0.00.05.7 0 0 0 0 0 0 c0t3d0 5.3 52.3 68.2 1694.1 0.0 0.20.04.2 0 2 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 3.0 112.60.9 6064.0 0.0 0.10.00.8 0 9 0 0 0 0 c0t3d0 6.4 118.8 39.5 6519.7 0.0 0.00.00.3 0 3 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 1.0 50.20.3 5068.8 0.0 1.40.0 27.5 0 6 0 0 0 0 c0t3d0 36.0 61.8 534.1 5921.6 0.0 0.50.05.5 0 6 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 58.00.0 1590.4 0.0 0.00.00.8 0 3 0 0 0 0 c0t3d0 39.2 67.0 651.3 1884.9 0.0 0.00.00.5 0 3 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 23.40.0 678.3 0.0 0.00.00.4 0 1 0 0 0 0 c0t3d0 11.8 30.6 135.0 1025.4 0.0 0.00.00.3 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 20.20.0 1045.0 0.0 0.00.01.2 0 1 0 0 0 0 c0t3d0 14.8 25.8 131.9 1335.7 0.0 0.00.00.4 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 33.00.0 2029.6 0.0 0.10.01.9 0 2 0 0 0 0 c0t3d0 1.8 37.6 37.9 2107.0 0.0 0.00.00.6 0 1 2 0 0 2 c0t5d0 extended device statistics errors --- r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device 0.0 21.20.0 797.6 0.0 0.00.00.7 0 1 0 0 0 0 c0t3d0 12.2 22.8 111.9 823.2 0.0 0.00.00.4 0 1 2 0 0 2 c0t5d0 My question is twofold: Why do log mirrors need to resilver at all? Why does this seem like it's going to take a full day, if I'm lucky? (If the answer is: Shut up and upgrade, that's fine.) Cheers. -- bdha cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
-Original Message- From: Edward Ned Harvey [mailto:sh...@nedharvey.com] Sent: Friday, November 19, 2010 8:03 AM To: Saxon, Will; 'Günther'; zfs-discuss@opensolaris.org Subject: RE: [zfs-discuss] Faster than 1G Ether... ESX to ZFS From: Saxon, Will [mailto:will.sa...@sage.com] In order to do this, you need to configure passthrough for the device at the host level (host - configuration - hardware - advanced settings). This Awesome. :-) The only problem is that once a device is configured to pass-thru to the guest VM, then that device isn't available for the host anymore. So you have to have your boot disks on a separate controller from the primary storage disks that are pass-thru to the guest ZFS server. For a typical ... let's say dell server ... that could be a problem. The boot disks would need to hold ESXi plus a ZFS server and then you can pass-thru the primary hotswappable storage HBA to the ZFS guest. Then the ZFS guest can export its storage back to the ESXi host via NFS or iSCSI... So all the remaining VM's can be backed by ZFS. Of course you have to configure ESXi to boot the ZFS guest before any of the other guests. The problem is just the boot device. One option is to boot from a USB dongle, but that's unattractive for a lot of reasons. Another option would be a PCIe storage device, which isn't too bad an idea. Anyone using PXE to boot ESXi? Got any other suggestions? In a typical dell server, there is no place to put a disk, which isn't attached via the primary hotswappable storage HBA. I suppose you could use a 1U rackmount server with only 2 internal disks, and add a 2nd HBA with external storage tray, to use as pass-thru to the ZFS guest. Well, with 4.1 ESXi does support boot from SAN. I guess that still presents a chicken-and-egg problem in this scenario, but maybe you have another san somewhere you can boot from. Also, most of the big name vendors have a USB or SD option for booting ESXi. I believe this is the 'ESXi Embedded' flavor vs. the typical 'ESXi Installable' that we're used to. I don't think it's a bad idea at all. I've got a not-quite-production system I'm booting off USB right now, and while it takes a really long time to boot it does work. I think I like the SD card option better though. What I am wondering is whether this is really worth it. Are you planning to share the storage out to other VM hosts, or are all the VMs running on the host using the 'local' storage? I know we like ZFS vs. traditional RAID and volume management, and I get that being able to boot any ZFS-capable OS is good for disaster recovery, but what I don't get is how this ends up working better than a larger dedicated ZFS system and a storage network. Is it cheaper over several hosts? Are you getting better performance through e.g. the vmxnet3 adapter and NFS than you would just using the disks directly? -Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Deduped zfs streams broken in post b134 ?
Hi, Here is a small script to test deduped zfs send stream: = #!/bin/bash ZFSPOOL=rpool ZFSDATASET=zfs-send-dedup-test dd if=/dev/random of=/var/tmp/testfile1 bs=512 count=10 zfs create $ZFSPOOL/$ZFSDATASET cp /var/tmp/testfile1 /$ZFSPOOL/$ZFSDATASET/testfile1 zfs snapshot $ZFSPOOL/$zfsdata...@snap1 cp /var/tmp/testfile1 /$ZFSPOOL/$ZFSDATASET/testfile2 zfs snapshot $ZFSPOOL/$zfsdata...@snap2 zfs send -D -R $ZFSPOOL/$zfsdata...@snap2 /var/tmp/ddtest-snap2.zfs zfs destroy -r $ZFSPOOL/$ZFSDATASET zfs receive -Fv $ZFSPOOL/$ZFSDATASET /var/tmp/ddtest-snap2.zfs = It works in OpenSolaris b134, but not in OpenIndiana b147, nor Solaris Express 11, where zfs receive exists on second incremental snapshot with error message: cannot receive incremental stream: invalid backup stream Does it look like a bug ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Deduped zfs streams broken in post b134 ?
Sry, the script was cut off, ending part is: mp/ddtest-snap2.zfs = It works in OpenSolaris b134, but not in OpenIndiana b147, nor Solaris Express 11, where zfs receive exists on second incremental snapshot with error message: cannot receive incremental stream: invalid backup stream Does it look like a bug ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
Also, most of the big name vendors have a USB or SD option for booting ESXi. I believe this is the 'ESXi Embedded' flavor vs. the typical 'ESXi Installable' that we're used to. I don't think it's a bad idea at all. I've got a not-quite-production system I'm booting off USB right now, and while it takes a really long time to boot it does work. I think I like the SD card option better though. i need 4gb extra space for the Nexenta zfs storage server. and it should not be as slow as a usb stick or management via web-gui is painfull slow. What I am wondering is whether this is really worth it. Are you planning to share the storage out to other VM hosts, or are all the VMs running on the host using the 'local' storage? I know we like ZFS vs. traditional RAID and volume management, and I get that being able to boot any ZFS-capable OS is good for disaster recovery, but what I don't get is how this ends up working better than a larger dedicated ZFS system and a storage network. Is it cheaper over several hosts? Are you getting better performance through e.g. the vmxnet3 adapter and NFS than you would just using the disks directly? mainly the storage is used via NFS for local vm's. but we share the nfs datastores also via cifs to have a simple move/ clone/ copy or backup. we also replicate datastores at least once per day to a second machine via incremental zfs send. we have or plan the same system on all of our esxi machines. each esxi has its own local san-like storage server. (i do not like a to have one big san-storage to be a single point of failure + high speed san cabling. so we have 4 esxi server, each with its own virtualized zfs-storage server + three common used backup systems - connected via 10Gbe VLAN). we formerly had separate storage and esxi server but with pass-through we could integrate the two and reduce our hardware that could fail and cabling at a rate of 50%. gea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss