Re: [pve-devel] ZFS Storage Patches
applied, thanks! My submission was rejected previously because I was not a member of pve- devel mailing list. I added myself to this list and I'm now re-submitting my patches. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. Michael, I can't seem to recreate this. I converted a VM with the sparse option to a template and created a linked clone, and then backed up the linked clone and didn't get this error message. This is a log of my linked clone backup: INFO: starting new backup job: vzdump 103 --remove 0 --mode snapshot --compress lzo --storage chatsworth_backup --node pvetest INFO: Starting Backup of VM 103 (qemu) INFO: status = stopped INFO: update VM 103: -lock backup INFO: backup mode: stop INFO: ionice priority: 7 INFO: creating archive '/mnt/pve/chatsworth_backup/dump/vzdump-qemu-103-2014_03_17-11_51_46.vma.lzo' INFO: starting kvm to execute backup task INFO: started backup task 'b44451e7-57d6-49cd-bcc3-61d7a7679417' INFO: status: 0% (155713536/34359738368), sparse 0% (155713536), duration 3, 51/0 MB/s INFO: status: 1% (363397120/34359738368), sparse 1% (363397120), duration 7, 51/0 MB/s INFO: status: 2% (725680128/34359738368), sparse 2% (725680128), duration 14, 51/0 MB/s INFO: status: 3% (1037828096/34359738368), sparse 3% (1037828096), duration 20, 52/0 MB/s ... INFO: status: 100% (34359738368/34359738368), sparse 100% (34359738368), duration 663, 50/0 MB/s INFO: transferred 34359 MB in 663 seconds (51 MB/s) INFO: stopping kvm after backup task INFO: archive file size: 2MB INFO: Finished Backup of VM 103 (00:11:07) INFO: Backup job finished successfully TASK OK Is there any pointers you could give me so I can generate the error? Thank you. On Mon, Mar 17, 2014 at 11:29 AM, Chris Allen ca.al...@gmail.com wrote: It might be worth trying to make a feature request to the qemu-img team to add this option? I'll pursue getting this feature added to qemu-img. Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. Got it, thanks. I'll look into fixing that. On Mon, Mar 17, 2014 at 11:20 AM, Michael Rasmussen m...@datanom.netwrote: On Mon, 17 Mar 2014 11:08:41 -0700 Chris Allen ca.al...@gmail.com wrote: Connecting to the target with a host group defined fails unless the initiator has been added to the host group, as it should be by design. If I manually add the initiator name (iqn.2008-11.org.linux-kvm:vm-name) to the host group on the server then I can connect to and use the volume. Since, like you pointed out Micheal, the initiator name is based on the VM's name, maintaining the host-group association, manually or automatically is a pain. Too bad we can't force an initiator name. It works with both hostgroup and targetgroup starting the VM provided that you add -iscsi 'initiator-name=initiator-name' since this option will be transferred to libiscsi. The problem is qemu-img which is used extensively in storage scripts in proxmox. It might be worth trying to make a feature request to the qemu-img team to add this option? Did you try using only a target group, with no host group definition? This seems to work fine for me. It might be worth it to keep just the target group part of the patch and scrap the host group. No, I haven't tried this but I think we should keep both hoping for qemu-img team to add the feature. Until then we simply ignore the option. Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: echo Congratulations. You aren't running Eunice. -- Larry Wall in Configure from the perl distribution ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Tue, 18 Mar 2014 10:00:48 -0700 Chris Allen ca.al...@gmail.com wrote: Michael, I can't seem to recreate this. I converted a VM with the sparse option to a template and created a linked clone, and then backed up the linked clone and didn't get this error message. This is a log of my linked clone backup: I will try again later today and see if it was just a blunder. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: BREAKFAST.COM Halted... Cereal Port Not Responding. signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Michael, Thanks for testing these. If I supply you with future patches I'll try to be a little bit more rigorous with testing before I send them in. Connecting to the target with a host group defined fails unless the initiator has been added to the host group, as it should be by design. If I manually add the initiator name (iqn.2008-11.org.linux-kvm:vm-name) to the host group on the server then I can connect to and use the volume. Since, like you pointed out Micheal, the initiator name is based on the VM's name, maintaining the host-group association, manually or automatically is a pain. Too bad we can't force an initiator name. Did you try using only a target group, with no host group definition? This seems to work fine for me. It might be worth it to keep just the target group part of the patch and scrap the host group. On Sat, Mar 15, 2014 at 5:31 PM, Michael Rasmussen m...@datanom.net wrote: Hi all, I have now tested the patch set. See attached PDF. Conclusion: Apart from host and target group and problems parsing the sparse option when a VM has been converted to a template the patch works. However, due to limitations in current backup/restore, cloning and storage migration sparse images only remains sparse as long as you don't restore, clone or migrate the storage. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Money isn't everything -- but it's a long way ahead of what comes next. -- Sir Edmond Stockdale ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Mon, 17 Mar 2014 11:08:41 -0700 Chris Allen ca.al...@gmail.com wrote: Connecting to the target with a host group defined fails unless the initiator has been added to the host group, as it should be by design. If I manually add the initiator name (iqn.2008-11.org.linux-kvm:vm-name) to the host group on the server then I can connect to and use the volume. Since, like you pointed out Micheal, the initiator name is based on the VM's name, maintaining the host-group association, manually or automatically is a pain. Too bad we can't force an initiator name. It works with both hostgroup and targetgroup starting the VM provided that you add -iscsi 'initiator-name=initiator-name' since this option will be transferred to libiscsi. The problem is qemu-img which is used extensively in storage scripts in proxmox. It might be worth trying to make a feature request to the qemu-img team to add this option? Did you try using only a target group, with no host group definition? This seems to work fine for me. It might be worth it to keep just the target group part of the patch and scrap the host group. No, I haven't tried this but I think we should keep both hoping for qemu-img team to add the feature. Until then we simply ignore the option. Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: echo Congratulations. You aren't running Eunice. -- Larry Wall in Configure from the perl distribution signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
It might be worth trying to make a feature request to the qemu-img team to add this option? I'll pursue getting this feature added to qemu-img. Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. Got it, thanks. I'll look into fixing that. On Mon, Mar 17, 2014 at 11:20 AM, Michael Rasmussen m...@datanom.net wrote: On Mon, 17 Mar 2014 11:08:41 -0700 Chris Allen ca.al...@gmail.com wrote: Connecting to the target with a host group defined fails unless the initiator has been added to the host group, as it should be by design. If I manually add the initiator name (iqn.2008-11.org.linux-kvm:vm-name) to the host group on the server then I can connect to and use the volume. Since, like you pointed out Micheal, the initiator name is based on the VM's name, maintaining the host-group association, manually or automatically is a pain. Too bad we can't force an initiator name. It works with both hostgroup and targetgroup starting the VM provided that you add -iscsi 'initiator-name=initiator-name' since this option will be transferred to libiscsi. The problem is qemu-img which is used extensively in storage scripts in proxmox. It might be worth trying to make a feature request to the qemu-img team to add this option? Did you try using only a target group, with no host group definition? This seems to work fine for me. It might be worth it to keep just the target group part of the patch and scrap the host group. No, I haven't tried this but I think we should keep both hoping for qemu-img team to add the feature. Until then we simply ignore the option. Your patch still needs to handle the case where a VM has been converted to a template since image handling after conversion complains of an unknown setting: sparse. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: echo Congratulations. You aren't running Eunice. -- Larry Wall in Configure from the perl distribution ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Hi Chris, How have you been able to use your patch adding host and target group? As far as I have been able to test these options are never send with the request for access to the LUN. So a request like this: qemu-img info iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0 Will give this response: qemu-img: iSCSI: Failed to connect to LUN : SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_OPERATION_CODE(0x2000) qemu-img: Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Invalid argument The only way I get a correct response is when I define target and host group on the view to all? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: No matter how subtle the wizard, a knife in the shoulder blades will seriously cramp his style. signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Sat, 15 Mar 2014 21:22:20 +0100 Michael Rasmussen m...@datanom.net wrote: So a request like this: qemu-img info iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0 Will give this response: qemu-img: iSCSI: Failed to connect to LUN : SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_OPERATION_CODE(0x2000) qemu-img: Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Invalid argument The only way I get a correct response is when I define target and host group on the view to all? Found out that for kvm the following option to the start command will allow you to start a VM when LUN's are protected by host and target group: -iscsi 'initiator-name=iqn.1993-08.org.debian:01:7f36313fb7bd' Sadly this option is not available using qemu-img -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Next Friday will not be your lucky day. As a matter of fact, you don't have a lucky day this year. signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Michael, Bummer, sorry it's not working. When I'm back in the office on Monday, I'll play with it. I've been using it with a defined target group and no host group so far without any problems. I have not tested a host group. I mainly wanted to be able to define a target group, because the storage server sits on multiple networks so I have target portal groups defined to control initiator access. On Sat, Mar 15, 2014 at 3:42 PM, Michael Rasmussen m...@datanom.net wrote: On Sat, 15 Mar 2014 21:22:20 +0100 Michael Rasmussen m...@datanom.net wrote: So a request like this: qemu-img info iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0 Will give this response: qemu-img: iSCSI: Failed to connect to LUN : SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:INVALID_OPERATION_CODE(0x2000) qemu-img: Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Could not open 'iscsi://192.168.3.110/iqn.2010-09.org.napp-it:omnios/0': Invalid argument The only way I get a correct response is when I define target and host group on the view to all? Found out that for kvm the following option to the start command will allow you to start a VM when LUN's are protected by host and target group: -iscsi 'initiator-name=iqn.1993-08.org.debian:01:7f36313fb7bd' Sadly this option is not available using qemu-img -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Next Friday will not be your lucky day. As a matter of fact, you don't have a lucky day this year. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Sat, 15 Mar 2014 16:26:46 -0700 Chris Allen ca.al...@gmail.com wrote: Michael, Bummer, sorry it's not working. When I'm back in the office on Monday, I'll play with it. I've been using it with a defined target group and no host group so far without any problems. I have not tested a host group. I mainly wanted to be able to define a target group, because the storage server sits on multiple networks so I have target portal groups defined to control initiator access. This is reason why it is failing: This patch updates the iscsi layer to automatically pick a 'unique' initiator-name based on the name of the vm in case the user has not set an explicit iqn-name to use. https://kernel.googlesource.com/pub/scm/virt/kvm/nab/qemu-kvm/+/31459f463a32dc6c1818fa1aaa3d1f56c367b718 Since qemu-img does not parse options to the iscsi block device it will always pick a unique initiator-named based on the name of the vm. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Likewise, the national appetizer, brine-cured herring with raw onions, wins few friends, Germans excepted. -- Darwin Porter Scandinavia On $50 A Day signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. Thank you, I was unaware of this. BTW. Have you made any performance measures sparse vs non-sparse and write cache vs no write cache? No I haven't. As far as I understand it sparse should not affect performance whatsoever, it only changes whether or not a reservation is created on the ZVOL. Turning of write caching on the LU should decrease performance, dramatically so, if you do not have a separate and very fast ZIL device (eg. ZeusRAM). Every block write to the ZVOL will be done synchronously when write caching is turned off. I've done some testing with regards to block size, compression, and dedup. I wanted sparse support for myself and I figured while I was there I might as well add a flag for turning off write caching. For people with the right (and expensive!) hardware the added safety of no write caching might be worth it. Have you tested the ZFS storage plugin on Solaris 11.1? I first tried using it with 11.1, but they changed how the LUN assignment for the views works. In 11.0 and OmniOS the first available LUN will get used when a new view is created if no LUN is given. But in 11.1 it gets populated with a string that says AUTO. This of course means PVE can't connect to the volume because it can't resolve the LUN. Unfortunately I couldn't find anything in the 11.1 documentation that described how to get the LUN. I'm assuming there's some kind of mechanism in 11.1 where you can get the number on the fly, as it must handle them dynamically now. But after a lot of Googling and fiddling around I gave up and switched to OmniOS. I don't have a support contract with Oracle so that was a no go. Anyway, just thought I'd mention that in case you knew about it. In addition to that problem 11.1 also has a bug in the handling of the iSCSI feature Immediate Data. It doesn't implement it properly according to the iSCSI RFC, and so you need to turn of Immediate Data on the client in order to connect. The patch is available to Oracle paying support customers only. On Thu, Mar 13, 2014 at 6:08 PM, Michael Rasmussen m...@datanom.net wrote: On Thu, 13 Mar 2014 11:45:52 -0700 Chris Allen ca.al...@gmail.com wrote: Yes. Thanks for including the patches. I was unaware of ZFSEdit.js as I wasn't testing this on new version of pve-manager (I was using 3.1 release). It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. BTW. Have you made any performance measures sparse vs non-sparse and write cache vs no write cache? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Q: What do you call the money you pay to the government when you ride into the country on the back of an elephant? A: A howdah duty. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Fri, 14 Mar 2014 10:11:17 -0700 Chris Allen ca.al...@gmail.com wrote: It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. I forgot to mention that at the moment the code for creating ZFS storage is commented out in /usr/share/pve-manager/ext4/pvemanagerlib.js line 20465-20473 No I haven't. As far as I understand it sparse should not affect performance whatsoever, it only changes whether or not a reservation is created on the ZVOL. Turning of write caching on the LU should decrease performance, dramatically so, if you do not have a separate and very fast ZIL device (eg. ZeusRAM). Every block write to the ZVOL will be done synchronously when write caching is turned off. I have already made some test and I have not be able to make any conclusive tests proving performance should be hurt by using sparse. Is sparse a way to provision more than a 100% then? I've done some testing with regards to block size, compression, and dedup. I wanted sparse support for myself and I figured while I was there I might as well add a flag for turning off write caching. For people with the right (and expensive!) hardware the added safety of no write caching might be worth it. I have done the same. For me 8k block size for volumes seems to be given more write speed. Regards write caching: Why not simply use sync directly on the volume? Have you tested the ZFS storage plugin on Solaris 11.1? I first tried using it with 11.1, but they changed how the LUN assignment for the views works. In 11.0 and OmniOS the first available LUN will get used when a new view is created if no LUN is given. But in 11.1 it gets populated with a string that says AUTO. This of course means PVE can't connect to the volume because it can't resolve the LUN. Unfortunately I couldn't find anything in the 11.1 documentation that described how to get the LUN. I'm assuming there's some kind of mechanism in 11.1 where you can get the number on the fly, as it must handle them dynamically now. But after a lot of Googling and fiddling around I gave up and switched to OmniOS. I don't have a support contract with Oracle so that was a no go. Anyway, just thought I'd mention that in case you knew about it. In addition to that problem 11.1 also has a bug in the handling of the iSCSI feature Immediate Data. It doesn't implement it properly according to the iSCSI RFC, and so you need to turn of Immediate Data on the client in order to connect. The patch is available to Oracle paying support customers only. I have made no tests on Solaris - licens costs is out of my league. I regularly test FreeBSD, Linux and Omnios. In production I only use Omnios (15008 but will migrate all to r151014 when this is released and then only use LTS in the future). -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: I never failed to convince an audience that the best thing they could do was to go away. signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
I have already made some test and I have not be able to make any conclusive tests proving performance should be hurt by using sparse. Yeah it shouldn't affect the ZFS mechanics at all, the ZVOL will just lack a reservation. Is sparse a way to provision more than a 100% then? Yes. That, and it enables you to take advantage of compression on the volume. Without sparse the volume is always going to take away the same amount of space from the pool (due to the hard reservation) regardless of whether or not compression and/or dedup is on. You just have to be careful to monitor pool capacity. Bad things will happen if your SAN server runs out of space... I attached a quick and dirty script I wrote to monitor pool capacity and status, and send an e-mail alert if the pool degrades or a capacity threshold is hit. I run it from cron every 30 minutes. For me 8k block size for volumes seems to be given more write speed. 8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit easily, with not much throughput, and I get a lot of IO delay on VMs when the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with lz4 compression. If you go the sparse route then compression is a no brainer as it accelerates performance on the underlying storage considerably. Compression will lower your IOPS and data usage both are good things for performance. ZFS performance drops as usage rises and gets really ugly at around 90% capacity. Some people say it starts to drop with as little as 10% used, but I have not tested this. With 16k block sizes I'm getting good compression ratios - my best volume is 2.21x, my worst 1.33x, and the average is 1.63x. So as you can see a lot of the time my real block size on disk is going to be effectively smaller than 16k. The tradeoff here is that compression ratios will go up with a larger block size, but you'll have to do larger operations and thus more waste will occur when the VM is doing small I/O. With a large block size on a busy SAN your I/O is going to get fragmented before it hits the disk anyway, so I think 16k is good balance. I only have 7200 RPM drives in my array, but a ton of RAM and a big ZFS cache device, which is another reason I went with 16k, to maximize what I get when I can get it. I think with 15k RPM drives 8k block size might be better, as your IOPS limit will be roughly double that of 7200 RPM. Dedup did not work out well for me. Aside from the huge memory consumption, it didn't save all that much space and to save the max space you need to match the VM's filesystem cluster size to the ZVOL block size. Which means 4k for ext4 and NTFS (unless you change it during a Windows install). Also dedup really really slows down zpool scrubbing and possibly rebuild. This is one of the main reasons I avoid it. I don't want scrubs to take forever, when I'm paranoid of something potentially being wrong. Regards write caching: Why not simply use sync directly on the volume? Good question. I don't know. I have made no tests on Solaris - licens costs is out of my league. I regularly test FreeBSD, Linux and Omnios. In production I only use Omnios (15008 but will migrate all to r151014 when this is released and then only use LTS in the future). I'm in the process of trying to run away from all things Oracle at my company. We keep getting burned by them. It's so freakin' expensive, and they hold you over a barrel with patches for both hardware and software. We bought some very expensive hardware from them, and a management controller for a blade chassis had major bugs to the point it was practically unusable out of the box. Oracle would not under any circumstance supply us with the new firmware unless we spent boatloads of cash for a maintenance contract. We ended up doing this because we needed the controller to work as advertised. This is what annoys me the most with them - you buy a product and it doesn't do what is written on the box and then you have to pay tons extra for it to do what they said it would do when you bought it. I miss Sun... On Fri, Mar 14, 2014 at 10:52 AM, Michael Rasmussen m...@datanom.net wrote: On Fri, 14 Mar 2014 10:11:17 -0700 Chris Allen ca.al...@gmail.com wrote: It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. I forgot to mention that at the moment the code for creating ZFS storage is commented out in /usr/share/pve-manager/ext4/pvemanagerlib.js line 20465-20473 No I haven't. As far as I understand it sparse should not affect performance whatsoever, it only changes whether or not a reservation is created on the ZVOL. Turning of write caching on the LU should decrease performance, dramatically so, if you do not have a separate and very fast ZIL device (eg. ZeusRAM). Every block write to the ZVOL will be done synchronously when write caching is
Re: [pve-devel] ZFS Storage Patches
Oops forgot to attach the script. Here's the script I mentioned. On Fri, Mar 14, 2014 at 12:21 PM, Chris Allen ca.al...@gmail.com wrote: I have already made some test and I have not be able to make any conclusive tests proving performance should be hurt by using sparse. Yeah it shouldn't affect the ZFS mechanics at all, the ZVOL will just lack a reservation. Is sparse a way to provision more than a 100% then? Yes. That, and it enables you to take advantage of compression on the volume. Without sparse the volume is always going to take away the same amount of space from the pool (due to the hard reservation) regardless of whether or not compression and/or dedup is on. You just have to be careful to monitor pool capacity. Bad things will happen if your SAN server runs out of space... I attached a quick and dirty script I wrote to monitor pool capacity and status, and send an e-mail alert if the pool degrades or a capacity threshold is hit. I run it from cron every 30 minutes. For me 8k block size for volumes seems to be given more write speed. 8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit easily, with not much throughput, and I get a lot of IO delay on VMs when the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with lz4 compression. If you go the sparse route then compression is a no brainer as it accelerates performance on the underlying storage considerably. Compression will lower your IOPS and data usage both are good things for performance. ZFS performance drops as usage rises and gets really ugly at around 90% capacity. Some people say it starts to drop with as little as 10% used, but I have not tested this. With 16k block sizes I'm getting good compression ratios - my best volume is 2.21x, my worst 1.33x, and the average is 1.63x. So as you can see a lot of the time my real block size on disk is going to be effectively smaller than 16k. The tradeoff here is that compression ratios will go up with a larger block size, but you'll have to do larger operations and thus more waste will occur when the VM is doing small I/O. With a large block size on a busy SAN your I/O is going to get fragmented before it hits the disk anyway, so I think 16k is good balance. I only have 7200 RPM drives in my array, but a ton of RAM and a big ZFS cache device, which is another reason I went with 16k, to maximize what I get when I can get it. I think with 15k RPM drives 8k block size might be better, as your IOPS limit will be roughly double that of 7200 RPM. Dedup did not work out well for me. Aside from the huge memory consumption, it didn't save all that much space and to save the max space you need to match the VM's filesystem cluster size to the ZVOL block size. Which means 4k for ext4 and NTFS (unless you change it during a Windows install). Also dedup really really slows down zpool scrubbing and possibly rebuild. This is one of the main reasons I avoid it. I don't want scrubs to take forever, when I'm paranoid of something potentially being wrong. Regards write caching: Why not simply use sync directly on the volume? Good question. I don't know. I have made no tests on Solaris - licens costs is out of my league. I regularly test FreeBSD, Linux and Omnios. In production I only use Omnios (15008 but will migrate all to r151014 when this is released and then only use LTS in the future). I'm in the process of trying to run away from all things Oracle at my company. We keep getting burned by them. It's so freakin' expensive, and they hold you over a barrel with patches for both hardware and software. We bought some very expensive hardware from them, and a management controller for a blade chassis had major bugs to the point it was practically unusable out of the box. Oracle would not under any circumstance supply us with the new firmware unless we spent boatloads of cash for a maintenance contract. We ended up doing this because we needed the controller to work as advertised. This is what annoys me the most with them - you buy a product and it doesn't do what is written on the box and then you have to pay tons extra for it to do what they said it would do when you bought it. I miss Sun... On Fri, Mar 14, 2014 at 10:52 AM, Michael Rasmussen m...@datanom.netwrote: On Fri, 14 Mar 2014 10:11:17 -0700 Chris Allen ca.al...@gmail.com wrote: It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. I forgot to mention that at the moment the code for creating ZFS storage is commented out in /usr/share/pve-manager/ext4/pvemanagerlib.js line 20465-20473 No I haven't. As far as I understand it sparse should not affect performance whatsoever, it only changes whether or not a reservation is created on the ZVOL. Turning of write caching on the LU
Re: [pve-devel] ZFS Storage Patches
On Fri, 14 Mar 2014 12:21:43 -0700 Chris Allen ca.al...@gmail.com wrote: 8k for me too is much better than 4k. With 4k I tend to hit my IOPS limit easily, with not much throughput, and I get a lot of IO delay on VMs when the SAN is fairly busy. Currently I'm leaning towards 16k, sparse, with lz4 compression. If you go the sparse route then compression is a no brainer as it accelerates performance on the underlying storage considerably. Compression will lower your IOPS and data usage both are good things for performance. ZFS performance drops as usage rises and gets really ugly at around 90% capacity. Some people say it starts to drop with as little as 10% used, but I have not tested this. With 16k block sizes I'm getting good compression ratios - my best volume is 2.21x, my worst 1.33x, and the average is 1.63x. So as you can see a lot of the time my real block size on disk is going to be effectively smaller than 16k. The tradeoff here is that compression ratios will go up with a larger block size, but you'll have to do larger operations and thus more waste will occur when the VM is doing small I/O. With a large block size on a busy SAN your I/O is going to get fragmented before it hits the disk anyway, so I think 16k is good balance. I only have 7200 RPM drives in my array, but a ton of RAM and a big ZFS cache device, which is another reason I went with 16k, to maximize what I get when I can get it. I think with 15k RPM drives 8k block size might be better, as your IOPS limit will be roughly double that of 7200 RPM. nice catch. I haven't thought of this. I will experiment some more with sparse volumes. I already have lz4 activated on all my pools and can proof that performance actually increases using compression. Hint: If you haven't already tested this you should try making some performance tests using RAID10. 12 disks (4 vdevs with a 3 disk mirror gives excellent speed/IO with reasonable security (you can loose 2 disks in any vdev and still not loose any data from the pool) Dedup did not work out well for me. Aside from the huge memory consumption, it didn't save all that much space and to save the max space you need to match the VM's filesystem cluster size to the ZVOL block size. Which means 4k for ext4 and NTFS (unless you change it during a Windows install). Also dedup really really slows down zpool scrubbing and possibly rebuild. This is one of the main reasons I avoid it. I don't want scrubs to take forever, when I'm paranoid of something potentially being wrong. I don't recall anybody have provided a proof for anything good coming out of using dedub. Regards write caching: Why not simply use sync directly on the volume? Good question. I don't know. If you use Omnios you should install napp-it. With napp-it administration of the Omnios storage cluster is a breeze. Changing caching policy is two clicks with a mouse. I'm in the process of trying to run away from all things Oracle at my company. We keep getting burned by them. It's so freakin' expensive, and they hold you over a barrel with patches for both hardware and software. We bought some very expensive hardware from them, and a management controller for a blade chassis had major bugs to the point it was practically unusable out of the box. Oracle would not under any circumstance supply us with the new firmware unless we spent boatloads of cash for a maintenance contract. We ended up doing this because we needed the controller to work as advertised. This is what annoys me the most with them - you buy a product and it doesn't do what is written on the box and then you have to pay tons extra for it to do what they said it would do when you bought it. I miss Sun... Hehe, more or less the same story here. We stick to Oracle database and application servers though for mission critical data since having a comparable setup from MS puts huge demands on hardware. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: How should I know if it works? That's what beta testers are for. I only coded it. -- Attributed to Linus Torvalds, somewhere in a posting signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Fri, 14 Mar 2014 12:24:10 -0700 Chris Allen ca.al...@gmail.com wrote: Oops forgot to attach the script. Here's the script I mentioned. Thanks. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: * bma is a yank * Knghtbrd is a Knghtbrd * dhd is also a yank * Espy is evil * Knghtbrd believes Espy signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Fri, 14 Mar 2014 12:24:10 -0700 Chris Allen ca.al...@gmail.com wrote: Oops forgot to attach the script. Here's the script I mentioned. I have added this line so that my MUA can sort the mail correct: msg += 'Date: %s \n' % time.strftime(%a, %d %b %Y %X %z) -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Children seldom misquote you. In fact, they usually repeat word for word what you shouldn't have said. signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Hi Michael, would you mind to review and test those patches? You are the author of the ZFS plugin, so I guess it is best if you do the review. From: pve-devel [mailto:pve-devel-boun...@pve.proxmox.com] On Behalf Of Chris Allen Sent: Mittwoch, 12. März 2014 18:14 To: pve-devel@pve.proxmox.com Subject: [pve-devel] ZFS Storage Patches My submission was rejected previously because I was not a member of pve-devel mailing list. I added myself to this list and I'm now re-submitting my patches. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Hi Dietmar, On Thu, 13 Mar 2014 07:00:10 + Dietmar Maurer diet...@proxmox.com wrote: would you mind to review and test those patches? You are the author of the ZFS plugin, so I guess it is best if you do the review. I will do the testing tomorrow or Saturday. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Hodie natus est radici frater. [ Unto the root is born a brother ] signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Thanks, I don't have time myself to test them. (I review them fastly, seem to be good) - Mail original - De: Michael Rasmussen m...@datanom.net À: Dietmar Maurer diet...@proxmox.com Cc: pve-devel@pve.proxmox.com Envoyé: Jeudi 13 Mars 2014 08:09:52 Objet: Re: [pve-devel] ZFS Storage Patches Hi Dietmar, On Thu, 13 Mar 2014 07:00:10 + Dietmar Maurer diet...@proxmox.com wrote: would you mind to review and test those patches? You are the author of the ZFS plugin, so I guess it is best if you do the review. I will do the testing tomorrow or Saturday. -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Hodie natus est radici frater. [ Unto the root is born a brother ] ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Wed, 12 Mar 2014 10:14:02 -0700 Chris Allen ca.al...@gmail.com wrote: My submission was rejected previously because I was not a member of pve-devel mailing list. I added myself to this list and I'm now re-submitting my patches. I have just briefly looked over the patch and can see that some patching are also required in pve-manager (ZFSEdit.js). Do you want me to do this? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: The other day I... uh, no, that wasn't me. -- Steven Wright signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Yes. Thanks for including the patches. I was unaware of ZFSEdit.js as I wasn't testing this on new version of pve-manager (I was using 3.1 release). On Thu, Mar 13, 2014 at 11:09 AM, Michael Rasmussen m...@datanom.net wrote: On Wed, 12 Mar 2014 10:14:02 -0700 Chris Allen ca.al...@gmail.com wrote: My submission was rejected previously because I was not a member of pve-devel mailing list. I added myself to this list and I'm now re-submitting my patches. I have just briefly looked over the patch and can see that some patching are also required in pve-manager (ZFSEdit.js). Do you want me to do this? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: The other day I... uh, no, that wasn't me. -- Steven Wright ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
On Thu, 13 Mar 2014 11:45:52 -0700 Chris Allen ca.al...@gmail.com wrote: Yes. Thanks for including the patches. I was unaware of ZFSEdit.js as I wasn't testing this on new version of pve-manager (I was using 3.1 release). It was also part of latest 3.1. Double-click the mouse over your storage specification in Datacenter-storage and the panel pops up. Patched panel attached. BTW. Have you made any performance measures sparse vs non-sparse and write cache vs no write cache? -- Hilsen/Regards Michael Rasmussen Get my public GnuPG keys: michael at rasmussen dot cc http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xD3C9A00E mir at datanom dot net http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE501F51C mir at miras dot org http://pgp.mit.edu:11371/pks/lookup?op=getsearch=0xE3E80917 -- /usr/games/fortune -es says: Q: What do you call the money you pay to the government when you ride into the country on the back of an elephant? A: A howdah duty. attachment: Screenshot.png signature.asc Description: PGP signature ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] ZFS Storage Patches
Hi Chris, I am working on a refactor of ZFS Plugin which will decouple specific LUN implementations from the driver, by providing a single interface for LUN implementation by means of invoking an external script/program/binary. As such, I will try to review your patches and incorporate what is not specific to any single LUN.. Regards On Wed, Mar 12, 2014 at 6:14 PM, Chris Allen ca.al...@gmail.com wrote: My submission was rejected previously because I was not a member of pve-devel mailing list. I added myself to this list and I'm now re-submitting my patches. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel