Re: [ovirt-users] oVirt 3.5 problem with gluster, can't set the group and can't find ovirt packages
Hey again I keep getting this still: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Still no luck after reinstall - Lars On 19/04/15 18:05, Lars Nielsen wrote: Hey Will try it. Thank you very much - Lars On 19 Apr 2015, at 12:35, Bernd Broermann be...@broermann.com wrote: Hi, I saw the same behavior in past. I solved it by reinstall glusterfs-server package. rpm -qf /var/lib/glusterd/groups/virt glusterfs-server-3.6.2-1.el7.x86_64 yum reinstall glusterfs-server Might fix it you also. Bernd Am 17.04.2015 um 19:52 schrieb Lars Nielsen: Hey again :) I have a problem with ovirt 3.5. I am trying to setup oVirt 3.5 and using gluster following this guide: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ However when I run the install command: yum install -y ovirt-hosted-engine-setup I get the output: No package overt-hosted-engine-setup available Error nothing to do. But I have run: yum localinstall -y http://resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm I get no error, and I have checked the log files. And when I run this command: gluster volume set engine group virt I get this output: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Hope some one can help. Thanks in advance - Lars Nielsen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Bernd Broermann Wellingsbütteler Landstr. 241 22337 Hamburg 0172/2982498 040/5370 http://www.broermann.com RHCE - Redhat Certified Engineer ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Med venlig hilsen / Best Regards Lars Nielsen Student developer at Steinwurf l...@steinwurf.com ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Options not being passed fence_ipmilan, Ovirt3.5 on Centos 7.1 hosts
Hi Eli, All nodes are the same in the cluster (fresh install): [mpc-ovirt-node03 ~]# yum list installed *OVIRT* *VDSM* *FENCE* Loaded plugins: fastestmirror, langpacks, priorities Loading mirror speeds from cached hostfile * base: centos.mirror.rafal.ca * epel: fedora-epel.mirror.iweb.com * extras: less.cogeco.net * ovirt-3.5: resources.ovirt.org * ovirt-3.5-epel: fedora-epel.mirror.iweb.com * rpmforge: repoforge.mirror.constant.com * updates: centos.mirror.rafal.ca 183 packages excluded due to repository priority protections Installed Packages fence-agents-all.x86_64 4.0.11-11.el7_1 @updates fence-agents-apc.x86_64 4.0.11-11.el7_1 @updates fence-agents-apc-snmp.x86_64 4.0.11-11.el7_1 @updates fence-agents-bladecenter.x86_64 4.0.11-11.el7_1 @updates fence-agents-brocade.x86_64 4.0.11-11.el7_1 @updates fence-agents-cisco-mds.x86_64 4.0.11-11.el7_1 @updates fence-agents-cisco-ucs.x86_64 4.0.11-11.el7_1 @updates fence-agents-common.x86_64 4.0.11-11.el7_1 @updates fence-agents-drac5.x86_64 4.0.11-11.el7_1 @updates fence-agents-eaton-snmp.x86_64 4.0.11-11.el7_1 @updates fence-agents-eps.x86_64 4.0.11-11.el7_1 @updates fence-agents-hpblade.x86_64 4.0.11-11.el7_1 @updates fence-agents-ibmblade.x86_64 4.0.11-11.el7_1 @updates fence-agents-ifmib.x86_64 4.0.11-11.el7_1 @updates fence-agents-ilo-mp.x86_64 4.0.11-11.el7_1 @updates fence-agents-ilo-ssh.x86_64 4.0.11-11.el7_1 @updates fence-agents-ilo2.x86_64 4.0.11-11.el7_1 @updates fence-agents-intelmodular.x86_64 4.0.11-11.el7_1 @updates fence-agents-ipdu.x86_64 4.0.11-11.el7_1 @updates fence-agents-ipmilan.x86_64 4.0.11-11.el7_1 @updates fence-agents-kdump.x86_64 4.0.11-11.el7_1 @updates fence-agents-rhevm.x86_64 4.0.11-11.el7_1 @updates fence-agents-rsb.x86_64 4.0.11-11.el7_1 @updates fence-agents-scsi.x86_64 4.0.11-11.el7_1 @updates fence-agents-vmware-soap.x86_64 4.0.11-11.el7_1 @updates fence-agents-wti.x86_64 4.0.11-11.el7_1 @updates fence-virt.x86_64 0.3.2-1.el7 @base libgovirt.x86_64 0.3.1-3.el7 @base ovirt-engine-sdk-python.noarch 3.5.1.0-1.el7.centos @ovirt-3.5 ovirt-host-deploy.noarch 1.3.1-1.el7 @ovirt-3.5 ovirt-hosted-engine-ha.noarch 1.2.5-1.el7.centos @ovirt-3.5 ovirt-hosted-engine-setup.noarch 1.2.2-1.el7.centos @ovirt-3.5 ovirt-release35.noarch 002-1 @/ovirt-release35 vdsm.x86_64 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-cli.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-gluster.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-jsonrpc.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-python.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-python-zombiereaper.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-xmlrpc.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 vdsm-yajsonrpc.noarch 4.16.10-8.gitc937927.el7 @ovirt-3.5 Cheers, Mike On 22 April 2015 at 03:00, Eli Mesika emes...@redhat.com wrote: - Original Message - From: Mike Lindsay mike.lind...@cbc.ca To: users@ovirt.org Sent: Tuesday, April 21, 2015 8:16:18 PM Subject: [ovirt-users] Options not being passed fence_ipmilan, Ovirt3.5 on Centos 7.1 hosts Hi All, I have a bit of an issue with a new install of Ovirt 3.5 (our 3.4 cluster is working fine) in a 4 node cluster. When I test fencing (or cause a kernal panic triggering a fence) the fencing fails. On investigation it appears that the fencing options are not being passed to the fencing script (fence_ipmilan in this case): Fence options under GUI(as entered in the gui): lanplus, ipport=623, power_wait=4, privlvl=operator from vdsm.log on the fence proxy node: Thread-818296::DEBUG::2015-04-21 12:39:39,136::API::1209::vds::(fenceNode) fenceNode(addr=x.x.x.x,port=,agent=ipmilan,user=stonith,passwd=,action=status,secure=False,options= power_wait=4 Thread-818296::DEBUG::2015-04-21 12:39:39,137::utils::739::root::(execCmd) /usr/sbin/fence_ipmilan (cwd None) Thread-818296::DEBUG::2015-04-21 12:39:39,295::utils::759::root::(execCmd) FAILED: err = 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n'; rc = 1 Thread-818296::DEBUG::2015-04-21 12:39:39,296::API::1164::vds::(fence) rc 1 inp agent=fence_ipmilan Thread-818296::DEBUG::2015-04-21 12:39:39,296::API::1235::vds::(fenceNode) rc 1 in agent=fence_ipmilan Thread-818296::DEBUG::2015-04-21 12:39:39,297::stompReactor::163::yajsonrpc.StompServer::(send) Sending response from engine.log on the engine: 2015-04-21 12:39:38,843 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-4) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host mpc-ovirt-node03 from cluster Default was chosen as a proxy to execute Status command on Host mpc-ovirt-node04. 2015-04-21 12:39:38,845 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Using Host mpc-ovirt-node03 from cluster Default as proxy to execute Status command on Host 2015-04-21 12:39:38,885 INFO [org.ovirt.engine.core.bll.FenceExecutor]
Re: [ovirt-users] oVirt 3.5 problem with gluster, can't set the group and can't find ovirt packages
On 22/04/15 14:49, Jorick Astrego wrote: On 04/22/2015 02:45 PM, Lars Nielsen wrote: Hey again I keep getting this still: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Still no luck after reinstall - Lars On 19/04/15 18:05, Lars Nielsen wrote: Hey Will try it. Thank you very much - Lars On 19 Apr 2015, at 12:35, Bernd Broermann be...@broermann.com wrote: Hi, I saw the same behavior in past. I solved it by reinstall glusterfs-server package. rpm -qf /var/lib/glusterd/groups/virt glusterfs-server-3.6.2-1.el7.x86_64 yum reinstall glusterfs-server Might fix it you also. Bernd Am 17.04.2015 um 19:52 schrieb Lars Nielsen: Hey again :) I have a problem with ovirt 3.5. I am trying to setup oVirt 3.5 and using gluster following this guide: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ However when I run the install command: yum install -y ovirt-hosted-engine-setup I get the output: No package overt-hosted-engine-setup available Error nothing to do. But I have run: yum localinstall -y http://resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm I get no error, and I have checked the log files. And when I run this command: gluster volume set engine group virt I get this output: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Hope some one can help. Thanks in advance - Lars Nielsen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Bernd Broermann Wellingsbütteler Landstr. 241 22337 Hamburg 0172/2982498 040/5370 http://www.broermann.com RHCE - Redhat Certified Engineer Hi, I think this is this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1129592 Netbulae mailto:i...@netbulae.com 2014-08-14 05:50:15 EDT I just read something about this in the following article: http://blog.gluster.org/2014/05/ovirt-3-4-glusterized/ Create a file named `/var/lib/glusterd/groups/virt' and paste in the lines below. This provides a virt group with settings optimized for VM storage. I've left off two quorum-related options present in the original group definition. These quorum settings help prevent split-brain, but will cause VMs hosted on Gluster volumes with the settings applied to pause when one of our two machines goes offline. quick-read=off read-ahead=off io-cache=off stat-prefetch=off eager-lock=enable remote-dio=enable Next, we'll add our new engine volume to this virt group: gluster volume set engine group virt I never createde the virt file and it's not in this directory. Can it be auto created on setup? Forgot to add these for cluster 2 nodes: quorum-type=auto server-quorum-type=server As perhttps://github.com/gluster/glusterfs/blob/master/extras/group-virt.example Met vriendelijke groet, With kind regards, Jorick Astrego* Netbulae Virtualization Experts * Tel: 053 20 30 270 i...@netbulae.euStaalsteden 4-3AKvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA EnschedeBTW NL821234584B01 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users Work thank you very much :) -- Med venlig hilsen / Best Regards Lars Nielsen Student developer at Steinwurf l...@steinwurf.com ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Is it possible to limit the number and speed of paralel STORAGE migrations?
Dňa 22.04.2015 o 13:09 Juan Hernández napísal(a): # Iterate the disks of the VM and for each of them move it # to the target storage domain: for disk in vm.disks.list(): # Start moving the disk. Note that this is an asynchronous # operation, so once the move method returns you will # have to wait for the actual movement to finish. print(Moving disk \%s\ ... % disk.get_alias()) disk.move( params.Action( storage_domain=sd ) ) This works nicely, thank you. -- Ernest Beinrohr, AXON PRO Ing http://www.beinrohr.sk/ing.php, RHCE http://www.beinrohr.sk/rhce.php, RHCVA http://www.beinrohr.sk/rhce.php, LPIC http://www.beinrohr.sk/lpic.php, VCA http://www.beinrohr.sk/vca.php, +421-2-62410360 +421-903-482603 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] oVirt 3.5 problem with gluster, can't set the group and can't find ovirt packages
On 04/22/2015 02:45 PM, Lars Nielsen wrote: Hey again I keep getting this still: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Still no luck after reinstall - Lars On 19/04/15 18:05, Lars Nielsen wrote: Hey Will try it. Thank you very much - Lars On 19 Apr 2015, at 12:35, Bernd Broermann be...@broermann.com wrote: Hi, I saw the same behavior in past. I solved it by reinstall glusterfs-server package. rpm -qf /var/lib/glusterd/groups/virt glusterfs-server-3.6.2-1.el7.x86_64 yum reinstall glusterfs-server Might fix it you also. Bernd Am 17.04.2015 um 19:52 schrieb Lars Nielsen: Hey again :) I have a problem with ovirt 3.5. I am trying to setup oVirt 3.5 and using gluster following this guide: http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ However when I run the install command: yum install -y ovirt-hosted-engine-setup I get the output: No package overt-hosted-engine-setup available Error nothing to do. But I have run: yum localinstall -y http://resources.ovirt.org/pub/yum-repo/ovirt-release35.rpm I get no error, and I have checked the log files. And when I run this command: gluster volume set engine group virt I get this output: Unable to open file ‘/var/lib/glusterd/groups/virt’. Error: No such file or directory Hope some one can help. Thanks in advance - Lars Nielsen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Bernd Broermann Wellingsbütteler Landstr. 241 22337 Hamburg 0172/2982498 040/5370 http://www.broermann.com RHCE - Redhat Certified Engineer Hi, I think this is this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1129592 Netbulae mailto:i...@netbulae.com 2014-08-14 05:50:15 EDT I just read something about this in the following article: http://blog.gluster.org/2014/05/ovirt-3-4-glusterized/ Create a file named `/var/lib/glusterd/groups/virt' and paste in the lines below. This provides a virt group with settings optimized for VM storage. I've left off two quorum-related options present in the original group definition. These quorum settings help prevent split-brain, but will cause VMs hosted on Gluster volumes with the settings applied to pause when one of our two machines goes offline. quick-read=off read-ahead=off io-cache=off stat-prefetch=off eager-lock=enable remote-dio=enable Next, we'll add our new engine volume to this virt group: gluster volume set engine group virt I never createde the virt file and it's not in this directory. Can it be auto created on setup? Forgot to add these for cluster 2 nodes: quorum-type=auto server-quorum-type=server As per https://github.com/gluster/glusterfs/blob/master/extras/group-virt.example Met vriendelijke groet, With kind regards, Jorick Astrego Netbulae Virtualization Experts Tel: 053 20 30 270 i...@netbulae.euStaalsteden 4-3A KvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW NL821234584B01 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] [ovirt-devel] oVirt 3.6 new feature: Affinity Group Enforcement Service
Il 22/04/2015 14:22, Tomer Saban ha scritto: Greetings users and developers, I'm developing a new feature Affinity Rules Enforcement Service; In summary, === If an anti-affinity policy is applied to VMs that are currently running on the same hypervisor, the engine will not automatically migrate one of them off once the policy is applied. The Affinity Rules Enforcement Service will periodically check that affinity rules are being enforced and will migrate VMs if necessary in order to comply with those rules. You're welcome to review the feature page: http://www.ovirt.org/Affinity_Group_Enforcement_Service The feature page is responding to RFE https://bugzilla.redhat.com/show_bug.cgi?id=1112332 Your comments appreciated. The bug is assigned to nobody, please take it. The feature page is not based on the feature template http://www.ovirt.org/Feature_template, please fix it. Please drop [[Category:Template]] since it's not a template Please fix categories to be: [[Category:Feature|Affinity Group Enforcement Service]] [[Category:oVirt 3.6 Proposed Feature|Affinity Group Enforcement Service]] Thanks, Tomer ___ Devel mailing list de...@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Minutes: oVirt Weekly Sync (Open Format)
Minutes: http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-22-14.02.html Minutes (text): http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-22-14.02.txt Log: http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-22-14.02.log.html === #ovirt: oVirt Weekly Sync (Open Format) === Meeting started by bkp at 14:02:19 UTC. The full logs are available at http://ovirt.org/meetings/ovirt/2015/ovirt.2015-04-22-14.02.log.html . Meeting summary --- * Agenda (bkp, 14:04:08) * 3.5 news (bkp, 14:04:33) * 3.6 news/discussion (bkp, 14:04:45) * Open Discussion (bkp, 14:04:55) * 3.5 news/discussion (bkp, 14:06:35) * 3.5 general status email: http://lists.ovirt.org/pipermail/devel/2015-April/010315.html (bkp, 14:12:58) * 3.5.2 looks good for April 28 (bkp, 14:13:01) * 3.5.2, only one bug in QA (bkp, 14:13:03) * sbonazzo needs maintainers to review the package list he sent on RC4 release, to be sure that listed packages correspond to the ones maintainers want to release (bkp, 14:13:07) * 3.6 news/discussion (bkp, 14:13:29) * Brief general status for 3.6 here: http://lists.ovirt.org/pipermail/devel/2015-April/010310.html (bkp, 14:28:29) * el6 repo broken by the removal of vdsm, infra and node (fabiand) must address in the next couple of days (bkp, 14:28:32) * Node needs to be dropped on el6, which, according to sbonazzo some related job is still building RPMs for el6 (bkp, 14:28:35) * Today (22.4.15) is feature submission deadline for 3.6 (bkp, 14:28:38) * Submitted features need to be reviewed. (bkp, 14:28:40) * An alpha release of 3.6 is scheduled for May 6 (bkp, 14:28:43) * Open Discussion (bkp, 14:28:44) * Inquiry about jdk 1.8 / wildfly support for dev infra, no initial response. Using jdk 1.7 as a fallback is not optimal, sbonazzo reports (bkp, 14:28:46) * Node is building nicely and the team is cleaning up and preparing for 3.6, fabiand reports (bkp, 14:28:49) * On the issue of server hardening, dcaro reports that we are waiting for the sec team to finish the checks and give more feedback on best practices (bkp, 14:53:45) * bkp suggests that office hours be held for set half-hour periods, moderated by a rotation of community members. Format should be open, answering all questions and discussing any cross-team/community-wide issues as needed. Twice a week, Tuesdays and Thursdays. (bkp, 14:53:48) * ACTION: bkp will ask on community mailing lists to finalize discussion on office hour format (bkp, 14:53:52) Meeting ended at 14:54:10 UTC. Action Items * bkp will ask on community mailing lists to finalize discussion on office hour format Action Items, by person --- * bkp * bkp will ask on community mailing lists to finalize discussion on office hour format * **UNASSIGNED** * (none) People Present (lines said) --- * bkp (73) * sbonazzo (56) * dcaro (14) * fabiand (7) * mr_chris (4) * ovirtbot (2) * awels (1) * misc (1) Generated by `MeetBot`_ 0.1.4 .. _`MeetBot`: http://wiki.debian.org/MeetBot -- Brian Proffitt Community Liaison oVirt Open Source and Standards, Red Hat - http://community.redhat.com Phone: +1 574 383 9BKP IRC: bkp @ OFTC ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] metadata not found
Hi, Any idea what this means? vdsm.log: Thread-65::DEBUG::2015-04-22 18:57:32,138::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 80 edom: 20 level: 2 message: metadata not found: Requested metadata element is not present I don't know if it's related with this messages from syslog: Apr 22 18:36:23 v2 vdsm vm.Vm WARNING vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel0', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '1'}}' found Apr 22 18:36:23 v2 vdsm vm.Vm WARNING vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device: '{'device': 'unix', 'alias': 'channel1', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '2'}}' found Apr 22 18:36:23 v2 vdsm vm.Vm WARNING vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Unknown type found, device: '{'device': 'spicevmc', 'alias': 'channel2', 'type': 'channel', 'address': {'bus': '0', 'controller': '0', 'type': 'virtio-serial', 'port': '3'}}' found Apr 22 18:36:24 v2 vdsm vm.Vm ERROR vmId=`045680b9-06fd-40d9-b98a-92ce527b734f`::Alias not found for device type graphics during migration at destination host regards, Giannis ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] soft negative affinity
Hi, Can someone comment on this? https://bugzilla.redhat.com/show_bug.cgi?id=1207255 I've defined a soft negative affinity group for two VMs. To my understanding if the there are at least 2 nodes available in the cluster then the VMs SHOULD start on different nodes. This does not happen. They start on the same node. If I make it hard then it works. However I don't want to make it hard since if there is only one node available in the cluster then one vm will stay down. regards, Giannis ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] ManageIQ : Attach an already attached storage domain
Hello, According to what I understood, ManageIQ VM appliance needs to attach to every storage domain it has to scan. 1 : I'm new to ManageIQ and the above is what I think I have understood from the answer of the ManageIQ team. If anyone here knows more about it, please comment. 2 : When trying to do the above, oVirt is showing red frightening warning, so I did not dare yet to approve. I intended to attach these LUNs in read only mode, hoping this checkbox will get respected. Once again, may the wisest users comment? Have a nice day. -- Nicolas Ecarnot ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Problem with 3.5.1 hosted engine setup
Hi all, I’m trying to get a hosted engine set up on a CentOS 7.1 box, per the instructions found at http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ http://community.redhat.com/blog/2014/10/up-and-running-with-ovirt-3-5/ When I run the ‘hosted-engine —deploy’ step, I’m getting the following output… https://gist.github.com/wdennis/f9783b04d77676881d5b https://gist.github.com/wdennis/f9783b04d77676881d5b Looking at the log file, I see the following traceback: https://gist.github.com/wdennis/4e4de8413cf2dd1a3c6b https://gist.github.com/wdennis/4e4de8413cf2dd1a3c6b Can anyone tell me what exactly is failing, and how I might fix it? Thanks, Will___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] GlusterFS native client use with oVirt
Hi all, Can someone tell me if it's possible or not to utilize GlusterFS mounted as native (i.e. FUSE) for a storage domain with oVirt 3.5.x? I have two nodes (with a third I'm thinking of using as well) that are running Gluster, and I've created the two volumes needed for hosted engine setup (engine, data) on them, and mounted them native (not via NFS.) Can this be used with oVirt 3.5.x? Or is this (from what I now understand) a new feature coming in oVirt 3.6? Thanks, Will ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Hosted-Engine Setup: Failed to setup networks
Hi Everyone, I tried to install oVirt 3.5 - hosted engine and it fails with some VDSM error while creating the ovirtmgmt bridge. The Host is running CentOS 7 and the interface I want to use is em1 and it's the parent interface from a vlan. [ ERROR ] Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {'nic': 'em1', 'netmask': '255.255.255.128', 'bootproto': 'none', 'ipaddr': '172.16.1.13', 'gateway': '172.16.1.1'}}. Error code: 16 message: Unexpected exception 2015-04-22 16:33:55 INFO otopi.plugins.ovirt_hosted_engine_setup.network.bridge bridge._misc:198 Configuring the management ridge 2015-04-22 16:33:55 DEBUG otopi.context context._executeMethod:152 method exception Traceback (most recent call last): File /usr/lib/python2.7/site-packages/otopi/context.py, line 142, in _executeMethod method['method']() File /usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py, line 207, in _misc _setupNetworks(conn, networks, {}, {'connectivityCheck': False}) File /usr/share/ovirt-hosted-engine-setup/scripts/../plugins/ovirt-hosted-engine-setup/network/bridge.py, line 225, in _setupNetworks 'message: %s' % (networks, code, message)) RuntimeError: Failed to setup networks {'ovirtmgmt': {'nic': 'em1', 'netmask': '255.255.255.128', 'bootproto': 'none', 'ipaddr': '172.16.1.13', 'gateway': '172.16.1.1'}}. Error code: 16 message: Unexpected exception 2015-04-22 16:33:55 ERROR otopi.context context._executeMethod:161 Failed to execute stage 'Misc configuration': Failed to setup networks {'ovirtmgmt': {' nic': 'em1', 'netmask': '255.255.255.128', 'bootproto': 'none', 'ipaddr': '172.16.1.13', 'gateway': '172.16.1.1'}}. Error code: 16 message: Unexpected exception Is there anything I can do like creating the bridge manually or use older version of the packages that don't have that issue ? Thank you, Sven ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Options not being passed fence_ipmilan, Ovirt3.5 on Centos 7.1 hosts
- Original Message - From: Mike Lindsay mike.lind...@cbc.ca To: users@ovirt.org Sent: Tuesday, April 21, 2015 8:16:18 PM Subject: [ovirt-users] Options not being passed fence_ipmilan, Ovirt3.5 on Centos 7.1 hosts Hi All, I have a bit of an issue with a new install of Ovirt 3.5 (our 3.4 cluster is working fine) in a 4 node cluster. When I test fencing (or cause a kernal panic triggering a fence) the fencing fails. On investigation it appears that the fencing options are not being passed to the fencing script (fence_ipmilan in this case): Fence options under GUI(as entered in the gui): lanplus, ipport=623, power_wait=4, privlvl=operator from vdsm.log on the fence proxy node: Thread-818296::DEBUG::2015-04-21 12:39:39,136::API::1209::vds::(fenceNode) fenceNode(addr=x.x.x.x,port=,agent=ipmilan,user=stonith,passwd=,action=status,secure=False,options= power_wait=4 Thread-818296::DEBUG::2015-04-21 12:39:39,137::utils::739::root::(execCmd) /usr/sbin/fence_ipmilan (cwd None) Thread-818296::DEBUG::2015-04-21 12:39:39,295::utils::759::root::(execCmd) FAILED: err = 'Failed: Unable to obtain correct plug status or plug is not available\n\n\n'; rc = 1 Thread-818296::DEBUG::2015-04-21 12:39:39,296::API::1164::vds::(fence) rc 1 inp agent=fence_ipmilan Thread-818296::DEBUG::2015-04-21 12:39:39,296::API::1235::vds::(fenceNode) rc 1 in agent=fence_ipmilan Thread-818296::DEBUG::2015-04-21 12:39:39,297::stompReactor::163::yajsonrpc.StompServer::(send) Sending response from engine.log on the engine: 2015-04-21 12:39:38,843 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-4) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host mpc-ovirt-node03 from cluster Default was chosen as a proxy to execute Status command on Host mpc-ovirt-node04. 2015-04-21 12:39:38,845 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Using Host mpc-ovirt-node03 from cluster Default as proxy to execute Status command on Host 2015-04-21 12:39:38,885 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Executing Status Power Management command, Proxy Host:mpc-ovirt-node03, Agent:ipmilan, Target Host:, Management IP:x.x.x.x, User:stonith, Options: power_wait=4, ipport=623, privlvl=operator,lanplus, Fencing policy:null 2015-04-21 12:39:38,921 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-4) START, FenceVdsVDSCommand(HostName = mpc-ovirt-node03, HostId = 5613a489-589d-4e89-ab01-3642795eedb8, targetVdsId = dbfa4e85-3e97-4324-b222-bf40a491db08, action = Status, ip = x.x.x.x, port = , type = ipmilan, user = stonith, password = **, options = ' power_wait=4, ipport=623, privlvl=operator,lanplus', policy = 'null'), log id: 774f328 2015-04-21 12:39:39,338 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-4) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Power Management test failed for Host mpc-ovirt-node04.Done 2015-04-21 12:39:39,339 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-4) FINISH, FenceVdsVDSCommand, return: Test Succeeded, unknown, log id: 774f328 2015-04-21 12:39:39,340 WARN [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Fencing operation failed with proxy host 5613a489-589d-4e89-ab01-3642795eedb8, trying another proxy... 2015-04-21 12:39:39,594 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-4) Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: Host mpc-ovirt-node01 from cluster Default was chosen as a proxy to execute Status command on Host mpc-ovirt-node04. 2015-04-21 12:39:39,595 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Using Host mpc-ovirt-node01 from cluster Default as proxy to execute Status command on Host 2015-04-21 12:39:39,598 INFO [org.ovirt.engine.core.bll.FenceExecutor] (ajp--127.0.0.1-8702-4) Executing Status Power Management command, Proxy Host:mpc-ovirt-node01, Agent:ipmilan, Target Host:, Management IP:x.x.x.x, User:stonith, Options: power_wait=4, ipport=623, privlvl=operator,lanplus, Fencing policy:null 2015-04-21 12:39:39,634 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.FenceVdsVDSCommand] (ajp--127.0.0.1-8702-4) START, FenceVdsVDSCommand(HostName = mpc-ovirt-node01, HostId = c3e8be6e-ac54-4861-b774-17ba5cc66dc6, targetVdsId = dbfa4e85-3e97-4324-b222-bf40a491db08, action = Status, ip = x.x.x.x, port = , type = ipmilan, user = stonith, password = **, options = ' power_wait=4, ipport=623, privlvl=operator,lanplus', policy = 'null'), log id: 6369eb1 2015-04-21 12:39:40,056 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp--127.0.0.1-8702-4) Correlation ID: null, Call Stack: null, Custom Event ID:
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
pool: z2pool state: ONLINE scan: scrub canceled on Sun Apr 12 16:33:38 2015 config: NAME STATE READ WRITE CKSUM z2pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C5004172A87Bd0 ONLINE 0 0 0 c0t5000C50041A59027d0 ONLINE 0 0 0 c0t5000C50041A592AFd0 ONLINE 0 0 0 c0t5000C50041A660D7d0 ONLINE 0 0 0 c0t5000C50041A69223d0 ONLINE 0 0 0 c0t5000C50041A6ADF3d0 ONLINE 0 0 0 logs c0t5001517BB2845595d0ONLINE 0 0 0 cache c0t5001517BB2847892d0ONLINE 0 0 0 spares c0t5000C50041A6B737d0AVAIL c0t5000C50041AC3F07d0AVAIL c0t5000C50041AD48DBd0AVAIL c0t5000C50041ADD727d0AVAIL errors: No known data errors On 04/22/2015 11:17 AM, Karli Sjöberg wrote: On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load to different platform outside of oVirt. * checked the wsize and rsize on the nfs mounts, storage and nodes are correct according to the NFS troubleshooting page on ovirt.org. The environment is running in production so we are not free to test everything. I can provide log files if needed. Kind Regards, Maikel
Re: [ovirt-users] Is it possible to limit the number and speed of paralel STORAGE migrations?
Dňa 21.04.2015 o 17:33 Dan Yasny napísal(a): Why not just script them to migrate one after the other? The CLI is nice and simple, and the SDK is even nicer Well I gave it a try, but I'm quite new to python and this does not work as expected: for vm in vms: print vm.name for disk in vm.disks.list( ): print disk: + disk.name sd = api.storagedomains.get('newstorage') disk.move(params.Disk(storage_domains=params.StorageDomains(storage_domain=[sd]))) status: 500 reason: Internal Server Error detail: HTTP Status 500 - Bad arguments passed to public abstract javax.ws.rs.core.Response org.ovirt.engine.api.resource.MovableResource.move(org.ovirt.engine.api.model.Action) ( org.ovirt.engine.api.model.Disk org.ovirt.engine.api.model.Disk@6a0db58b ) -- Ernest Beinrohr, AXON PRO Ing http://www.beinrohr.sk/ing.php, RHCE http://www.beinrohr.sk/rhce.php, RHCVA http://www.beinrohr.sk/rhce.php, LPIC http://www.beinrohr.sk/lpic.php, VCA http://www.beinrohr.sk/vca.php, +421-2-62410360 +421-903-482603 ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
i expect that you are aware of the fact that you only get the write performance of a single disk in that configuration? i whould drop that pool configuration, drop the spare drives and go for a mirror pool. Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar: pool: z2pool state: ONLINE scan: scrub canceled on Sun Apr 12 16:33:38 2015 config: NAME STATE READ WRITE CKSUM z2pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C5004172A87Bd0 ONLINE 0 0 0 c0t5000C50041A59027d0 ONLINE 0 0 0 c0t5000C50041A592AFd0 ONLINE 0 0 0 c0t5000C50041A660D7d0 ONLINE 0 0 0 c0t5000C50041A69223d0 ONLINE 0 0 0 c0t5000C50041A6ADF3d0 ONLINE 0 0 0 logs c0t5001517BB2845595d0ONLINE 0 0 0 cache c0t5001517BB2847892d0ONLINE 0 0 0 spares c0t5000C50041A6B737d0AVAIL c0t5000C50041AC3F07d0AVAIL c0t5000C50041AD48DBd0AVAIL c0t5000C50041ADD727d0AVAIL errors: No known data errors On 04/22/2015 11:17 AM, Karli Sjöberg wrote: On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
/Our current //nfs settings:/ listen_backlog=64 protocol=ALL servers=1024 lockd_listen_backlog=64 lockd_servers=1024 lockd_retransmit_timeout=5 grace_period=90 server_versmin=2 server_versmax=4 client_versmin=2 client_versmax=4 server_delegation=on nfsmapid_domain= max_connections=-1 On 04/22/2015 11:32 AM, InterNetX - Juergen Gotteswinter wrote: Am 22.04.2015 um 11:12 schrieb Maikel vd Mosselaar: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. for testing, i whould give zfs set sync=disabled pool/vol a shot. but as i already said, thats nothing you should keep for production. what i had in the past, too: the filer saturated the max lockd/nfs processes (which are quite low in their default setting, dont worry to push the nfs threads up to 512+. same goes for lockd) to get your current values sharectl get nfs for example, one of my files which is pretty heavy hammered most of the time through nfs uses this settings servers=1024 lockd_listen_backlog=32 lockd_servers=1024 lockd_retransmit_timeout=5 grace_period=90 server_versmin=2 server_versmax=3 client_versmin=2 client_versmax=4 server_delegation=on nfsmapid_domain= max_connections=-1 protocol=ALL listen_backlog=32 device= mountd_listen_backlog=64 mountd_max_threads=16 to change them, use sharectl or throw it into /etc/system set rpcmod:clnt_max_conns = 8 set rpcmod:maxdupreqs=8192 set rpcmod:cotsmaxdupreqs=8192 set nfs:nfs3_max_threads=1024 set nfs:nfs3_nra=128 set nfs:nfs3_bsize=1048576 set nfs:nfs3_max_transfer_size=1048576 - reboot When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load to
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
Yes we are aware of that, problem is it's running production so not very easy to change the pool. On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote: i expect that you are aware of the fact that you only get the write performance of a single disk in that configuration? i whould drop that pool configuration, drop the spare drives and go for a mirror pool. Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar: pool: z2pool state: ONLINE scan: scrub canceled on Sun Apr 12 16:33:38 2015 config: NAME STATE READ WRITE CKSUM z2pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C5004172A87Bd0 ONLINE 0 0 0 c0t5000C50041A59027d0 ONLINE 0 0 0 c0t5000C50041A592AFd0 ONLINE 0 0 0 c0t5000C50041A660D7d0 ONLINE 0 0 0 c0t5000C50041A69223d0 ONLINE 0 0 0 c0t5000C50041A6ADF3d0 ONLINE 0 0 0 logs c0t5001517BB2845595d0ONLINE 0 0 0 cache c0t5001517BB2847892d0ONLINE 0 0 0 spares c0t5000C50041A6B737d0AVAIL c0t5000C50041AC3F07d0AVAIL c0t5000C50041AD48DBd0AVAIL c0t5000C50041ADD727d0AVAIL errors: No known data errors On 04/22/2015 11:17 AM, Karli Sjöberg wrote: On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage /
Re: [ovirt-users] Move/Migrate Storage Domain to new devices
Ok, I copied the templates and migration almost worked. Anyway there are some disks that fails migration, here is what I found in vdsm.log of the SPM: c061a252-0611-4e25-b9eb-8540e01dcfec::ERROR::2015-04-22 09:36:42,063::volume::409::Storage.Volume::(create) Unexpected error Traceback (most recent call last): File /usr/share/vdsm/storage/volume.py, line 399, in create File /usr/share/vdsm/storage/volume.py, line 302, in share CannotShareVolume: Cannot share volume: 'src=/rhev/data-center/134985e2-4885-4b24-85b0-3b51365d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/98cecc63-0816-4e64-8dfb-cadad6af8aae/c10aebff-20b3-4ccc-b166-c2be47dc5af0, dst=/rhev/d ata-center/134985e2-4885-4b24-85b0-3b51365d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/530d6820-98b1-4e95-8470-4cefa5ab351c/c10aebff-20b3-4ccc-b166-c2be47dc5af0: [Errno 17] File exists' c061a252-0611-4e25-b9eb-8540e01dcfec::ERROR::2015-04-22 09:36:42,071::image::401::Storage.Image::(_createTargetImage) Unexpected error Traceback (most recent call last): File /usr/share/vdsm/storage/image.py, line 384, in _createTargetImage File /usr/share/vdsm/storage/sd.py, line 430, in createVolume File /usr/share/vdsm/storage/volume.py, line 412, in create VolumeCannotGetParent: Cannot get parent volume: (Couldn't get parent c10aebff-20b3-4ccc-b166-c2be47dc5af0 for volume d694d919-a0b0-4b89-b6cd-d5c02748f9de: Cannot share volume: 'src=/rhev/data-center/134985e2-4885-4b24-85b0-3b513 65d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/98cecc63-0816-4e64-8dfb-cadad6af8aae/c10aebff-20b3-4ccc-b166-c2be47dc5af0, dst=/rhev/data-center/134985e2-4885-4b24-85b0-3b51365d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/ 530d6820-98b1-4e95-8470-4cefa5ab351c/c10aebff-20b3-4ccc-b166-c2be47dc5af0: [Errno 17] File exists',) c061a252-0611-4e25-b9eb-8540e01dcfec::ERROR::2015-04-22 09:36:44,174::task::866::Storage.TaskManager.Task::(_setError) Task=`c061a252-0611-4e25-b9eb-8540e01dcfec`::Unexpected error Traceback (most recent call last): File /usr/share/vdsm/storage/task.py, line 873, in _run File /usr/share/vdsm/storage/task.py, line 334, in run File /usr/share/vdsm/storage/securable.py, line 77, in wrapper File /usr/share/vdsm/storage/sp.py, line 1553, in moveImage File /usr/share/vdsm/storage/image.py, line 499, in move File /usr/share/vdsm/storage/image.py, line 384, in _createTargetImage File /usr/share/vdsm/storage/sd.py, line 430, in createVolume File /usr/share/vdsm/storage/volume.py, line 412, in create VolumeCannotGetParent: Cannot get parent volume: (Couldn't get parent c10aebff-20b3-4ccc-b166-c2be47dc5af0 for volume d694d919-a0b0-4b89-b6cd-d5c02748f9de: Cannot share volume: 'src=/rhev/data-center/134985e2-4885-4b24-85b0-3b513 65d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/98cecc63-0816-4e64-8dfb-cadad6af8aae/c10aebff-20b3-4ccc-b166-c2be47dc5af0, dst=/rhev/data-center/134985e2-4885-4b24-85b0-3b51365d66c7/2f4e7ec2-2865-4956-b4ee-735a9d46eb67/images/ 530d6820-98b1-4e95-8470-4cefa5ab351c/c10aebff-20b3-4ccc-b166-c2be47dc5af0: [Errno 17] File exists',) Thank you. Regards, Dael Maselli. On 21/04/15 09:42, Aharon Canan wrote: Hi Did you try to copy the template to the new storage domain? Under Template tab - Disks sub-tab - copy Regards, __ *Aharon Canan* *From: *Dael Maselli dael.mase...@lnf.infn.it *To: *users@ovirt.org *Sent: *Monday, April 20, 2015 5:48:03 PM *Subject: *[ovirt-users] Move/Migrate Storage Domain to new devices Hi, I have a data storage domain that use one FC LUN. I need to move all data to a new storage server. I tried by move single disks to a new storage domain but some cannot be moved, I think because they are thin-cloned by template. When I worked with LVM I use to do a simple pvmove leaving the VG intact, is there something similar (online or in maintenance) in oVirt? Can I just do a pvmove from the SPM host o it's going to destroy everything? Thank you very much. Regards, Dael Maselli. -- ___ Dael Maselli --- INFN-LNF Computing Service -- +39.06.9403.2214 ___ * http://www.lnf.infn.it/~dmaselli/ * ___ Democracy is two wolves and a lamb voting on what to have for lunch ___ ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- ___ Dael Maselli --- INFN-LNF Computing Service -- +39.06.9403.2214
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
you got 4 spare disks, and can take out one of your raidz to create a temp. parallel existing pool. zfs send/receive to migrate the data, this shouldnt take much time if you are not using huge drives? Am 22.04.2015 um 11:54 schrieb Maikel vd Mosselaar: Yes we are aware of that, problem is it's running production so not very easy to change the pool. On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote: i expect that you are aware of the fact that you only get the write performance of a single disk in that configuration? i whould drop that pool configuration, drop the spare drives and go for a mirror pool. Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar: pool: z2pool state: ONLINE scan: scrub canceled on Sun Apr 12 16:33:38 2015 config: NAME STATE READ WRITE CKSUM z2pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C5004172A87Bd0 ONLINE 0 0 0 c0t5000C50041A59027d0 ONLINE 0 0 0 c0t5000C50041A592AFd0 ONLINE 0 0 0 c0t5000C50041A660D7d0 ONLINE 0 0 0 c0t5000C50041A69223d0 ONLINE 0 0 0 c0t5000C50041A6ADF3d0 ONLINE 0 0 0 logs c0t5001517BB2845595d0ONLINE 0 0 0 cache c0t5001517BB2847892d0ONLINE 0 0 0 spares c0t5000C50041A6B737d0AVAIL c0t5000C50041AC3F07d0AVAIL c0t5000C50041AD48DBd0AVAIL c0t5000C50041ADD727d0AVAIL errors: No known data errors On 04/22/2015 11:17 AM, Karli Sjöberg wrote: On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
On Wed, 2015-04-22 at 11:54 +0200, Maikel vd Mosselaar wrote: Yes we are aware of that, problem is it's running production so not very easy to change the pool. On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote: i expect that you are aware of the fact that you only get the write performance of a single disk in that configuration? i whould drop that pool configuration, drop the spare drives and go for a mirror pool. ^ What he said:) That, or if you have more space to add another 2 disks and use them plus the spare drives to add a second raidz(1|2|3) vdev. What drives do you use for data, log and cache? /K Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar: pool: z2pool state: ONLINE scan: scrub canceled on Sun Apr 12 16:33:38 2015 config: NAME STATE READ WRITE CKSUM z2pool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C5004172A87Bd0 ONLINE 0 0 0 c0t5000C50041A59027d0 ONLINE 0 0 0 c0t5000C50041A592AFd0 ONLINE 0 0 0 c0t5000C50041A660D7d0 ONLINE 0 0 0 c0t5000C50041A69223d0 ONLINE 0 0 0 c0t5000C50041A6ADF3d0 ONLINE 0 0 0 logs c0t5001517BB2845595d0ONLINE 0 0 0 cache c0t5001517BB2847892d0ONLINE 0 0 0 spares c0t5000C50041A6B737d0AVAIL c0t5000C50041AC3F07d0AVAIL c0t5000C50041AD48DBd0AVAIL c0t5000C50041ADD727d0AVAIL errors: No known data errors On 04/22/2015 11:17 AM, Karli Sjöberg wrote: On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
Am 22.04.2015 um 11:12 schrieb Maikel vd Mosselaar: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. for testing, i whould give zfs set sync=disabled pool/vol a shot. but as i already said, thats nothing you should keep for production. what i had in the past, too: the filer saturated the max lockd/nfs processes (which are quite low in their default setting, dont worry to push the nfs threads up to 512+. same goes for lockd) to get your current values sharectl get nfs for example, one of my files which is pretty heavy hammered most of the time through nfs uses this settings servers=1024 lockd_listen_backlog=32 lockd_servers=1024 lockd_retransmit_timeout=5 grace_period=90 server_versmin=2 server_versmax=3 client_versmin=2 client_versmax=4 server_delegation=on nfsmapid_domain= max_connections=-1 protocol=ALL listen_backlog=32 device= mountd_listen_backlog=64 mountd_max_threads=16 to change them, use sharectl or throw it into /etc/system set rpcmod:clnt_max_conns = 8 set rpcmod:maxdupreqs=8192 set rpcmod:cotsmaxdupreqs=8192 set nfs:nfs3_max_threads=1024 set nfs:nfs3_nra=128 set nfs:nfs3_bsize=1048576 set nfs:nfs3_max_transfer_size=1048576 - reboot When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load to different platform outside of oVirt. * checked the wsize and rsize on the nfs mounts, storage and nodes are correct according to the NFS troubleshooting page on ovirt.org. The environment is running in production so we are not free to test everything. I can provide log files if needed.
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load to different platform outside of oVirt. * checked the wsize and rsize on the nfs mounts, storage and nodes are correct according to the NFS troubleshooting page on ovirt.org. The environment is running in production so we are not free to test everything. I can provide log files if needed. Kind Regards, Maikel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] storage issue's with oVirt 3.5.1 + Nexenta NFS
On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter is on the default setting (standard) so sync is on. # zpool status ? /K When the issue happens oVirt event viewer shows indeed latency warnings. Not always but most of the time this will be followed by an i/o storage error linked to random VMs and they will be paused when that happens. All the nodes use mode 4 bonding. The interfaces on the nodes don't show any drops or errors, i checked 2 of the VMs that got paused the last time it happened they have dropped packets on their interfaces. We don't have a subscription with nexenta (anymore). On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: Hi Juergen, The load on the nodes rises far over 200 during the event. Load on the nexenta stays normal and nothing strange in the logging. ZFS + NFS could be still the root of this. Your Pool Configuration is RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS Subvolume which gets exported is kept default on standard ? http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html Since Ovirt acts very sensible about Storage Latency (throws VM into unresponsive or unknown state) it might be worth a try to do zfs set sync=disabled pool/volume to see if this changes things. But be aware that this makes the NFS Export vuln. against dataloss in case of powerloss etc, comparable to async NFS in Linux. If disabling the sync setting helps, and you dont use a seperate ZIL Flash Drive yet - this whould be very likely help to get rid of this. Also, if you run a subscribed Version of Nexenta it might be helpful to involve them. Do you see any messages about high latency in the Ovirt Events Panel? For our storage interfaces on our nodes we use bonding in mode 4 (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. This should be fine, as long as no Node uses Mode0 / Round Robin which whould lead to out of order TCP Packets. The Interfaces themself dont show any Drops or Errors - on the VM Hosts as well as on the Switch itself? Jumbo Frames? Kind regards, Maikel On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: Hi, how about Load, Latency, strange dmesg messages on the Nexenta ? You are using bonded Gbit Networking? If yes, which mode? Cheers, Juergen Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: Hi, We are running ovirt 3.5.1 with 3 nodes and seperate engine. All on CentOS 6.6: 3 x nodes 1 x engine 1 x storage nexenta with NFS For multiple weeks we are experiencing issues of our nodes that cannot access the storage at random moments (atleast thats what the nodes think). When the nodes are complaining about a unavailable storage then the load rises up to +200 on all three nodes, this causes that all running VMs are unaccessible. During this process oVirt event viewer shows some i/o storage error messages, when this happens random VMs get paused and will not be resumed anymore (this almost happens every time but not all the VMs get paused). During the event we tested the accessibility from the nodes to the storage and it looks like it is working normal, at least we can do a normal ls on the storage without any delay of showing the contents. We tried multiple things that we thought it causes this issue but nothing worked so far. * rebooting storage / nodes / engine. * disabling offsite rsync backups. * moved the biggest VMs with highest load to different platform outside of oVirt. * checked the wsize and rsize on the nfs mounts, storage and nodes are correct according to the NFS troubleshooting page on ovirt.org. The environment is running in production so we are not free to test everything. I can provide log files if needed. Kind Regards, Maikel ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] oVirt 3.6 new feature: Affinity Group Enforcement Service
Greetings users and developers, I'm developing a new feature Affinity Rules Enforcement Service; In summary, === If an anti-affinity policy is applied to VMs that are currently running on the same hypervisor, the engine will not automatically migrate one of them off once the policy is applied. The Affinity Rules Enforcement Service will periodically check that affinity rules are being enforced and will migrate VMs if necessary in order to comply with those rules. You're welcome to review the feature page: http://www.ovirt.org/Affinity_Group_Enforcement_Service The feature page is responding to RFE https://bugzilla.redhat.com/show_bug.cgi?id=1112332 Your comments appreciated. Thanks, Tomer ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users