Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
On 03/09/2015 07:12 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On Mar 9, 2015 5:23 AM, Simone Tiraboschi stira...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: users-ovirt users@ovirt.org Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) Hi, I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address. I'm assuming this should be a password-less login (key-based authentication?). Yes, it is. As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. After adding it to /etc/hosts (both hostname and FQDN), when I try to ssh from root@engine to root@host it is prompting me for a password. On my engine, ~root/.ssh does not contain any keys. On my host, ~root/.ssh has authorized_keys, and in it there is a key with the comment ovirt-engine. It's possible that I inadvertently removed ~root/.ssh on engine while I was preparing the engine (I started to set up my own no-password logins and then thought better and cleaned up, not realizing that some prior setup affecting that directory had occurred). That would explain the second issue. How/when does the key for root@engine get populated to the host's ~root/.ssh/authenticated_keys during setup? -Bob -Bob Till hosted-engine hosts were simply identified by their IP address but than we had some bug report on side effects of that. So now we generate and sign certs using host hostnames and so the engine should be able to correctly resolve them. When I log into the Administration portal, the engine VM does not appear under the Virtual machine view (it's empty). It's cause the setup didn't complete. I've attached what I think are the relevant logs. Also, when my host reboots, the ovirt-ha-broker and ovirt-ha-agent services do not come up automatically. I have to use systemctl to start them manually. It's cause the setup didn't complete. This is a fresh Fedora 20 machine installing a fresh copy of Ovirt 3.5.1. What's the cleanest approach to restore/complete sanity of my setup please? First step is to clarify what went wrong in order to avoid it in the future. Than, if you want a really sanity environment for production use I'd suggest to redeploy. So hosted-engine --vm-poweroff empty the storage domain share and deploy again Thanks, Bob I've linked 3 files to this email: server.log (12.4 MB) Dropbox https://db.tt/g5p09AaD vdsm.log (3.2 MB) Dropbox https://db.tt/P4572SUm ovirt-hosted-engine-setup-20150306123622-tad1fy.log (413 KB) Dropbox https://db.tt/XAM9ffhi Mozilla Thunderbird makes it easy to share large files over email. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png movciari left the room (quit: Ping timeout: 480 seconds). (10:02:34 AM) 10:02:46 AM saggi: YamakasY: horizontal is time since epoch and vertical is RSS in bytes bobdrad left the room (quit: Quit: Leaving.). (10:03:25 AM) 10:03:52 AM YamakasY: saggi: I have seen that line s much! 10:04:11 AM YamakasY: I think I even made a mailing about it 10:04:18 AM YamakasY: at least asked here 10:04:32 AM YamakasY: no-one knew, but those lines are almost blowing you away 10:04:35 AM YamakasY: can we patch it ? 10:04:59 AM YamakasY: wow, nice one to catch 10:05:28 AM saggi: YamakasY: I now have a smaller part of the code to scan through and a way to reproduce so hopefully I'll have a patch soon was that ever followed up on? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 6:26:30 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On 03/09/2015 12:53 PM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On 03/09/2015 07:12 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On Mar 9, 2015 5:23 AM, Simone Tiraboschi stira...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: users-ovirt users@ovirt.org Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) Hi, I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address. I'm assuming this should be a password-less login (key-based authentication?). Yes, it is. As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. OK, I've started over. Simply removing the storage domain was insufficient, the hosted-engine deploy failed when it found the HA and Broker services already configured. I decided to just start over fresh starting with re-installing the OS on my host. I can't deploy DNS at the moment, so I have to simply replicate /etc/hosts files on my host/engine. I did that this time, but have run into a new problem: [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add ovirt-vm to the manager Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ ERROR ] Failed to execute stage 'Closing up': [Errno 111] Connection refused [ INFO ] Stage: Clean up [ ERROR ] Failed to execute stage 'Clean up': [Errno 111] Connection refused I've attached my engine log and the ovirt-hosted-engine-setup log. I think I had an issue with resolving external hostnames, or else a connectivity issue during the install. For some reason your engine wasn't able to deploy your hosts but the SSH session this time was established. 2015-03-09 13:05:58,514 ERROR [org.ovirt.engine.core.bll.InstallVdsInternalCommand] (org.ovirt.thread.pool-8-thread-3) [3cf91626] Host installation failed for host
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Cc: users-ovirt users@ovirt.org Sent: Monday, March 9, 2015 12:48:37 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On 03/09/2015 07:12 AM, Simone Tiraboschi wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Simone Tiraboschi stira...@redhat.com Sent: Monday, March 9, 2015 12:02:49 PM Subject: Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) On Mar 9, 2015 5:23 AM, Simone Tiraboschi stira...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: users-ovirt users@ovirt.org Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) Hi, I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address. I'm assuming this should be a password-less login (key-based authentication?). Yes, it is. As what user? root OK, I see a couple of problems. First off, I didn't have my deploying-host hostname in the hosts map for my engine. This is enough by itself to make the deploy procedure failing. If possible we recommend to rely a DNS infrastructure especially if you are deploying more than one host. After adding it to /etc/hosts (both hostname and FQDN), when I try to ssh from root@engine to root@host it is prompting me for a password. On my engine, ~root/.ssh does not contain any keys. On my host, ~root/.ssh has authorized_keys, and in it there is a key with the comment ovirt-engine. It's possible that I inadvertently removed ~root/.ssh on engine while I was preparing the engine (I started to set up my own no-password logins and then thought better and cleaned up, not realizing that some prior setup affecting that directory had occurred). That would explain the second issue. No, it's OK: the private key is contained in /etc/pki/ovirt-engine/keys/engine.p12 How/when does the key for root@engine get populated to the host's ~root/.ssh/authenticated_keys during setup? It's part of hosted-engine deploy procedure: when the engine setup on the VM it's completed, it gathers the engine SSH public key from http://{enginefqdn}/engine.ssh.key.txt and it stores it under ~root/.ssh/authenticated_keys to make the engine able to add the host without knowing the host root password. Than hosted-engine setup contacts the engine via REST APIs to trigger the host setup procedure. If the engine wasn't able to contact the host due to bad hostname resolution as we pointed out, you missed some steps to have a safe deployment. -Bob -Bob Till hosted-engine hosts were simply identified by their IP address but than we had some bug report on side effects of that. So now we generate and sign certs using host hostnames and so the engine should be able to correctly resolve them. When I log into the Administration portal, the engine VM does not appear under the Virtual machine view (it's empty). It's cause the setup didn't complete. I've attached what I think are the relevant logs. Also, when my host reboots, the ovirt-ha-broker and ovirt-ha-agent services do not come up automatically. I have to use systemctl to start them manually. It's cause the setup didn't complete. This is a fresh
Re: [ovirt-users] VDSM memory consumption
Once upon a time, Dan Kenigsberg dan...@redhat.com said: I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? So, to confirm, it looks like to do that, the steps would be: - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false. - Restart the vdsmd service. Is that all that is needed? Is it safe to restart vdsmd on a node with active VMs? -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] hosted-engine --vm-status output
Hello guys I installed ovirt using hosted-engine procedure with six fisical hosts, with more than 60 vms, and until now, everythings ok and my environment works fine. I decided to use some of my hosts for other tasks, so have been removed four of my six hosts and put it way from my environment. After few days, my second host (hosted_engine_2) start to fail. It's hardware issue. My 10GbE interface stoped. I decide to put my host 4 as a second hosted_engine_2. It's works fine. but when I use command hosted-engine --vm-status, its still returns all of the old members of hosted-engines (1 to 6) how can i fix it leave only just active active nodes? See below the output for my hosted-engine --vm-status [root@bmh0001 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : bmh0001.place.brazil Host ID: 1 Engine status : {reason: vm not running on this host, health: bad, vm: down, detail: unknown} Score : 2400 Local maintenance : False Host timestamp : 68830 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=68830 (Sun Mar 8 17:38:05 2015) host-id=1 score=2400 maintenance=False state=EngineDown --== Host 2 status ==-- Status up-to-date : True Hostname : bmh0004.place.brazil Host ID: 2 Engine status : {health: good, vm: up, detail: up} Score : 2400 Local maintenance : False Host timestamp : 2427 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=2427 (Sun Mar 8 17:38:09 2015) host-id=2 score=2400 maintenance=False state=EngineUp --== Host 3 status ==-- Status up-to-date : False Hostname : bmh0003.place.brazil Host ID: 3 Engine status : unknown stale-data Score : 0 Local maintenance : True Host timestamp : 331389 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=331389 (Tue Mar 3 14:48:25 2015) host-id=3 score=0 maintenance=True state=LocalMaintenance --== Host 4 status ==-- Status up-to-date : False Hostname : bmh0004.place.brazil Host ID: 4 Engine status : unknown stale-data Score : 0 Local maintenance : True Host timestamp : 364358 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=364358 (Tue Mar 3 16:10:36 2015) host-id=4 score=0 maintenance=True state=LocalMaintenance --== Host 5 status ==-- Status up-to-date : False Hostname : bmh0005.place.brazil Host ID: 5 Engine status : unknown stale-data Score : 0 Local maintenance : True Host timestamp : 241930 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=241930 (Fri Mar 6 09:40:31 2015) host-id=5 score=0 maintenance=True state=LocalMaintenance --== Host 6 status ==-- Status up-to-date : False Hostname : bmh0006.place.brazil Host ID: 6 Engine status : unknown stale-data Score : 0 Local maintenance : True Host timestamp : 77376 Extra metadata (valid at timestamp): metadata_parse_version=1 metadata_feature_version=1 timestamp=77376 (Wed Mar 4 09:11:17 2015) host-id=6 score=0 maintenance=True state=LocalMaintenance [root@bmh0001 ~]# thank you very much. -- Regards *Filipe Guarino* ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Mon, Mar 09, 2015 at 12:17:00PM -0500, Chris Adams wrote: Once upon a time, Dan Kenigsberg dan...@redhat.com said: I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? So, to confirm, it looks like to do that, the steps would be: - In the [vars] section of /etc/vdsm/vdsm.conf, set ssl = false. - Restart the vdsmd service. Is that all that is needed? No. You'd have to reconfigure libvirtd to work in plaintext vdsm-tool congfigure --force and also set you Engine to work in plaintext (unfortunately, I don't recall how's that done. surely Yaniv does) Is it safe to restart vdsmd on a node with active VMs? It's safe in the sense that I have not heard of a single failure to reconnected to already-running VMs in years. However, this is still not recommended for production environment, and particularly not if one of the VMs is defined as highly-available. This can end up with your host being fenced and all your VMs dead. Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
Hi, I also see this on the latest 3.5 version, I'm thinking about setting up a cronjob to restart vdsm every night. I cannot believe that people say they don't have this issue. Can someone of the devs dive in maybe ? Thanks! Matt 2015-03-09 23:29 GMT+01:00 Dan Kenigsberg dan...@redhat.com: On Mon, Mar 09, 2015 at 10:40:51AM -0500, Darrell Budic wrote: On Mar 9, 2015, at 4:51 AM, Dan Kenigsberg dan...@redhat.com wrote: On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. I don’t think this is crypto related, but I could try that if you still need some confirmation (and point me at a quick doc on switching to plaintext?). This is from #ovirt around November 18th I think, Saggi thought he’d found something related: 9:58:43 AM saggi: YamakasY: Found the leak 9:58:48 AM saggi: YamakasY: Or at least the flow 9:58:57 AM saggi: YamakasY: The good news is that I can reproduce 9:59:20 AM YamakasY: saggi: that's kewl! 9:59:25 AM YamakasY: saggi: what happens ? 9:59:41 AM YamakasY: I know from Telsin (ping ping!) that he sees it going faster on gluster usage tdosek left the room (quit: Ping timeout: 480 seconds). (10:00:02 AM) djasa left the room (quit: Quit: Leaving). (10:00:24 AM) mlipchuk left the room (quit: Quit: Leaving.). (10:00:29 AM) laravot left the room (quit: Quit: Leaving.). (10:01:19 AM) 10:01:54 AM saggi: YamakasY: it's in getCapabilities(). Here is the RSS graph. The flatlines are when I stopped calling it and called other verbs. http://i.imgur.com/CLm0Q75.png I do recall what is the issue Saggi and YamakasY were dicussing (CCing the pair), or if it reached fruition as a patch. It is certainly something other than Bug 1158108, as the latter speak about a leak in a normal working state, with no getCapabilities calls. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Error during host deploy for 3.5.1, package installation
Hello, When deploying a new host from the admin portal to FC20 target, the package dependency check fails (host-deploy log): ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:97 Yum [u'vdsm-4.14.8.1-0.fc20.i686 requires vdsm-xmlrpc = 4.14.8.1-0.fc20', u'vdsm-4.14.8.1-0.fc20.i686 requires vdsm-python = 4.14.8.1-0.fc20', u'vdsm-4.14.8.1-0.fc20.i686 requires vdsm-python-zombiereaper = 4.14.8.1-0.fc20'] I've tried the release 3.5 and 3.5-snapshot repos. Installing the packages manually does not satisfy host deploy. It appears vdsm 4.16 packages are available in the repository. Engine was previously running 3.5.0, updated to 3.5.1, no change. I was able to deploy hosts in January with 3.5.0. Any assistance greatly appreciated! Best - Erik ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Troubles starting hosted engine
- Original Message - From: John Florian jflor...@doubledog.org To: users@ovirt.org Sent: Sunday, March 8, 2015 9:37:39 PM Subject: [ovirt-users] Troubles starting hosted engine I have lots of extra fun bringing up my hosted engine right now due to two issues. First, either during the hosted-engine --deploy or engine-setup (I can't remember) I was prompted for the IP address of my gateway. Since then that address has changed. I'm unable to start the engine VM if that address isn't reachable so my temporary workaround is to add this old address onto the current gateway. How/where do I change things so that this old address can be truly retired? It's written in /etc/ovirt-hosted-engine/hosted-engine.conf If you deployed more than one host, you need to explicitly fix it on each of them. My second issue might be harder. Again during the setup I was prompted for a location of an ISO file for installing the engine's OS. That location is served by NFS and is auto-mounted by /etc/fstab (and systemd). Here's the hitch: my NFS server is now a VM in my cluster. :-) Since I only have a single hypervisor host right now that ISO isn't reachable when I'm trying to start my engine VM so that I can also start the VM that provides the NFS share. I'm getting away with evil right now by touching an empty file at the same path, which gets obscured once the NFS share is mounted, but it's enough. You need that ISO file just to install the OS when you create the engine VM on the first host: you don't need a shared domain for that. So my suggestion is just to copy that ISO image on the first host and use it locally. You can destroy it when the setup is done. It's not at all clear to me how I'm supposed to edit things for my hosted engine setup. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed)
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: users-ovirt users@ovirt.org Sent: Friday, March 6, 2015 9:21:20 PM Subject: [ovirt-users] Error during hosted-engine-setup for 3.5.1 on F20 (Cannot add the host to cluster ... SSH has failed) Hi, I'm following the instructions here: http://www.ovirt.org/Hosted_Engine_Howto My self-hosted install failed near the end: To continue make a selection from the options below: (1) Continue setup - engine installation is complete (2) Power off and restart the VM (3) Abort setup (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: 1 [ INFO ] Engine replied: DB Up!Welcome to Health Status! Enter the name of the cluster to which you want to add the host (Default) [Default]: [ ERROR ] Cannot automatically add the host to cluster Default: Cannot add Host. Connecting to host via SSH has failed, verify that the host is reachable (IP address, routable address etc.) You may refer to the engine.log file for further details. [ ERROR ] Failed to execute stage 'Closing up': Cannot add the host to cluster Default [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20150306135624.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination I can ssh into the engine VM both locally and remotely. There is no /root/.ssh directory, however. Did I need to set that up somehow? It's the engine that needs to open an SSH connection to the host calling it by its hostname. So please be sure that you can SSH to the host from the engine using its hostname and not its IP address. Till hosted-engine hosts were simply identified by their IP address but than we had some bug report on side effects of that. So now we generate and sign certs using host hostnames and so the engine should be able to correctly resolve them. When I log into the Administration portal, the engine VM does not appear under the Virtual machine view (it's empty). It's cause the setup didn't complete. I've attached what I think are the relevant logs. Also, when my host reboots, the ovirt-ha-broker and ovirt-ha-agent services do not come up automatically. I have to use systemctl to start them manually. It's cause the setup didn't complete. This is a fresh Fedora 20 machine installing a fresh copy of Ovirt 3.5.1. What's the cleanest approach to restore/complete sanity of my setup please? First step is to clarify what went wrong in order to avoid it in the future. Than, if you want a really sanity environment for production use I'd suggest to redeploy. So hosted-engine --vm-poweroff empty the storage domain share and deploy again Thanks, Bob I've linked 3 files to this email: server.log (12.4 MB) Dropbox https://db.tt/g5p09AaD vdsm.log (3.2 MB) Dropbox https://db.tt/P4572SUm ovirt-hosted-engine-setup-20150306123622-tad1fy.log (413 KB) Dropbox https://db.tt/XAM9ffhi Mozilla Thunderbird makes it easy to share large files over email. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] engine-image-uploader failing to update OVF
- Original Message - From: Stephen Repetski srepe...@srepetsk.net To: users users@ovirt.org Sent: Friday, March 6, 2015 8:46:38 PM Subject: [ovirt-users] engine-image-uploader failing to update OVF Hi all, I'm trying to import an OVA (containing .ovf, disk, and disk.meta) into my ovirt environment, but it's failing during the import where it seems that it shouldn't. The command I'm running is: export TMPDIR=/data/backup/test; engine-image-uploader upload /data/convert/srepetsk-vm.ova --insecure --name=srepetsk-testvmimport -n $nfs_server:/backup/2fff9385-10b8-41e5-93c6-c0ef18b9840f -v The command mounts the nfs server, extracts the OVA into /data/backup/test/tmpxEpuMc/, parses the OVF file (/data/backup/test/tmpxEpuMc/srepetsk-vm.ovf), and creates the new .meta file and whatnot for the disk image. It then proceeds to fail saying: ERROR: Unable to update the OVF XML file. Message: [Errno 2] No such file or directory: '/data/backup/test/tmpxEpuMc/srepetsk-vm.ovf' however, this is the same file that it extracted and read earlier. What might I be doing wrong? At that point it has to update the OVF file so it has to be able to write it. Is /data/backup/test/ a local directory? Can you please check SELinux logs? Full(er) log: DEBUG: local extract directory for OVF is /data/backup/test/tmpxEpuMc DEBUG: Size of /data/convert/srepetsk-vm.ova: 17179876069 bytes 16777222.7 1K-blocks 16384.0 MB DEBUG: Available space in /data/backup/test/tmpxEpuMc: 5206184878080 bytes 5084164920 .0 1K-blocks 4965004.8 MB DEBUG: File is /data/backup/test/tmpxEpuMc/srepetsk-vm.ovf DEBUG: tag(Section) text(None) attr({'{ http://www.w3.org/2001/XMLSchema-instance}type ': 'ovf:VirtualHardwareSection_Type'}) class(Element Section at 1e752b8) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75368) DEBUG: tag({ http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_ResourceAllocationSettingData}Caption ) value(1 virtual cpu) snip DEBUG: old meta file(/data/backup/test/tmpxEpuMc/5f63da48-3ced-42ad-b684-72b626aec727.meta) new meta file(/data/backup/test/tmpxEpuMc/fffc878e-8df9-4b81-b04a-614a0af437a3.meta) DEBUG: old dir(/data/backup/test/tmpxEpuMc) new dir(/data/backup/test/4cd70a4f-f979-4b42-a416-2c8b4d028a88) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75470) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e754c8) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75520) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75578) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e755d0) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75628) DEBUG: tag(Section) text(None) attr({'{ http://www.w3.org/2001/XMLSchema-instance}type ': 'ovf:VirtualHardwareSection_Type'}) class(Element Section at 1e752b8) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75368) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e753c0) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75418) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75470) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e754c8) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75520) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75578) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e755d0) DEBUG: item tag(Item) item text(None) item attr({}) class(Element Item at 1e75628) ERROR: Unable to update the OVF XML file. Message: [Errno 2] No such file or directory: '/data/backup/test/tmpxEpuMc/srepetsk-vm.ovf' DEBUG: Cleaning up OVF extract directory /data/backup/test/tmpxEpuMc DEBUG: [Errno 2] No such file or directory: '/data/backup/test/tmpxEpuMc' DEBUG: /bin/umount -t nfs -f /data/backup/test/tmpd8kH1X DEBUG: /bin/umount -t nfs -f /data/backup/test/tmpd8kH1X DEBUG: _cmds(['/bin/umount', '-t', 'nfs', '-f', '/data/backup/test/tmpd8kH1X']) DEBUG: returncode(0) DEBUG: STDOUT() DEBUG: STDERR() Thanks, Stephen Stephen Repetski Rochester Institute of Technology '13 | http://srepetsk.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VDSM memory consumption
On Fri, Mar 06, 2015 at 10:58:53AM -0600, Darrell Budic wrote: I believe the supervdsm leak was fixed, but 3.5.1 versions of vdsmd still leaks slowly, ~300k/hr, yes. https://bugzilla.redhat.com/show_bug.cgi?id=1158108 On Mar 6, 2015, at 10:23 AM, Chris Adams c...@cmadams.net wrote: Once upon a time, Federico Alberto Sayd fs...@uncu.edu.ar said: I am experiencing troubles with VDSM memory consuption. I am running Engine: ovirt 3.5.1 Nodes: Centos 6.6 VDSM 4.16.10-8 Libvirt: libvirt-0.10.2-46 Kernel: 2.6.32 When the host boots, memory consuption is normal, but after 2 or 3 days running, VDSM memory consuption grows and it consumes more memory that all vm's running in the host. If I restart the vdsm service, memory consuption normalizes, but then it start growing again. I have seen some BZ about vdsm and supervdsm about memory leaks, but I don't know if VDSM 4.6.10.8 is still affected by a related bug. Can't help, but I see the same thing with CentOS 7 nodes and the same version of vdsm. -- Chris Adams c...@cmadams.net ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users I'm afraid that we are yet to find a solution for this issue, which is completly different from the horrible leak of supervdsm 4.16.7. Could you corroborate the claim of Bug 1147148 - M2Crypto usage in vdsm leaks memory ? Does the leak disappear once you start using plaintext transport? Regards, Dan. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users