Re: [Users] hosted engine help
On 03/13/2014 05:25 PM, Jason Brooks wrote: - Original Message - From: Greg Padgett gpadg...@redhat.com To: Jason Brooks jbro...@redhat.com Cc: Sandro Bonazzola sbona...@redhat.com, users@ovirt.org, Martin Sivak msi...@redhat.com Sent: Tuesday, March 11, 2014 7:52:42 AM Subject: Re: [Users] hosted engine help On 03/11/2014 04:09 PM, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? Regards, Jason *** ovirt-ha-agent.log MainThread::CRITICAL::2014-03-06 18:48:30,622::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success vdsm.log Thread-29::ERROR::2014-03-06 18:48:11,101::API::1607::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File /usr/share/vdsm/API.py, line 1598, in _getHaInfo stats = instance.get_all_stats() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py, line 86, in get_all_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success Greg, Martin, Request failed: success ? Hi Jason, I talked to Martin about this and opened a bug [1]/submitted a patch [2]. Based on your mail, I'm not sure if you experienced a race condition or some other issue. This patch should help the former case, but if you're still experiencing problems then we would need to investigate further. I made these changes to my install and now I get a different error: MainThread::CRITICAL::2014-03-13 12:05:47,749::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 191, in _checked_communicate return parts[1] IndexError: list index out of range I'm attaching my vdsm.log, agent.log
Re: [Users] hosted engine help
- Original Message - From: Jiri Moskovcak jmosk...@redhat.com To: Jason Brooks jbro...@redhat.com, Greg Padgett gpadg...@redhat.com Cc: users@ovirt.org Sent: Tuesday, March 25, 2014 10:34:16 AM Subject: Re: [Users] hosted engine help On 03/13/2014 05:25 PM, Jason Brooks wrote: - Original Message - From: Greg Padgett gpadg...@redhat.com To: Jason Brooks jbro...@redhat.com Cc: Sandro Bonazzola sbona...@redhat.com, users@ovirt.org, Martin Sivak msi...@redhat.com Sent: Tuesday, March 11, 2014 7:52:42 AM Subject: Re: [Users] hosted engine help On 03/11/2014 04:09 PM, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? Regards, Jason *** ovirt-ha-agent.log MainThread::CRITICAL::2014-03-06 18:48:30,622::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success vdsm.log Thread-29::ERROR::2014-03-06 18:48:11,101::API::1607::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File /usr/share/vdsm/API.py, line 1598, in _getHaInfo stats = instance.get_all_stats() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py, line 86, in get_all_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success Greg, Martin, Request failed: success ? Hi Jason, I talked to Martin about this and opened a bug [1]/submitted a patch [2]. Based on your mail, I'm not sure if you experienced a race condition or some other issue. This patch should help the former case, but if you're still experiencing problems then we would need to investigate further. I made these changes to my install and now I get a different error: MainThread::CRITICAL::2014-03-13 12:05:47,749::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib
Re: [Users] hosted engine help
On 03/25/2014 06:34 PM, Jiri Moskovcak wrote: On 03/13/2014 05:25 PM, Jason Brooks wrote: - Original Message - From: Greg Padgett gpadg...@redhat.com To: Jason Brooks jbro...@redhat.com Cc: Sandro Bonazzola sbona...@redhat.com, users@ovirt.org, Martin Sivak msi...@redhat.com Sent: Tuesday, March 11, 2014 7:52:42 AM Subject: Re: [Users] hosted engine help On 03/11/2014 04:09 PM, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? Regards, Jason *** ovirt-ha-agent.log MainThread::CRITICAL::2014-03-06 18:48:30,622::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success vdsm.log Thread-29::ERROR::2014-03-06 18:48:11,101::API::1607::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File /usr/share/vdsm/API.py, line 1598, in _getHaInfo stats = instance.get_all_stats() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py, line 86, in get_all_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success Greg, Martin, Request failed: success ? Hi Jason, I talked to Martin about this and opened a bug [1]/submitted a patch [2]. Based on your mail, I'm not sure if you experienced a race condition or some other issue. This patch should help the former case, but if you're still experiencing problems then we would need to investigate further. I made these changes to my install and now I get a different error: MainThread::CRITICAL::2014-03-13 12:05:47,749::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 191, in _checked_communicate return parts[1] IndexError: list index out of range I'm
Re: [Users] hosted engine help
Il 10/03/2014 22:32, Giuseppe Ragusa ha scritto: Hi all, Date: Mon, 10 Mar 2014 12:56:19 -0400 From: jbro...@redhat.com To: msi...@redhat.com CC: users@ovirt.org Subject: Re: [Users] hosted engine help - Original Message - From: Martin Sivak msi...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: users@ovirt.org Sent: Saturday, March 8, 2014 11:52:59 PM Subject: Re: [Users] hosted engine help Hi Jason, can you please attach the full logs? We had very similar issue before I we need to see if is the same or not. I may have to recreate it -- I switched back to an all in one engine after my setup started refusing to run the engine at all. It's no fun losing your engine! This was a migrated-from-standalone setup, maybe that caused additional wrinkles... Jason Thanks I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only). Using GlusterFS with hosted-engine storage is not supported and not recommended. HA daemon may not work properly there. I roughly followed the guide from Andrew Lau: http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/ with some variations due to newer packages (resolved bugs) and different hardware setup (no VLANs in my setup: physically separated networks; custom second nic added to Engine VM template before deploying etc.) The self-hosted installation on first node + Engine VM (configured for managing both oVirt and the storage; Datacenter default set to NFS because no GlusterFS offered) went apparently smooth, but the HA-agent failed to start at the very end (same errors in logs as Jason: the storage domain seems missing) and I was only able to start it all manually with: hosted-engine --connect-storage hosted-engine --start-pool The above commands are used for development and shouldn't be used for starting the engine. hosted-engine --vm-start then the Engine came up and I could use it, I even registered the second node (same final error in HA-agent) and tried to add GlusterFS storage domains for further VMs and ISOs (by the way: the original NFS-GlusterFS domain for Engine VM only is not present inside the Engine web UI) but it always failed activating the domains (they remain Inactive). Furthermore the engine gets killed some time after starting (from 3 up to 11 hours later) and the only way to get it back is repeating the above commands. Need logs for this. I always managed GlusterFS natively (not through oVirt) from the commandline and verified that the NFS-exported Engine-VM-only volume gets replicated, but I obviously failed to try migration because the HA part results inactive and oVirt refuse to migrate the Engine. Since I tried many times, with variations and further manual actions between (like trying to manually mount the NFS Engine domain, restarting the HA-agent only etc.), my logs are cluttered, so I should start from scratch again and pack up all logs in one swipe. +1 Tell me what I should capture and at which points in the whole process and I will try to follow up as soon as possible. What: hosted-engine-setup, hosted-engine-ha, vdsm, libvirt, sanlock from the physical hosts and engine and server logs from the hosted engine VM. When: As soon as you see an error. Many thanks, Giuseppe -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Why? The exception is raised from deep inside the ovirt_hosted_engine_ha code. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration
Re: [Users] hosted engine help
I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only). Using GlusterFS with hosted-engine storage is not supported and not recommended. HA daemon may not work properly there. I used Gluster NFS, too. That's not supported or recommended? I've assumed that I could use Gluster NFS / NFS interchangeably... Jason ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
On 03/11/2014 04:41 PM, Jason Brooks wrote: I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only). Using GlusterFS with hosted-engine storage is not supported and not recommended. HA daemon may not work properly there. I used Gluster NFS, too. That's not supported or recommended? I've assumed that I could use Gluster NFS / NFS interchangeably... that should be ok. make sure your gluster environment is robust with quorum, etc. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
On 03/11/2014 04:09 PM, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? Regards, Jason *** ovirt-ha-agent.log MainThread::CRITICAL::2014-03-06 18:48:30,622::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success vdsm.log Thread-29::ERROR::2014-03-06 18:48:11,101::API::1607::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File /usr/share/vdsm/API.py, line 1598, in _getHaInfo stats = instance.get_all_stats() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py, line 86, in get_all_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success Greg, Martin, Request failed: success ? Hi Jason, I talked to Martin about this and opened a bug [1]/submitted a patch [2]. Based on your mail, I'm not sure if you experienced a race condition or some other issue. This patch should help the former case, but if you're still experiencing problems then we would need to investigate further. Thanks, Greg [1] https://bugzilla.redhat.com/show_bug.cgi?id=1075126 [2] http://gerrit.ovirt.org/#/c/25621/ ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
Date: Tue, 11 Mar 2014 15:16:36 +0100 From: sbona...@redhat.com To: giuseppe.rag...@hotmail.com; jbro...@redhat.com; msi...@redhat.com CC: users@ovirt.org; fsimo...@redhat.com; gpadg...@redhat.com Subject: Re: [Users] hosted engine help Il 10/03/2014 22:32, Giuseppe Ragusa ha scritto: Hi all, Date: Mon, 10 Mar 2014 12:56:19 -0400 From: jbro...@redhat.com To: msi...@redhat.com CC: users@ovirt.org Subject: Re: [Users] hosted engine help - Original Message - From: Martin Sivak msi...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: users@ovirt.org Sent: Saturday, March 8, 2014 11:52:59 PM Subject: Re: [Users] hosted engine help Hi Jason, can you please attach the full logs? We had very similar issue before I we need to see if is the same or not. I may have to recreate it -- I switched back to an all in one engine after my setup started refusing to run the engine at all. It's no fun losing your engine! This was a migrated-from-standalone setup, maybe that caused additional wrinkles... Jason Thanks I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only). Using GlusterFS with hosted-engine storage is not supported and not recommended. HA daemon may not work properly there. If it is unsupported (and particularly not recommended) even with the interposed NFS (the native Gluster-provided NFSv3 export of a volume), then which is the recommended way to setup a fault-tolerant load-balanced 2 node oVirt cluster (without external dedicated SAN/NAS)? I roughly followed the guide from Andrew Lau: http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/ with some variations due to newer packages (resolved bugs) and different hardware setup (no VLANs in my setup: physically separated networks; custom second nic added to Engine VM template before deploying etc.) The self-hosted installation on first node + Engine VM (configured for managing both oVirt and the storage; Datacenter default set to NFS because no GlusterFS offered) went apparently smooth, but the HA-agent failed to start at the very end (same errors in logs as Jason: the storage domain seems missing) and I was only able to start it all manually with: hosted-engine --connect-storage hosted-engine --start-pool The above commands are used for development and shouldn't be used for starting the engine. Directly starting the engine (with the command below) failed because of storage unavailability, so I used the above trick as a last resort to at least prove that the engine was able to start and had not been somewhat destroyed or lost in the process (but I do understand that it is an extreme debug-only action). hosted-engine --vm-start then the Engine came up and I could use it, I even registered the second node (same final error in HA-agent) and tried to add GlusterFS storage domains for further VMs and ISOs (by the way: the original NFS-GlusterFS domain for Engine VM only is not present inside the Engine web UI) but it always failed activating the domains (they remain Inactive). Furthermore the engine gets killed some time after starting (from 3 up to 11 hours later) and the only way to get it back is repeating the above commands. Need logs for this. I will try to reproduce it all, but I can recall that on libvirt logs (HostedEngine.log) there was always clear indication of the PID that killed the Engine VM and each time it belonged to an instance of sanlock. I always managed GlusterFS natively (not through oVirt) from the commandline and verified that the NFS-exported Engine-VM-only volume gets replicated, but I obviously failed to try migration because the HA part results inactive and oVirt refuse to migrate the Engine. Since I tried many times, with variations and further manual actions between (like trying to manually mount the NFS Engine domain, restarting the HA-agent only etc.), my logs are cluttered, so I should start from scratch again and pack up all logs in one swipe. +1 ; Tell me what I should capture and at which points in the whole process and I will try to follow up as soon as possible. What: hosted-engine-setup, hosted-engine-ha, vdsm, libvirt, sanlock from the physical hosts and engine and server logs from the hosted engine VM. When: As soon as you see an error. If the setup design (wholly GlusterFS based) is somewhat flawed, please point me to some hint/docs/guide for the right way of setting it up on 2 standalone physical nodes, so as not to waste your time in chasing defects in something that is not supposed to be working anyway. I will follow your
Re: [Users] hosted engine help
- Original Message - From: Martin Sivak msi...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: users@ovirt.org Sent: Saturday, March 8, 2014 11:52:59 PM Subject: Re: [Users] hosted engine help Hi Jason, can you please attach the full logs? We had very similar issue before I we need to see if is the same or not. I may have to recreate it -- I switched back to an all in one engine after my setup started refusing to run the engine at all. It's no fun losing your engine! This was a migrated-from-standalone setup, maybe that caused additional wrinkles... Jason Thanks -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Why? The exception is raised from deep inside the ovirt_hosted_engine_ha code. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
Hi all, Date: Mon, 10 Mar 2014 12:56:19 -0400 From: jbro...@redhat.com To: msi...@redhat.com CC: users@ovirt.org Subject: Re: [Users] hosted engine help - Original Message - From: Martin Sivak msi...@redhat.com To: Dan Kenigsberg dan...@redhat.com Cc: users@ovirt.org Sent: Saturday, March 8, 2014 11:52:59 PM Subject: Re: [Users] hosted engine help Hi Jason, can you please attach the full logs? We had very similar issue before I we need to see if is the same or not. I may have to recreate it -- I switched back to an all in one engine after my setup started refusing to run the engine at all. It's no fun losing your engine! This was a migrated-from-standalone setup, maybe that caused additional wrinkles... Jason Thanks I experienced the exact same symptoms as Jason on a from-scratch installation on two physical nodes with CentOS 6.5 (fully up-to-date) using oVirt 3.4.0_pre (latest test-day release) and GlusterFS 3.5.0beta3 (with Gluster-provided NFS as storage for the self-hosted engine VM only). I roughly followed the guide from Andrew Lau: http://www.andrewklau.com/ovirt-hosted-engine-with-3-4-0-nightly/ with some variations due to newer packages (resolved bugs) and different hardware setup (no VLANs in my setup: physically separated networks; custom second nic added to Engine VM template before deploying etc.) The self-hosted installation on first node + Engine VM (configured for managing both oVirt and the storage; Datacenter default set to NFS because no GlusterFS offered) went apparently smooth, but the HA-agent failed to start at the very end (same errors in logs as Jason: the storage domain seems missing) and I was only able to start it all manually with: hosted-engine --connect-storagehosted-engine --start-poolhosted-engine --vm-start then the Engine came up and I could use it, I even registered the second node (same final error in HA-agent) and tried to add GlusterFS storage domains for further VMs and ISOs (by the way: the original NFS-GlusterFS domain for Engine VM only is not present inside the Engine web UI) but it always failed activating the domains (they remain Inactive). Furthermore the engine gets killed some time after starting (from 3 up to 11 hours later) and the only way to get it back is repeating the above commands. I always managed GlusterFS natively (not through oVirt) from the commandline and verified that the NFS-exported Engine-VM-only volume gets replicated, but I obviously failed to try migration because the HA part results inactive and oVirt refuse to migrate the Engine. Since I tried many times, with variations and further manual actions between (like trying to manually mount the NFS Engine domain, restarting the HA-agent only etc.), my logs are cluttered, so I should start from scratch again and pack up all logs in one swipe. Tell me what I should capture and at which points in the whole process and I will try to follow up as soon as possible. Many thanks,Giuseppe -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Why? The exception is raised from deep inside the ovirt_hosted_engine_ha code. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
Hi Jason, can you please attach the full logs? We had very similar issue before I we need to see if is the same or not. Thanks -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Why? The exception is raised from deep inside the ovirt_hosted_engine_ha code. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Regards, Jason *** ovirt-ha-agent.log MainThread::CRITICAL::2014-03-06 18:48:30,622::agent::103::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Could not start ha-agent Traceback (most recent call last): File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 97, in run self._run_agent() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/agent.py, line 154, in _run_agent hosted_engine.HostedEngine(self.shutdown_requested).start_monitoring() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 303, in start_monitoring for old_state, state, delay in self.fsm: File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/fsm/machine.py, line 125, in next new_data = self.refresh(self._state.data) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/state_machine.py, line 77, in refresh stats.update(self.hosted_engine.collect_stats()) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py, line 623, in collect_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success vdsm.log Thread-29::ERROR::2014-03-06 18:48:11,101::API::1607::vds::(_getHaInfo) failed to retrieve Hosted Engine HA info Traceback (most recent call last): File /usr/share/vdsm/API.py, line 1598, in _getHaInfo stats = instance.get_all_stats() File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py, line 86, in get_all_stats constants.SERVICE_TYPE) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 171, in get_stats_from_storage result = self._checked_communicate(request) File /usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py, line 198, in _checked_communicate raise RequestError(Request failed: {0}.format(msg)) RequestError: Request failed: success ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users -- Sandro Bonazzola Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [Users] hosted engine help
On Fri, Mar 07, 2014 at 10:17:43AM +0100, Sandro Bonazzola wrote: Il 07/03/2014 01:10, Jason Brooks ha scritto: Hey everyone, I've been testing out oVirt 3.4 w/ hosted engine, and while I've managed to bring the engine up, I've only been able to do it manually, using hosted-engine --vm-start. The ovirt-ha-agent service fails reliably for me, erroring out with RequestError: Request failed: success. I've pasted error passages from the ha agent and vdsm logs below. Any pointers? looks like a VDSM bug, Dan? Why? The exception is raised from deep inside the ovirt_hosted_engine_ha code. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users