Re: [ovirt-users] Can HA Agent control NFS Mount?
Hi, One by directly using rem_lockspace (since it's the hosted-engine one) The hosted engine lockspace protects the ID of the node. Global maintenance only disables the state machine, but it still reports data to the shared storage. The hosted engine lock can only be released when the agent is down. If the lock is still there even after you call service ovirt-ha-agent stop, then it is a bug somewhere. the other one by stopMonitoringDomain Shouldn't this be done by VDSM or sanlock when the VM disappears? The lock has to stay acquired any time the VM is running (independently on the hosted engine services or vdsm) to protect the VM's data. We can't release a lock for running VM, because some other host might try to start it and corrupt data by doing it. Martin -- Martin Sivák msi...@redhat.com Red Hat Czech RHEV-M SLA / Brno, CZ - Original Message - - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Hi, Did anyone find much luck tracking this down? I rebooted one of our servers and hit this issue again, conveniently, the dell remote access card has borked as well.. so a 50 minute trip to the DC.. On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com wrote: Specifically, if do the following: - Enter global maintenance (hosted-engine --set-maintenance-mode --mode=global) - init 0 the engine - systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd and then run sanlock client status I see: # sanlock client status daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 Waiting a few minutes does not change this state. The earlier data I shared which showed HostedEngine was with a different test scenario. -Bob On 06/18/2014 07:53 AM, Bob Doolittle wrote: I see I have a very unfortunate typo in my previous mail. As supported by the vm-status output I attached, I had set --mode=global (not none) in step 1. I am not the only one experiencing this. I can reproduce it easily. It appears that shutting down vdsm causes the HA services to incorrectly think the system has come out of Global Maintenance and restart the engine. -Bob On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/ xion2.smartcity.net \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
btw, this happened on an aborted hosted-engine install. So, the ha-agents hadn't even started up.. just the VM running. On Sat, Jul 19, 2014 at 11:24 PM, Andrew Lau and...@andrewklau.com wrote: Hi, Did anyone find much luck tracking this down? I rebooted one of our servers and hit this issue again, conveniently, the dell remote access card has borked as well.. so a 50 minute trip to the DC.. On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com wrote: Specifically, if do the following: - Enter global maintenance (hosted-engine --set-maintenance-mode --mode=global) - init 0 the engine - systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd and then run sanlock client status I see: # sanlock client status daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 Waiting a few minutes does not change this state. The earlier data I shared which showed HostedEngine was with a different test scenario. -Bob On 06/18/2014 07:53 AM, Bob Doolittle wrote: I see I have a very unfortunate typo in my previous mail. As supported by the vm-status output I attached, I had set --mode=global (not none) in step 1. I am not the only one experiencing this. I can reproduce it easily. It appears that shutting down vdsm causes the HA services to incorrectly think the system has come out of Global Maintenance and restart the engine. -Bob On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/ xion2.smartcity.net \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
I see I have a very unfortunate typo in my previous mail. As supported by the vm-status output I attached, I had set --mode=global (not none) in step 1. I am not the only one experiencing this. I can reproduce it easily. It appears that shutting down vdsm causes the HA services to incorrectly think the system has come out of Global Maintenance and restart the engine. -Bob On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/ xion2.smartcity.net \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Specifically, if do the following: * Enter global maintenance (hosted-engine --set-maintenance-mode --mode=global) * init 0 the engine * systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd and then run sanlock client status I see: # sanlock client status daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 Waiting a few minutes does not change this state. The earlier data I shared which showed HostedEngine was with a different test scenario. -Bob On 06/18/2014 07:53 AM, Bob Doolittle wrote: I see I have a very unfortunate typo in my previous mail. As supported by the vm-status output I attached, I had set --mode=global (not none) in step 1. I am not the only one experiencing this. I can reproduce it easily. It appears that shutting down vdsm causes the HA services to incorrectly think the system has come out of Global Maintenance and restart the engine. -Bob On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com mailto:fsimo...@redhat.com wrote: - Original Message - From: Bob Doolittle b...@doolittle.us.com mailto:b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com mailto:dfedi...@redhat.com, Andrew Lau and...@andrewklau.com mailto:and...@andrewklau.com Cc: users users@ovirt.org mailto:users@ovirt.org, Federico Simoncelli fsimo...@redhat.com mailto:fsimo...@redhat.com Sent: Saturday, June 14, 2014 1:29:54 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net http://xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 This output shows that the lockspaces are still acquired. When you put hosted-engine in maintenance they must be released. One by directly using rem_lockspace (since it's the hosted-engine one) and the other one by stopMonitoringDomain. I quickly looked at the ovirt-hosted-engine* projects and I haven't found anything related to that. -- Federico ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
I suppose a hosted-engine solution without HA would suffice the use case of just having one system to host and manage vms, with the ability to extend this system to many more. Am 03.06.2014 13:52, schrieb Itamar Heim: what would look different for hosted-engine on a single host? just not have the ha feature? -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Would that help the issue being reported in this thread at all? This thread was about issues with clean shutdown of a single node hosted environment, which result in hangs/timeouts and the inability to issue poweroff without it resulting in a reboot. There have been no suggestions about how to resolve those issues, which seem related to sanlock leases not being cleanly released. -Bob On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote: I suppose a hosted-engine solution without HA would suffice the use case of just having one system to host and manage vms, with the ability to extend this system to many more. Am 03.06.2014 13:52, schrieb Itamar Heim: what would look different for hosted-engine on a single host? just not have the ha feature? -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Bob, the way to handle it is to switch to global maintenance, and then ssh into the VM and shut it down. After rebooting you should switch maintenance mode to off. - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Sven Kieske s.kie...@mittwald.de Cc: users users@ovirt.org, Doron Fediuck dfedi...@redhat.com, Itamar Heim ih...@redhat.com Sent: Friday, June 13, 2014 3:16:12 PM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? Would that help the issue being reported in this thread at all? This thread was about issues with clean shutdown of a single node hosted environment, which result in hangs/timeouts and the inability to issue poweroff without it resulting in a reboot. There have been no suggestions about how to resolve those issues, which seem related to sanlock leases not being cleanly released. -Bob On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote: I suppose a hosted-engine solution without HA would suffice the use case of just having one system to host and manage vms, with the ability to extend this system to many more. Am 03.06.2014 13:52, schrieb Itamar Heim: what would look different for hosted-engine on a single host? just not have the ha feature? -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Andrew Lau and...@andrewklau.com To: Bob Doolittle b...@doolittle.us.com Cc: users users@ovirt.org Sent: Friday, June 6, 2014 6:14:18 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle b...@doolittle.us.com wrote: Thanks Andrew, I'll try this workaround tomorrow for sure. But reading though that bug report (closed not a bug) it states that the problem should only arise if something is not releasing a sanlock lease. So if we've entered Global Maintenance and shut down Engine, the question is what's holding the lease? How can that be debugged? For me it's wdmd and sanlock itself failing to shutdown properly. I also noticed even when in global maintenance and the engine VM powered off there is still a sanlock lease for the /rhev/mnt/hosted-engine/? lease file or something along those lines. So the global maintenance may not actually be releasing that lock. I'm not too familiar with sanlock etc. So it's like stabbing in the dark :( Sounds like a bug since once the VM is off there should not be a lease taken. Please check if after a minute you still have a lease taken according to: http://www.ovirt.org/SANLock#sanlock_timeouts In this case try to stop vdsm and libvirt just so we'll know who still keeps the lease. -Bob On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote: On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff While searching for my issue of wdmd/sanlock not shutting down, I found this which may interest you both: https://bugzilla.redhat.com/show_bug.cgi?id=888197 Specifically: To shut down sanlock without causing a wdmd reboot, you can run the following command: sanlock client shutdown -f 1 This will cause sanlock to kill any pid's that are holding leases, release those leases, and then exit. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Doron, This is my normal process but it does not resolve the issue. A few of us who have experienced this have tried a number of things. I see two hangs/wdmd timeouts during shutdown, so I think there are two remaining lease holders. I find if I stop vdsmd, ovirt-ha-agent, and ovirt-ha-broker services I only experience the last hang (almost at the end, as it's shutting down filesystems). Any hang results in a wdmd timeout and consequently a reboot instead of poweroff. If I never bring engine up, things shut down cleanly. I will try your suggestions in the other mail a bit later this morning. -Bob On 06/13/2014 08:28 AM, Doron Fediuck wrote: Bob, the way to handle it is to switch to global maintenance, and then ssh into the VM and shut it down. After rebooting you should switch maintenance mode to off. - Original Message - From: Bob Doolittle b...@doolittle.us.com To: Sven Kieske s.kie...@mittwald.de Cc: users users@ovirt.org, Doron Fediuck dfedi...@redhat.com, Itamar Heim ih...@redhat.com Sent: Friday, June 13, 2014 3:16:12 PM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? Would that help the issue being reported in this thread at all? This thread was about issues with clean shutdown of a single node hosted environment, which result in hangs/timeouts and the inability to issue poweroff without it resulting in a reboot. There have been no suggestions about how to resolve those issues, which seem related to sanlock leases not being cleanly released. -Bob On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote: I suppose a hosted-engine solution without HA would suffice the use case of just having one system to host and manage vms, with the ability to extend this system to many more. Am 03.06.2014 13:52, schrieb Itamar Heim: what would look different for hosted-engine on a single host? just not have the ha feature? -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
It turns out I was wrong before. I don't have to start up Engine to get into this situation. I did the following: * Turn on Global Maintenance * Engine init 0 * Reboot node * Wait a few minutes * poweroff I'll get the timeouts and hangs during shutdown again, and a reset instead of poweroff. It's possible that somehow the system is coming out of Global Maintenance mode during shutdown, and the Engine VM is starting up and causing this issue. I did the following. 1. hosted-engine --set-maintenance --mode=none You can see the attached output from 'hosted-engine --vm-status' (hosted-engine.out) at this point, indicating that the system is in Global Maintenance 2. Waited 60 seconds, and checked sanlock You can see the attached output of 'sanlock client status' (sanlock-status.out) at this point, showing the Engine VM locks being held 3. I stopped the vdsmd service (note that the first time I tried I got Job for vdsmd.service cancelled, and re-issued the stop. You can see the attached output of 'sanlock client status', and the following commands (output) What's interesting and I didn't notice right away, is that after I stopped vdsmd the sanlock status started changing as if the locks were being manipulated. After I stopped vdsmd, the HA services, and libvirtd, and waited 60 seconds, I noticed the locks seemed to be changing state and that HostedEngine was listed. At that point I got suspicious and started vdsmd again so that I could recheck Global Maintenance mode, and I found that the system was no longer *in* maintenance, and that the Engine VM was running. So I think this partly explains the situation. Somehow the act of stopping vdsmd is making the system look like it is *out* of Global Maintenance mode, and the Engine VM starts up while the system is shutting down. This creates new sanlock leases on the Engine VM storage, which prevents the system from shutting down cleanly. Oddly after a reboot Global Maintenance is preserved. But there may be more going on. Even if I stop vdsmd, the HA services, and libvirtd, and sleep 60 seconds, I still see a lock held on the Engine VM storage: daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar p -1 helper p -1 listener p -1 status s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0 s hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0 It stays in this state however and HostedEngine doesn't grab a lock again. In any case no matter what I do, it's impossible to shut the system down cleanly. -Bob On 06/13/2014 08:33 AM, Doron Fediuck wrote: - Original Message - From: Andrew Lauand...@andrewklau.com To: Bob Doolittleb...@doolittle.us.com Cc: usersusers@ovirt.org Sent: Friday, June 6, 2014 6:14:18 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittleb...@doolittle.us.com wrote: Thanks Andrew, I'll try this workaround tomorrow for sure. But reading though that bug report (closed not a bug) it states that the problem should only arise if something is not releasing a sanlock lease. So if we've entered Global Maintenance and shut down Engine, the question is what's holding the lease? How can that be debugged? For me it's wdmd and sanlock itself failing to shutdown properly. I also noticed even when in global maintenance and the engine VM powered off there is still a sanlock lease for the /rhev/mnt/hosted-engine/? lease file or something along those lines. So the global maintenance may not actually be releasing that lock. I'm not too familiar with sanlock etc. So it's like stabbing in the dark :( Sounds like a bug since once the VM is off there should not be a lease taken. Please check if after a minute you still have a lease taken according to:http://www.ovirt.org/SANLock#sanlock_timeouts In this case try to stop vdsm and libvirt just so we'll know who still keeps the lease. -Bob On Jun 5, 2014 10:56 PM, Andrew Lauand...@andrewklau.com wrote: On Mon, May 26, 2014 at 5:10 AM, Bob Doolittleb...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most
Re: [ovirt-users] Can HA Agent control NFS Mount?
Hi Doron, On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote: - Original Message - From: Andrew Lau and...@andrewklau.com To: Bob Doolittle b...@doolittle.us.com Cc: users users@ovirt.org Sent: Monday, May 26, 2014 7:30:41 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) For 1. I was wondering if perhaps, we could have an option to specify the mount options. If I understand correctly, applying a soft mount instead of a hard mount would prevent this from happening. I'm however not sure of the implications this would have on the data integrity.. I would really like to see it happen in the ha-agent, as it's the one which connects/mounts the storage it should also unmount it on boot. However the stability on it, is flaky at best. I've noticed if `df` hangs because of another NFS mount having timed-out the agent will die. That's not a good sign.. this was what actually caused my hosted-engine to run twice in one case. Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. After a short while, the qemu process should gracefully end and release its sanlock lease as well as any other resource, which means you can reboot your hypervisor peacefully. Sadly no, I've only been able to reboot my hypervisors if one of the two conditions are met: - Lazy unmount of /rhev/mnt/hosted-engine etc. - killall -9 sanlock wdmd I notice sanlock and wdmd are not able to be stopped with service wdmd stop; service sanlock stop These seem to fail during the shutdown/reboot process which prevents the unmount and the graceful reboot. Are there any logs I can look into on how to debug those failed shutdowns? Doron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff While searching for my issue of wdmd/sanlock not shutting down, I found this which may interest you both: https://bugzilla.redhat.com/show_bug.cgi?id=888197 Specifically: To shut down sanlock without causing a wdmd reboot, you can run the following command: sanlock client shutdown -f 1 This will cause sanlock to kill any pid's that are holding leases, release those leases, and then exit. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Thanks Andrew, I'll try this workaround tomorrow for sure. But reading though that bug report (closed not a bug) it states that the problem should only arise if something is not releasing a sanlock lease. So if we've entered Global Maintenance and shut down Engine, the question is what's holding the lease? How can that be debugged? -Bob On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote: On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff While searching for my issue of wdmd/sanlock not shutting down, I found this which may interest you both: https://bugzilla.redhat.com/show_bug.cgi?id=888197 Specifically: To shut down sanlock without causing a wdmd reboot, you can run the following command: sanlock client shutdown -f 1 This will cause sanlock to kill any pid's that are holding leases, release those leases, and then exit. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle b...@doolittle.us.com wrote: Thanks Andrew, I'll try this workaround tomorrow for sure. But reading though that bug report (closed not a bug) it states that the problem should only arise if something is not releasing a sanlock lease. So if we've entered Global Maintenance and shut down Engine, the question is what's holding the lease? How can that be debugged? For me it's wdmd and sanlock itself failing to shutdown properly. I also noticed even when in global maintenance and the engine VM powered off there is still a sanlock lease for the /rhev/mnt/hosted-engine/? lease file or something along those lines. So the global maintenance may not actually be releasing that lock. I'm not too familiar with sanlock etc. So it's like stabbing in the dark :( -Bob On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote: On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff While searching for my issue of wdmd/sanlock not shutting down, I found this which may interest you both: https://bugzilla.redhat.com/show_bug.cgi?id=888197 Specifically: To shut down sanlock without causing a wdmd reboot, you can run the following command: sanlock client shutdown -f 1 This will cause sanlock to kill any pid's that are holding leases, release those leases, and then exit. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On 05/27/2014 03:47 PM, Doron Fediuck wrote: - Original Message - From: Sven Kieske s.kie...@mittwald.de To: Doron Fediuck dfedi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, May 27, 2014 12:44:23 PM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? So now I have to ask, what is the purpose of ovirt-all-in-one? Just for tech demos at events? I can live with any solution, but I feel it will limit ovirts adoption and the use-cases should get better documented. There are still many beginners asking very basic questions on this ML and I think this shows the lack of a better documentation for the first steps to take for a working ovirt environment. Or what do you think? Am 27.05.2014 11:02, schrieb Doron Fediuck: I do not think a single-host use case is technically impossible, but we need to understand a VM may die, and in a single host setup this may have additional implications. So why get into troubles instead of planning your way or using the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a picture frame on the wall. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen Hi Sven, All-in-one is classic-style installation (ie- not running in a vm), and indeed was created for demo's. However it indeed may grow to support additional hosts in a different DC, as the one it starts with is for local storage. This is why I was saying that the issue is not technical but requires some understanding of the admin to where he wants to go. As for documentation, that's always an issue. Keeping docs updated is an on going task and we'd appreciate any assistance we get for it. Sven - so assuming we want to reduce the number of configurations we want to support, instead of all-in-one which was our previous solutions for POCs/Demos, what would look different for hosted-engine on a single host? just not have the ha feature? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On Sat, May 31, 2014 at 6:06 AM, Joop jvdw...@xs4all.nl wrote: Bob Doolittle wrote: Joop, On 05/26/2014 02:43 AM, Joop wrote: Yesterday evening I have found the service responsible for the reboot instead of the powerdown. If I do: service wdmd stop the server will reboot. It seems the watchdog is hung up and eventually this will lead to a crash and thus a reboot instead of the shutdown. Anyone knows how to debug this? Did you get anywhere with this? Pretty nasty. Is there a bug open? We're getting a timeout on an NFS mount during the powerdown (single-node hosted, after global maintenance enabled and engine powered off), and that makes the machine reboot and try to come back up again instead of powering off. So two issues: - What is the mount that is hanging (probably an oVirt issue)? Don´t know what that problem is. I have a local nfs mount but don´t experience that problem I got to the console when mine went for a reboot. I see sanlock and wdmd failing to shutdown properly which would explain why it doesn't unmount properly. - Why does the system reboot instead of powering down as instructed (?)? the reboot is caused by wdmd. Docs says that if the watchdogs aren´t responding that a reset will follow. So our init 0 is overruled because of the hanging watchdog. Why it is hanging I don´t know. Could be my chipset, could be the version of wdmd-kernel. Only thing is I know it didn´t happen always in the past but that is as much as I remember, sorry. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Joop, On 05/26/2014 02:43 AM, Joop wrote: Yesterday evening I have found the service responsible for the reboot instead of the powerdown. If I do: service wdmd stop the server will reboot. It seems the watchdog is hung up and eventually this will lead to a crash and thus a reboot instead of the shutdown. Anyone knows how to debug this? Did you get anywhere with this? Pretty nasty. Is there a bug open? We're getting a timeout on an NFS mount during the powerdown (single-node hosted, after global maintenance enabled and engine powered off), and that makes the machine reboot and try to come back up again instead of powering off. So two issues: - What is the mount that is hanging (probably an oVirt issue)? - Why does the system reboot instead of powering down as instructed (?)? -Bob Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Bob Doolittle wrote: Joop, On 05/26/2014 02:43 AM, Joop wrote: Yesterday evening I have found the service responsible for the reboot instead of the powerdown. If I do: service wdmd stop the server will reboot. It seems the watchdog is hung up and eventually this will lead to a crash and thus a reboot instead of the shutdown. Anyone knows how to debug this? Did you get anywhere with this? Pretty nasty. Is there a bug open? We're getting a timeout on an NFS mount during the powerdown (single-node hosted, after global maintenance enabled and engine powered off), and that makes the machine reboot and try to come back up again instead of powering off. So two issues: - What is the mount that is hanging (probably an oVirt issue)? Don´t know what that problem is. I have a local nfs mount but don´t experience that problem - Why does the system reboot instead of powering down as instructed (?)? the reboot is caused by wdmd. Docs says that if the watchdogs aren´t responding that a reset will follow. So our init 0 is overruled because of the hanging watchdog. Why it is hanging I don´t know. Could be my chipset, could be the version of wdmd-kernel. Only thing is I know it didn´t happen always in the past but that is as much as I remember, sorry. Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Hi Doron, Before the initial thread sways a little more.. On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote: - Original Message - From: Andrew Lau and...@andrewklau.com To: Bob Doolittle b...@doolittle.us.com Cc: users users@ovirt.org Sent: Monday, May 26, 2014 7:30:41 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) For 1. I was wondering if perhaps, we could have an option to specify the mount options. If I understand correctly, applying a soft mount instead of a hard mount would prevent this from happening. I'm however not sure of the implications this would have on the data integrity.. I would really like to see it happen in the ha-agent, as it's the one which connects/mounts the storage it should also unmount it on boot. However the stability on it, is flaky at best. I've noticed if `df` hangs because of another NFS mount having timed-out the agent will die. That's not a good sign.. this was what actually caused my hosted-engine to run twice in one case. Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. After a short while, the qemu process should gracefully end and release its sanlock lease as well as any other resource, which means you can reboot your hypervisor peacefully. What about in a 2 host cluster. Lets say we want to take down 1 host for maintenance, so 50% chance it could be running the engine. Would setting maintenance-mode local do the same thing and allow a clean shutdown/reboot? Doron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Andrew Lau and...@andrewklau.com To: Doron Fediuck dfedi...@redhat.com Cc: Bob Doolittle b...@doolittle.us.com, users users@ovirt.org, Jiri Moskovcak jmosk...@redhat.com, Sandro Bonazzola sbona...@redhat.com Sent: Wednesday, May 28, 2014 11:03:38 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? Hi Doron, Before the initial thread sways a little more.. On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote: - Original Message - From: Andrew Lau and...@andrewklau.com To: Bob Doolittle b...@doolittle.us.com Cc: users users@ovirt.org Sent: Monday, May 26, 2014 7:30:41 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) For 1. I was wondering if perhaps, we could have an option to specify the mount options. If I understand correctly, applying a soft mount instead of a hard mount would prevent this from happening. I'm however not sure of the implications this would have on the data integrity.. I would really like to see it happen in the ha-agent, as it's the one which connects/mounts the storage it should also unmount it on boot. However the stability on it, is flaky at best. I've noticed if `df` hangs because of another NFS mount having timed-out the agent will die. That's not a good sign.. this was what actually caused my hosted-engine to run twice in one case. Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. After a short while, the qemu process should gracefully end and release its sanlock lease as well as any other resource, which means you can reboot your hypervisor peacefully. What about in a 2 host cluster. Lets say we want to take down 1 host for maintenance, so 50% chance it could be running the engine. Would setting maintenance-mode local do the same thing and allow a clean shutdown/reboot? Yes. That's the idea behind of local (aka host) maintenance[1]. Starting 3.4 all you need to do is move the host to maintenance in the UI and this will also set the local maintenance mode for this host. So you should be able to do everything with it, and use 'activate' in the UI to get it into production. [1] http://www.ovirt.org/Features/Self_Hosted_Engine#Maintenance_Flows ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Bob Doolittle b...@doolittle.us.com To: Doron Fediuck dfedi...@redhat.com, Andrew Lau and...@andrewklau.com Cc: users users@ovirt.org, Jiri Moskovcak jmosk...@redhat.com, Sandro Bonazzola sbona...@redhat.com Sent: Monday, May 26, 2014 6:12:04 PM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On 05/26/2014 02:38 AM, Doron Fediuck wrote: Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. I hope you will keep this configuration in your design space as you move forward. As you can see it is likely to be a popular one. At a previous employer, everybody in our development group had a single (powerful) machine running VMware vSphere, with a hosted vCenter, to use as a development VM testbed. It's likely to be popular for people with few resources who want to run in a fully supported configuration. HA is great, but not every environment has the resources/requirement to support it. oVirt has a lot to offer even a leaner environment and I hope those environments continue to get attention as the product matures. For example, I hope there is a reasonable story around upgrades to future releases for single-node hosted configurations. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. What is the recipe for a clean startup after shutdown? Can we do 'hosted-engine --vm-start' while the system is in Global Maintenance mode? Thanks, Bob Hi Bob, If all you need is a way to run VMs, then Kimchi[1] from the oVirt eco-system can be a better solution, as it's designed for a single-node use case. If you wish to run ovirt in a vm you can run a VM with Kimchi and install the engine inside. Also, coming on 3.5 is the ovirt appliance[2] which will simplify it. If I'm missing something please share your thoughts. As for clean start after maintenance, I'd expect hosted-engine --set-maintenance=none will allow the agent to start the VM automatically. If this is not the case please let us know. Thanks, Doron [1] https://github.com/kimchi-project/kimchi/wiki [2] http://www.ovirt.org/Feature/oVirtAppliance ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Well I don't think Kimchi is a solution to the problem Bob mentioned. What you want (and what you get at Vmware btw) is _one_ solution to start on a single node (laptop or server, whatever) and which is able to scale to manage 100 Hosts or a complete data center. I doubt kimchi can do this. oVirt could be tweaked to do this. Other thoughts? Am 27.05.2014 09:37, schrieb Doron Fediuck: If I'm missing something please share your thoughts. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Sven Kieske s.kie...@mittwald.de To: users@ovirt.org Sent: Tuesday, May 27, 2014 10:44:39 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? Well I don't think Kimchi is a solution to the problem Bob mentioned. What you want (and what you get at Vmware btw) is _one_ solution to start on a single node (laptop or server, whatever) and which is able to scale to manage 100 Hosts or a complete data center. I doubt kimchi can do this. oVirt could be tweaked to do this. Other thoughts? Am 27.05.2014 09:37, schrieb Doron Fediuck: If I'm missing something please share your thoughts. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen Hi Sven, Referring to Bob's mail, he mentioned people with little resources, and also a development use case- running VMware vSphere, with a hosted vCenter, to use as a development VM testbed. These are valid cases for Kimchi, where all you want to do is simply run a VM. If you want to be able scale up to 100 hosts or more, you can definitely start with 2 hosts, which is the hosted engine use case. The reason why I mention it, is that additional hosts will require proper network and storage configuration, which basically mean you want to plan ahead even when you begin with a modest deployment of 2 hosts. I do not think a single-host use case is technically impossible, but we need to understand a VM may die, and in a single host setup this may have additional implications. So why get into troubles instead of planning your way or using the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a picture frame on the wall. Doron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
So now I have to ask, what is the purpose of ovirt-all-in-one? Just for tech demos at events? I can live with any solution, but I feel it will limit ovirts adoption and the use-cases should get better documented. There are still many beginners asking very basic questions on this ML and I think this shows the lack of a better documentation for the first steps to take for a working ovirt environment. Or what do you think? Am 27.05.2014 11:02, schrieb Doron Fediuck: I do not think a single-host use case is technically impossible, but we need to understand a VM may die, and in a single host setup this may have additional implications. So why get into troubles instead of planning your way or using the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a picture frame on the wall. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Sven Kieske s.kie...@mittwald.de To: Doron Fediuck dfedi...@redhat.com Cc: users@ovirt.org Sent: Tuesday, May 27, 2014 12:44:23 PM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? So now I have to ask, what is the purpose of ovirt-all-in-one? Just for tech demos at events? I can live with any solution, but I feel it will limit ovirts adoption and the use-cases should get better documented. There are still many beginners asking very basic questions on this ML and I think this shows the lack of a better documentation for the first steps to take for a working ovirt environment. Or what do you think? Am 27.05.2014 11:02, schrieb Doron Fediuck: I do not think a single-host use case is technically impossible, but we need to understand a VM may die, and in a single host setup this may have additional implications. So why get into troubles instead of planning your way or using the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a picture frame on the wall. -- Mit freundlichen Grüßen / Regards Sven Kieske Systemadministrator Mittwald CM Service GmbH Co. KG Königsberger Straße 6 32339 Espelkamp T: +49-5772-293-100 F: +49-5772-293-333 https://www.mittwald.de Geschäftsführer: Robert Meyer St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen Hi Sven, All-in-one is classic-style installation (ie- not running in a vm), and indeed was created for demo's. However it indeed may grow to support additional hosts in a different DC, as the one it starts with is for local storage. This is why I was saying that the issue is not technical but requires some understanding of the admin to where he wants to go. As for documentation, that's always an issue. Keeping docs updated is an on going task and we'd appreciate any assistance we get for it. Doron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Andrew Lau and...@andrewklau.com To: Bob Doolittle b...@doolittle.us.com Cc: users users@ovirt.org Sent: Monday, May 26, 2014 7:30:41 AM Subject: Re: [ovirt-users] Can HA Agent control NFS Mount? On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) For 1. I was wondering if perhaps, we could have an option to specify the mount options. If I understand correctly, applying a soft mount instead of a hard mount would prevent this from happening. I'm however not sure of the implications this would have on the data integrity.. I would really like to see it happen in the ha-agent, as it's the one which connects/mounts the storage it should also unmount it on boot. However the stability on it, is flaky at best. I've noticed if `df` hangs because of another NFS mount having timed-out the agent will die. That's not a good sign.. this was what actually caused my hosted-engine to run twice in one case. Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. After a short while, the qemu process should gracefully end and release its sanlock lease as well as any other resource, which means you can reboot your hypervisor peacefully. Doron ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Yesterday evening I have found the service responsible for the reboot instead of the powerdown. If I do: service wdmd stop the server will reboot. It seems the watchdog is hung up and eventually this will lead to a crash and thus a reboot instead of the shutdown. Anyone knows how to debug this? Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On 05/26/2014 02:38 AM, Doron Fediuck wrote: Great to have your feedback guys! So just to clarify some of the issues you mentioned; Hosted engine wasn't designed for a 'single node' use case, as we do want it to be highly available. This is why it's being restarted elsewhere or even on the same server if no better alternative. I hope you will keep this configuration in your design space as you move forward. As you can see it is likely to be a popular one. At a previous employer, everybody in our development group had a single (powerful) machine running VMware vSphere, with a hosted vCenter, to use as a development VM testbed. It's likely to be popular for people with few resources who want to run in a fully supported configuration. HA is great, but not every environment has the resources/requirement to support it. oVirt has a lot to offer even a leaner environment and I hope those environments continue to get attention as the product matures. For example, I hope there is a reasonable story around upgrades to future releases for single-node hosted configurations. Having said that, it is possible to set global maintenance mode as a first step (in the UI: right click engine vm and choose ha-maintenance). Then you can ssh into the engine vm and init 0. What is the recipe for a clean startup after shutdown? Can we do 'hosted-engine --vm-start' while the system is in Global Maintenance mode? Thanks, Bob ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
- Original Message - From: Andrew Lau and...@andrewklau.com To: users users@ovirt.org Sent: Saturday, May 24, 2014 9:59:26 AM Subject: [ovirt-users] Can HA Agent control NFS Mount? Hi, I was just wondering, within the whole complexity of hosted-engine. Would it be possible for the hosted-engine ha-agent control the mount point? I'm basing this off a few people I've been talking to who have their NFS server running on the same host that the hosted-engine servers are running. Most normally also running that on top of gluster. The main motive for this, is currently if the nfs server is running on the localhost and the server goes for a clean shutdown it will hang because the nfs mount is hard mounted and as the nfs server has gone away, we're stuck at an infinite hold waiting for it to cleanly unmount (which it never will) If it's possible for instead one of the ha components to unmount this nfs mount when it shuts down, this could potentially prevent this. There are other alternatives and I know this is not the supported scenario, but just hoping to bounce a few ideas. Thanks, Andrew Hi Andrew, Indeed we're not looking into the Gluster flow now as it has some known issues. Additionally (just to make it clear) local nfs will not provide any tolerance if the hosting server dies. So we should be looking at a shared storage regardless of the hypervisors. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
Just for the record, what Andrew reports is not specific to GlusterFS. I have not yet found a way to shut down my single-node Hosted deployment cleanly without experiencing NFS hangs/timeouts on the way down. My NFS storage is local to my host. Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? -Bob On 05/25/2014 08:13 AM, Doron Fediuck wrote: - Original Message - From: Andrew Lau and...@andrewklau.com To: users users@ovirt.org Sent: Saturday, May 24, 2014 9:59:26 AM Subject: [ovirt-users] Can HA Agent control NFS Mount? Hi, I was just wondering, within the whole complexity of hosted-engine. Would it be possible for the hosted-engine ha-agent control the mount point? I'm basing this off a few people I've been talking to who have their NFS server running on the same host that the hosted-engine servers are running. Most normally also running that on top of gluster. The main motive for this, is currently if the nfs server is running on the localhost and the server goes for a clean shutdown it will hang because the nfs mount is hard mounted and as the nfs server has gone away, we're stuck at an infinite hold waiting for it to cleanly unmount (which it never will) If it's possible for instead one of the ha components to unmount this nfs mount when it shuts down, this could potentially prevent this. There are other alternatives and I know this is not the supported scenario, but just hoping to bounce a few ideas. Thanks, Andrew Hi Andrew, Indeed we're not looking into the Gluster flow now as it has some known issues. Additionally (just to make it clear) local nfs will not provide any tolerance if the hosting server dies. So we should be looking at a shared storage regardless of the hypervisors. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. I think wdmd or sanlock are causing the reboot instead of poweroff Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Can HA Agent control NFS Mount?
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote: On 05/25/2014 02:51 PM, Joop wrote: On 25-5-2014 19:38, Bob Doolittle wrote: Also curious is that when I say poweroff it actually reboots and comes up again. Could that be due to the timeouts on the way down? Ah, that's something my F19 host does too. Some more info: if engine hasn't been started on the host then I can shutdown it and it will poweroff. IF engine has been run on it then it will reboot. Its not vdsm (I think) because my shutdown sequence is (on my f19 host): service ovirt-agent-ha stop service ovirt-agent-broker stop service vdsmd stop ssh root@engine01 init 0 init 0 I don't use maintenance mode because when I poweron my host (= my desktop) I want engine to power on automatically which it does most of the time within 10 min. For comparison, I see this issue and I *do* use maintenance mode (because presumably that's the 'blessed' way to shut things down and I'm scared to mess this complex system up by straying off the beaten path ;). My process is: ssh root@engine init 0 (wait for vdsClient -s 0 list | grep Status: to show the vm as down) hosted-engine --set-maintenance --mode=global poweroff And then on startup: hosted-engine --set-maintenance --mode=none hosted-engine --vm-start There are two issues here. I am not sure if they are related or not. 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?) 2. The system reboot instead of poweroff (which messes up remote machine management) For 1. I was wondering if perhaps, we could have an option to specify the mount options. If I understand correctly, applying a soft mount instead of a hard mount would prevent this from happening. I'm however not sure of the implications this would have on the data integrity.. I would really like to see it happen in the ha-agent, as it's the one which connects/mounts the storage it should also unmount it on boot. However the stability on it, is flaky at best. I've noticed if `df` hangs because of another NFS mount having timed-out the agent will die. That's not a good sign.. this was what actually caused my hosted-engine to run twice in one case. Thanks, Bob I think wdmd or sanlock are causing the reboot instead of poweroff Joop ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users