Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-07-22 Thread Martin Sivak
Hi,

 One by directly using rem_lockspace (since it's the hosted-engine one)

The hosted engine lockspace protects the ID of the node. Global maintenance 
only disables the state machine, but it still reports data to the shared 
storage. The hosted engine lock can only be released when the agent is down. If 
the lock is still there even after you call service ovirt-ha-agent stop, then 
it is a bug somewhere.

 the other one by stopMonitoringDomain

Shouldn't this be done by VDSM or sanlock when the VM disappears? The lock has 
to stay acquired any time the VM is running (independently on the hosted engine 
services or vdsm) to protect the VM's data. We can't release a lock for running 
VM, because some other host might try to start it and corrupt data by doing it.

Martin

--
Martin Sivák
msi...@redhat.com
Red Hat Czech
RHEV-M SLA / Brno, CZ


- Original Message -
 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau
  and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
  
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
  
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s
  003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
  hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0
 
 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one) and
 the other
 one by stopMonitoringDomain.
 
 I quickly looked at the ovirt-hosted-engine* projects and I haven't found
 anything
 related to that.
 
 --
 Federico
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-07-19 Thread Andrew Lau
Hi,

Did anyone find much luck tracking this down? I rebooted one of our servers
and hit this issue again, conveniently, the dell remote access card has
borked as well.. so a 50 minute trip to the DC..


On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com wrote:

  Specifically, if do the following:

- Enter global maintenance (hosted-engine --set-maintenance-mode
--mode=global)
- init 0 the engine
- systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd


 and then run sanlock client status I see:

 # sanlock client status
 daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s 
 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s 
 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
 s 
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


 Waiting a few minutes does not change this state.

 The earlier data I shared which showed HostedEngine was with a different
 test scenario.

 -Bob


 On 06/18/2014 07:53 AM, Bob Doolittle wrote:

 I see I have a very unfortunate typo in my previous mail. As supported by
 the vm-status output I attached, I had set --mode=global (not none) in step
 1.

 I am not the only one experiencing this. I can reproduce it easily. It
 appears that shutting down vdsm causes the HA services to incorrectly think
 the system has come out of Global Maintenance and restart the engine.

 -Bob
 On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com
 wrote:

 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli 
 fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
 
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/
 xion2.smartcity.net
 \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one)
 and the other
 one by stopMonitoringDomain.

 I quickly looked at the ovirt-hosted-engine* projects and I haven't found
 anything
 related to that.

 --
 Federico



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-07-19 Thread Andrew Lau
btw, this happened on an aborted hosted-engine install. So, the ha-agents
hadn't even started up.. just the VM running.


On Sat, Jul 19, 2014 at 11:24 PM, Andrew Lau and...@andrewklau.com wrote:

 Hi,

 Did anyone find much luck tracking this down? I rebooted one of our
 servers and hit this issue again, conveniently, the dell remote access card
 has borked as well.. so a 50 minute trip to the DC..


 On Thu, Jun 19, 2014 at 10:10 AM, Bob Doolittle bobddr...@gmail.com
 wrote:

  Specifically, if do the following:

- Enter global maintenance (hosted-engine --set-maintenance-mode
--mode=global)
- init 0 the engine
- systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd


 and then run sanlock client status I see:

 # sanlock client status
 daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s 
 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s 
 18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
 s 
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


 Waiting a few minutes does not change this state.

 The earlier data I shared which showed HostedEngine was with a different
 test scenario.

 -Bob


 On 06/18/2014 07:53 AM, Bob Doolittle wrote:

 I see I have a very unfortunate typo in my previous mail. As supported by
 the vm-status output I attached, I had set --mode=global (not none) in step
 1.

 I am not the only one experiencing this. I can reproduce it easily. It
 appears that shutting down vdsm causes the HA services to incorrectly think
 the system has come out of Global Maintenance and restart the engine.

 -Bob
 On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com
 wrote:

 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli 
 fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
 
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/
 xion2.smartcity.net
 \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one)
 and the other
 one by stopMonitoringDomain.

 I quickly looked at the ovirt-hosted-engine* projects and I haven't
 found anything
 related to that.

 --
 Federico




___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-18 Thread Federico Simoncelli
- Original Message -
 From: Bob Doolittle b...@doolittle.us.com
 To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
 Cc: users users@ovirt.org, Federico Simoncelli fsimo...@redhat.com
 Sent: Saturday, June 14, 2014 1:29:54 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

 
 But there may be more going on. Even if I stop vdsmd, the HA services,
 and libvirtd, and sleep 60 seconds, I still see a lock held on the
 Engine VM storage:
 
 daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s 
 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s 
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

This output shows that the lockspaces are still acquired. When you put 
hosted-engine
in maintenance they must be released.
One by directly using rem_lockspace (since it's the hosted-engine one) and the 
other
one by stopMonitoringDomain.

I quickly looked at the ovirt-hosted-engine* projects and I haven't found 
anything
related to that.

-- 
Federico
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-18 Thread Bob Doolittle
I see I have a very unfortunate typo in my previous mail. As supported by
the vm-status output I attached, I had set --mode=global (not none) in step
1.

I am not the only one experiencing this. I can reproduce it easily. It
appears that shutting down vdsm causes the HA services to incorrectly think
the system has come out of Global Maintenance and restart the engine.

-Bob
On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com wrote:

 - Original Message -
  From: Bob Doolittle b...@doolittle.us.com
  To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
  Cc: users users@ovirt.org, Federico Simoncelli 
 fsimo...@redhat.com
  Sent: Saturday, June 14, 2014 1:29:54 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
  But there may be more going on. Even if I stop vdsmd, the HA services,
  and libvirtd, and sleep 60 seconds, I still see a lock held on the
  Engine VM storage:
 
  daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
  p -1 helper
  p -1 listener
  p -1 status
  s 003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/
 xion2.smartcity.net
 \:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
  s
 hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

 This output shows that the lockspaces are still acquired. When you put
 hosted-engine
 in maintenance they must be released.
 One by directly using rem_lockspace (since it's the hosted-engine one) and
 the other
 one by stopMonitoringDomain.

 I quickly looked at the ovirt-hosted-engine* projects and I haven't found
 anything
 related to that.

 --
 Federico

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-18 Thread Bob Doolittle

Specifically, if do the following:

 * Enter global maintenance (hosted-engine --set-maintenance-mode
   --mode=global)
 * init 0 the engine
 * systemctl stop ovirt-ha-agent ovirt-ha-broker libvirtd vdmsd


and then run sanlock client status I see:

# sanlock client status
daemon c715b5de-fd98-4146-a0b1-e9801179c768.xion2.smar
p -1 helper
p -1 listener
p -1 status
s 
003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
s 
18eeab54-e482-497f-b096-11f8a43f94f4:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/dom_md/ids:0
s 
hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


Waiting a few minutes does not change this state.

The earlier data I shared which showed HostedEngine was with a different 
test scenario.


-Bob

On 06/18/2014 07:53 AM, Bob Doolittle wrote:


I see I have a very unfortunate typo in my previous mail. As supported 
by the vm-status output I attached, I had set --mode=global (not none) 
in step 1.


I am not the only one experiencing this. I can reproduce it easily. It 
appears that shutting down vdsm causes the HA services to incorrectly 
think the system has come out of Global Maintenance and restart the 
engine.


-Bob

On Jun 18, 2014 5:06 AM, Federico Simoncelli fsimo...@redhat.com 
mailto:fsimo...@redhat.com wrote:


- Original Message -
 From: Bob Doolittle b...@doolittle.us.com
mailto:b...@doolittle.us.com
 To: Doron Fediuck dfedi...@redhat.com
mailto:dfedi...@redhat.com, Andrew Lau and...@andrewklau.com
mailto:and...@andrewklau.com
 Cc: users users@ovirt.org mailto:users@ovirt.org,
Federico Simoncelli fsimo...@redhat.com
mailto:fsimo...@redhat.com
 Sent: Saturday, June 14, 2014 1:29:54 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?


 But there may be more going on. Even if I stop vdsmd, the HA
services,
 and libvirtd, and sleep 60 seconds, I still see a lock held on the
 Engine VM storage:

 daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
 p -1 helper
 p -1 listener
 p -1 status
 s

003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net

http://xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
 s

hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0

This output shows that the lockspaces are still acquired. When you
put hosted-engine
in maintenance they must be released.
One by directly using rem_lockspace (since it's the hosted-engine
one) and the other
one by stopMonitoringDomain.

I quickly looked at the ovirt-hosted-engine* projects and I
haven't found anything
related to that.

--
Federico



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Sven Kieske
I suppose a hosted-engine solution without HA
would suffice the use case of just having one system to host and manage
vms, with the ability to extend this system to many more.

Am 03.06.2014 13:52, schrieb Itamar Heim:
 what would look different for hosted-engine on a single host? just not
 have the ha feature?

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Bob Doolittle
Would that help the issue being reported in this thread at all? This thread
was about issues with clean shutdown of a single node hosted environment,
which result in hangs/timeouts and the inability to issue poweroff without
it resulting in a reboot.

There have been no suggestions about how to resolve those issues, which
seem related to sanlock leases not being cleanly released.

-Bob
On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote:

 I suppose a hosted-engine solution without HA
 would suffice the use case of just having one system to host and manage
 vms, with the ability to extend this system to many more.

 Am 03.06.2014 13:52, schrieb Itamar Heim:
  what would look different for hosted-engine on a single host? just not
  have the ha feature?

 --
 Mit freundlichen Grüßen / Regards

 Sven Kieske

 Systemadministrator
 Mittwald CM Service GmbH  Co. KG
 Königsberger Straße 6
 32339 Espelkamp
 T: +49-5772-293-100
 F: +49-5772-293-333
 https://www.mittwald.de
 Geschäftsführer: Robert Meyer
 St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
 Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Doron Fediuck
Bob,
the way to handle it is to switch to global maintenance,
and then ssh into the VM and shut it down.

After rebooting you should switch maintenance mode to off.

- Original Message -
 From: Bob Doolittle b...@doolittle.us.com
 To: Sven Kieske s.kie...@mittwald.de
 Cc: users users@ovirt.org, Doron Fediuck dfedi...@redhat.com, Itamar 
 Heim ih...@redhat.com
 Sent: Friday, June 13, 2014 3:16:12 PM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 Would that help the issue being reported in this thread at all? This thread
 was about issues with clean shutdown of a single node hosted environment,
 which result in hangs/timeouts and the inability to issue poweroff without
 it resulting in a reboot.
 
 There have been no suggestions about how to resolve those issues, which
 seem related to sanlock leases not being cleanly released.
 
 -Bob
 On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote:
 
  I suppose a hosted-engine solution without HA
  would suffice the use case of just having one system to host and manage
  vms, with the ability to extend this system to many more.
 
  Am 03.06.2014 13:52, schrieb Itamar Heim:
   what would look different for hosted-engine on a single host? just not
   have the ha feature?
 
  --
  Mit freundlichen Grüßen / Regards
 
  Sven Kieske
 
  Systemadministrator
  Mittwald CM Service GmbH  Co. KG
  Königsberger Straße 6
  32339 Espelkamp
  T: +49-5772-293-100
  F: +49-5772-293-333
  https://www.mittwald.de
  Geschäftsführer: Robert Meyer
  St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
  Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Doron Fediuck


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Friday, June 6, 2014 6:14:18 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle b...@doolittle.us.com wrote:
  Thanks Andrew, I'll try this workaround tomorrow for sure. But reading
  though that bug report (closed not a bug) it states that the problem should
  only arise if something is not releasing a sanlock lease. So if we've
  entered Global Maintenance and shut down Engine, the question is what's
  holding the lease?
 
  How can that be debugged?
 
 For me it's wdmd and sanlock itself failing to shutdown properly. I
 also noticed even when in global maintenance and the engine VM powered
 off there is still a sanlock lease for the
 /rhev/mnt/hosted-engine/? lease file or something along those
 lines. So the global maintenance may not actually be releasing that
 lock.
 
 I'm not too familiar with sanlock etc. So it's like stabbing in the dark :(
 

Sounds like a bug since once the VM is off there should not
be a lease taken.

Please check if after a minute you still have a lease taken
according to: http://www.ovirt.org/SANLock#sanlock_timeouts

In this case try to stop vdsm and libvirt just so we'll know
who still keeps the lease.

 
  -Bob
 
  On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote:
 
  On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com
  wrote:
  
   On 05/25/2014 02:51 PM, Joop wrote:
  
   On 25-5-2014 19:38, Bob Doolittle wrote:
  
  
   Also curious is that when I say poweroff it actually reboots and
   comes
   up again. Could that be due to the timeouts on the way down?
  
   Ah, that's something my F19 host does too. Some more info: if engine
   hasn't been started on the host then I can shutdown it and it will
   poweroff.
   IF engine has been run on it then it will reboot.
   Its not vdsm (I think) because my shutdown sequence is (on my f19
   host):
service ovirt-agent-ha stop
service ovirt-agent-broker stop
service vdsmd stop
ssh root@engine01 init 0
   init 0
  
   I don't use maintenance mode because when I poweron my host (= my
   desktop)
   I want engine to power on automatically which it does most of the time
   within 10 min.
  
  
   For comparison, I see this issue and I *do* use maintenance mode
   (because
   presumably that's the 'blessed' way to shut things down and I'm scared
   to
   mess this complex system up by straying off the beaten path ;). My
   process
   is:
  
   ssh root@engine init 0
   (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
   hosted-engine --set-maintenance --mode=global
   poweroff
  
   And then on startup:
   hosted-engine --set-maintenance --mode=none
   hosted-engine --vm-start
  
   There are two issues here. I am not sure if they are related or not.
   1. The NFS timeout during shutdown (Joop do you see this also? Or just
   #2?)
   2. The system reboot instead of poweroff (which messes up remote machine
   management)
  
   Thanks,
Bob
  
  
   I think wdmd or sanlock are causing the reboot instead of poweroff
 
  While searching for my issue of wdmd/sanlock not shutting down, I
  found this which may interest you both:
  https://bugzilla.redhat.com/show_bug.cgi?id=888197
 
  Specifically:
  To shut down sanlock without causing a wdmd reboot, you can run the
  following command: sanlock client shutdown -f 1
 
  This will cause sanlock to kill any pid's that are holding leases,
  release those leases, and then exit.
  
 
  
   Joop
  
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Bob Doolittle

Doron,

This is my normal process but it does not resolve the issue.

A few of us who have experienced this have tried a number of things.

I see two hangs/wdmd timeouts during shutdown, so I think there are two 
remaining lease holders.


I find if I stop vdsmd, ovirt-ha-agent, and ovirt-ha-broker services I 
only experience the last hang (almost at the end, as it's shutting down 
filesystems).
Any hang results in a wdmd timeout and consequently a reboot instead of 
poweroff.


If I never bring engine up, things shut down cleanly.

I will try your suggestions in the other mail a bit later this morning.

-Bob

On 06/13/2014 08:28 AM, Doron Fediuck wrote:

Bob,
the way to handle it is to switch to global maintenance,
and then ssh into the VM and shut it down.

After rebooting you should switch maintenance mode to off.

- Original Message -

From: Bob Doolittle b...@doolittle.us.com
To: Sven Kieske s.kie...@mittwald.de
Cc: users users@ovirt.org, Doron Fediuck dfedi...@redhat.com, Itamar Heim 
ih...@redhat.com
Sent: Friday, June 13, 2014 3:16:12 PM
Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

Would that help the issue being reported in this thread at all? This thread
was about issues with clean shutdown of a single node hosted environment,
which result in hangs/timeouts and the inability to issue poweroff without
it resulting in a reboot.

There have been no suggestions about how to resolve those issues, which
seem related to sanlock leases not being cleanly released.

-Bob
On Jun 13, 2014 5:14 AM, Sven Kieske s.kie...@mittwald.de wrote:


I suppose a hosted-engine solution without HA
would suffice the use case of just having one system to host and manage
vms, with the ability to extend this system to many more.

Am 03.06.2014 13:52, schrieb Itamar Heim:

what would look different for hosted-engine on a single host? just not
have the ha feature?

--
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users



___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-13 Thread Bob Doolittle
It turns out I was wrong before. I don't have to start up Engine to get 
into this situation.


I did the following:

 * Turn on Global Maintenance
 * Engine init 0
 * Reboot node
 * Wait a few minutes
 * poweroff


I'll get the timeouts and hangs during shutdown again, and a reset 
instead of poweroff.


It's possible that somehow the system is coming out of Global 
Maintenance mode during shutdown, and the Engine VM is starting up and 
causing this issue.


I did the following.
1. hosted-engine --set-maintenance --mode=none
You can see the attached output from 'hosted-engine --vm-status' 
(hosted-engine.out) at this point, indicating that the system is in 
Global Maintenance


2. Waited 60 seconds, and checked sanlock
You can see the attached output of 'sanlock client status' 
(sanlock-status.out) at this point, showing the Engine VM locks being held


3. I stopped the vdsmd service (note that the first time I tried I got 
Job for vdsmd.service cancelled, and re-issued the stop.
You can see the attached output of 'sanlock client status', and the 
following commands (output)


What's interesting and I didn't notice right away, is that after I 
stopped vdsmd the sanlock status started changing as if the locks were 
being manipulated.
After I stopped vdsmd, the HA services, and libvirtd, and waited 60 
seconds, I noticed the locks seemed to be changing state and that 
HostedEngine was listed. At that point I got suspicious and started 
vdsmd again so that I could recheck Global Maintenance mode, and I found 
that the system was no longer *in* maintenance, and that the Engine VM 
was running.


So I think this partly explains the situation. Somehow the act of 
stopping vdsmd is making the system look like it is *out* of Global 
Maintenance mode, and the Engine VM starts up while the system is 
shutting down. This creates new sanlock leases on the Engine VM storage, 
which prevents the system from shutting down cleanly. Oddly after a 
reboot Global Maintenance is preserved.


But there may be more going on. Even if I stop vdsmd, the HA services, 
and libvirtd, and sleep 60 seconds, I still see a lock held on the 
Engine VM storage:


daemon 6f3af037-d05e-4ad8-a53c-61627e0c2464.xion2.smar
p -1 helper
p -1 listener
p -1 status
s 
003510e8-966a-47e6-a5eb-3b5c8a6070a9:1:/rhev/data-center/mnt/xion2.smartcity.net\:_export_VM__NewDataDomain/003510e8-966a-47e6-a5eb-3b5c8a6070a9/dom_md/ids:0
s 
hosted-engine:1:/rhev/data-center/mnt/xion2\:_export_vm_he1/18eeab54-e482-497f-b096-11f8a43f94f4/ha_agent/hosted-engine.lockspace:0


It stays in this state however and HostedEngine doesn't grab a lock again.
In any case no matter what I do, it's impossible to shut the system down 
cleanly.


-Bob

On 06/13/2014 08:33 AM, Doron Fediuck wrote:

- Original Message -

From: Andrew Lauand...@andrewklau.com
To: Bob Doolittleb...@doolittle.us.com
Cc: usersusers@ovirt.org
Sent: Friday, June 6, 2014 6:14:18 AM
Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittleb...@doolittle.us.com  wrote:

Thanks Andrew, I'll try this workaround tomorrow for sure. But reading
though that bug report (closed not a bug) it states that the problem should
only arise if something is not releasing a sanlock lease. So if we've
entered Global Maintenance and shut down Engine, the question is what's
holding the lease?

How can that be debugged?

For me it's wdmd and sanlock itself failing to shutdown properly. I
also noticed even when in global maintenance and the engine VM powered
off there is still a sanlock lease for the
/rhev/mnt/hosted-engine/? lease file or something along those
lines. So the global maintenance may not actually be releasing that
lock.

I'm not too familiar with sanlock etc. So it's like stabbing in the dark :(


Sounds like a bug since once the VM is off there should not
be a lease taken.

Please check if after a minute you still have a lease taken
according to:http://www.ovirt.org/SANLock#sanlock_timeouts

In this case try to stop vdsm and libvirt just so we'll know
who still keeps the lease.


-Bob

On Jun 5, 2014 10:56 PM, Andrew Lauand...@andrewklau.com  wrote:

On Mon, May 26, 2014 at 5:10 AM, Bob Doolittleb...@doolittle.us.com
wrote:

On 05/25/2014 02:51 PM, Joop wrote:

On 25-5-2014 19:38, Bob Doolittle wrote:

Also curious is that when I say poweroff it actually reboots and
comes
up again. Could that be due to the timeouts on the way down?


Ah, that's something my F19 host does too. Some more info: if engine
hasn't been started on the host then I can shutdown it and it will
poweroff.
IF engine has been run on it then it will reboot.
Its not vdsm (I think) because my shutdown sequence is (on my f19
host):
  service ovirt-agent-ha stop
  service ovirt-agent-broker stop
  service vdsmd stop
  ssh root@engine01 init 0
init 0

I don't use maintenance mode because when I poweron my host (= my
desktop)
I want engine to power on automatically which it does most

Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
Hi Doron,

On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote:


 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Monday, May 26, 2014 7:30:41 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 

 For 1. I was wondering if perhaps, we could have an option to specify
 the mount options. If I understand correctly, applying a soft mount
 instead of a hard mount would prevent this from happening. I'm however
 not sure of the implications this would have on the data integrity..

 I would really like to see it happen in the ha-agent, as it's the one
 which connects/mounts the storage it should also unmount it on boot.
 However the stability on it, is flaky at best. I've noticed if `df`
 hangs because of another NFS mount having timed-out the agent will
 die. That's not a good sign.. this was what actually caused my
 hosted-engine to run twice in one case.

  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff
 
  Joop
 

 Great to have your feedback guys!

 So just to clarify some of the issues you mentioned;

 Hosted engine wasn't designed for a 'single node' use case, as we do
 want it to be highly available. This is why it's being restarted
 elsewhere or even on the same server if no better alternative.

 Having said that, it is possible to set global maintenance mode
 as a first step (in the UI: right click engine vm and choose ha-maintenance).
 Then you can ssh into the engine vm and init 0.

 After a short while, the qemu process should gracefully end and release
 its sanlock lease as well as any other resource, which means you can
 reboot your hypervisor peacefully.

Sadly no, I've only been able to reboot my hypervisors if one of the
two conditions are met:

- Lazy unmount of /rhev/mnt/hosted-engine etc.
- killall -9 sanlock wdmd

I notice sanlock and wdmd are not able to be stopped with service wdmd
stop; service sanlock stop
These seem to fail during the shutdown/reboot process which prevents
the unmount and the graceful reboot.

Are there any logs I can look into on how to debug those failed shutdowns?


 Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:

 On 05/25/2014 02:51 PM, Joop wrote:

 On 25-5-2014 19:38, Bob Doolittle wrote:


 Also curious is that when I say poweroff it actually reboots and comes
 up again. Could that be due to the timeouts on the way down?

 Ah, that's something my F19 host does too. Some more info: if engine
 hasn't been started on the host then I can shutdown it and it will poweroff.
 IF engine has been run on it then it will reboot.
 Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
  service ovirt-agent-ha stop
  service ovirt-agent-broker stop
  service vdsmd stop
  ssh root@engine01 init 0
 init 0

 I don't use maintenance mode because when I poweron my host (= my desktop)
 I want engine to power on automatically which it does most of the time
 within 10 min.


 For comparison, I see this issue and I *do* use maintenance mode (because
 presumably that's the 'blessed' way to shut things down and I'm scared to
 mess this complex system up by straying off the beaten path ;). My process
 is:

 ssh root@engine init 0
 (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
 hosted-engine --set-maintenance --mode=global
 poweroff

 And then on startup:
 hosted-engine --set-maintenance --mode=none
 hosted-engine --vm-start

 There are two issues here. I am not sure if they are related or not.
 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
 2. The system reboot instead of poweroff (which messes up remote machine
 management)

 Thanks,
  Bob


 I think wdmd or sanlock are causing the reboot instead of poweroff

While searching for my issue of wdmd/sanlock not shutting down, I
found this which may interest you both:
https://bugzilla.redhat.com/show_bug.cgi?id=888197

Specifically:
To shut down sanlock without causing a wdmd reboot, you can run the
following command: sanlock client shutdown -f 1

This will cause sanlock to kill any pid's that are holding leases,
release those leases, and then exit.



 Joop

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Bob Doolittle
Thanks Andrew, I'll try this workaround tomorrow for sure. But reading
though that bug report (closed not a bug) it states that the problem should
only arise if something is not releasing a sanlock lease. So if we've
entered Global Maintenance and shut down Engine, the question is what's
holding the lease?

How can that be debugged?

-Bob
On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote:

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com
 wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and
 comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
 poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my
 desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My
 process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just
 #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 
  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff

 While searching for my issue of wdmd/sanlock not shutting down, I
 found this which may interest you both:
 https://bugzilla.redhat.com/show_bug.cgi?id=888197

 Specifically:
 To shut down sanlock without causing a wdmd reboot, you can run the
 following command: sanlock client shutdown -f 1

 This will cause sanlock to kill any pid's that are holding leases,
 release those leases, and then exit.
 

 
  Joop
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-05 Thread Andrew Lau
On Fri, Jun 6, 2014 at 1:09 PM, Bob Doolittle b...@doolittle.us.com wrote:
 Thanks Andrew, I'll try this workaround tomorrow for sure. But reading
 though that bug report (closed not a bug) it states that the problem should
 only arise if something is not releasing a sanlock lease. So if we've
 entered Global Maintenance and shut down Engine, the question is what's
 holding the lease?

 How can that be debugged?

For me it's wdmd and sanlock itself failing to shutdown properly. I
also noticed even when in global maintenance and the engine VM powered
off there is still a sanlock lease for the
/rhev/mnt/hosted-engine/? lease file or something along those
lines. So the global maintenance may not actually be releasing that
lock.

I'm not too familiar with sanlock etc. So it's like stabbing in the dark :(


 -Bob

 On Jun 5, 2014 10:56 PM, Andrew Lau and...@andrewklau.com wrote:

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com
 wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and
  comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19
  host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my
  desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode
  (because
  presumably that's the 'blessed' way to shut things down and I'm scared
  to
  mess this complex system up by straying off the beaten path ;). My
  process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just
  #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 
  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff

 While searching for my issue of wdmd/sanlock not shutting down, I
 found this which may interest you both:
 https://bugzilla.redhat.com/show_bug.cgi?id=888197

 Specifically:
 To shut down sanlock without causing a wdmd reboot, you can run the
 following command: sanlock client shutdown -f 1

 This will cause sanlock to kill any pid's that are holding leases,
 release those leases, and then exit.
 

 
  Joop
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
 
 
  ___
  Users mailing list
  Users@ovirt.org
  http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-06-03 Thread Itamar Heim

On 05/27/2014 03:47 PM, Doron Fediuck wrote:



- Original Message -

From: Sven Kieske s.kie...@mittwald.de
To: Doron Fediuck dfedi...@redhat.com
Cc: users@ovirt.org
Sent: Tuesday, May 27, 2014 12:44:23 PM
Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

So now I have to ask, what is the purpose of ovirt-all-in-one?

Just for tech demos at events?

I can live with any solution, but I feel it will limit ovirts adoption
and the use-cases should get better documented.

There are still many beginners asking very basic questions on this ML
and I think this shows the lack of a better documentation for the first
steps to take for a working ovirt environment.

Or what do you think?

Am 27.05.2014 11:02, schrieb Doron Fediuck:

I do not think a single-host use case is technically impossible, but we
need
to understand a VM may die, and in a single host setup this may have
additional
implications. So why get into troubles instead of planning your way or
using
the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a
picture
frame on the wall.


--
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen


Hi Sven,
All-in-one is classic-style installation (ie- not running in a vm), and indeed
was created for demo's. However it indeed may grow to support additional hosts
in a different DC, as the one it starts with is for local storage. This is why
I was saying that the issue is not technical but requires some understanding
of the admin to where he wants to go.

As for documentation, that's always an issue. Keeping docs updated is an on 
going
task and we'd appreciate any assistance we get for it.


Sven - so assuming we want to reduce the number of configurations we 
want to support, instead of all-in-one which was our previous solutions 
for POCs/Demos, what would look different for hosted-engine on a single 
host? just not have the ha feature?


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-31 Thread Andrew Lau
On Sat, May 31, 2014 at 6:06 AM, Joop jvdw...@xs4all.nl wrote:
 Bob Doolittle wrote:

 Joop,

 On 05/26/2014 02:43 AM, Joop wrote:

 Yesterday evening I have found the service responsible for the reboot
 instead of the powerdown. If I do: service wdmd stop the server will reboot.
 It seems the watchdog is hung up and eventually this will lead to a crash
 and thus a reboot instead of the shutdown.

 Anyone knows how to debug this?


 Did you get anywhere with this?
 Pretty nasty. Is there a bug open?

 We're getting a timeout on an NFS mount during the powerdown (single-node
 hosted, after global maintenance enabled and engine powered off), and that
 makes the machine reboot and try to come back up again instead of powering
 off.

 So two issues:
 - What is the mount that is hanging (probably an oVirt issue)?

 Don´t know what that problem is. I have a local nfs mount but don´t
 experience that problem

I got to the console when mine went for a reboot. I see sanlock and
wdmd failing to shutdown properly which would explain why it doesn't
unmount properly.



 - Why does the system reboot instead of powering down as instructed (?)?


 the reboot is caused by wdmd. Docs says that if the watchdogs aren´t
 responding that a reset will follow. So our init 0 is overruled because of
 the hanging watchdog. Why it is hanging I don´t know. Could be my chipset,
 could be the version of wdmd-kernel. Only thing is I know it didn´t happen
 always in the past but that is as much as I remember, sorry.


 Joop


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-30 Thread Bob Doolittle

Joop,

On 05/26/2014 02:43 AM, Joop wrote:
Yesterday evening I have found the service responsible for the reboot 
instead of the powerdown. If I do: service wdmd stop the server will 
reboot. It seems the watchdog is hung up and eventually this will lead 
to a crash and thus a reboot instead of the shutdown.


Anyone knows how to debug this?


Did you get anywhere with this?
Pretty nasty. Is there a bug open?

We're getting a timeout on an NFS mount during the powerdown 
(single-node hosted, after global maintenance enabled and engine powered 
off), and that makes the machine reboot and try to come back up again 
instead of powering off.


So two issues:
- What is the mount that is hanging (probably an oVirt issue)?
- Why does the system reboot instead of powering down as instructed (?)?

-Bob



Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-30 Thread Joop

Bob Doolittle wrote:

Joop,

On 05/26/2014 02:43 AM, Joop wrote:
Yesterday evening I have found the service responsible for the reboot 
instead of the powerdown. If I do: service wdmd stop the server will 
reboot. It seems the watchdog is hung up and eventually this will 
lead to a crash and thus a reboot instead of the shutdown.


Anyone knows how to debug this?


Did you get anywhere with this?
Pretty nasty. Is there a bug open?

We're getting a timeout on an NFS mount during the powerdown 
(single-node hosted, after global maintenance enabled and engine 
powered off), and that makes the machine reboot and try to come back 
up again instead of powering off.


So two issues:
- What is the mount that is hanging (probably an oVirt issue)?
Don´t know what that problem is. I have a local nfs mount but don´t 
experience that problem



- Why does the system reboot instead of powering down as instructed (?)?



the reboot is caused by wdmd. Docs says that if the watchdogs aren´t 
responding that a reset will follow. So our init 0 is overruled because 
of the hanging watchdog. Why it is hanging I don´t know. Could be my 
chipset, could be the version of wdmd-kernel. Only thing is I know it 
didn´t happen always in the past but that is as much as I remember, sorry.


Joop


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-28 Thread Andrew Lau
Hi Doron,

Before the initial thread sways a little more..

On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote:


 - Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Monday, May 26, 2014 7:30:41 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?

 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 

 For 1. I was wondering if perhaps, we could have an option to specify
 the mount options. If I understand correctly, applying a soft mount
 instead of a hard mount would prevent this from happening. I'm however
 not sure of the implications this would have on the data integrity..

 I would really like to see it happen in the ha-agent, as it's the one
 which connects/mounts the storage it should also unmount it on boot.
 However the stability on it, is flaky at best. I've noticed if `df`
 hangs because of another NFS mount having timed-out the agent will
 die. That's not a good sign.. this was what actually caused my
 hosted-engine to run twice in one case.

  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff
 
  Joop
 

 Great to have your feedback guys!

 So just to clarify some of the issues you mentioned;

 Hosted engine wasn't designed for a 'single node' use case, as we do
 want it to be highly available. This is why it's being restarted
 elsewhere or even on the same server if no better alternative.

 Having said that, it is possible to set global maintenance mode
 as a first step (in the UI: right click engine vm and choose ha-maintenance).
 Then you can ssh into the engine vm and init 0.

 After a short while, the qemu process should gracefully end and release
 its sanlock lease as well as any other resource, which means you can
 reboot your hypervisor peacefully.


What about in a 2 host cluster. Lets say we want to take down 1 host
for maintenance, so 50% chance it could be running the engine. Would
setting  maintenance-mode local do the same thing and allow a clean
shutdown/reboot?

 Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-28 Thread Doron Fediuck


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Doron Fediuck dfedi...@redhat.com
 Cc: Bob Doolittle b...@doolittle.us.com, users users@ovirt.org, Jiri 
 Moskovcak jmosk...@redhat.com,
 Sandro Bonazzola sbona...@redhat.com
 Sent: Wednesday, May 28, 2014 11:03:38 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 Hi Doron,
 
 Before the initial thread sways a little more..
 
 On Mon, May 26, 2014 at 4:38 PM, Doron Fediuck dfedi...@redhat.com wrote:
 
 
  - Original Message -
  From: Andrew Lau and...@andrewklau.com
  To: Bob Doolittle b...@doolittle.us.com
  Cc: users users@ovirt.org
  Sent: Monday, May 26, 2014 7:30:41 AM
  Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
  On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com
  wrote:
  
   On 05/25/2014 02:51 PM, Joop wrote:
  
   On 25-5-2014 19:38, Bob Doolittle wrote:
  
  
   Also curious is that when I say poweroff it actually reboots and
   comes
   up again. Could that be due to the timeouts on the way down?
  
   Ah, that's something my F19 host does too. Some more info: if engine
   hasn't been started on the host then I can shutdown it and it will
   poweroff.
   IF engine has been run on it then it will reboot.
   Its not vdsm (I think) because my shutdown sequence is (on my f19
   host):
service ovirt-agent-ha stop
service ovirt-agent-broker stop
service vdsmd stop
ssh root@engine01 init 0
   init 0
  
   I don't use maintenance mode because when I poweron my host (= my
   desktop)
   I want engine to power on automatically which it does most of the time
   within 10 min.
  
  
   For comparison, I see this issue and I *do* use maintenance mode
   (because
   presumably that's the 'blessed' way to shut things down and I'm scared
   to
   mess this complex system up by straying off the beaten path ;). My
   process
   is:
  
   ssh root@engine init 0
   (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
   hosted-engine --set-maintenance --mode=global
   poweroff
  
   And then on startup:
   hosted-engine --set-maintenance --mode=none
   hosted-engine --vm-start
  
   There are two issues here. I am not sure if they are related or not.
   1. The NFS timeout during shutdown (Joop do you see this also? Or just
   #2?)
   2. The system reboot instead of poweroff (which messes up remote machine
   management)
  
 
  For 1. I was wondering if perhaps, we could have an option to specify
  the mount options. If I understand correctly, applying a soft mount
  instead of a hard mount would prevent this from happening. I'm however
  not sure of the implications this would have on the data integrity..
 
  I would really like to see it happen in the ha-agent, as it's the one
  which connects/mounts the storage it should also unmount it on boot.
  However the stability on it, is flaky at best. I've noticed if `df`
  hangs because of another NFS mount having timed-out the agent will
  die. That's not a good sign.. this was what actually caused my
  hosted-engine to run twice in one case.
 
   Thanks,
Bob
  
  
   I think wdmd or sanlock are causing the reboot instead of poweroff
  
   Joop
  
 
  Great to have your feedback guys!
 
  So just to clarify some of the issues you mentioned;
 
  Hosted engine wasn't designed for a 'single node' use case, as we do
  want it to be highly available. This is why it's being restarted
  elsewhere or even on the same server if no better alternative.
 
  Having said that, it is possible to set global maintenance mode
  as a first step (in the UI: right click engine vm and choose
  ha-maintenance).
  Then you can ssh into the engine vm and init 0.
 
  After a short while, the qemu process should gracefully end and release
  its sanlock lease as well as any other resource, which means you can
  reboot your hypervisor peacefully.
 
 
 What about in a 2 host cluster. Lets say we want to take down 1 host
 for maintenance, so 50% chance it could be running the engine. Would
 setting  maintenance-mode local do the same thing and allow a clean
 shutdown/reboot?
 

Yes. That's the idea behind of local (aka host) maintenance[1].
Starting 3.4 all you need to do is move the host to maintenance in the UI
and this will also set the local maintenance mode for this host. So
you should be able to do everything with it, and use 'activate' in the
UI to get it into production.

[1] http://www.ovirt.org/Features/Self_Hosted_Engine#Maintenance_Flows
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-27 Thread Doron Fediuck


- Original Message -
 From: Bob Doolittle b...@doolittle.us.com
 To: Doron Fediuck dfedi...@redhat.com, Andrew Lau 
 and...@andrewklau.com
 Cc: users users@ovirt.org, Jiri Moskovcak jmosk...@redhat.com, 
 Sandro Bonazzola sbona...@redhat.com
 Sent: Monday, May 26, 2014 6:12:04 PM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 
 On 05/26/2014 02:38 AM, Doron Fediuck wrote:
  Great to have your feedback guys!
 
  So just to clarify some of the issues you mentioned;
 
  Hosted engine wasn't designed for a 'single node' use case, as we do
  want it to be highly available. This is why it's being restarted
  elsewhere or even on the same server if no better alternative.
 
 I hope you will keep this configuration in your design space as you move
 forward.
 As you can see it is likely to be a popular one. At a previous employer,
 everybody in our development group had a single (powerful) machine
 running VMware vSphere, with a hosted vCenter, to use as a development
 VM testbed.
 
 It's likely to be popular for people with few resources who want to run
 in a fully supported configuration.
 
 HA is great, but not every environment has the resources/requirement to
 support it. oVirt has a lot to offer even a leaner environment and I
 hope those environments continue to get attention as the product
 matures. For example, I hope there is a reasonable story around upgrades
 to future releases for single-node hosted configurations.
 
  Having said that, it is possible to set global maintenance mode
  as a first step (in the UI: right click engine vm and choose
  ha-maintenance).
  Then you can ssh into the engine vm and init 0.
 
 What is the recipe for a clean startup after shutdown?
 Can we do 'hosted-engine --vm-start' while the system is in Global
 Maintenance mode?
 
 Thanks,
  Bob
 
 

Hi Bob,
If all you need is a way to run VMs, then Kimchi[1] from the oVirt
eco-system can be a better solution, as it's designed for a single-node
use case. If you wish to run ovirt in a vm you can run a VM with Kimchi
and install the engine inside. Also, coming on 3.5 is the ovirt appliance[2]
which will simplify it. 

If I'm missing something please share your thoughts.

As for clean start after maintenance, I'd expect hosted-engine 
--set-maintenance=none
will allow the agent to start the VM automatically. If this is not the
case please let us know.

Thanks,
Doron

[1] https://github.com/kimchi-project/kimchi/wiki
[2] http://www.ovirt.org/Feature/oVirtAppliance
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-27 Thread Sven Kieske
Well I don't think Kimchi is a solution
to the problem Bob mentioned.

What you want (and what you get at Vmware btw)
is _one_ solution to start on a single node (laptop or server, whatever)
and which is able to scale to manage 100 Hosts or a complete data
center.

I doubt kimchi can do this.
oVirt could be tweaked to do this.

Other thoughts?

Am 27.05.2014 09:37, schrieb Doron Fediuck:
 If I'm missing something please share your thoughts.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-27 Thread Doron Fediuck


- Original Message -
 From: Sven Kieske s.kie...@mittwald.de
 To: users@ovirt.org
 Sent: Tuesday, May 27, 2014 10:44:39 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 Well I don't think Kimchi is a solution
 to the problem Bob mentioned.
 
 What you want (and what you get at Vmware btw)
 is _one_ solution to start on a single node (laptop or server, whatever)
 and which is able to scale to manage 100 Hosts or a complete data
 center.
 
 I doubt kimchi can do this.
 oVirt could be tweaked to do this.
 
 Other thoughts?
 
 Am 27.05.2014 09:37, schrieb Doron Fediuck:
  If I'm missing something please share your thoughts.
 
 --
 Mit freundlichen Grüßen / Regards
 
 Sven Kieske
 
 Systemadministrator
 Mittwald CM Service GmbH  Co. KG
 Königsberger Straße 6
 32339 Espelkamp
 T: +49-5772-293-100
 F: +49-5772-293-333
 https://www.mittwald.de
 Geschäftsführer: Robert Meyer
 St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
 Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Hi Sven,
Referring to Bob's mail, he mentioned people with little resources, and also
a development use case- running VMware vSphere, with a hosted vCenter, to use
as a development VM testbed.
These are valid cases for Kimchi, where all you want to do is simply run a VM.

If you want to be able scale up to 100 hosts or more, you can definitely start
with 2 hosts, which is the hosted engine use case. The reason why I mention it,
is that additional hosts will require proper network and storage configuration,
which basically mean you want to plan ahead even when you begin with a modest
deployment of 2 hosts.

I do not think a single-host use case is technically impossible, but we need
to understand a VM may die, and in a single host setup this may have additional
implications. So why get into troubles instead of planning your way or using
the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a 
picture
frame on the wall.

Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-27 Thread Sven Kieske
So now I have to ask, what is the purpose of ovirt-all-in-one?

Just for tech demos at events?

I can live with any solution, but I feel it will limit ovirts adoption
and the use-cases should get better documented.

There are still many beginners asking very basic questions on this ML
and I think this shows the lack of a better documentation for the first
steps to take for a working ovirt environment.

Or what do you think?

Am 27.05.2014 11:02, schrieb Doron Fediuck:
 I do not think a single-host use case is technically impossible, but we need
 to understand a VM may die, and in a single host setup this may have 
 additional
 implications. So why get into troubles instead of planning your way or using
 the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a 
 picture
 frame on the wall.

-- 
Mit freundlichen Grüßen / Regards

Sven Kieske

Systemadministrator
Mittwald CM Service GmbH  Co. KG
Königsberger Straße 6
32339 Espelkamp
T: +49-5772-293-100
F: +49-5772-293-333
https://www.mittwald.de
Geschäftsführer: Robert Meyer
St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-27 Thread Doron Fediuck


- Original Message -
 From: Sven Kieske s.kie...@mittwald.de
 To: Doron Fediuck dfedi...@redhat.com
 Cc: users@ovirt.org
 Sent: Tuesday, May 27, 2014 12:44:23 PM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 So now I have to ask, what is the purpose of ovirt-all-in-one?
 
 Just for tech demos at events?
 
 I can live with any solution, but I feel it will limit ovirts adoption
 and the use-cases should get better documented.
 
 There are still many beginners asking very basic questions on this ML
 and I think this shows the lack of a better documentation for the first
 steps to take for a working ovirt environment.
 
 Or what do you think?
 
 Am 27.05.2014 11:02, schrieb Doron Fediuck:
  I do not think a single-host use case is technically impossible, but we
  need
  to understand a VM may die, and in a single host setup this may have
  additional
  implications. So why get into troubles instead of planning your way or
  using
  the right tool for the task. Just imagine using a 5.4 Kg hammer to hang a
  picture
  frame on the wall.
 
 --
 Mit freundlichen Grüßen / Regards
 
 Sven Kieske
 
 Systemadministrator
 Mittwald CM Service GmbH  Co. KG
 Königsberger Straße 6
 32339 Espelkamp
 T: +49-5772-293-100
 F: +49-5772-293-333
 https://www.mittwald.de
 Geschäftsführer: Robert Meyer
 St.Nr.: 331/5721/1033, USt-IdNr.: DE814773217, HRA 6640, AG Bad Oeynhausen
 Komplementärin: Robert Meyer Verwaltungs GmbH, HRB 13260, AG Bad Oeynhausen

Hi Sven,
All-in-one is classic-style installation (ie- not running in a vm), and indeed
was created for demo's. However it indeed may grow to support additional hosts
in a different DC, as the one it starts with is for local storage. This is why
I was saying that the issue is not technical but requires some understanding
of the admin to where he wants to go.

As for documentation, that's always an issue. Keeping docs updated is an on 
going
task and we'd appreciate any assistance we get for it.

Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-26 Thread Doron Fediuck


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: Bob Doolittle b...@doolittle.us.com
 Cc: users users@ovirt.org
 Sent: Monday, May 26, 2014 7:30:41 AM
 Subject: Re: [ovirt-users] Can HA Agent control NFS Mount?
 
 On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:
 
  On 05/25/2014 02:51 PM, Joop wrote:
 
  On 25-5-2014 19:38, Bob Doolittle wrote:
 
 
  Also curious is that when I say poweroff it actually reboots and comes
  up again. Could that be due to the timeouts on the way down?
 
  Ah, that's something my F19 host does too. Some more info: if engine
  hasn't been started on the host then I can shutdown it and it will
  poweroff.
  IF engine has been run on it then it will reboot.
  Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
   service ovirt-agent-ha stop
   service ovirt-agent-broker stop
   service vdsmd stop
   ssh root@engine01 init 0
  init 0
 
  I don't use maintenance mode because when I poweron my host (= my desktop)
  I want engine to power on automatically which it does most of the time
  within 10 min.
 
 
  For comparison, I see this issue and I *do* use maintenance mode (because
  presumably that's the 'blessed' way to shut things down and I'm scared to
  mess this complex system up by straying off the beaten path ;). My process
  is:
 
  ssh root@engine init 0
  (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
  hosted-engine --set-maintenance --mode=global
  poweroff
 
  And then on startup:
  hosted-engine --set-maintenance --mode=none
  hosted-engine --vm-start
 
  There are two issues here. I am not sure if they are related or not.
  1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
  2. The system reboot instead of poweroff (which messes up remote machine
  management)
 
 
 For 1. I was wondering if perhaps, we could have an option to specify
 the mount options. If I understand correctly, applying a soft mount
 instead of a hard mount would prevent this from happening. I'm however
 not sure of the implications this would have on the data integrity..
 
 I would really like to see it happen in the ha-agent, as it's the one
 which connects/mounts the storage it should also unmount it on boot.
 However the stability on it, is flaky at best. I've noticed if `df`
 hangs because of another NFS mount having timed-out the agent will
 die. That's not a good sign.. this was what actually caused my
 hosted-engine to run twice in one case.
 
  Thanks,
   Bob
 
 
  I think wdmd or sanlock are causing the reboot instead of poweroff
 
  Joop
 

Great to have your feedback guys!

So just to clarify some of the issues you mentioned;

Hosted engine wasn't designed for a 'single node' use case, as we do
want it to be highly available. This is why it's being restarted
elsewhere or even on the same server if no better alternative.

Having said that, it is possible to set global maintenance mode
as a first step (in the UI: right click engine vm and choose ha-maintenance).
Then you can ssh into the engine vm and init 0.

After a short while, the qemu process should gracefully end and release
its sanlock lease as well as any other resource, which means you can
reboot your hypervisor peacefully.

Doron
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-26 Thread Joop
Yesterday evening I have found the service responsible for the reboot 
instead of the powerdown. If I do: service wdmd stop the server will 
reboot. It seems the watchdog is hung up and eventually this will lead 
to a crash and thus a reboot instead of the shutdown.


Anyone knows how to debug this?

Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-26 Thread Bob Doolittle


On 05/26/2014 02:38 AM, Doron Fediuck wrote:

Great to have your feedback guys!

So just to clarify some of the issues you mentioned;

Hosted engine wasn't designed for a 'single node' use case, as we do
want it to be highly available. This is why it's being restarted
elsewhere or even on the same server if no better alternative.


I hope you will keep this configuration in your design space as you move 
forward.
As you can see it is likely to be a popular one. At a previous employer, 
everybody in our development group had a single (powerful) machine 
running VMware vSphere, with a hosted vCenter, to use as a development 
VM testbed.


It's likely to be popular for people with few resources who want to run 
in a fully supported configuration.


HA is great, but not every environment has the resources/requirement to 
support it. oVirt has a lot to offer even a leaner environment and I 
hope those environments continue to get attention as the product 
matures. For example, I hope there is a reasonable story around upgrades 
to future releases for single-node hosted configurations.



Having said that, it is possible to set global maintenance mode
as a first step (in the UI: right click engine vm and choose ha-maintenance).
Then you can ssh into the engine vm and init 0.


What is the recipe for a clean startup after shutdown?
Can we do 'hosted-engine --vm-start' while the system is in Global 
Maintenance mode?


Thanks,
Bob

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Doron Fediuck


- Original Message -
 From: Andrew Lau and...@andrewklau.com
 To: users users@ovirt.org
 Sent: Saturday, May 24, 2014 9:59:26 AM
 Subject: [ovirt-users] Can HA Agent control NFS Mount?
 
 Hi,
 
 I was just wondering, within the whole complexity of hosted-engine.
 Would it be possible for the hosted-engine ha-agent control the mount
 point?
 
 I'm basing this off a few people I've been talking to who have their
 NFS server running on the same host that the hosted-engine servers are
 running. Most normally also running that on top of gluster.
 
 The main motive for this, is currently if the nfs server is running on
 the localhost and the server goes for a clean shutdown it will hang
 because the nfs mount is hard mounted and as the nfs server has gone
 away, we're stuck at an infinite hold waiting for it to cleanly
 unmount (which it never will)
 
 If it's possible for instead one of the ha components to unmount this
 nfs mount when it shuts down, this could potentially prevent this.
 There are other alternatives and I know this is not the supported
 scenario, but just hoping to bounce a few ideas.
 
 Thanks,
 Andrew

Hi Andrew,
Indeed we're not looking into the Gluster flow now as it has some
known issues. Additionally (just to make it clear) local nfs will
not provide any tolerance if the hosting server dies. So we should
be looking at a shared storage regardless of the hypervisors.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Bob Doolittle

Just for the record, what Andrew reports is not specific to GlusterFS.

I have not yet found a way to shut down my single-node Hosted deployment 
cleanly without experiencing NFS hangs/timeouts on the way down.

My NFS storage is local to my host.

Also curious is that when I say poweroff it actually reboots and comes 
up again. Could that be due to the timeouts on the way down?


-Bob

On 05/25/2014 08:13 AM, Doron Fediuck wrote:


- Original Message -

From: Andrew Lau and...@andrewklau.com
To: users users@ovirt.org
Sent: Saturday, May 24, 2014 9:59:26 AM
Subject: [ovirt-users] Can HA Agent control NFS Mount?

Hi,

I was just wondering, within the whole complexity of hosted-engine.
Would it be possible for the hosted-engine ha-agent control the mount
point?

I'm basing this off a few people I've been talking to who have their
NFS server running on the same host that the hosted-engine servers are
running. Most normally also running that on top of gluster.

The main motive for this, is currently if the nfs server is running on
the localhost and the server goes for a clean shutdown it will hang
because the nfs mount is hard mounted and as the nfs server has gone
away, we're stuck at an infinite hold waiting for it to cleanly
unmount (which it never will)

If it's possible for instead one of the ha components to unmount this
nfs mount when it shuts down, this could potentially prevent this.
There are other alternatives and I know this is not the supported
scenario, but just hoping to bounce a few ideas.

Thanks,
Andrew

Hi Andrew,
Indeed we're not looking into the Gluster flow now as it has some
known issues. Additionally (just to make it clear) local nfs will
not provide any tolerance if the hosting server dies. So we should
be looking at a shared storage regardless of the hypervisors.
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Joop

On 25-5-2014 19:38, Bob Doolittle wrote:


Also curious is that when I say poweroff it actually reboots and 
comes up again. Could that be due to the timeouts on the way down?


Ah, that's something my F19 host does too. Some more info: if engine 
hasn't been started on the host then I can shutdown it and it will 
poweroff. IF engine has been run on it then it will reboot.

Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
 service ovirt-agent-ha stop
 service ovirt-agent-broker stop
 service vdsmd stop
 ssh root@engine01 init 0
init 0

I don't use maintenance mode because when I poweron my host (= my 
desktop) I want engine to power on automatically which it does most of 
the time within 10 min.

I think wdmd or sanlock are causing the reboot instead of poweroff

Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Bob Doolittle


On 05/25/2014 02:51 PM, Joop wrote:

On 25-5-2014 19:38, Bob Doolittle wrote:


Also curious is that when I say poweroff it actually reboots and 
comes up again. Could that be due to the timeouts on the way down?


Ah, that's something my F19 host does too. Some more info: if engine 
hasn't been started on the host then I can shutdown it and it will 
poweroff. IF engine has been run on it then it will reboot.

Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
 service ovirt-agent-ha stop
 service ovirt-agent-broker stop
 service vdsmd stop
 ssh root@engine01 init 0
init 0

I don't use maintenance mode because when I poweron my host (= my 
desktop) I want engine to power on automatically which it does most of 
the time within 10 min.


For comparison, I see this issue and I *do* use maintenance mode 
(because presumably that's the 'blessed' way to shut things down and I'm 
scared to mess this complex system up by straying off the beaten path 
;). My process is:


ssh root@engine init 0
(wait for vdsClient -s 0 list | grep Status: to show the vm as down)
hosted-engine --set-maintenance --mode=global
poweroff

And then on startup:
hosted-engine --set-maintenance --mode=none
hosted-engine --vm-start

There are two issues here. I am not sure if they are related or not.
1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
2. The system reboot instead of poweroff (which messes up remote machine 
management)


Thanks,
 Bob


I think wdmd or sanlock are causing the reboot instead of poweroff

Joop

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Can HA Agent control NFS Mount?

2014-05-25 Thread Andrew Lau
On Mon, May 26, 2014 at 5:10 AM, Bob Doolittle b...@doolittle.us.com wrote:

 On 05/25/2014 02:51 PM, Joop wrote:

 On 25-5-2014 19:38, Bob Doolittle wrote:


 Also curious is that when I say poweroff it actually reboots and comes
 up again. Could that be due to the timeouts on the way down?

 Ah, that's something my F19 host does too. Some more info: if engine
 hasn't been started on the host then I can shutdown it and it will poweroff.
 IF engine has been run on it then it will reboot.
 Its not vdsm (I think) because my shutdown sequence is (on my f19 host):
  service ovirt-agent-ha stop
  service ovirt-agent-broker stop
  service vdsmd stop
  ssh root@engine01 init 0
 init 0

 I don't use maintenance mode because when I poweron my host (= my desktop)
 I want engine to power on automatically which it does most of the time
 within 10 min.


 For comparison, I see this issue and I *do* use maintenance mode (because
 presumably that's the 'blessed' way to shut things down and I'm scared to
 mess this complex system up by straying off the beaten path ;). My process
 is:

 ssh root@engine init 0
 (wait for vdsClient -s 0 list | grep Status: to show the vm as down)
 hosted-engine --set-maintenance --mode=global
 poweroff

 And then on startup:
 hosted-engine --set-maintenance --mode=none
 hosted-engine --vm-start

 There are two issues here. I am not sure if they are related or not.
 1. The NFS timeout during shutdown (Joop do you see this also? Or just #2?)
 2. The system reboot instead of poweroff (which messes up remote machine
 management)


For 1. I was wondering if perhaps, we could have an option to specify
the mount options. If I understand correctly, applying a soft mount
instead of a hard mount would prevent this from happening. I'm however
not sure of the implications this would have on the data integrity..

I would really like to see it happen in the ha-agent, as it's the one
which connects/mounts the storage it should also unmount it on boot.
However the stability on it, is flaky at best. I've noticed if `df`
hangs because of another NFS mount having timed-out the agent will
die. That's not a good sign.. this was what actually caused my
hosted-engine to run twice in one case.

 Thanks,
  Bob


 I think wdmd or sanlock are causing the reboot instead of poweroff

 Joop

 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users


 ___
 Users mailing list
 Users@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users