Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-21 Thread Jamie Lawrence

> On Apr 21, 2017, at 6:38 AM, knarra  wrote:
> 
> On 04/21/2017 06:34 PM, Jamie Lawrence wrote:
>>> On Apr 20, 2017, at 10:36 PM, knarra  wrote:
 The installer claimed it did, but I believe it didn’t. Below the error 
 from my original email, there’s the below (apologies for not including it 
 earlier; I missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the 
 HE - I’m pretty sure it is complaining about itself. (In any case, I 
 verified that there are no other VMs running with both virsh and 
 vdsClient.)
>> ^^^
>> 
 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
 late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
 2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
 runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
 [u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
 2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
 runvm._late_setup:91 The following VMs have been found: 
 04ff4cf1-135a-4918-9a1f-8023322f89a3
 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
 exception
 Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
 _executeMethod
 method['method']()
   File 
 "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
  line 95, in _late_setup
 _('Cannot setup Hosted Engine with other VMs running')
 RuntimeError: Cannot setup Hosted Engine with other VMs running
 2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed 
 to execute stage 'Environment setup': Cannot setup Hosted Engine with 
 other VMs running
 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 
 ENVIRONMENT DUMP - BEGIN
 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
 BASE/error=bool:'True'
 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
 BASE/exceptionInfo=list:'[(, 
 RuntimeError('Cannot setup Hosted Engine with other VMs running',), 
 )]'
 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 
 ENVIRONMENT DUMP - END
>>> James, generally this issue happens when the setup failed once and you 
>>> tried re running it again.  Can you clean it and deploy it again?  HE 
>>> should come up successfully. Below are the steps for cleaning it up.
>> Knarra,
>> 
>> I realize that. However, that is not the situation in my case. See above, at 
>> the mark - the UUID it is complaining about is the UUID of the hosted-engine 
>> it just installed. From the answers file generated from the run (whole thing 
>> below):
>> 
>> OVEHOSTED_VM/vmUUID=str:04ff4cf1-135a-4918-9a1f-8023322f89a3
>> Also see the WARNs I mentioned previously, quoted below. Excerpt:
>> 
>> Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
>> File: 
>> /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.com.redhat.rhevm.vdsm
>>  already removed
>> Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
>> File: 
>> /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.org.qemu.guest_agent.0
>>  already removed
>> Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
>> ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect 
>> to broker, the number of errors has exceeded the limit (1)
>> I’m not clear on what it is attempting to do there, but it seems relevant.
> I remember that you said HE vm was not started when the installation was 
> successful. Is Local Maintenance enabled on that host?
> 
> can you please check if the services 'ovirt-ha-agent' and 'ovirt-ha-broker' 
> running fine and try to restart them once ?

Agent and broker logs from before are down in the original message quoting.  
They’re running, but not fine.

[root@sc5-ovirt-2 jlawrence]# ps ax|grep ha-
130599 ?Ssl3:52 /usr/bin/python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker --no-daemon
132869 ?Ss 0:13 /usr/bin/python 
/usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent --no-daemon
133501 pts/0S+ 0:00 grep --color=auto ha-
[root@sc5-ovirt-2 jlawrence]# systemctl restart ovirt-ha-agent ovirt-ha-broker
[root@sc5-ovirt-2 jlawrence]# tail -40 
/var/log/ovirt-hosted-engine-ha/broker.log
Thread-46::INFO::2017-04-21 
10:52:57,058::storage_backends::119::ovirt_hosted_engine_ha.lib.storage_backends::(_check_symlinks)
 Cleaning up stale LV link 
'/rhev/data-center/mnt/glusterSD/sc5-gluster-1.squaretrade.com:_ovirt__engine/a1155699-0bcf-44c5-aa55-a574ca3ad313/ha_agent/hosted-engine.metadata'
Thread-53::INFO::2017-04-21 
10:52:57,070::listener::186::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle)
 Connection closed
Thread-50::INFO::2017-04-21 
10:52:57,118::mem_free::

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-21 Thread knarra

On 04/21/2017 06:34 PM, Jamie Lawrence wrote:

On Apr 20, 2017, at 10:36 PM, knarra  wrote:

The installer claimed it did, but I believe it didn’t. Below the error from my 
original email, there’s the below (apologies for not including it earlier; I 
missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE - I’m pretty 
sure it is complaining about itself. (In any case, I verified that there are no 
other VMs running with both virsh and vdsClient.)

^^^


2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
[u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:91 The following VMs have been found: 
04ff4cf1-135a-4918-9a1f-8023322f89a3
2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
_executeMethod
 method['method']()
   File 
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
 line 95, in _late_setup
 _('Cannot setup Hosted Engine with other VMs running')
RuntimeError: Cannot setup Hosted Engine with other VMs running
2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
running
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT 
DUMP - BEGIN
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/error=bool:'True'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/exceptionInfo=list:'[(, RuntimeError('Cannot 
setup Hosted Engine with other VMs running',), )]'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT 
DUMP - END

James, generally this issue happens when the setup failed once and you tried re 
running it again.  Can you clean it and deploy it again?  HE should come up 
successfully. Below are the steps for cleaning it up.

Knarra,

I realize that. However, that is not the situation in my case. See above, at 
the mark - the UUID it is complaining about is the UUID of the hosted-engine it 
just installed. From the answers file generated from the run (whole thing 
below):


OVEHOSTED_VM/vmUUID=str:04ff4cf1-135a-4918-9a1f-8023322f89a3

Also see the WARNs I mentioned previously, quoted below. Excerpt:


Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN File: 
/var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.com.redhat.rhevm.vdsm
 already removed
Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN File: 
/var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.org.qemu.guest_agent.0
 already removed
Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to 
broker, the number of errors has exceeded the limit (1)

I’m not clear on what it is attempting to do there, but it seems relevant.
I remember that you said HE vm was not started when the installation was 
successful. Is Local Maintenance enabled on that host?


can you please check if the services 'ovirt-ha-agent' and 
'ovirt-ha-broker' running fine and try to restart them once ?





I know there is no failed install left on the gluster volume, because when I 
attempt an install, part of my scripted prep process is deleting and recreating 
the Gluster volume. The below instructions are more or less what I’m doing 
already in a script[1]. (the gluster portion of the script process is: stop the 
volume, delete the volume, remove the mount point directory to avoid Gluster’s 
xattr problem with recycling directories, recreate the directory, change perms, 
create the volume, start the volume, set Ovirt-recc’ed volume options.)

-j

[1] We have a requirement for automated setup of all production resources, so 
all of this ends up being scripted.


1) vdsClient -s 0 list table | awk '{print $1}' | xargs vdsClient -s 0 destroy

2) stop the volume and delete all the information inside the bricks from all 
the hosts

3) try to umount storage from /rhev/data-center/mnt/ - umount -f 
/rhev/data-center/mnt/  if it is mounted

4) remove all dirs from /rhev/data-center/mnt/ - rm -rf /rhev/data-center/mnt/*

5) start  volume again and start the deployment.

Thanks
kasturi



If I start it manually, the default DC is down, the default cluster has the 
installation host in the cluster,  there is no storage, and the VM doesn’t show 
up in the GUI. In this install run, I have not yet started the engine manually.

you wont be seeing HE vm until HE storage is imported into the UI. HE storage 
will be automatically imported into the

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-21 Thread Jamie Lawrence

> On Apr 20, 2017, at 10:36 PM, knarra  wrote:

>> The installer claimed it did, but I believe it didn’t. Below the error from 
>> my original email, there’s the below (apologies for not including it 
>> earlier; I missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE 
>> - I’m pretty sure it is complaining about itself. (In any case, I verified 
>> that there are no other VMs running with both virsh and vdsClient.)

^^^ 

>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
>> late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
>> 2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
>> runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
>> [u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
>> 2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
>> runvm._late_setup:91 The following VMs have been found: 
>> 04ff4cf1-135a-4918-9a1f-8023322f89a3
>> 2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
>> exception
>> Traceback (most recent call last):
>>   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
>> _executeMethod
>> method['method']()
>>   File 
>> "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
>>  line 95, in _late_setup
>> _('Cannot setup Hosted Engine with other VMs running')
>> RuntimeError: Cannot setup Hosted Engine with other VMs running
>> 2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
>> execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
>> running
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 
>> ENVIRONMENT DUMP - BEGIN
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>> BASE/error=bool:'True'
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
>> BASE/exceptionInfo=list:'[(, 
>> RuntimeError('Cannot setup Hosted Engine with other VMs running',), 
>> )]'
>> 2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 
>> ENVIRONMENT DUMP - END
> James, generally this issue happens when the setup failed once and you tried 
> re running it again.  Can you clean it and deploy it again?  HE should come 
> up successfully. Below are the steps for cleaning it up.

Knarra,

I realize that. However, that is not the situation in my case. See above, at 
the mark - the UUID it is complaining about is the UUID of the hosted-engine it 
just installed. From the answers file generated from the run (whole thing 
below):

 OVEHOSTED_VM/vmUUID=str:04ff4cf1-135a-4918-9a1f-8023322f89a3

Also see the WARNs I mentioned previously, quoted below. Excerpt:

 Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
 File: 
 /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.com.redhat.rhevm.vdsm
  already removed
 Apr 19 12:29:20 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root WARN 
 File: 
 /var/lib/libvirt/qemu/channels/04ff4cf1-135a-4918-9a1f-8023322f89a3.org.qemu.guest_agent.0
  already removed
 Apr 19 12:29:30 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
 ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect 
 to broker, the number of errors has exceeded the limit (1)

I’m not clear on what it is attempting to do there, but it seems relevant.

I know there is no failed install left on the gluster volume, because when I 
attempt an install, part of my scripted prep process is deleting and recreating 
the Gluster volume. The below instructions are more or less what I’m doing 
already in a script[1]. (the gluster portion of the script process is: stop the 
volume, delete the volume, remove the mount point directory to avoid Gluster’s 
xattr problem with recycling directories, recreate the directory, change perms, 
create the volume, start the volume, set Ovirt-recc’ed volume options.)

-j

[1] We have a requirement for automated setup of all production resources, so 
all of this ends up being scripted.

> 1) vdsClient -s 0 list table | awk '{print $1}' | xargs vdsClient -s 0 destroy
> 
> 2) stop the volume and delete all the information inside the bricks from all 
> the hosts
> 
> 3) try to umount storage from /rhev/data-center/mnt/ - umount -f 
> /rhev/data-center/mnt/  if it is mounted
> 
> 4) remove all dirs from /rhev/data-center/mnt/ - rm -rf 
> /rhev/data-center/mnt/*
> 
> 5) start  volume again and start the deployment.
> 
> Thanks
> kasturi
>> 
>> 
 If I start it manually, the default DC is down, the default cluster has 
 the installation host in the cluster,  there is no storage, and the VM 
 doesn’t show up in the GUI. In this install run, I have not yet started 
 the engine manually.
>>> you wont be seeing HE vm until HE storage is imported into the UI. HE 
>>> storage will be automatically imported into the UI (which will import HE vm 
>>> too )once a master domain is pr

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-20 Thread knarra

On 04/20/2017 10:48 PM, Jamie Lawrence wrote:

On Apr 19, 2017, at 11:35 PM, knarra  wrote:

On 04/20/2017 03:15 AM, Jamie Lawrence wrote:

I trialed installing the hosted engine, following the instructions at  
http://www.ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_Engine/
  . This is using Gluster as the backend storage subsystem.

Answer file at the end.

Per the docs,

"When the hosted-engine deployment script completes successfully, the oVirt 
Engine is configured and running on your host. The Engine has already configured the 
data center, cluster, host, the Engine virtual machine, and a shared storage domain 
dedicated to the Engine virtual machine.”

In my case, this is false. The installation claims success, but  the hosted 
engine VM stays stopped, unless I start it manually.

During the install process there is a step where HE vm is stopped and started. 
Can you check if this has happened correctly ?

The installer claimed it did, but I believe it didn’t. Below the error from my 
original email, there’s the below (apologies for not including it earlier; I 
missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE - I’m pretty 
sure it is complaining about itself. (In any case, I verified that there are no 
other VMs running with both virsh and vdsClient.)

2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
[u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:91 The following VMs have been found: 
04ff4cf1-135a-4918-9a1f-8023322f89a3
2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
   File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
_executeMethod
 method['method']()
   File 
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
 line 95, in _late_setup
 _('Cannot setup Hosted Engine with other VMs running')
RuntimeError: Cannot setup Hosted Engine with other VMs running
2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
running
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT 
DUMP - BEGIN
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/error=bool:'True'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/exceptionInfo=list:'[(, RuntimeError('Cannot 
setup Hosted Engine with other VMs running',), )]'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT 
DUMP - END
James, generally this issue happens when the setup failed once and you 
tried re running it again.  Can you clean it and deploy it again?  HE 
should come up successfully. Below are the steps for cleaning it up.


1) vdsClient -s 0 list table | awk '{print $1}' | xargs vdsClient -s 0 
destroy


2) stop the volume and delete all the information inside the bricks from 
all the hosts


3) try to umount storage from /rhev/data-center/mnt/ - umount 
-f /rhev/data-center/mnt/  if it is mounted


4) remove all dirs from /rhev/data-center/mnt/ - rm 
-rf /rhev/data-center/mnt/*


5) start  volume again and start the deployment.

Thanks
kasturi




If I start it manually, the default DC is down, the default cluster has the 
installation host in the cluster,  there is no storage, and the VM doesn’t show 
up in the GUI. In this install run, I have not yet started the engine manually.

you wont be seeing HE vm until HE storage is imported into the UI. HE storage 
will be automatically imported into the UI (which will import HE vm too )once a 
master domain is present .

Sure; I’m just attempting to provide context.


I assume this is related to the errors in ovirt-hosted-engine-setup.log, below. 
(The timestamps are confusing; it looks like the Python errors are logged some 
time after they’re captured or something.) The HA broker and agent logs just 
show them looping in the sequence below.

Is there a decent way to pick this up and continue? If not, how do I make this 
work?

Can you please check the following things.

1) is glusterd running on all the nodes ? 'systemctl status glistered’
2) Are you able to connect to your storage server which is ovirt_engine in your 
case.
3) Can you check if all the brick process in the volume is up ?


1) Verified that glusterd is running on all three nodes.

2)
[root@sc5-thing-1]# mount -tglusterfs sc5-gluster-1:/ovirt_engine 
/mnt/ovirt_engine
[root@sc5-thing-1]# df -h
Filesystem  Size  Used Avail Use% Mounted on
[…]
sc5-gluster-1:/ovirt_engine 300G  2.6G  298G   1% /mnt/ovirt_engine


3)
[root@sc5-gluster-1 jla

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker (revised)

2017-04-20 Thread Jamie Lawrence

> On Apr 20, 2017, at 9:18 AM, Simone Tiraboschi  wrote:

> Could you please share the output of 
>   sudo -u vdsm sudo service sanlock status

That command line prompts for vdsm’s password, which it doesn’t have. But 
output returned as root is below. Is that ‘operation not permitted’ related?

Thanks,

-j

[root@sc5-ovirt-2 jlawrence]# service sanlock status
Redirecting to /bin/systemctl status  sanlock.service
● sanlock.service - Shared Storage Lease Manager
   Loaded: loaded (/usr/lib/systemd/system/sanlock.service; disabled; vendor 
preset: disabled)
   Active: active (running) since Wed 2017-04-19 16:56:40 PDT; 17h ago
  Process: 16764 ExecStart=/usr/sbin/sanlock daemon (code=exited, 
status=0/SUCCESS)
 Main PID: 16765 (sanlock)
   CGroup: /system.slice/sanlock.service
   ├─16765 /usr/sbin/sanlock daemon
   └─16766 /usr/sbin/sanlock daemon

Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com systemd[1]: Starting Shared Storage 
Lease Manager...
Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com systemd[1]: Started Shared Storage 
Lease Manager.
Apr 19 16:56:40 sc5-ovirt-2.squaretrade.com sanlock[16765]: 2017-04-19 
16:56:40-0700 482 [16765]: set scheduler RR|RESET_ON_FORK priority 99 failed: 
Operation not permitted

___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-20 Thread Jamie Lawrence

> On Apr 19, 2017, at 11:35 PM, knarra  wrote:
> 
> On 04/20/2017 03:15 AM, Jamie Lawrence wrote:
>> I trialed installing the hosted engine, following the instructions at  
>> http://www.ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_Engine/
>>   . This is using Gluster as the backend storage subsystem.
>> 
>> Answer file at the end.
>> 
>> Per the docs,
>> 
>> "When the hosted-engine deployment script completes successfully, the oVirt 
>> Engine is configured and running on your host. The Engine has already 
>> configured the data center, cluster, host, the Engine virtual machine, and a 
>> shared storage domain dedicated to the Engine virtual machine.”
>> 
>> In my case, this is false. The installation claims success, but  the hosted 
>> engine VM stays stopped, unless I start it manually.
> During the install process there is a step where HE vm is stopped and 
> started. Can you check if this has happened correctly ?

The installer claimed it did, but I believe it didn’t. Below the error from my 
original email, there’s the below (apologies for not including it earlier; I 
missed it). Note: 04ff4cf1-135a-4918-9a1f-8023322f89a3 is the HE - I’m pretty 
sure it is complaining about itself. (In any case, I verified that there are no 
other VMs running with both virsh and vdsClient.)

2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.vm.runvm.Plugin._late_setup
2017-04-19 12:27:02 DEBUG otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:83 {'status': {'message': 'Done', 'code': 0}, 'items': 
[u'04ff4cf1-135a-4918-9a1f-8023322f89a3']}
2017-04-19 12:27:02 ERROR otopi.plugins.gr_he_setup.vm.runvm 
runvm._late_setup:91 The following VMs have been found: 
04ff4cf1-135a-4918-9a1f-8023322f89a3
2017-04-19 12:27:02 DEBUG otopi.context context._executeMethod:142 method 
exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in 
_executeMethod
method['method']()
  File 
"/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/vm/runvm.py",
 line 95, in _late_setup
_('Cannot setup Hosted Engine with other VMs running')
RuntimeError: Cannot setup Hosted Engine with other VMs running
2017-04-19 12:27:02 ERROR otopi.context context._executeMethod:151 Failed to 
execute stage 'Environment setup': Cannot setup Hosted Engine with other VMs 
running
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT 
DUMP - BEGIN
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/error=bool:'True'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:770 ENV 
BASE/exceptionInfo=list:'[(, 
RuntimeError('Cannot setup Hosted Engine with other VMs running',), )]'
2017-04-19 12:27:02 DEBUG otopi.context context.dumpEnvironment:774 ENVIRONMENT 
DUMP - END


>> If I start it manually, the default DC is down, the default cluster has the 
>> installation host in the cluster,  there is no storage, and the VM doesn’t 
>> show up in the GUI. In this install run, I have not yet started the engine 
>> manually.
> you wont be seeing HE vm until HE storage is imported into the UI. HE storage 
> will be automatically imported into the UI (which will import HE vm too )once 
> a master domain is present .

Sure; I’m just attempting to provide context.

>> I assume this is related to the errors in ovirt-hosted-engine-setup.log, 
>> below. (The timestamps are confusing; it looks like the Python errors are 
>> logged some time after they’re captured or something.) The HA broker and 
>> agent logs just show them looping in the sequence below.
>> 
>> Is there a decent way to pick this up and continue? If not, how do I make 
>> this work?
> Can you please check the following things.
> 
> 1) is glusterd running on all the nodes ? 'systemctl status glistered’
> 2) Are you able to connect to your storage server which is ovirt_engine in 
> your case.
> 3) Can you check if all the brick process in the volume is up ?


1) Verified that glusterd is running on all three nodes.

2) 
[root@sc5-thing-1]# mount -tglusterfs sc5-gluster-1:/ovirt_engine 
/mnt/ovirt_engine
[root@sc5-thing-1]# df -h
Filesystem  Size  Used Avail Use% Mounted on
[…]
sc5-gluster-1:/ovirt_engine 300G  2.6G  298G   1% /mnt/ovirt_engine


3)
[root@sc5-gluster-1 jlawrence]# gluster volume status
Status of volume: ovirt_engine
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick sc5-gluster-1:/gluster-bricks/ovirt_e
ngine/ovirt_engine-149217 0  Y   22102
Brick sc5-gluster-2:/gluster-bricks/ovirt_e
ngine/ovirt_engine-149157 0  Y   37842
Brick sc5-gluster-3:/gluster-bricks/ovirt_e
ngine/ovirt_engine-149157 0  Y   112018
Self-

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker (revised)

2017-04-20 Thread Simone Tiraboschi
On Thu, Apr 20, 2017 at 2:14 AM, Jamie Lawrence 
wrote:

>
> So, tracing this further, I’m pretty sure this is something about sanlock.
>
> As best I can tell this[1]  seems to be the failure that is blocking
> importing the pool, creating storage domains, importing the HE, etc.
> Contrary to the log, sanlock is running; I verified it starts on
> system-boot and restarts just fine.
>
> I found one reference to someone having a similar problem in 3.6, but that
> appeared to have been a permission issue I’m not afflicted with.
>
> How can I move past this?
>

Could you please share the output of
  sudo -u vdsm sudo service sanlock status
?


>
> TIA,
>
> -j
>
>
> [1] agent.log:
> MainThread::WARNING::2017-04-19 17:07:13,537::agent::209::
> ovirt_hosted_engine_ha.agent.agent.Agent::(_run_agent) Restarting agent,
> attempt '6'
> MainThread::INFO::2017-04-19 17:07:13,567::hosted_engine::
> 242::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_get_hostname)
> Found certificate common name: sc5-ovirt-2.squaretrade.com
> MainThread::INFO::2017-04-19 17:07:13,569::hosted_engine::
> 604::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_vdsm) Initializing VDSM
> MainThread::INFO::2017-04-19 17:07:16,044::hosted_engine::
> 630::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Connecting the storage
> MainThread::INFO::2017-04-19 17:07:16,045::storage_server::
> 219::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::INFO::2017-04-19 17:07:20,876::storage_server::
> 226::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Connecting storage server
> MainThread::INFO::2017-04-19 17:07:20,893::storage_server::
> 233::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server)
> Refreshing the storage domain
> MainThread::INFO::2017-04-19 17:07:21,160::hosted_engine::
> 657::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Preparing images
> MainThread::INFO::2017-04-19 17:07:21,160::image::126::
> ovirt_hosted_engine_ha.lib.image.Image::(prepare_images) Preparing images
> MainThread::INFO::2017-04-19 17:07:23,954::hosted_engine::
> 660::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_storage_images) Refreshing vm.conf
> MainThread::INFO::2017-04-19 17:07:23,955::config::485::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.config::(refresh_vm_conf)
> Reloading vm.conf from the shared storage domain
> MainThread::INFO::2017-04-19 17:07:23,955::config::412::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
> config::(_get_vm_conf_content_from_ovf_store) Trying to get a fresher
> copy of vm configuration from the OVF_STORE
> MainThread::WARNING::2017-04-19 17:07:26,741::ovf_store::107::
> ovirt_hosted_engine_ha.lib.ovf.ovf_store.OVFStore::(scan) Unable to find
> OVF_STORE
> MainThread::ERROR::2017-04-19 17:07:26,744::config::450::
> ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine.
> config::(_get_vm_conf_content_from_ovf_store) Unable to identify the
> OVF_STORE volume, falling back to initial vm.conf. Please ensure you
> already added your first data domain for regular VMs
> MainThread::INFO::2017-04-19 17:07:26,770::hosted_engine::
> 509::ovirt_hosted_engine_ha.agent.hosted_engine.
> HostedEngine::(_initialize_broker) Initializing ha-broker connection
> MainThread::INFO::2017-04-19 17:07:26,771::brokerlink::130:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor ping, options {'addr': '10.181.26.1'}
> MainThread::INFO::2017-04-19 17:07:26,774::brokerlink::141:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Success, id 140621269798096
> MainThread::INFO::2017-04-19 17:07:26,774::brokerlink::130:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor mgmt-bridge, options {'use_ssl': 'true', 'bridge_name':
> 'ovirtmgmt', 'address': '0'}
> MainThread::INFO::2017-04-19 17:07:26,791::brokerlink::141:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Success, id 140621269798544
> MainThread::INFO::2017-04-19 17:07:26,792::brokerlink::130:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor mem-free, options {'use_ssl': 'true', 'address': '0'}
> MainThread::INFO::2017-04-19 17:07:26,793::brokerlink::141:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Success, id 140621269798224
> MainThread::INFO::2017-04-19 17:07:26,794::brokerlink::130:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
> Starting monitor cpu-load-no-engine, options {'use_ssl': 'true', 'vm_uuid':
> '04ff4cf1-135a-4918-9a1f-8023322f89a3', 'address': '0'}
> MainThread::INFO::2017-04-19 17:07:26,796::brokerlink::141:
> :ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(start_monitor)
>

Re: [ovirt-users] Hosted engine install failed; vdsm upset about broker

2017-04-19 Thread knarra

On 04/20/2017 03:15 AM, Jamie Lawrence wrote:

I trialed installing the hosted engine, following the instructions at  
http://www.ovirt.org/documentation/self-hosted/chap-Deploying_Self-Hosted_Engine/
  . This is using Gluster as the backend storage subsystem.

Answer file at the end.

Per the docs,

"When the hosted-engine deployment script completes successfully, the oVirt 
Engine is configured and running on your host. The Engine has already configured the 
data center, cluster, host, the Engine virtual machine, and a shared storage domain 
dedicated to the Engine virtual machine.”

In my case, this is false. The installation claims success, but  the hosted 
engine VM stays stopped, unless I start it manually.
During the install process there is a step where HE vm is stopped and 
started. Can you check if this has happened correctly ?

If I start it manually, the default DC is down, the default cluster has the 
installation host in the cluster,  there is no storage, and the VM doesn’t show 
up in the GUI. In this install run, I have not yet started the engine manually.
you wont be seeing HE vm until HE storage is imported into the UI. HE 
storage will be automatically imported into the UI (which will import HE 
vm too )once a master domain is present .


I assume this is related to the errors in ovirt-hosted-engine-setup.log, below. 
(The timestamps are confusing; it looks like the Python errors are logged some 
time after they’re captured or something.) The HA broker and agent logs just 
show them looping in the sequence below.

Is there a decent way to pick this up and continue? If not, how do I make this 
work?

Can you please check the following things.

1) is glusterd running on all the nodes ? 'systemctl status glusterd'
2) Are you able to connect to your storage server which is ovirt_engine 
in your case.

3) Can you check if all the brick process in the volume is up ?

Thanks
kasturi.



Thanks,

-j

- - - - ovirt-hosted-engine-setup.log snippet: - - - -

2017-04-19 12:29:55 DEBUG otopi.context context._executeMethod:128 Stage 
late_setup METHOD otopi.plugins.gr_he_setup.system.vdsmenv.Plugin._late_setup
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
systemd.status:90 check service vdsmd status
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:813 execute: ('/bin/systemctl', 'status', 'vdsmd.service'), 
executable='None', cwd='None', env=None
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'status', 
'vdsmd.service'), rc=0
2017-04-19 12:29:55 DEBUG otopi.plugins.otopi.services.systemd 
plugin.execute:921 execute-output: ('/bin/systemctl', 'status', 
'vdsmd.service') stdout:
● vdsmd.service - Virtual Desktop Server Manager
Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor 
preset: enabled)
Active: active (running) since Wed 2017-04-19 12:26:59 PDT; 2min 55s ago
   Process: 67370 ExecStopPost=/usr/libexec/vdsm/vdsmd_init_common.sh 
--post-stop (code=exited, status=0/SUCCESS)
   Process: 69995 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh 
--pre-start (code=exited, status=0/SUCCESS)
  Main PID: 70062 (vdsm)
CGroup: /system.slice/vdsmd.service
└─70062 /usr/bin/python2 /usr/share/vdsm/vdsm

Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm 
ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink ERROR Failed to connect to 
broker, the number of errors has exceeded the limit (1)
Apr 19 12:29:00 sc5-ovirt-2.squaretrade.com vdsm[70062]: vdsm root ERROR failed 
to retrieve Hosted Engine HA info
  Traceback (most 
recent call last):
File 
"/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
  stats = 
instance.get_all_stats()
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", 
line 102, in get_all_stats
  with 
broker.connection(self._retries, self._wait):
File 
"/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
  return 
self.gen.next()
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 99, in connection
  
self.connect(retries, wait)
File 
"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", 
line 78, in connect
  raise 
BrokerConnectionError(error_msg)