I don't notice anything wrong on the gluster end. Maybe Simone can help take a look at HE behaviour?
On Fri, Jun 16, 2017 at 6:14 PM, Joel Diaz <[email protected]> wrote: > Good morning, > > Info requested below. > > [root@ovirt-hyp-02 ~]# hosted-engine --vm-start > > Exception in thread Client localhost:54321 (most likely raised during > interpreter shutdown):VM exists and its status is Up > > > > [root@ovirt-hyp-02 ~]# ping engine > > PING engine.example.lan (192.168.170.149) 56(84) bytes of data. > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=1 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=2 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=3 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=4 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=5 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=6 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=7 Destination > Host Unreachable > > From ovirt-hyp-02.example.lan (192.168.170.143) icmp_seq=8 Destination > Host Unreachable > > > > > > [root@ovirt-hyp-02 ~]# gluster volume status engine > > Status of volume: engine > > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------ > ------------------ > > Brick 192.168.170.141:/gluster_bricks/engin > > e/engine 49159 0 Y > 1799 > > Brick 192.168.170.143:/gluster_bricks/engin > > e/engine 49159 0 Y > 2900 > > Self-heal Daemon on localhost N/A N/A Y > 2914 > > Self-heal Daemon on ovirt-hyp-01.example.lan N/A N/A > Y 1854 > > > > Task Status of Volume engine > > ------------------------------------------------------------ > ------------------ > > There are no active volume tasks > > > > [root@ovirt-hyp-02 ~]# gluster volume heal engine info > > Brick 192.168.170.141:/gluster_bricks/engine/engine > > Status: Connected > > Number of entries: 0 > > > > Brick 192.168.170.143:/gluster_bricks/engine/engine > > Status: Connected > > Number of entries: 0 > > > > Brick 192.168.170.147:/gluster_bricks/engine/engine > > Status: Connected > > Number of entries: 0 > > > > [root@ovirt-hyp-02 ~]# cat /var/log/glusterfs/rhev-data-c > enter-mnt-glusterSD-ovirt-hyp-01.example.lan\:engine.log > > [2017-06-15 13:37:02.009436] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > > > > > > Each of the three host sends out the following notifications about every > 15 minutes. > > Hosted engine host: ovirt-hyp-01.example.lan changed state: > EngineDown-EngineStart. > > Hosted engine host: ovirt-hyp-01.example.lan changed state: > EngineStart-EngineStarting. > > Hosted engine host: ovirt-hyp-01.example.lan changed state: > EngineStarting-EngineForceStop. > > Hosted engine host: ovirt-hyp-01.example.lan changed state: > EngineForceStop-EngineDown. > > Please let me know if you need any additional information. > > Thank you, > > Joel > > > > On Jun 16, 2017 2:52 AM, "Sahina Bose" <[email protected]> wrote: > >> From the agent.log, >> MainThread::INFO::2017-06-15 11:16:50,583::states::473::ovi >> rt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine >> vm is running on host ovirt-hyp-02.reis.com (id 2) >> >> It looks like the HE VM was started successfully? Is it possible that the >> ovirt-engine service could not be started on the HE VM. Could you try to >> start the HE vm using below and then logging into the VM console. >> #hosted-engine --vm-start >> >> Also, please check >> # gluster volume status engine >> # gluster volume heal engine info >> >> Please also check if there are errors in gluster mount logs - at >> /var/log/glusterfs/rhev-data-center-mnt..<engine>.log >> >> >> On Thu, Jun 15, 2017 at 8:53 PM, Joel Diaz <[email protected]> wrote: >> >>> Sorry. I forgot to attached the requested logs in the previous email. >>> >>> Thanks, >>> >>> On Jun 15, 2017 9:38 AM, "Joel Diaz" <[email protected]> wrote: >>> >>> Good morning, >>> >>> Requested info below. Along with some additional info. >>> >>> You'll notice the data volume is not mounted. >>> >>> Any help in getting HE back running would be greatly appreciated. >>> >>> Thank you, >>> >>> Joel >>> >>> [root@ovirt-hyp-01 ~]# hosted-engine --vm-status >>> >>> >>> >>> >>> >>> --== Host 1 status ==-- >>> >>> >>> >>> conf_on_shared_storage : True >>> >>> Status up-to-date : False >>> >>> Hostname : ovirt-hyp-01.example.lan >>> >>> Host ID : 1 >>> >>> Engine status : unknown stale-data >>> >>> Score : 3400 >>> >>> stopped : False >>> >>> Local maintenance : False >>> >>> crc32 : 5558a7d3 >>> >>> local_conf_timestamp : 20356 >>> >>> Host timestamp : 20341 >>> >>> Extra metadata (valid at timestamp): >>> >>> metadata_parse_version=1 >>> >>> metadata_feature_version=1 >>> >>> timestamp=20341 (Fri Jun 9 14:38:57 2017) >>> >>> host-id=1 >>> >>> score=3400 >>> >>> vm_conf_refresh_time=20356 (Fri Jun 9 14:39:11 2017) >>> >>> conf_on_shared_storage=True >>> >>> maintenance=False >>> >>> state=EngineDown >>> >>> stopped=False >>> >>> >>> >>> >>> >>> --== Host 2 status ==-- >>> >>> >>> >>> conf_on_shared_storage : True >>> >>> Status up-to-date : False >>> >>> Hostname : ovirt-hyp-02.example.lan >>> >>> Host ID : 2 >>> >>> Engine status : unknown stale-data >>> >>> Score : 3400 >>> >>> stopped : False >>> >>> Local maintenance : False >>> >>> crc32 : 936d4cf3 >>> >>> local_conf_timestamp : 20351 >>> >>> Host timestamp : 20337 >>> >>> Extra metadata (valid at timestamp): >>> >>> metadata_parse_version=1 >>> >>> metadata_feature_version=1 >>> >>> timestamp=20337 (Fri Jun 9 14:39:03 2017) >>> >>> host-id=2 >>> >>> score=3400 >>> >>> vm_conf_refresh_time=20351 (Fri Jun 9 14:39:17 2017) >>> >>> conf_on_shared_storage=True >>> >>> maintenance=False >>> >>> state=EngineDown >>> >>> stopped=False >>> >>> >>> >>> >>> >>> --== Host 3 status ==-- >>> >>> >>> >>> conf_on_shared_storage : True >>> >>> Status up-to-date : False >>> >>> Hostname : ovirt-hyp-03.example.lan >>> >>> Host ID : 3 >>> >>> Engine status : unknown stale-data >>> >>> Score : 3400 >>> >>> stopped : False >>> >>> Local maintenance : False >>> >>> crc32 : f646334e >>> >>> local_conf_timestamp : 20391 >>> >>> Host timestamp : 20377 >>> >>> Extra metadata (valid at timestamp): >>> >>> metadata_parse_version=1 >>> >>> metadata_feature_version=1 >>> >>> timestamp=20377 (Fri Jun 9 14:39:37 2017) >>> >>> host-id=3 >>> >>> score=3400 >>> >>> vm_conf_refresh_time=20391 (Fri Jun 9 14:39:51 2017) >>> >>> conf_on_shared_storage=True >>> >>> maintenance=False >>> >>> state=EngineStop >>> >>> stopped=False >>> >>> timeout=Thu Jan 1 00:43:08 1970 >>> >>> >>> >>> >>> >>> [root@ovirt-hyp-01 ~]# gluster peer status >>> >>> Number of Peers: 2 >>> >>> >>> >>> Hostname: 192.168.170.143 >>> >>> Uuid: b2b30d05-cf91-4567-92fd-022575e082f5 >>> >>> State: Peer in Cluster (Connected) >>> >>> Other names: >>> >>> 10.0.0.2 >>> >>> >>> >>> Hostname: 192.168.170.147 >>> >>> Uuid: 4e50acc4-f3cb-422d-b499-fb5796a53529 >>> >>> State: Peer in Cluster (Connected) >>> >>> Other names: >>> >>> 10.0.0.3 >>> >>> >>> >>> [root@ovirt-hyp-01 ~]# gluster volume info all >>> >>> >>> >>> Volume Name: data >>> >>> Type: Replicate >>> >>> Volume ID: 1d6bb110-9be4-4630-ae91-36ec1cf6cc02 >>> >>> Status: Started >>> >>> Snapshot Count: 0 >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: 192.168.170.141:/gluster_bricks/data/data >>> >>> Brick2: 192.168.170.143:/gluster_bricks/data/data >>> >>> Brick3: 192.168.170.147:/gluster_bricks/data/data (arbiter) >>> >>> Options Reconfigured: >>> >>> nfs.disable: on >>> >>> performance.readdir-ahead: on >>> >>> transport.address-family: inet >>> >>> performance.quick-read: off >>> >>> performance.read-ahead: off >>> >>> performance.io-cache: off >>> >>> performance.stat-prefetch: off >>> >>> performance.low-prio-threads: 32 >>> >>> network.remote-dio: off >>> >>> cluster.eager-lock: enable >>> >>> cluster.quorum-type: auto >>> >>> cluster.server-quorum-type: server >>> >>> cluster.data-self-heal-algorithm: full >>> >>> cluster.locking-scheme: granular >>> >>> cluster.shd-max-threads: 8 >>> >>> cluster.shd-wait-qlength: 10000 >>> >>> features.shard: on >>> >>> user.cifs: off >>> >>> storage.owner-uid: 36 >>> >>> storage.owner-gid: 36 >>> >>> network.ping-timeout: 30 >>> >>> performance.strict-o-direct: on >>> >>> cluster.granular-entry-heal: enable >>> >>> >>> >>> Volume Name: engine >>> >>> Type: Replicate >>> >>> Volume ID: b160f0b2-8bd3-4ff2-a07c-134cab1519dd >>> >>> Status: Started >>> >>> Snapshot Count: 0 >>> >>> Number of Bricks: 1 x (2 + 1) = 3 >>> >>> Transport-type: tcp >>> >>> Bricks: >>> >>> Brick1: 192.168.170.141:/gluster_bricks/engine/engine >>> >>> Brick2: 192.168.170.143:/gluster_bricks/engine/engine >>> >>> Brick3: 192.168.170.147:/gluster_bricks/engine/engine (arbiter) >>> >>> Options Reconfigured: >>> >>> nfs.disable: on >>> >>> performance.readdir-ahead: on >>> >>> transport.address-family: inet >>> >>> performance.quick-read: off >>> >>> performance.read-ahead: off >>> >>> performance.io-cache: off >>> >>> performance.stat-prefetch: off >>> >>> performance.low-prio-threads: 32 >>> >>> network.remote-dio: off >>> >>> cluster.eager-lock: enable >>> >>> cluster.quorum-type: auto >>> >>> cluster.server-quorum-type: server >>> >>> cluster.data-self-heal-algorithm: full >>> >>> cluster.locking-scheme: granular >>> >>> cluster.shd-max-threads: 8 >>> >>> cluster.shd-wait-qlength: 10000 >>> >>> features.shard: on >>> >>> user.cifs: off >>> >>> storage.owner-uid: 36 >>> >>> storage.owner-gid: 36 >>> >>> network.ping-timeout: 30 >>> >>> performance.strict-o-direct: on >>> >>> cluster.granular-entry-heal: enable >>> >>> >>> >>> >>> >>> [root@ovirt-hyp-01 ~]# df -h >>> >>> Filesystem Size Used Avail Use% >>> Mounted on >>> >>> /dev/mapper/centos_ovirt--hyp--01-root 50G 4.1G 46G 9% / >>> >>> devtmpfs 7.7G 0 7.7G 0% /dev >>> >>> tmpfs 7.8G 0 7.8G 0% >>> /dev/shm >>> >>> tmpfs 7.8G 8.7M 7.7G 1% /run >>> >>> tmpfs 7.8G 0 7.8G 0% >>> /sys/fs/cgroup >>> >>> /dev/mapper/centos_ovirt--hyp--01-home 61G 33M 61G 1% >>> /home >>> >>> /dev/mapper/gluster_vg_sdb-gluster_lv_engine 50G 7.6G 43G 16% >>> /gluster_bricks/engine >>> >>> /dev/mapper/gluster_vg_sdb-gluster_lv_data 730G 157G 574G 22% >>> /gluster_bricks/data >>> >>> /dev/sda1 497M 173M 325M 35% >>> /boot >>> >>> ovirt-hyp-01.example.lan:engine 50G 7.6G 43G 16% >>> /rhev/data-center/mnt/glusterSD/ovirt-hyp-01.example.lan:engine >>> >>> tmpfs 1.6G 0 1.6G 0% >>> /run/user/0 >>> >>> >>> >>> [root@ovirt-hyp-01 ~]# systemctl list-unit-files|grep ovirt >>> >>> ovirt-ha-agent.service enabled >>> >>> ovirt-ha-broker.service enabled >>> >>> ovirt-imageio-daemon.service disabled >>> >>> ovirt-vmconsole-host-sshd.service enabled >>> >>> >>> >>> [root@ovirt-hyp-01 ~]# systemctl status ovirt-ha-agent.service >>> >>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >>> Monitoring Agent >>> >>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >>> enabled; vendor preset: disabled) >>> >>> Active: active (running) since Thu 2017-06-15 08:56:15 EDT; 21min ago >>> >>> Main PID: 3150 (ovirt-ha-agent) >>> >>> CGroup: /system.slice/ovirt-ha-agent.service >>> >>> └─3150 /usr/bin/python >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >>> --no-daemon >>> >>> >>> >>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>> Hosted Engine High Availability Monitoring Agent. >>> >>> Jun 15 08:56:15 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>> Hosted Engine High Availability Monitoring Agent... >>> >>> Jun 15 09:17:18 ovirt-hyp-01.example.lan ovirt-ha-agent[3150]: >>> ovirt-ha-agent ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine >>> ERROR Engine VM stopped on localhost >>> >>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >>> >>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >>> Communications Broker >>> >>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >>> enabled; vendor preset: disabled) >>> >>> Active: active (running) since Thu 2017-06-15 08:54:06 EDT; 24min ago >>> >>> Main PID: 968 (ovirt-ha-broker) >>> >>> CGroup: /system.slice/ovirt-ha-broker.service >>> >>> └─968 /usr/bin/python >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >>> --no-daemon >>> >>> >>> >>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>> Hosted Engine High Availability Communications Broker. >>> >>> Jun 15 08:54:06 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>> Hosted Engine High Availability Communications Broker... >>> >>> Jun 15 08:56:16 ovirt-hyp-01.example.lan ovirt-ha-broker[968]: >>> ovirt-ha-broker ovirt_hosted_engine_ha.broker.listener.ConnectionHandler >>> ERROR Error handling request, data: '...1b55bcf76' >>> >>> Traceback >>> (most recent call last): >>> >>> File >>> "/usr/lib/python2.7/site-packages/ovirt... >>> >>> Hint: Some lines were ellipsized, use -l to show in full. >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-agent.service >>> >>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-agent.service >>> >>> ● ovirt-ha-agent.service - oVirt Hosted Engine High Availability >>> Monitoring Agent >>> >>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-agent.service; >>> enabled; vendor preset: disabled) >>> >>> Active: active (running) since Thu 2017-06-15 09:19:21 EDT; 26s ago >>> >>> Main PID: 8563 (ovirt-ha-agent) >>> >>> CGroup: /system.slice/ovirt-ha-agent.service >>> >>> └─8563 /usr/bin/python >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-agent >>> --no-daemon >>> >>> >>> >>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>> Hosted Engine High Availability Monitoring Agent. >>> >>> Jun 15 09:19:21 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>> Hosted Engine High Availability Monitoring Agent... >>> >>> [root@ovirt-hyp-01 ‾]# systemctl restart ovirt-ha-broker.service >>> >>> [root@ovirt-hyp-01 ‾]# systemctl status ovirt-ha-broker.service >>> >>> ● ovirt-ha-broker.service - oVirt Hosted Engine High Availability >>> Communications Broker >>> >>> Loaded: loaded (/usr/lib/systemd/system/ovirt-ha-broker.service; >>> enabled; vendor preset: disabled) >>> >>> Active: active (running) since Thu 2017-06-15 09:20:59 EDT; 28s ago >>> >>> Main PID: 8844 (ovirt-ha-broker) >>> >>> CGroup: /system.slice/ovirt-ha-broker.service >>> >>> └─8844 /usr/bin/python >>> /usr/share/ovirt-hosted-engine-ha/ovirt-ha-broker >>> --no-daemon >>> >>> >>> >>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Started oVirt >>> Hosted Engine High Availability Communications Broker. >>> >>> Jun 15 09:20:59 ovirt-hyp-01.example.lan systemd[1]: Starting oVirt >>> Hosted Engine High Availability Communications Broker... >>> >>> >>> On Jun 14, 2017 4:45 AM, "Sahina Bose" <[email protected]> wrote: >>> >>>> What's the output of "hosted-engine --vm-status" and "gluster volume >>>> status engine" tell you? Are all the bricks running as per gluster vol >>>> status? >>>> >>>> Can you try to restart the ovirt-ha-agent and ovirt-ha-broker services? >>>> >>>> If HE still has issues powering up, please provide agent.log and >>>> broker.log from /var/log/ovirt-hosted-engine-ha and gluster mount logs >>>> from /var/log/glusterfs/rhev-data-center-mnt <engine>.log >>>> >>>> On Thu, Jun 8, 2017 at 6:57 PM, Joel Diaz <[email protected]> wrote: >>>> >>>>> Good morning oVirt community, >>>>> >>>>> I'm running a three host gluster environment with hosted engine. >>>>> >>>>> Yesterday the engine went down and has not been able to come up >>>>> properly. It tries to start on all three host. >>>>> >>>>> I have two gluster volumes, data and engne. The data storage domian >>>>> volume is no longer mounted but the engine volume is up. I've restarted >>>>> the >>>>> gluster service and make sure both volumes were running. The data volume >>>>> will not mount. >>>>> >>>>> How can I get the engine running properly again? >>>>> >>>>> Thanks, >>>>> >>>>> Joel >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> >>>>> >>>> >>> >>
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

