Re: [ovirt-users] Hosted engine on gluster problem

Sahina Bose Wed, 13 Apr 2016 07:21:43 -0700


On 04/12/2016 01:33 PM, Sandro Bonazzola wrote:

On Mon, Apr 11, 2016 at 11:44 PM, Bond, Darryl <[email protected]<mailto:[email protected]>> wrote:
    My setup is hyperconverged. I have placed my test results in
    https://bugzilla.redhat.com/show_bug.cgi?id=1298693
Ok, so you're aware about the limitation of the single point offailure. If you drop the host referenced in hosted engineconfiguration for the initial setup it won't be able to connect toshared storage even if the other hosts in the cluster are up since theentry point is down.
Note that hyperconverged deployment is not supported in 3.6.

This issue does not seem related to the single point of failure. Testedthis on a 3 node setup with each node mounting the volume hosting HE aslocalhost:/engine. Since all nodes have glusterd running and belong tosame cluster, with any one node down - mount should continue to work.

But HE VM is restarted once a node is powered off.

broker.log :

Thread-4602::ERROR::2016-04-1318:50:28,249::listener::192::ovirt_hosted_engine_ha.broker.listener.ConnectionHandler::(handle) Error handling request, data:'set-storage-domain FilesystemB

ackend dom_type=glusterfs sd_uuid=7fe3707b-2435-4e71-b831-4daba08cc72c'
Traceback (most recent call last):

File"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",line 166,

 in handle
    data)

File"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/listener.py",line 299,

 in _dispatch
    .set_storage_domain(client, sd_type, **options)

File"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/broker/storage_broker.py",lin

e 66, in set_storage_domain
    self._backends[client].connect()

File"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",line

 456, in connect
    self._dom_type)

File"/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py",line

 108, in get_domain_path
    " in {1}".format(sd_uuid, parent))

BackendFailureException: path to storage domain7fe3707b-2435-4e71-b831-4daba08cc72c not found

 in /rhev/data-center/mnt/glusterSD

agent.log

MainThread::INFO::2016-04-1318:50:26,020::storage_server::207::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(connect_storage_server) Connecting storageserverMainThread::INFO::2016-04-1318:50:28,054::hosted_engine::807::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor)Stopped VDSM domain monitor for 7fe3707b-2435-4e71-b831-4daba08cc72cMainThread::INFO::2016-04-1318:50:28,055::image::184::ovirt_hosted_engine_ha.lib.image.Image::(teardown_images)Teardown imagesMainThread::WARNING::2016-04-1318:50:28,177::hosted_engine::675::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_storage_images)Disconnecting the storageMainThread::INFO::2016-04-1318:50:28,177::storage_server::229::ovirt_hosted_engine_ha.lib.storage_server.StorageServer::(disconnect_storage_server)Disconnecting storage server




The gluster mount logs for this time frame contain unmount messages

[2016-04-13 13:20:28.199429] I [fuse-bridge.c:4997:fuse_thread_proc]0-fuse: unmounting /rhev/

data-center/mnt/glusterSD/localhost:_engine

[2016-04-13 13:20:28.199934] W [glusterfsd.c:1251:cleanup_and_exit](-->/lib64/libpthread.so.0(+0x7dc5) [0x7ff9b3ceddc5]-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7ff9b53588b5] -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7ff9b5358739] ) 0-:received signum (15), shut

ting down

[2016-04-13 13:20:28.199970] I [fuse-bridge.c:5704:fini] 0-fuse:Unmounting '/rhev/data-center

/mnt/glusterSD/localhost:_engine'.



    Short description of setup:

    3 hosts with 2 disks each set up with gluster replica 3 across the
    6 disks volume name hosted-engine.

    Hostname hosted-storage configured in /etc//hosts to point to the
    host1.

    Installed hosted engine on host1 with the hosted engine storage
    path = hosted-storage:/hosted-engine

    Install first engine on h1 successful. Hosts h2 and h3 added to
    the hosted engine. All works fine.

    Additional storage and non-hosted engine hosts added etc.

    Additional VMs added to hosted-engine storage (oVirt Reports VM
    and Cinder VM). Additional VM's are hosted by other storage -
    cinder and NFS.

    The system is in production.


    Engine can be migrated around with the web interface.


    - 3.6.4 upgrade released, follow the upgrade guide, engine is
    upgraded first , new Centos kernel requires host reboot.

    - Engine placed on h2 -  h3 into maintenance (local) upgrade and
    Reboot h3 - No issues - Local maintenance removed from h3.

    - Engine placed on h3 -  h2 into maintenance (local) upgrade and
    Reboot h2 - No issues - Local maintenance removed from h2.

    - Engine placed on h3 -h1 into mainteance (local) upgrade and
    reboot h1 - engine crashes and does not start elsewhere,
    VM(cinder)  on h3 on same gluster volume pauses.

    - Host 1 takes about 5 minutes to reboot (Enterprise box with all
    it's normal BIOS probing)

    - Engine starts after h1 comes back and stabilises

    - VM(cinder) unpauses itself,  VM(reports) continued fine the
    whole time. I can do no diagnosis on the 2 VMs as the engine is
    not available.

    - Local maintenance removed from h1


    I don't believe the issue is with gluster itself as the volume
    remains accessible on all hosts during this time albeit with a
    missing server (gluster volume status) as each gluster server is
    rebooted.

    Gluster was upgraded as part of the process, no issues were seen here.


    I have been able to duplicate the issue without the upgrade by
    following the same sort of timeline.


    ________________________________
    From: Sandro Bonazzola <[email protected]
    <mailto:[email protected]>>
    Sent: Monday, 11 April 2016 7:11 PM
    To: Richard Neuboeck; Simone Tiraboschi; Roy Golan; Martin Sivak;
    Sahina Bose
    Cc: Bond, Darryl; users
    Subject: Re: [ovirt-users] Hosted engine on gluster problem



    On Mon, Apr 11, 2016 at 9:37 AM, Richard Neuboeck
    <[email protected]
    <mailto:[email protected]><mailto:[email protected]
    <mailto:[email protected]>>> wrote:
    Hi Darryl,

    I'm still experimenting with my oVirt installation so I tried to
    recreate the problems you've described.

    My setup has three HA hosts for virtualization and three machines
    for the gluster replica 3 setup.

    I manually migrated the Engine from the initial install host (one)
    to host three. Then shut down host one manually and interrupted the
    fencing mechanisms so the host stayed down. This didn't bother the
    Engine VM at all.

    Did you move the host one to maintenance before shutting down?
    Or is this a crash recovery test?



    To make things a bit more challenging I then shut down host three
    while running the Engine VM. Of course the Engine was down for some
    time until host two detected the problem. It started the Engine VM
    and everything seems to be running quite well without the initial
    install host.

    Thanks for the feedback!



    My only problem is that the HA agent on host two and three refuse to
    start after a reboot due to the fact that the configuration of the
    hosted engine is missing. I wrote another mail to [email protected]
    <mailto:[email protected]><mailto:[email protected]
    <mailto:[email protected]>>
    about that.

    This is weird. Martin,  Simone can you please investigate on this?




    Cheers
    Richard

    On 04/08/2016 01:38 AM, Bond, Darryl wrote:
    > There seems to be a pretty severe bug with using hosted engine
    on gluster.
    >
    > If the host that was used as the initial hosted-engine --deploy
    host goes away, the engine VM wil crash and cannot be restarted
    until the host comes back.

    is this an Hyperconverged setup?


    >
    > This is regardless of which host the engine was currently running.
    >
    >
    > The issue seems to be buried in the bowels of VDSM and is not an
    issue with gluster itself.

    Sahina, can you please investigate on this?


    >
    > The gluster filesystem is still accessable from the host that
    was running the engine. The issue has been submitted to bugzilla
    but the fix is some way off (4.1).
    >
    >
    > Can my hosted engine be converted to use NFS (using the gluster
    NFS server on the same filesystem) without rebuilding my hosted
    engine (ie change domainType=glusterfs to domainType=nfs)?

    >
    > What effect would that have on the hosted-engine storage domain
    inside oVirt, ie would the same filesystem be mounted twice or
    would it just break.
    >
    >
    > Will this actually fix the problem, does it have the same issue
    when the hosted engine is on NFS?
    >
    >
    > Darryl
    >
    >
    >
    >
    > ________________________________
    >
    > The contents of this electronic message and any attachments are
    intended only for the addressee and may contain legally
    privileged, personal, sensitive or confidential information. If
    you are not the intended addressee, and have received this email,
    any transmission, distribution, downloading, printing or
    photocopying of the contents of this message or attachments is
    strictly prohibited. Any legal privilege or confidentiality
    attached to this message and attachments is not waived, lost or
    destroyed by reason of delivery to any person other than intended
    addressee. If you have received this message and are not the
    intended addressee you should notify the sender by return email
    and destroy all copies of the message and any attachments. Unless
    expressly attributed, the views expressed in this email do not
    necessarily represent the views of the company.
    > _______________________________________________
    > Users mailing list
    > [email protected] <mailto:[email protected]><mailto:[email protected]
    <mailto:[email protected]>>
    >http://lists.ovirt.org/mailman/listinfo/users
    >


    --
    /dev/null


    _______________________________________________
    Users mailing list
    [email protected] <mailto:[email protected]><mailto:[email protected]
    <mailto:[email protected]>>
    http://lists.ovirt.org/mailman/listinfo/users




    --
    Sandro Bonazzola
    Better technology. Faster innovation. Powered by community
    collaboration.
    See how it works at redhat.com <http://redhat.com><http://redhat.com>




--
Sandro Bonazzola
Better technology. Faster innovation. Powered by community collaboration.
See how it works at redhat.com <http://redhat.com>

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Hosted engine on gluster problem

Reply via email to