Thanks Steve!
I am still perplexed how come one out of two identical vm's would fail migration... logic states that they should both fail.

On 02/19/2014 06:29 PM, Steve Dainard wrote:
I added another vlan on both hosts, and designated it a migration network. Still the same issue, failed to migrate one of the two VM's.

I then deleted a failed posix domain on another gluster volume with some heal tasks pending, with no hosts attached to it, and the VM's migrated successfully. Perhaps gluster isn't passing storage errors up properly for non-dependent volumes. Anyways this is solved for now, just wanted this here for posterity.

*Steve Dainard *
IT Infrastructure Manager
Miovision <http://miovision.com/> | /Rethink Traffic/

*Blog <http://miovision.com/blog> | **LinkedIn <https://www.linkedin.com/company/miovision-technologies> | Twitter <https://twitter.com/miovision> | Facebook <https://www.facebook.com/miovision>*
------------------------------------------------------------------------
Miovision Technologies Inc. | 148 Manitou Drive, Suite 101, Kitchener, ON, Canada | N2C 1L3 This e-mail may contain information that is privileged or confidential. If you are not the intended recipient, please delete the e-mail and any attachments and notify us immediately.


On Tue, Feb 18, 2014 at 5:00 PM, Steve Dainard <[email protected] <mailto:[email protected]>> wrote:

    sanlock.log on the second host (ovirt002) doesn't have any entries
    anywhere near that time of failure.

    I see some heal-failed errors in gluster, but seeing as the
    storage is exposed via NFS I'm surprised to think this might be
    the issue. I'm working on fixing those files now, I'll update if I
    make any progress.



    *Steve Dainard *
    IT Infrastructure Manager
    Miovision <http://miovision.com/> | /Rethink Traffic/

    *Blog <http://miovision.com/blog>  | **LinkedIn
    <https://www.linkedin.com/company/miovision-technologies>  |
    Twitter <https://twitter.com/miovision>  | Facebook
    <https://www.facebook.com/miovision>*
    ------------------------------------------------------------------------
    Miovision Technologies Inc. | 148 Manitou Drive, Suite 101,
    Kitchener, ON, Canada | N2C 1L3
    This e-mail may contain information that is privileged or
    confidential. If you are not the intended recipient, please delete
    the e-mail and any attachments and notify us immediately.


    On Mon, Feb 17, 2014 at 5:32 PM, Dafna Ron <[email protected]
    <mailto:[email protected]>> wrote:

        really interesting case :)  maybe gluster related?
        Elad, can you please try to reproduce this?
        gluster storage -> at least two vm's server type created from
        template as thin provision (it's clone copy).
        after create run and migrate all vm's from one host to the
        second host.
        I think it would be a locking issue.

        Steve, can you please also check the sanlock log in the second
        host + look if there are any errors in the gluster logs (on
        both hosts)?

        Thanks,

        Dafna



        On 02/17/2014 06:52 PM, Steve Dainard wrote:

            VM's are identical, same template, same cpu/mem/nic.
            Server type, thin provisioned on NFS (backend is glusterfs
            3.4).

            Does monitor = spice console? I don't believe either of
            them had a spice connection.

            I don't see anything in the ovirt001 sanlock.log:

            2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:05-0500 255246 [5111]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:15-0500 255256 [5110]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:25-0500 255266 [5111]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:36-0500 255276 [5110]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:46-0500 255286 [5111]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:16:56-0500 255296 [5110]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace
            4,14
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:06-0500 255306 [5111]: cmd_inq_lockspace
            4,14 done 0
            2014-02-14 11:17:06-0500 255307 [5105]: cmd_register ci 4
            fd 14 pid 31132
            2014-02-14 11:17:06-0500 255307 [5105]: cmd_restrict ci 4
            fd 14 pid 31132 flags 1
            2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:16-0500 255316 [5110]: cmd_inq_lockspace
            5,15 done 0
            2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:26-0500 255326 [5111]: cmd_inq_lockspace
            5,15 done 0
            2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire
            4,14,31132 ci_in 5 fd 15 count 0
            2014-02-14 11:17:26-0500 255326 [5110]: cmd_acquire
            4,14,31132 result 0 pid_dead 0
            2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire
            4,14,31132 ci_in 6 fd 16 count 0
            2014-02-14 11:17:26-0500 255326 [5111]: cmd_acquire
            4,14,31132 result 0 pid_dead 0
            2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:36-0500 255336 [5110]: cmd_inq_lockspace
            5,15 done 0
            2014-02-14 11:17:39-0500 255340 [5105]: cmd_register ci 5
            fd 15 pid 31319
            2014-02-14 11:17:39-0500 255340 [5105]: cmd_restrict ci 5
            fd 15 pid 31319 flags 1
            2014-02-14 11:17:39-0500 255340 [5105]: client_pid_dead
            5,15,31319 cmd_active 0 suspend 0
            2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:46-0500 255346 [5111]: cmd_inq_lockspace
            5,15 done 0
            2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0
            2014-02-14 11:17:56-0500 255356 [5110]: cmd_inq_lockspace
            5,15 done 0
            2014-02-14 11:18:06-0500 255366 [5111]: cmd_inq_lockspace
            5,15
            
a52938f7-2cf4-4771-acb2-0c78d14999e5:1:/rhev/data-center/mnt/gluster-store-vip:_rep1/a52938f7-2cf4-4771-acb2-0c78d14999e5/dom_md/ids:0
            flags 0

            ovirt002 sanlock.log has on entries during that time frame.

            *Steve Dainard *
            IT Infrastructure Manager
            Miovision <http://miovision.com/> | /Rethink Traffic/

            *Blog <http://miovision.com/blog> | **LinkedIn
            <https://www.linkedin.com/company/miovision-technologies>
             | Twitter <https://twitter.com/miovision>  | Facebook
            <https://www.facebook.com/miovision>*
            
------------------------------------------------------------------------
            Miovision Technologies Inc. | 148 Manitou Drive, Suite
            101, Kitchener, ON, Canada | N2C 1L3
            This e-mail may contain information that is privileged or
            confidential. If you are not the intended recipient,
            please delete the e-mail and any attachments and notify us
            immediately.


            On Mon, Feb 17, 2014 at 12:59 PM, Dafna Ron
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                mmm... that is very interesting...
                both vm's are identical? are they server or desktops
            type? created
                as thin copy or clone? what storage type are you
            using? did you
                happen to have an open monitor on the vm that failed
            migration?
                I wonder if it can be sanlock lock on the source
            template but I
                can only see this bug happening if the vm's are linked
            to the template
                can you look at the sanlock log and see if there are
            any warning
                or errors?

                All logs are in debug so I don't think we can get
            anything more
                from it but I am adding Meital and Omer to this mail
            to help debug
                this - perhaps they can think of something that can
            cause that
                from the trace.

                This case is really interesting... sorry, probably not
            what you
                want to hear...  thanks for helping with this :)

                Dafna



                On 02/17/2014 05:08 PM, Steve Dainard wrote:

                    Failed live migration is wider spread than these
            two VM's, but
                    they are a good example because they were both
            built from the
                    same template and have no modifications after they
            were
                    created. They were also migrated one after the
            other, with one
                    successfully migrating and the other not.

                    Are there any increased logging levels that might help
                    determine what the issue is?

                    Thanks,

                    *Steve Dainard *
                    IT Infrastructure Manager
                    Miovision <http://miovision.com/> | /Rethink Traffic/

                    *Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> |
                    Twitter <https://twitter.com/miovision>  | Facebook
                    <https://www.facebook.com/miovision>*
------------------------------------------------------------------------
                    Miovision Technologies Inc. | 148 Manitou Drive,
            Suite 101,
                    Kitchener, ON, Canada | N2C 1L3
                    This e-mail may contain information that is
            privileged or
                    confidential. If you are not the intended
            recipient, please
                    delete the e-mail and any attachments and notify
            us immediately.


                    On Mon, Feb 17, 2014 at 11:47 AM, Dafna Ron
            <[email protected] <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>
            <mailto:[email protected] <mailto:[email protected]>
                    <mailto:[email protected]
            <mailto:[email protected]>>>> wrote:

                        did you install these vm's from a cd? run it
            as run-once
                    with a
                        special monitor?
                        try to think if there is anything different in the
                    configuration
                        of these vm's from the other vm's that succeed
            to migrate?


                        On 02/17/2014 04:36 PM, Steve Dainard wrote:

                            Hi Dafna,

                            No snapshots of either of those VM's have
            been taken, and
                            there are no updates for any of those
            packages on EL 6.5.

                            *Steve Dainard *
                            IT Infrastructure Manager
                            Miovision <http://miovision.com/> |
            /Rethink Traffic/

                            *Blog <http://miovision.com/blog> | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> |
                            Twitter <https://twitter.com/miovision>  |
            Facebook
                            <https://www.facebook.com/miovision>*
             
------------------------------------------------------------------------
                            Miovision Technologies Inc. | 148 Manitou
            Drive, Suite
                    101,
                            Kitchener, ON, Canada | N2C 1L3
                            This e-mail may contain information that
            is privileged or
                            confidential. If you are not the intended
            recipient,
                    please
                            delete the e-mail and any attachments and
            notify us
                    immediately.


                            On Sun, Feb 16, 2014 at 7:05 AM, Dafna Ron
                    <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                            <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                            <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>>> wrote:

                                does the vm that fails migration have
            a live snapshot?
                                if so how many snapshots does the vm have.
                                I think that there are newer packages
            of vdsm,
                    libvirt and
                            qemu -
                                can you try to update



                                On 02/16/2014 12:33 AM, Steve Dainard
            wrote:

                                    Versions are the same:

                                    [root@ovirt001 ~]# rpm -qa | egrep
                    'libvirt|vdsm|qemu'
                            | sort
            gpxe-roms-qemu-0.9.7-6.10.el6.noarch
            libvirt-0.10.2-29.el6_5.3.x86_64
            libvirt-client-0.10.2-29.el6_5.3.x86_64
            libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
            libvirt-python-0.10.2-29.el6_5.3.x86_64
            qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
            qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
            qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
            vdsm-4.13.3-3.el6.x86_64
            vdsm-cli-4.13.3-3.el6.noarch
            vdsm-gluster-4.13.3-3.el6.noarch
            vdsm-python-4.13.3-3.el6.x86_64
            vdsm-xmlrpc-4.13.3-3.el6.noarch

                                    [root@ovirt002 ~]# rpm -qa | egrep
                    'libvirt|vdsm|qemu'
                            | sort
            gpxe-roms-qemu-0.9.7-6.10.el6.noarch
            libvirt-0.10.2-29.el6_5.3.x86_64
            libvirt-client-0.10.2-29.el6_5.3.x86_64
            libvirt-lock-sanlock-0.10.2-29.el6_5.3.x86_64
            libvirt-python-0.10.2-29.el6_5.3.x86_64
            qemu-img-rhev-0.12.1.2-2.355.el6.5.x86_64
            qemu-kvm-rhev-0.12.1.2-2.355.el6.5.x86_64
            qemu-kvm-rhev-tools-0.12.1.2-2.355.el6.5.x86_64
            vdsm-4.13.3-3.el6.x86_64
            vdsm-cli-4.13.3-3.el6.noarch
            vdsm-gluster-4.13.3-3.el6.noarch
            vdsm-python-4.13.3-3.el6.x86_64
            vdsm-xmlrpc-4.13.3-3.el6.noarch

                                    Logs attached, thanks.

                                    *Steve Dainard *
                                    IT Infrastructure Manager
                                    Miovision <http://miovision.com/>
            | /Rethink
                    Traffic/

                                    *Blog <http://miovision.com/blog>
            | **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> |
                                    Twitter
            <https://twitter.com/miovision>  |
                    Facebook
                                    <https://www.facebook.com/miovision>*
            
------------------------------------------------------------------------
                                    Miovision Technologies Inc. | 148
            Manitou
                    Drive, Suite
                            101,
                                    Kitchener, ON, Canada | N2C 1L3
                                    This e-mail may contain
            information that is
                    privileged or
                                    confidential. If you are not the
            intended
                    recipient,
                            please
                                    delete the e-mail and any
            attachments and
                    notify us
                            immediately.


                                    On Sat, Feb 15, 2014 at 6:24 AM,
            Dafna Ron
                            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>
                                    <mailto:[email protected]
            <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>
            <mailto:[email protected] <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>>>
                            <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>

                                    <mailto:[email protected]
            <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>
            <mailto:[email protected] <mailto:[email protected]>
                    <mailto:[email protected]
            <mailto:[email protected]>>>>>> wrote:

                                        the migration fails in libvirt:


            Thread-153709::ERROR::2014-02-14
            11:17:40,420::vm::337::vm.Vm::(run)
vmId=`08434c90-ffa3-4b63-aa8e-5613f7b0e0cd`::Failed to
            migrate
                                        Traceback (most recent call last):
                                          File
            "/usr/share/vdsm/vm.py", line 323,
                    in run
self._startUnderlyingMigration()
                                          File
            "/usr/share/vdsm/vm.py", line 403, in
            _startUnderlyingMigration
                                            None, maxBandwidth)
                                          File
            "/usr/share/vdsm/vm.py", line 841, in f
                                            ret = attr(*args, **kwargs)
                                          File
"/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py",
                                        line 76, in wrapper
                                            ret = f(*args, **kwargs)
                                          File
"/usr/lib64/python2.6/site-packages/libvirt.py",
                                    line 1178,
                                        in migrateToURI2
                                            if ret == -1: raise
            libvirtError
            ('virDomainMigrateToURI2()
                                        failed', dom=self)
                                        libvirtError: Unable to read
            from monitor:
                    Connection
                                    reset by peer
            Thread-54041::DEBUG::2014-02-14
            11:17:41,752::task::579::TaskManager.Task::(_updateState)
Task=`094c412a-43dc-4c29-a601-d759486469a8`::moving from
            state
                                        init -> state preparing
            Thread-54041::INFO::2014-02-14
             11:17:41,753::logUtils::44::dispatcher::(wrapper)
                            Run and
                                    protect:
            getVolumeSize(sdUUID='a52938f7-2cf4-4771-acb2-0c78d14999e5',
            spUUID='fcb89071-6cdb-4972-94d1-c9324cebf814',
            imgUUID='97c9108f-a506-415f-ad2
                                        c-370d707cb130',
            volUUID='61f82f7f-18e4-4ea8-9db3-71ddd9d4e836',
                                        options=None)

                                        Do you have the same
            libvirt/vdsm/qemu on both
                            your hosts?
                                        Please attach the libvirt and
            vm logs from
                    both hosts.

                                        Thanks,
                                        Dafna



                                        On 02/14/2014 04:50 PM, Steve
            Dainard wrote:

                                            Quick overview:
                                            Ovirt 3.3.2 running on
            CentOS 6.5
                                            Two hosts: ovirt001, ovirt002
                                            Migrating two VM's:
            puppet-agent1,
                            puppet-agent2 from
                                    ovirt002
                                            to ovirt001.

                                            The first VM puppet-agent1
            migrates
                            successfully. The
                                    second
                                            VM puppet-agent2 fails
            with "Migration
                    failed
                            due to
                                    Error:
                                            Fatal error during
            migration (VM:
                            puppet-agent2, Source:
                                            ovirt002, Destination:
            ovirt001)."

                                            I've attached the logs if
            anyone can
                    help me track
                                    down the issue.

                                            Thanks,

                                            *Steve Dainard *
                                            IT Infrastructure Manager
                                            Miovision
            <http://miovision.com/> |
                    /Rethink
                            Traffic/

                                            *Blog
            <http://miovision.com/blog> |
                    **LinkedIn
<https://www.linkedin.com/company/miovision-technologies> |
                                            Twitter
            <https://twitter.com/miovision>  |
                            Facebook
<https://www.facebook.com/miovision>* ------------------------------------------------------------------------


                                            Miovision Technologies
            Inc. | 148 Manitou
                            Drive, Suite
                                    101,
                                            Kitchener, ON, Canada |
            N2C 1L3
                                            This e-mail may contain
            information
                    that is
                            privileged or
            confidential. If you are not the intended
                            recipient,
                                    please
                                            delete the e-mail and any
            attachments and
                            notify us
                                    immediately.


             _______________________________________________
                                            Users mailing list
            [email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>
                            <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>>
                                    <mailto:[email protected]
            <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>
            <mailto:[email protected] <mailto:[email protected]>
                    <mailto:[email protected] <mailto:[email protected]>>>
                            <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>
                    <mailto:[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>>>>


            http://lists.ovirt.org/mailman/listinfo/users



                                        --     Dafna Ron




                                --     Dafna Ron




                        --     Dafna Ron




                --     Dafna Ron




-- Dafna Ron





--
Dafna Ron
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to