RE: [rhelv5-list] Ping fails after Xen live migration

Collins, Kevin [Beeline] Fri, 11 Jul 2008 14:48:54 -0700

After thinking further on the problem, I don't MAC address/ARP cache has
anything to do with the issue because other networking is fine, and that
would tend to be an issue as well.
 
In addition, I noticed right after a migrate (note 14:40 was when it
completed):
 
root      3158     7  0 14:40 ?        00:00:00 [migration/1]
root      3159     7  0 14:40 ?        00:00:00 [ksoftirqd/1]
root      3160     7  0 14:40 ?        00:00:00 [watchdog/1]
root      3162     7  0 14:40 ?        00:00:00 [rpciod/1]
root      3163     7  0 14:40 ?        00:00:00 [kmpathd/1]
root      3164     7  0 14:40 ?        00:00:00 [aio/1]
root      3165     7  0 14:40 ?        00:00:00 [cqueue/1]
root      3166     7  0 14:40 ?        00:00:00 [kblockd/1]
root      3167     7  0 14:40 ?        00:00:00 [events/1]
root      3168   340  0 14:40 ?        00:00:00 /sbin/udevd -d
root      3169  3168  0 14:40 ?        00:00:00 [udev_run_hotplu]
<defunct>


After a migrate back to the original server (a few minutes later -
14:42):
 
root      3168   340  0 16:43 ?        00:00:00 /sbin/udevd -d
root      3169  3168  0 16:43 ?        00:00:00 [udev_run_hotplu]
<defunct>
root      3179     7  0 14:42 ?        00:00:00 [migration/1]
root      3180     7  0 14:42 ?        00:00:00 [ksoftirqd/1]
root      3181     7  0 14:42 ?        00:00:00 [watchdog/1]
root      3182     7  0 14:42 ?        00:00:00 [rpciod/1]
root      3183     7  0 14:42 ?        00:00:00 [kmpathd/1]
root      3184     7  0 14:42 ?        00:00:00 [aio/1]
root      3185     7  0 14:42 ?        00:00:00 [cqueue/1]
root      3186     7  0 14:42 ?        00:00:00 [kblockd/1]
root      3187     7  0 14:42 ?        00:00:00 [events/1]

Notice the process time is off by about 2 hours? That seems very wrong
to me!
 
Kevin

________________________________

From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave Costakos
Sent: Friday, July 11, 2008 2:35 PM
To: Red Hat Enterprise Linux 5 (Tikanga) discussion mailing-list
Subject: Re: [rhelv5-list] Ping fails after Xen live migration


By "machine", I mean whatever machine you are pinging from.

You are right that the MAC does not change on a migration (whether you
have specified the MAC in your configuration file or not), but the
switchport that the MAC is connected to does (since you migrate it to a
different host).  If you're router or switches are caching MACs, other
machines won't be able to connect to your host because the switch can
still think it's connected to a different port.

-Dave.


On Fri, Jul 11, 2008 at 11:39 AM, Collins, Kevin [Beeline]
<[EMAIL PROTECTED]> wrote:


        When you say "machine" are you talking about the DomU, one of
the Dom0s or the host being pinged? I tried all and none made a
difference... In addition, I am confused as to why this would be an
issue - the MAC is not changing anywhere.
         
        One other thing, maybe "ping fails" is not the best description
- the ping actually responds once and then "hangs":
         
        root# ping cpafisxe
        PING cpafisxe (146.27.79.182) 56(84) bytes of data.
        64 bytes from cpafisxe (146.27.79.182): icmp_seq=1 ttl=64
time=0.000 ms
         
        --- cpafisxe ping statistics ---
        1 packets transmitted, 1 received, 0% packet loss, time 0ms
        rtt min/avg/max/mdev = 0.000/0.000/0.000/0.000 ms
        
        I am not yet using RHCS for managing the VMs, but this is on a
GFS cluster... my man page for clusvcadm has no "-M" option.
         
        Thanks,
         
        Kevin

________________________________

        From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave Costakos
        Sent: Friday, July 11, 2008 10:51 AM
        To: Red Hat Enterprise Linux 5 (Tikanga) discussion mailing-list
        Subject: Re: [rhelv5-list] Ping fails after Xen live migration
        
        
        I have seen this caused by MAC address caching.  After a
migration completes, try checking your machine's arp cache like so for
the host in quetion:
        
           arp -a 
        If your MAC is cached, try running something like this to see if
it clears up the issue:
        
                arp -d <hostname_or_ip>
                ping -c 3 -w 1 <hostname_or_ip>
        
        In a RHCS clustered environment, you can add these to the
/usr/share/cluster.vm.sh script to automate the task when you use
"clusvcadm -M" to migrate your guests.
        
        I updated my "migrate" function in vm.sh like so on all my
hosts:
        
        migrate()
        {
                declare target=$1
        
                # change here to add "--live to xm migrate
                xm migrate --live $OCF_RESKEY_name $target
                # change here to store return value of xm migrate
                rv=$?
        
                # added these three lines to clear the network mac cache
                # since our bridges don't (and shouldn't) participate in
                # spanning tree
                arp -d $OCF_RESKEY_name > /dev/null 2>&1 || true
                arp -d ${OCF_RESKEY_name}.FQDN > /dev/null 2>&1 || true
                ping -c 1 -w 1 $OCF_RESKEY_name > /dev/null 2>&1 || true
        
                # now return the return code from xm migrate rather tha
                # the arp/ping return codes
                return ${rv}
        }
        
        Hope this helps,
        
        -Dave.
        
        
        
        
        
        On Fri, Jul 11, 2008 at 10:18 AM, Collins, Kevin [Beeline]
<[EMAIL PROTECTED]> wrote:
        

                Hi, 

                        I am just starting tests of Xen live migrations
and I am seeing something weird. I initiated a ping from the DomU was
about to migrate, saw that it was working and then initiated a migration
(from node0 to node1). Once the DomU was running on the other node, the
ping was hanging. I migrated back (node1 to node0) and it started
working again. Futher tests back and forth proved this to be consistent.

                I then shutdown the DomU and rebooted node0 and node1.
This time I initially started the DomU on node1 and pinging was working.
Following the same test as above, I found similar results - the ping
would work when running from node1, not from node0. Both of the Dom0
nodes and the DomU are RHEL5.2...

                During this process, I assumed some iptables magic
happening behind the scenes. It appears as if the VIF in the iptables
rules is changing each time the node migrates (see the "PHYSDEV" line):

                root# iptables -L 
                Chain INPUT (policy ACCEPT) 
                target     prot opt source               destination

                ACCEPT     udp  --  anywhere             anywhere
udp dpt:domain 
                ACCEPT     tcp  --  anywhere             anywhere
tcp dpt:domain 
                ACCEPT     udp  --  anywhere             anywhere
udp dpt:bootps 
                ACCEPT     tcp  --  anywhere             anywhere
tcp dpt:bootps 

                Chain FORWARD (policy ACCEPT) 
                target     prot opt source               destination

                ACCEPT     all  --  anywhere
192.168.122.0/24    state RELATED,ESTABLISHED 
                ACCEPT     all  --  192.168.122.0/24     anywhere

                ACCEPT     all  --  anywhere             anywhere

                REJECT     all  --  anywhere             anywhere
reject-with icmp-port-unreachable 
                REJECT     all  --  anywhere             anywhere
reject-with icmp-port-unreachable 
                ACCEPT     all  --  anywhere             anywhere
PHYSDEV match --physdev-in vif1.0 

                Chain OUTPUT (policy ACCEPT) 
                target     prot opt source               destination


                root# iptables -L 
                Chain INPUT (policy ACCEPT) 
                target     prot opt source               destination

                ACCEPT     udp  --  anywhere             anywhere
udp dpt:domain 
                ACCEPT     tcp  --  anywhere             anywhere
tcp dpt:domain 
                ACCEPT     udp  --  anywhere             anywhere
udp dpt:bootps 
                ACCEPT     tcp  --  anywhere             anywhere
tcp dpt:bootps 

                Chain FORWARD (policy ACCEPT) 
                target     prot opt source               destination

                ACCEPT     all  --  anywhere
192.168.122.0/24    state RELATED,ESTABLISHED 
                ACCEPT     all  --  192.168.122.0/24     anywhere

                ACCEPT     all  --  anywhere             anywhere

                REJECT     all  --  anywhere             anywhere
reject-with icmp-port-unreachable 
                REJECT     all  --  anywhere             anywhere
reject-with icmp-port-unreachable 
                ACCEPT     all  --  anywhere             anywhere
PHYSDEV match --physdev-in vif2.0 

                Chain OUTPUT (policy ACCEPT) 
                target     prot opt source               destination


                Has anyone seen this before? The DomU seems to be fine
other than that - I can login in to it remotely and it seems
functionally on the network...

                Thanks, 

                Kevin 


                _______________________________________________
                rhelv5-list mailing list
                [email protected]
                https://www.redhat.com/mailman/listinfo/rhelv5-list
                
                




        -- 
        Dave Costakos
        mailto:[EMAIL PROTECTED] 

        _______________________________________________
        rhelv5-list mailing list
        [email protected]
        https://www.redhat.com/mailman/listinfo/rhelv5-list
        
        




-- 
Dave Costakos
mailto:[EMAIL PROTECTED]

_______________________________________________
rhelv5-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/rhelv5-list

RE: [rhelv5-list] Ping fails after Xen live migration

Reply via email to