please forgive the cross-posting. having not had much luck on the
xen-users list, and having seen similarly-complex threads on this list,
i thought i'd see if anyone here had any ideas or pointers.
all packets are being dropped in a debian 7 (wheezy) guest only when
they are coming from a debian 5 (lenny) guest on the same host. the
console and kernel log report 'net eth0: Invalid extra type: 4' when
packets are being dropped. the problem goes away if i change wheezy
configuration from 1 vcpu to >1 vcpu. i tested all of this on fresh,
minimal installs, so AFAICT there are no firewalls or other esoteric
i noticed that there is a backport kernel for lenny that includes the PV
on HVM drivers. after upgrading to that kernel, the problem also goes
away. i also confirmed that the problem is there when trying to connect
to a debian 8 (jessie) guest from a HVM lenny guest.
so it appears that the problem here is specific to HVM guests attempting
to communicate with single-vcpu PV guests. if my analysis is correct,
that would seem to imply that the root of the problem is in the
packet-handling code in the host, no?
this is a strange one, so please forgive me if i omit some useful details.
i have a pair of xen hosts which are running pairs of guest HA pairs.
with various HA solutions implemented within the guests. this is not
germane to the particular problem, but germane to how i discovered it.
for the sake of balancing, i have configured the guests' HA preferences
so that the active nodes tend to be on different hosts. so under normal
circumstances, apache-guest1 and haproxy-guest2 would be the active
nodes. no problem at all in that situation.
but i discovered that i cannot communicate between apache-guest1 and
haproxy-guest1, located on the same host. after much tcpdumping in the
host and guests, i discovered that the problem is unidirectional and
specific to a particular OS combination.
a) inbound packets to a debian wheezy guest are dropped only when they
originate from a debian lenny guest on the same host
b) outbound packets from a wheezy guest to a lenny guest are passed
correctly, even though the wheezy cannot see the return communication
from the lenny guest
c) there is no problem communicating to or from the wheezy guest and an
identically-configured lenny guest on the other host
d) there is no problem communicating to or from other combinations of
guests on the same host. ie, from jessie to wheezy, lenny to lenny and
wheezy to wheezy, etc.
even stranger, my attempts in trying to narrow it down to the simplest
possible test case led me to discover that for the same exact guest,
changing the vcpu setting from 1 to >1 makes the problem go away.
sburton@host:~$ virsh -c xen:/// dumpxml wheezy-guest > ~/cannot-ping.xml
# test and reconfigure
sburton@host:~$ virsh -c xen:/// dumpxml wheezy-guest > ~/can-ping.xml
sburton@host:~$ diff ~/can-ping.xml ~/cannot-ping.xml
< <vcpu placement='static'>2</vcpu>
simple ping between hosts.
initially broken because the ARP 'is-at' traffic from the lenny guest is
dropped going into the wheezy guest, and ARP 'who-has' traffic from the
lenny guest is dropped going into the wheezy guest. therefore the guests
cannot discover one another.
after manually setting the ARP cache entries on both guests:
pinging from lenny to wheezy, tcpdump shows ICMP echo requests in the
lenny guest and on the VIFs for both guests in the host. but the ICMP
requests are unseen in the wheezy guest.
pinging from wheezy to lenny, tcpdump shows ICMP echo requests and
replies in the lenny guest and on the VIFs for both guests in the host.
ICMP requests are seen in the wheezy guest, since they originate there,
but the replies from the lenny guest are unseen.
the problem is not limited to ARP or ICMP, all other communication i
have tried fails similarly.
the smoking gun (i hope):
when packets are being dropped in the wheezy guest, the console and
various logs report
[ 6977.669408] net eth0: Invalid extra type: 4
and the only reference i have found via my searching is this thread:
which seems to be unresolved.
i'm hoping that some part of this tickles someone's memory, or piques
their interest, or at least that someone can point me to some more
troubleshooting steps i haven't thought of.
sburton@host:~$ cat /etc/issue
Debian GNU/Linux 8 \n \l
sburton@host:~$ uname -a
Linux host 4.7.0-0.bpo.1-amd64 #1 SMP Debian 4.7.8-1~bpo8+1 (2016-10-19)
sburton@host:~$ dpkg -l | grep -F -e libvirt-daemon -e xen-hypervisor -e
ii libvirt-daemon 1.2.9-9+deb8u3 amd64
programs for the libvirt library
ii libvirt-daemon-system 1.2.9-9+deb8u3 amd64
Libvirt daemon configuration files
ii qemu-system-common 1:2.7+dfsg-3~bpo8+2 amd64
QEMU full system emulation binaries (common files)
ii qemu-system-x86 1:2.7+dfsg-3~bpo8+2 amd64
QEMU full system emulation binaries (x86)
ii xen-hypervisor-4.4-amd64 4.4.1-9+deb8u8 amd64
Xen Hypervisor on AMD64
sburton@host:~$ grep -F -A1 '<os>' ~/cannot-ping.xml
<type arch='x86_64' machine='xenfv'>hvm</type>
sburton@host:~$ grep -F -C2 'xenbr0' ~/cannot-ping.xml
sburton@host:~$ ip addr show xenbr0
8: xenbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
link/ether bc:30:5b:f0:32:b4 brd ff:ff:ff:ff:ff:ff
inet 192.168.240.52/20 brd 192.168.255.255 scope global xenbr0
valid_lft forever preferred_lft forever
inet6 fe80::be30:5bff:fef0:32b4/64 scope link
valid_lft forever preferred_lft forever
fullvirt installs, created from netinst ISO via virt-manager running on
my workstation, manipulated through some combination of virt-manager and
local virsh commands.
root@wheezy-guest:~# uname -a
Linux wheezy-guest 3.16.0-0.bpo.4-amd64 #1 SMP Debian
3.16.36-1+deb8u2~bpo70+1 (2016-10-19) x86_64 GNU/Linux
root@wheezy-guest:~# cat /etc/issue
Debian GNU/Linux 7 \n \l
root@lenny-guest:~# uname -a
Linux lenny-guest 2.6.26-2-amd64 #1 SMP Sun Mar 4 21:48:06 UTC 2012
root@lenny-guest:~# cat /etc/issue
Debian GNU/Linux 5.0 \n \l
Xen-devel mailing list