[lxc-users] LXC networking stop working between containers and real network

2016-07-18 Thread Ruzsinszky Attila
Hi,

There is an Ubuntu 14.04 64 bit up to date host.
LXC version is: 2.0.3 (from backport packages)
OpenvSwitch: 2.0.2.

Container1: Ubuntu 14.04
Container2: Ubuntu 16.04 (both of them was installed from root.fs.zx,
because lxc-create doesn't work with auth. Squid proxy)

Both containers are working perfectly in "standalone" mode.
I use lxcbr0 as a bridge between the containers. There is dnsmasq for DHCP
and it is working, because containers get IP address (from 10.0.3.0/24
range).
There is an OVS bridge: vbr0 and its port is lxcbr0 on the host. The real
Ethernet interface is: eth0 which is connected to the real network. There
is a mgmtlxc0 virt. management interface which IP is: 10.0.3.2/24. I can
ping every machine in the 10.0.3.0/24 range.
The MAC addresses of the containers are different. I checked them.
mgmtlxc0 and the lxcbr0 are tagged for VLAN (tag=800 in OVS config)

I want to MASQUERADE the lxc-net to the real network:
Chain POSTROUTING (policy ACCEPT 54626 packets, 5252K bytes)
 pkts bytes target prot opt in out source
destination
  246 20520 MASQUERADE  all  --  *  *   10.0.3.0/24 !
10.0.3.0/24

Routing table:
root@fcubi:~# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric RefUse
Iface
defaultreal_router 0.0.0.0 UG0  00 eth0
LXCnet  *   255.255.255.0   U 0  00
mgmtlxc0
FCnet   *   255.255.255.0   U 1  00 eth0

The problem is:
I try to ping from container1 (lub4) to a host on the real network. It is
working.
I try to ping from container2 (lub5) to the same host and it is not
working! The DNS resolving is OK, but no answer from the real host.

I checked the traffic on eth0 on lub4 or 5 (inside the containers). I can
see the ICMP echo REQ packets.
They are arrived to the host's lxcbr0 interface. I think it is good.
I checked the hosts's mgmtlxc0 interface which is the routing interface on
IP level. I can see the REQ packets.
ip4_forwarding is enabled (=1).
The next interface is eth0 and no traffic from containers on it! I filtered
for ICMP and no REQ! So the host "filter out" (or not routing) my MASQUed
ICMP packets.
I think it is not a MASQ problem, because without MASQUERADING I had to see
the outgoing REQ packets with wrong source IP (10.0.3.x) and of course
there won't be any answer because the real host knows nothing about routing
to 10.0.3.0 lxcnet. But no any outgoing packets.
I tried to remove the all iptables rules except MASQ and nothing was
changed.

If I ping between lub4 and 5 it is working (virtual) when the real not.

If I restart the containers one by one and I change the ping test (1st is
lub5 and the 2nd is lub4) the 2nd won't ping so not depend ont the
containers OS version.

I think the problem maybe in MASQ or routing between mgmtlxc0 and eth0.
netstat-nat doesn't work and I don't know why.
Do you have any clue?

I've got another host which is Fedora 23 64 bit (OVS 2.5) with 3 U14.04
containers and it seems working.

I'll do some more test. For example making a new U14.04 container because
on F23 the container's versions are the same.
LXD was installed but not used or configured.

TIA,
Ruzsi
___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users

[lxc-users] kernel:NMI watchdog: BUG: soft lockup - renders the system inresponsive

2016-07-18 Thread Király , István
Hello list, ...

Today, after an upgrade from lxc 2.0.0 to 2.0.3 + a full OS update, some
strange errors shed light.

In the terminal:

kernel:NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s!
[rpc.nfsd:25666]

.. beeps every 10 seconds or so.

top shows indeed rpc.nfsd with the given pid at continuous 100% CPU usage.
It renders the system completely out of control, even a reboot command
times out.

This behavior is getting started after starting a couple of containers.

On another server, the same upgrade took place, and after a couple of same
style errors (with kworker as I remember) - but the system started up and
so far runs as normal.

On a third system, there were no errors, but that had only one container.
...

My only hope now is that a distro update from fedora 23 to 24 will
eliminate the problem.
Might try an lxc downgrade before though ,...

Any suggestions appreciated.

Thank you, ...

-- 
 Király István
+36 209 753 758
lak...@d250.hu

___
lxc-users mailing list
lxc-users@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-users