[Bug 1325560] [NEW] kvm vm loses network connectivity under enough load

2014-06-02 Thread Izhar ul Hassan
Public bug reported:

Networking breaks after awhile in kvm guests using virtio networking. We
run data intensive jobs on our virtual cluster (OpenStack Grizzly
Installed on Ubuntu 12.04 Server).  The job runs fine on a single worker
VM (no data transfer involved). As soon as I add more nodes where the
workers need to exchange some data, one of the worker VM goes down. Ping
responds with 'host unreachable'. Logging in via the serial console
shows no problems: eth0 is up, can ping the local host, but no outside
connectivity. Restart the network (/etc/init.d/networking restart) does
nothing. Reboot the machine and it comes alive again.

14/06/01 18:30:06 INFO YarnClientClusterScheduler: 
YarnClientClusterScheduler.postStartHook done
14/06/01 18:30:06 INFO MemoryStore: ensureFreeSpace(190758) called with 
curMem=0, maxMem=308713881
14/06/01 18:30:06 INFO MemoryStore: Block broadcast_0 stored as values to 
memory (estimated size 186.3 KB, free 294.2 MB)
14/06/01 18:30:06 INFO FileInputFormat: Total input paths to process : 1
14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: 
/default-rack/10.20.20.28:50010
14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: 
/default-rack/10.20.20.23:50010
14/06/01 18:30:06 INFO SparkContext: Starting job: count at hello_spark.py:15
14/06/01 18:30:06 INFO DAGScheduler: Got job 0 (count at hello_spark.py:15) 
with 2 output partitions (allowLocal=false)
14/06/01 18:30:06 INFO DAGScheduler: Final stage: Stage 0 (count at 
hello_spark.py:15)
14/06/01 18:30:06 INFO DAGScheduler: Parents of final stage: List()
14/06/01 18:30:06 INFO DAGScheduler: Missing parents: List()
14/06/01 18:30:06 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count 
at hello_spark.py:15), which has no missing parents
14/06/01 18:30:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 
(PythonRDD[2] at count at hello_spark.py:15)
14/06/01 18:30:07 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 
tasks
14/06/01 18:30:08 INFO YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkExecutor@host-10-20-20-28.novalocal:44417/user/Executor#-1352071582]
 with ID 1
14/06/01 18:30:08 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 
1: host-10-20-20-28.novalocal (PROCESS_LOCAL)
14/06/01 18:30:08 INFO TaskSetManager: Serialized task 0.0:0 as 3123 bytes in 
14 ms
14/06/01 18:30:09 INFO BlockManagerMasterActor$BlockManagerInfo: Registering 
block manager host-10-20-20-28.novalocal:42960 with 588.8 MB RAM
14/06/01 18:30:16 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 
in memory on host-10-20-20-28.novalocal:42960 (size: 308.2 MB, free: 280.7 MB)
14/06/01 18:30:17 INFO YarnClientSchedulerBackend: Registered executor: 
Actor[akka.tcp://sparkExecutor@host-10-20-20-23.novalocal:58126/user/Executor#1079893974]
 with ID 2
14/06/01 18:30:17 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 
2: host-10-20-20-23.novalocal (PROCESS_LOCAL)
14/06/01 18:30:17 INFO TaskSetManager: Serialized task 0.0:1 as 3123 bytes in 1 
ms
14/06/01 18:30:17 INFO BlockManagerMasterActor$BlockManagerInfo: Registering 
block manager host-10-20-20-23.novalocal:56776 with 588.8 MB RAM
fj14/06/01 18:31:20 WARN BlockManagerMasterActor: Removing BlockManager 
BlockManagerId(1, host-10-20-20-28.novalocal, 42960, 0) with no recent heart 
beats: 55828ms exceeds 45000ms
14/06/01 18:42:23 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so 
removing it
14/06/01 18:42:23 ERROR YarnClientClusterScheduler: Lost executor 2 on 
host-10-20-20-23.novalocal: remote Akka client disassociated

The same job finishes flawlessly on a single worker.

System Information:
==

Description: Ubuntu 12.04.4 LTS
Release: 12.04

Linux 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC
2014 x86_64

libvirt-bin:
--
  Installed: 1.1.1-0ubuntu8~cloud2
  Candidate: 1.1.1-0ubuntu8.7~cloud1
  Version table:
 1.1.1-0ubuntu8.7~cloud1 0
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ 
precise-updates/havana/main amd64 Packages
 *** 1.1.1-0ubuntu8~cloud2 0
100 /var/lib/dpkg/status
 0.9.8-2ubuntu17.19 0
500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 
Packages
 0.9.8-2ubuntu17.17 0
500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 
Packages
 0.9.8-2ubuntu17 0
500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages

qemu-kvm:
---
  Installed: 1.5.0+dfsg-3ubuntu5~cloud0
  Candidate: 1.5.0+dfsg-3ubuntu5.4~cloud0
  Version table:
 1.5.0+dfsg-3ubuntu5.4~cloud0 0
500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ 
precise-updates/havana/main amd64 Packages
 *** 1.5.0+dfsg-3ubuntu5~cloud0 0
100 /var/lib/dpkg/status
 1.0+noroms-0ubuntu14.15 0
500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 
Packages
 1.0+noroms-0ubuntu14.14 0
500 http://security.ubuntu.com/ubuntu/ 

Re: [Bug 1325560] [NEW] kvm vm loses network connectivity under enough load

2014-06-02 Thread Serge Hallyn
Thanks for reporting this bug.  When you say reboot the machine and it comes
alive again, do you mean that a soft reboot from inside the vm guest (i.e. not
restarting qemu at all) brings the guest's network back up?

 status: incomplete
 importance high


** Changed in: qemu-kvm (Ubuntu)
   Importance: Undecided = High

** Changed in: qemu-kvm (Ubuntu)
   Status: New = Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to qemu-kvm in Ubuntu.
https://bugs.launchpad.net/bugs/1325560

Title:
  kvm vm loses network connectivity under enough load

To manage notifications about this bug go to:
https://bugs.launchpad.net/libvirt/+bug/1325560/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs