Dne 4.2.2012 3:34, Benjamin Reiter, Aginion IT-Consulting napsal(a):
To verify whether this is a hardware or configuration problem on my
side, do SL 6.2 guests on SL 6.2 hosts work reliably without virtio-net
hiccups for other people?

Even when they are stressed with a bit of network traffic? (~100 GB/hour)

Any reports are highly appreciated.



-------- Original Message --------
Subject: Bug report: Page allocation failure with virtio-net in kvm
guest on 2.6.32-220.4.1
Date: Thu, 02 Feb 2012 16:21:38 +0100
From: Benjamin Reiter, Aginion IT-Consulting
<[email protected]>
To: [email protected]

Page allocation failure with virtio-net in kvm guest on 2.6.32-220.4.1

Reproducibly after a couple minutes or hours and 100 MB - 30GB of
network traffic (NFS) the network interface in the guest goes down. The
guest can be shut down from the host via acpi event.

This does only happen with the virtio net driver, with e1000 the guest
is stable for days.

Host and guest run 2.6.32-220.4.1.el6.x86_64

Host runs kvm version 0.12.1.2-2.209.el6_2.4.x86_64




Feb  2 13:04:02 host656 kernel: rpciod/0: page allocation failure.
order:0, mode:0x20
Feb  2 13:04:02 host656 kernel: Pid: 1081, comm: rpciod/0 Not tainted
2.6.32-220.4.1.el6.x86_64 #1
Feb  2 13:04:02 host656 kernel: Call Trace:
Feb  2 13:04:02 host656 kernel:<IRQ>   [<ffffffff81123daf>] ?
__alloc_pages_nodemask+0x77f/0x940
Feb  2 13:04:02 host656 kernel: [<ffffffff81158a1a>] ?
alloc_pages_current+0xaa/0x110
Feb  2 13:04:02 host656 kernel: [<ffffffffa0108d22>] ?
try_fill_recv+0x262/0x280 [virtio_net]
Feb  2 13:04:02 host656 kernel: [<ffffffff8142df18>] ?
netif_receive_skb+0x58/0x60
Feb  2 13:04:02 host656 kernel: [<ffffffffa01091fd>] ?
virtnet_poll+0x42d/0x8d0 [virtio_net]
Feb  2 13:04:02 host656 kernel: [<ffffffff814307c3>] ?
net_rx_action+0x103/0x2f0
Feb  2 13:04:02 host656 kernel: [<ffffffff81072001>] ?
__do_softirq+0xc1/0x1d0
Feb  2 13:04:02 host656 kernel: [<ffffffff8100c24c>] ?
call_softirq+0x1c/0x30
Feb  2 13:04:02 host656 kernel:<EOI>   [<ffffffff8100de85>] ?
do_softirq+0x65/0xa0
Feb  2 13:04:02 host656 kernel: [<ffffffff81071f0a>] ?
local_bh_enable+0x9a/0xb0
Feb  2 13:04:02 host656 kernel: [<ffffffff8147a8e7>] ?
tcp_rcv_established+0x107/0x800
Feb  2 13:04:02 host656 kernel: [<ffffffff81482c13>] ?
tcp_v4_do_rcv+0x2e3/0x430
Feb  2 13:04:02 host656 kernel: [<ffffffff8147ead6>] ?
tcp_write_xmit+0x1f6/0x9e0
Feb  2 13:04:02 host656 kernel: [<ffffffff8141cc75>] ?
release_sock+0x65/0xe0
Feb  2 13:04:02 host656 kernel: [<ffffffff8146fb4c>] ?
tcp_sendmsg+0x73c/0xa10
Feb  2 13:04:02 host656 kernel: [<ffffffff81419a0a>] ?
sock_sendmsg+0x11a/0x150
Feb  2 13:04:02 host656 kernel: [<ffffffff81038488>] ?
pvclock_clocksource_read+0x58/0xd0
Feb  2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
autoremove_wake_function+0x0/0x40
Feb  2 13:04:02 host656 kernel: [<ffffffff81061c95>] ?
enqueue_entity+0x125/0x420
Feb  2 13:04:02 host656 kernel: [<ffffffff81419a81>] ?
kernel_sendmsg+0x41/0x60
Feb  2 13:04:02 host656 kernel: [<ffffffffa018ab6e>] ?
xs_send_kvec+0x8e/0xa0 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa018acf3>] ?
xs_sendpages+0x173/0x220 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa018aedd>] ?
xs_tcp_send_request+0x5d/0x160 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa0188e63>] ?
xprt_transmit+0x83/0x2e0 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa0185c48>] ?
call_transmit+0x1d8/0x2c0 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa018e23e>] ?
__rpc_execute+0x5e/0x2a0 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa018e4d0>] ?
rpc_async_schedule+0x0/0x20 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffffa018e4e5>] ?
rpc_async_schedule+0x15/0x20 [sunrpc]
Feb  2 13:04:02 host656 kernel: [<ffffffff8108b150>] ?
worker_thread+0x170/0x2a0
Feb  2 13:04:02 host656 kernel: [<ffffffff81090a90>] ?
autoremove_wake_function+0x0/0x40
Feb  2 13:04:02 host656 kernel: [<ffffffff8108afe0>] ?
worker_thread+0x0/0x2a0
Feb  2 13:04:02 host656 kernel: [<ffffffff81090726>] ? kthread+0x96/0xa0
Feb  2 13:04:02 host656 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Feb  2 13:04:02 host656 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
Feb  2 13:04:02 host656 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
...


VM is started with:

qemu      2347 61.7  3.7 537704 281556 ?       Sl   13:09  67:29
/usr/libexec/qemu-kvm -S -M rhel6.2.0 -enable-kvm -m 256 -smp
1,sockets=1,cores=1,threads=1 -name kvm_host656.net31 -uuid
97eae23f-bb13-58da-b4bc-258c6bf275a2 -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/kvm_host656.net31.monitor,server,nowait

-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -drive
file=/dev/disk/by-path/ip-10.224.2.20:3260-iscsi-iqn.1986-03.com.sun:02:e9e63ad1-3f29-4d5c-9da9-b10e44a1520f.vmstore12.net31-lun-1,if=none,id=drive-virtio-disk0,format=raw,cache=none

-device
virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1

-netdev tap,fd=21,id=hostnet0 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:6a:c7:d8,bus=pci.0,addr=0x3

-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -vnc 0.0.0.0:21 -k en-us
-vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Hi, for me virtio-net works fine but with sufficient amount of memory.

Which in your case, I think it's just too low to make the kernel work
reliable under pressure.

I have amd64 host machine, one of the guests is configured as i386,
1CPU, 1GB memory, virtio-net and it's doing single thread data analysis
from NFS share without any problems. I had issues under low memory
condition, but not just with virtio :-)

HTH, Z.



Thanks for your input. I can try increasing ram size but I don't think this qualifies as a low memory condition. This is normal operation after about 30 hours uptime:

       244680  total memory
       215732  used memory
        58152  active memory
        85692  inactive memory
        28948  free memory
         1456  buffer memory
       107944  swap cache
      3561464  total swap
         4256  used swap
      3557208  free swap
       172194 non-nice user cpu ticks
          673 nice user cpu ticks
       127903 system cpu ticks
     12064827 idle cpu ticks
       665005 IO-wait cpu ticks
       208048 IRQ cpu ticks
       561345 softirq cpu ticks
            0 stolen cpu ticks
       764023 pages paged in
       480384 pages paged out
           98 pages swapped in
         1081 pages swapped out
    103629433 interrupts
     54954195 CPU context switches
   1328188168 boot time
         2993 forks

Seems sufficient, doesn't it?

Why would virtio be affected by it but e1000 not at all? Even if memory would really be a problem I don't think killing the network is the right response.

Btw: Looks a lot like this bug from 2009: https://bugzilla.redhat.com/show_bug.cgi?id=520119

Benjamin

Reply via email to