Bug#592187: [stable] Bug#576838: virtio network crashes again
Am Donnerstag, den 09.09.2010, 04:23 +0100 schrieb Ben Hutchings: On Tue, 2010-09-07 at 13:25 +0200, Lukas Kolbe wrote: On Wed, 2010-09-01 at 05:26 +0100, Ben Hutchings wrote: On Tue, 2010-08-31 at 17:34 +0200, Lukas Kolbe wrote: On Tue, 2010-08-31 at 06:35 -0700, Greg KH wrote: [...] Then how about convincing the Debian kernel developers to accept these patches, and work through any regressions that might be found and after that, reporting back to us? Ben? The reason I contacted you was precisely because it went into 2.6.33.2, e.g. was already accepted into a -stalbe release. I didn't expect it to be such an issue. That's not likely if people spread FUD about the backlog patches! Dave, did you explicitly exclude these patches from 2.6.32 when you submitted them to stable, or is it just that 5534979 udp: use limited socket backlog depends on a1ab77f ipv6: udp: Optimise multicast reception? The former patch doesn't look too hard to backport to 2.6.32 (see below). Anybody? We've currently rolled out our own 2.6.32 kernel with these fixes applied, and they indeed fix a system crash under our nfs-load. What else can I do to get these fixes into either Debians' 2.6.32 or Greg's stable 2.6.32 series? [...] These patches will be included in Debian's version 2.6.32-22. We'll see how that goes. I owe you a few beers. Thanks a million! Ben. Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1284025201.15888.1.ca...@larosa.fritz.box
Bug#592187: [stable] Bug#576838: virtio network crashes again
On Wed, 2010-09-01 at 05:26 +0100, Ben Hutchings wrote: On Tue, 2010-08-31 at 17:34 +0200, Lukas Kolbe wrote: On Tue, 2010-08-31 at 06:35 -0700, Greg KH wrote: [...] Then how about convincing the Debian kernel developers to accept these patches, and work through any regressions that might be found and after that, reporting back to us? Ben? The reason I contacted you was precisely because it went into 2.6.33.2, e.g. was already accepted into a -stalbe release. I didn't expect it to be such an issue. That's not likely if people spread FUD about the backlog patches! Dave, did you explicitly exclude these patches from 2.6.32 when you submitted them to stable, or is it just that 5534979 udp: use limited socket backlog depends on a1ab77f ipv6: udp: Optimise multicast reception? The former patch doesn't look too hard to backport to 2.6.32 (see below). Anybody? We've currently rolled out our own 2.6.32 kernel with these fixes applied, and they indeed fix a system crash under our nfs-load. What else can I do to get these fixes into either Debians' 2.6.32 or Greg's stable 2.6.32 series? Ben. From: Zhu Yi yi@intel.com Date: Thu, 4 Mar 2010 18:01:42 + Subject: [PATCH] udp: use limited socket backlog [ Upstream commit 55349790d7cbf0d381873a7ece1dcafcffd4aaa9 ] Make udp adapt to the limited socket backlog change. Cc: David S. Miller da...@davemloft.net Cc: Alexey Kuznetsov kuz...@ms2.inr.ac.ru Cc: Pekka Savola (ipv6) pek...@netcore.fi Cc: Patrick McHardy ka...@trash.net Signed-off-by: Zhu Yi yi@intel.com Acked-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: David S. Miller da...@davemloft.net Signed-off-by: Greg Kroah-Hartman gre...@suse.de [bwh: Backport to 2.6.32] Regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1283858701.5722.78.ca...@quux.techfak.uni-bielefeld.de
Bug#592187: [stable] Bug#576838: virtio network crashes again
Am Montag, den 30.08.2010, 10:21 -0700 schrieb Greg KH: On Mon, Aug 30, 2010 at 09:46:36AM -0700, David Miller wrote: From: Greg KH g...@kroah.com Date: Mon, 30 Aug 2010 07:50:17 -0700 As I stated above, I need the ACK from David to be able to add these patches. David? I believe there were some regressions caused by these changes that were fixed later, a bit after those commites went into the tree. I'm only confortable ACK'ing this if someone does due diligence and checks for any such follow-on fixes to that series. It's a pretty non-trivial set of patches and has the potential to kill performance which would be a very serious regression. Fair enough. Yep, thanks! Who's done the checks to find out any problems with these patches? I'll skim the changelogs in 2.6.3[345].x to see if there are any related patches. And what is keeping you from moving to the .35 kernel tree instead? Basically, distribution support. Debian Squeeze will ship with 2.6.32, as Ubuntu already did for their current LTS - and I really want Debian's kernel to be as reliable and stable as possible (btw, that's why I initally reported this as a debian bug, because at that time I wasn't using vanilla kernels. Now that I know how git bisect works, it will hopefully be easier for me to pinpoint regressions in the future). Also, We do not really have enough hardware to test new upstream releases thouroughly before going into production (e.g., we only have one big tape library with one big disk pool, so no test if tape-on-mptsas and aacraid work properly and stable in a newer upstream releases. Kind regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1283242616.14680.832.ca...@larosa.fritz.box
Bug#592187: [stable] Bug#576838: virtio network crashes again
Who's done the checks to find out any problems with these patches? I'll skim the changelogs in 2.6.3[345].x to see if there are any related patches. This is all I could find in current 2.6.36-rc2 (via git log | grep, minus rps/rfs patches). I don't know anything about these, but they sound related. If anybody with insight into the actual codebase could take a look there ... commit c377411f2494a931ff7facdbb3a6839b1266bcf6 Author: Eric Dumazet eric.duma...@gmail.com Date: Tue Apr 27 15:13:20 2010 -0700 net: sk_add_backlog() take rmem_alloc into account Current socket backlog limit is not enough to really stop DDOS attacks, because user thread spend many time to process a full backlog each round, and user might crazy spin on socket lock. We should add backlog size and receive_queue size (aka rmem_alloc) to pace writers, and let user run without being slow down too much. Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in stress situations. Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp receiver can now process ~200.000 pps (instead of ~100 pps before the patch) on a 8 core machine. Signed-off-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: David S. Miller da...@davemloft.net commit 6cce09f87a04797fae5b947ef2626c14a78f0b49 Author: Eric Dumazet eric.duma...@gmail.com Date: Sun Mar 7 23:21:57 2010 + tcp: Add SNMP counters for backlog and min_ttl drops Commit 6b03a53a (tcp: use limited socket backlog) added the possibility of dropping frames when backlog queue is full. Commit d218d111 (tcp: Generalized TTL Security Mechanism) added the possibility of dropping frames when TTL is under a given limit. This patch adds new SNMP MIB entries, named TCPBacklogDrop and TCPMinTTLDrop, published in /proc/net/netstat in TcpExt: line netstat -s | egrep TCPBacklogDrop|TCPMinTTLDrop TCPBacklogDrop: 0 TCPMinTTLDrop: 0 Signed-off-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: David S. Miller da...@davemloft.net commit 4045635318538d3ddd2007720412fdc4b08f6a62 Author: Zhu Yi yi@intel.com Date: Sun Mar 7 16:21:39 2010 + net: add __must_check to sk_add_backlog Add the __must_check tag to sk_add_backlog() so that any failure to check and drop packets will be warned about. Signed-off-by: Zhu Yi yi@intel.com Signed-off-by: David S. Miller da...@davemloft.net commit b1faf5666438090a4dc4fceac8502edc7788b7e3 Author: Eric Dumazet eric.duma...@gmail.com Date: Mon May 31 23:44:05 2010 -0700 net: sock_queue_err_skb() dont mess with sk_forward_alloc Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: David S. Miller da...@davemloft.net commit dee42870a423ad485129f43cddfe7275479f11d8 Author: Changli Gao xiao...@gmail.com Date: Sun May 2 05:42:16 2010 + net: fix softnet_stat Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context without any protection. And enqueue_to_backlog should update the netdev_rx_stat of the target CPU. This patch renames softnet_data.total to softnet_data.processed: the number of packets processed in uppper levels(IP stacks). softnet_stat data is moved into softnet_data. Signed-off-by: Changli Gao xiao...@gmail.com include/linux/netdevice.h | 17 +++-- net/core/dev.c| 26 -- net/sched/sch_generic.c |2 +- 3 files changed, 20 insertions(+), 25 deletions(-) Signed-off-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: David S. Miller da...@davemloft.net commit 4b0b72f7dd617b13abd1b04c947e15873e011a24 Author: Eric Dumazet eric.duma...@gmail.com Date: Wed Apr 28 14:35:48 2010 -0700 net: speedup udp receive path Since commit 95766fff ([UDP]: Add memory accounting.), each received packet needs one extra sock_lock()/sock_release() pair. This added latency because of possible backlog handling. Then later, ticket spinlocks added yet another latency source in case of DDOS. This patch introduces lock_sock_bh() and unlock_sock_bh() synchronization primitives, avoiding one atomic operation and backlog processing. skb_free_datagram_locked() uses them instead of full blown
Bug#592187: [stable] Bug#576838: virtio network crashes again
On Tue, 2010-08-31 at 06:35 -0700, Greg KH wrote: On Tue, Aug 31, 2010 at 10:16:56AM +0200, Lukas Kolbe wrote: Am Montag, den 30.08.2010, 10:21 -0700 schrieb Greg KH: On Mon, Aug 30, 2010 at 09:46:36AM -0700, David Miller wrote: From: Greg KH g...@kroah.com Date: Mon, 30 Aug 2010 07:50:17 -0700 As I stated above, I need the ACK from David to be able to add these patches. David? I believe there were some regressions caused by these changes that were fixed later, a bit after those commites went into the tree. I'm only confortable ACK'ing this if someone does due diligence and checks for any such follow-on fixes to that series. It's a pretty non-trivial set of patches and has the potential to kill performance which would be a very serious regression. Fair enough. Yep, thanks! Who's done the checks to find out any problems with these patches? I'll skim the changelogs in 2.6.3[345].x to see if there are any related patches. And what is keeping you from moving to the .35 kernel tree instead? Basically, distribution support. Debian Squeeze will ship with 2.6.32, as Ubuntu already did for their current LTS - and I really want Debian's kernel to be as reliable and stable as possible (btw, that's why I initally reported this as a debian bug, because at that time I wasn't using vanilla kernels. Now that I know how git bisect works, it will hopefully be easier for me to pinpoint regressions in the future). Also, We do not really have enough hardware to test new upstream releases thouroughly before going into production (e.g., we only have one big tape library with one big disk pool, so no test if tape-on-mptsas and aacraid work properly and stable in a newer upstream releases. Then how about convincing the Debian kernel developers to accept these patches, and work through any regressions that might be found and after that, reporting back to us? Ben? The reason I contacted you was precisely because it went into 2.6.33.2, e.g. was already accepted into a -stalbe release. I didn't expect it to be such an issue. thanks, greg k-h Regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1283268890.5722.75.ca...@quux.techfak.uni-bielefeld.de
Bug#592187: [stable] Bug#576838: virtio network crashes again
On Thu, 2010-08-26 at 09:32 +0200, Lukas Kolbe wrote: Hi, I was finally able to identify the patch series that introduced the fix (they were introduced to -stable in 2.6.33.2): cb63112 net: add __must_check to sk_add_backlog a12a9a2 net: backlog functions rename 51c5db4 x25: use limited socket backlog c531ab2 tipc: use limited socket backlog 37d60aa sctp: use limited socket backlog 9b3d968 llc: use limited socket backlog 230401e udp: use limited socket backlog 20a92ec tcp: use limited socket backlog ab9dd05 net: add limit for socket backlog After applying these to 2.6.32.17, I wasn't able to trigger the failure anymore. What failure? 230401e didn't apply cleanly with git cherry-pick on top of 2.6.32.17, so there might be some additional work needed. @Greg: would it be possible to have these fixes in the next 2.6.32? See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592187#69 for details: they fix a guest network crash during heavy nfs-io using virtio. These are a lot of patches, looking like they are adding a new feature. I would need to get the ack of the network maintainer before I can add them. David? I don't mean to nag (hm well, maybe I do) and I know you were busy preparing the guard-page fixes, but what's the status of this? In the meantime, we triggered this bug also on barebone hardware using nfs over tcp with default [rw]sizes of about 1MiB. On the real hardware, the kernel oopsed, not only the network stack ... With these patches applied, everything works smoothly. I'd really love to see a stable 2.6.32 ... Is there anything I can do to help reaching a decision with this issue? Regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1283176797.5722.13.ca...@quux.techfak.uni-bielefeld.de
Bug#592187: [stable] Bug#576838: virtio network crashes again
Hi all, I was finally able to identify the patch series that introduced the fix (they were introduced to -stable in 2.6.33.2): cb63112 net: add __must_check to sk_add_backlog a12a9a2 net: backlog functions rename 51c5db4 x25: use limited socket backlog c531ab2 tipc: use limited socket backlog 37d60aa sctp: use limited socket backlog 9b3d968 llc: use limited socket backlog 230401e udp: use limited socket backlog 20a92ec tcp: use limited socket backlog ab9dd05 net: add limit for socket backlog After applying these to 2.6.32.17, I wasn't able to trigger the failure anymore. What failure? 230401e didn't apply cleanly with git cherry-pick on top of 2.6.32.17, so there might be some additional work needed. @Greg: would it be possible to have these fixes in the next 2.6.32? See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592187#69 for details: they fix a guest network crash during heavy nfs-io using virtio. These are a lot of patches, looking like they are adding a new feature. I would need to get the ack of the network maintainer before I can add them. David? I don't mean to nag (hm well, maybe I do) and I know you were busy preparing the guard-page fixes, but what's the status of this? In the meantime, we triggered this bug also on barebone hardware using nfs over tcp with default [rw]sizes of about 1MiB. On the real hardware, the kernel oopsed, not only the network stack ... With these patches applied, everything works smoothly. I'd really love to see a stable 2.6.32 ... thanks, greg k-h Regards, Lukas Kolbe -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1282807979.16456.10.ca...@larosa.fritz.box
Bug#592187: [stable] Bug#576838: virtio network crashes again
Hi all, I was finally able to identify the patch series that introduced the fix (they were introduced to -stable in 2.6.33.2): cb63112 net: add __must_check to sk_add_backlog a12a9a2 net: backlog functions rename 51c5db4 x25: use limited socket backlog c531ab2 tipc: use limited socket backlog 37d60aa sctp: use limited socket backlog 9b3d968 llc: use limited socket backlog 230401e udp: use limited socket backlog 20a92ec tcp: use limited socket backlog ab9dd05 net: add limit for socket backlog After applying these to 2.6.32.17, I wasn't able to trigger the failure anymore. What failure? From my other mail, for public reference: With 2.6.32.17 as a KVM guest using virtio_net, large nfs reads and writes cause the network to crash. Only rmmod virtio_net/modprobe virtio_net fixes it. I found that this bug was fixed in 2.6.33.2, and git bisect pointed me to the following patch series, which, when applied to 2.6.32.17, fixes the problem: I have to add that this also happens to bare-bone systems on real hardware - we just had a machine crash during it's nightly nfs backup with a slew of page allocation failures. 230401e didn't apply cleanly with git cherry-pick on top of 2.6.32.17, so there might be some additional work needed. @Greg: would it be possible to have these fixes in the next 2.6.32? See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592187#69 for details: they fix a guest network crash during heavy nfs-io using virtio. These are a lot of patches, looking like they are adding a new feature. I would need to get the ack of the network maintainer before I can add them. David? fyi, the comment of the original patch series applied to 2.6.33. It's not a new feature per se, but a fix to a general problem. Author: Zhu Yi yi@intel.com Date: Thu Mar 4 18:01:40 2010 + net: add limit for socket backlog [ Upstream commit 8eae939f1400326b06d0c9afe53d2a484a326871 ] We got system OOM while running some UDP netperf testing on the loopback device. The case is multiple senders sent stream UDP packets to a single receiver via loopback on local host. Of course, the receiver is not able to handle all the packets in time. But we surprisingly found that these packets were not discarded due to the receiver's sk-sk_rcvbuf limit. Instead, they are kept queuing to sk-sk_backlog and finally ate up all the memory. We believe this is a secure hole that a none privileged user can crash the system. The root cause for this problem is, when the receiver is doing __release_sock() (i.e. after userspace recv, kernel udp_recvmsg - skb_free_datagram_locked - release_sock), it moves skbs from backlog to sk_receive_queue with the softirq enabled. In the above case, multiple busy senders will almost make it an endless loop. The skbs in the backlog end up eat all the system memory. The issue is not only for UDP. Any protocols using socket backlog is potentially affected. The patch adds limit for socket backlog so that the backlog size cannot be expanded endlessly. Reported-by: Alex Shi alex@intel.com Cc: David Miller da...@davemloft.net Cc: Arnaldo Carvalho de Melo a...@ghostprotocols.net Cc: Alexey Kuznetsov kuz...@ms2.inr.ac.ru Cc: Pekka Savola (ipv6) pek...@netcore.fi Cc: Patrick McHardy ka...@trash.net Cc: Vlad Yasevich vladislav.yasev...@hp.com Cc: Sridhar Samudrala s...@us.ibm.com Cc: Jon Maloy jon.ma...@ericsson.com Cc: Allan Stephens allan.steph...@windriver.com Cc: Andrew Hendry andrew.hen...@gmail.com Signed-off-by: Zhu Yi yi@intel.com Signed-off-by: Eric Dumazet eric.duma...@gmail.com Acked-by: Arnaldo Carvalho de Melo a...@redhat.com Signed-off-by: David S. Miller da...@davemloft.net Signed-off-by: Greg Kroah-Hartman gre...@suse.de thanks, greg k-h Regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1282310856.8635.7.ca...@quux.techfak.uni-bielefeld.de
Bug#592187: Bug#576838: virtio network crashes again
Hi Ben, Greg, I was finally able to identify the patch series that introduced the fix (they were introduced to -stable in 2.6.33.2): cb63112 net: add __must_check to sk_add_backlog a12a9a2 net: backlog functions rename 51c5db4 x25: use limited socket backlog c531ab2 tipc: use limited socket backlog 37d60aa sctp: use limited socket backlog 9b3d968 llc: use limited socket backlog 230401e udp: use limited socket backlog 20a92ec tcp: use limited socket backlog ab9dd05 net: add limit for socket backlog After applying these to 2.6.32.17, I wasn't able to trigger the failure anymore. 230401e didn't apply cleanly with git cherry-pick on top of 2.6.32.17, so there might be some additional work needed. @Greg: would it be possible to have these fixes in the next 2.6.32? See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592187#69 for details: they fix a guest network crash during heavy nfs-io using virtio. Kind regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1281857854.2475.80.ca...@larosa.fritz.box
Bug#592187: Bug#576838: virtio network crashes again
Am Mittwoch, den 11.08.2010, 04:13 +0100 schrieb Ben Hutchings: On Mon, 2010-08-09 at 11:24 +0200, Lukas Kolbe wrote: So, testing begins. First conclusion: not all traffic patterns produce the page allocation failure. rdiff-backup only writing to an nfs-share does no harm; rdiff-backup reading and writing (incremental backup) leads to (nearly immediate) error. The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says: fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x 0 0 [...] I've seen some recent discussion of a bug in the Linux NFS client that can cause it to stop working entirely in case of some packet loss events https://bugzilla.kernel.org/show_bug.cgi?id=16494. It is possible that you are running into that bug. I haven't yet seen an agreement on the fix for it. Thanks, I'll look into it. I ran some further tests with vanilla and debian kernels: VERSION WORKING --- 2.6.35 yes 2.6.33.6yes 2.6.32.17 doesn't boot as kvm guest 2.6.32.17-2.6.32-19 no 2.6.32.17-2.6.32-18 no 2.6.32.16 no I don't know if this is related to #16494 since I'm unable to trigger it on 2.6.33.6 or 2.6.35. I'll test 2.6.32 with the patch from http://lkml.org/lkml/2010/8/10/52 applied as well and bisect between 2.6.32.17 and 2.6.33.6 in the next few days. I also wonder whether the extremely large request sizes (rsize and wsize) you have selected are more likely to trigger the allocation failure in virtio_net. Please can you test whether reducing them helps? The large rsize/wsize were automatically chosen, but I'll test with a failing kernel and [rw]size of 32768. Kind regards, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1281518672.11319.146.ca...@larosa.fritz.box
Re: Uploading linux-2.6 (2.6.32-20)
Am Mittwoch, den 11.08.2010, 04:28 +0100 schrieb Ben Hutchings: I intend to upload linux-2.6 to unstable on Wednesday evening or Thursday morning (GMT+1). This should fix the FTBFS, and contains many other bug fixes besides. Let me know if there's anything I should wait for. If it is not too much trouble for you, could you wait until thursday morning? I'm doing a git bisect for http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=592187 that will probably take until tonight; if I find a single cause for it I'd love to see the patch applied! -- Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1281534305.11319.315.ca...@larosa.fritz.box
Bug#592187: Bug#576838: virtio network crashes again
Hi Ben, Am Sonntag, den 08.08.2010, 03:36 +0100 schrieb Ben Hutchings: This is not the same bug as was originally reported, which is that virtio_net failed to retry refilling its RX buffer ring. That is definitely fixed. So I'm treating this as a new bug report, #592187. Okay, thanks. I think you need to give your guests more memory. They all have between 512M and 2G - and it happens to all of them using virtio_net, and none of them using rtl8139 as a network driver, reproducibly. The RTL8139 hardware uses a single fixed RX DMA buffer. The virtio 'hardware' allows the host to write into RX buffers anywhere in guest memory. This results in very different allocation patterns. Please try specifying 'e1000' hardware, i.e. an Intel gigabit controller. I think the e1000 driver will have a similar allocation pattern to virtio_net, so you can see whether it also triggers allocation failures and a network stall in the guest. Also, please test Linux 2.6.35 in the guest. This is packaged in the 'experimental' suite. I'll rig up a test machine (the crashes all occured on production guests, unfortunatly) and report back. [...] If it would be an OOM situation, wouldn't the OOM-killer be supposed to kick in? [...] The log you sent shows failure to allocate memory in an 'atomic' context where there is no opportunity to wait for pages to be swapped out. The OOM killer isn't triggered until the system is running out of memory despite swapping out pages. Ah, good to know, thanks! Also, I note that following the failure of virtio_net to refill its RX buffer ring, I see failures to allocate buffers for sending TCP ACKs. So the guest drops the ACKs, and that TCP connection will stall temporarily (until the peer re-sends the unacknowledged packets). I also see 'nfs: server fileserver.backup.TechFak.Uni-Bielefeld.DE not responding, still trying'. This suggests that the allocation failure in virtio_net has resulted in dropping packets from the NFS server. And it just makes matters worse as it becomes impossible to free memory by flushing out buffers over NFS! This sounds quite bad. This problem *seems* to be fixed by 2.6.32-19: we upgraded to that on a different machine for host and guests, and an rsync of ~1TiB of data didn't produce any page allocation failures using virtio. But I'd wait for my tests with rsync/nfs and 2.6.32-18+e1000, 2.6.32-18+virtio 2.6.32-19+virtio and 2.6.35+virtio to conclude that. Thanks for taking your time to explain things! -- Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1281338417.11319.20.ca...@larosa.fritz.box
Bug#592187: Bug#576838: virtio network crashes again
So, testing begins. First conclusion: not all traffic patterns produce the page allocation failure. rdiff-backup only writing to an nfs-share does no harm; rdiff-backup reading and writing (incremental backup) leads to (nearly immediate) error. The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says: fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x 0 0 This is the result of 2.6.32-18 with virtio: (/proc/meminfo within ten seconds of the page allocation failure, if that helps) MemTotal: 509072 kB MemFree: 10356 kB Buffers:4244 kB Cached: 419996 kB SwapCached:0 kB Active:50856 kB Inactive: 422424 kB Active(anon): 24948 kB Inactive(anon):25084 kB Active(file): 25908 kB Inactive(file): 397340 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194296 kB SwapFree:4194296 kB Dirty: 5056 kB Writeback: 0 kB AnonPages: 49080 kB Mapped: 7868 kB Shmem: 952 kB Slab: 11736 kB SReclaimable: 5604 kB SUnreclaim: 6132 kB KernelStack:1920 kB PageTables: 3728 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit: 4448832 kB Committed_AS:1419384 kB VmallocTotal: 34359738367 kB VmallocUsed:5536 kB VmallocChunk: 34359728048 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:8180 kB DirectMap2M: 516096 kB [ 170.625928] rdiff-backup.bi: page allocation failure. order:0, mode:0x20 [ 170.625934] Pid: 2398, comm: rdiff-backup.bi Not tainted 2.6.32-5-amd64 #1 [ 170.625935] Call Trace: [ 170.625937] IRQ [810b8b7f] ? __alloc_pages_nodemask+0x55b/0x5d0 [ 170.625993] [81245a6c] ? __alloc_skb+0x69/0x15a [ 170.626002] [a01aee52] ? try_fill_recv+0x8b/0x18b [virtio_net] [ 170.626004] [a01af8c6] ? virtnet_poll+0x543/0x5c9 [virtio_net] [ 170.626010] [8124cb8b] ? net_rx_action+0xae/0x1c9 [ 170.626032] [81052735] ? __do_softirq+0xdd/0x1a0 [ 170.626035] [a01ae153] ? skb_recv_done+0x28/0x34 [virtio_net] [ 170.626044] [81011cac] ? call_softirq+0x1c/0x30 [ 170.626049] [81013207] ? do_softirq+0x3f/0x7c [ 170.626051] [810525a4] ? irq_exit+0x36/0x76 [ 170.626053] [810128fe] ? do_IRQ+0xa0/0xb6 [ 170.626061] [810114d3] ? ret_from_intr+0x0/0x11 [ 170.626062] EOI [ 170.626063] Mem-Info: [ 170.626065] Node 0 DMA per-cpu: [ 170.626072] CPU0: hi:0, btch: 1 usd: 0 [ 170.626073] CPU1: hi:0, btch: 1 usd: 0 [ 170.626074] Node 0 DMA32 per-cpu: [ 170.626076] CPU0: hi: 186, btch: 31 usd: 30 [ 170.626078] CPU1: hi: 186, btch: 31 usd: 181 [ 170.626082] active_anon:6237 inactive_anon:6271 isolated_anon:0 [ 170.626083] active_file:6476 inactive_file:100535 isolated_file:32 [ 170.626084] unevictable:0 dirty:1008 writeback:0 unstable:2050 [ 170.626084] free:729 slab_reclaimable:1401 slab_unreclaimable:1762 [ 170.626085] mapped:1967 shmem:238 pagetables:932 bounce:0 [ 170.626087] Node 0 DMA free:1980kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:13856kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15372kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:32kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 170.626099] lowmem_reserve[]: 0 489 489 489 [ 170.626101] Node 0 DMA32 free:936kB min:2784kB low:3480kB high:4176kB active_anon:24948kB inactive_anon:25084kB active_file:25904kB inactive_file:388284kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:500948kB mlocked:0kB dirty:4032kB writeback:0kB mapped:7868kB shmem:952kB slab_reclaimable:5572kB slab_unreclaimable:7040kB kernel_stack:1912kB pagetables:3728kB unstable:8200kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 170.626110] lowmem_reserve[]: 0 0 0 0 [ 170.626112] Node 0 DMA: 0*4kB 1*8kB 1*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1976kB [ 170.626118] Node 0 DMA32: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 936kB [ 170.626125] 107278 total pagecache pages [ 170.626126] 0 pages in swap cache [ 170.626127] Swap cache stats: add 0, delete 0, find 0/0 [ 170.626128] Free swap = 4194296kB [ 170.626130] Total swap = 4194296kB [ 170.631675] 131069 pages RAM [ 170.631677] 3801 pages reserved [ 170.631678] 23548 pages shared [ 170.631679] 113310
Bug#592187: Bug#576838: virtio network crashes again
Okay, next round: This time, 2.6.32-19 and virtio in guest, 2.6.32-18 in the host and sadly, it's not fixed: [ 159.772700] rdiff-backup.bi: page allocation failure. order:0, mode:0x20 [ 159.772708] Pid: 2524, comm: rdiff-backup.bi Not tainted 2.6.32-5-amd64 #1 [ 159.772710] Call Trace: [ 159.772712] IRQ [810b8b6f] ? __alloc_pages_nodemask+0x55b/0x5d0 [ 159.772759] [81245b2c] ? __alloc_skb+0x69/0x15a [ 159.772779] [a0202e52] ? try_fill_recv+0x8b/0x18b [virtio_net] [ 159.772784] [a02038c6] ? virtnet_poll+0x543/0x5c9 [virtio_net] [ 159.772799] [8124cc4b] ? net_rx_action+0xae/0x1c9 [ 159.772817] [8105274d] ? __do_softirq+0xdd/0x1a0 [ 159.772829] [a0202153] ? skb_recv_done+0x28/0x34 [virtio_net] [ 159.772838] [81011cac] ? call_softirq+0x1c/0x30 [ 159.772843] [81013207] ? do_softirq+0x3f/0x7c [ 159.772845] [810525bc] ? irq_exit+0x36/0x76 [ 159.772847] [810128fe] ? do_IRQ+0xa0/0xb6 [ 159.772850] [810114d3] ? ret_from_intr+0x0/0x11 [ 159.772851] EOI [81242beb] ? kmap_skb_frag+0x3/0x43 [ 159.772856] [81243b2d] ? skb_checksum+0xfa/0x23f [ 159.772858] [8124726d] ? __skb_checksum_complete_head+0x15/0x55 [ 159.772868] [81282d4f] ? tcp_checksum_complete_user+0x1f/0x3c [ 159.772870] [812835dd] ? tcp_rcv_established+0x3c5/0x6d9 [ 159.772875] [8128a87b] ? tcp_v4_do_rcv+0x1bb/0x376 [ 159.772877] [812876e8] ? tcp_write_xmit+0x883/0x96c [ 159.772880] [81240ac1] ? release_sock+0x46/0x96 [ 159.772882] [8127ca05] ? tcp_sendmsg+0x78a/0x87e [ 159.772885] [8123e515] ? sock_sendmsg+0xa3/0xbb [ 159.772894] [8106386e] ? autoremove_wake_function+0x0/0x2e [ 159.772902] [810c5b9c] ? zone_statistics+0x3c/0x5d [ 159.772906] [8104085f] ? pick_next_task_fair+0xcd/0xd8 [ 159.772919] [8123e814] ? kernel_sendmsg+0x32/0x3f [ 159.772943] [a02acca6] ? xs_send_kvec+0x78/0x7f [sunrpc] [ 159.772948] [a02acd36] ? xs_sendpages+0x89/0x1a1 [sunrpc] [ 159.772953] [a02acf43] ? xs_tcp_send_request+0x44/0x131 [sunrpc] [ 159.772961] [a02ab263] ? xprt_transmit+0x17b/0x25a [sunrpc] [ 159.772996] [f[ 340.048248] serial8250: too much work for irq4 fffa033af51] ? nfs3_xdr_readargs+0x7a/0x89 [nfs] [ 159.773000] [a02a8c14] ? call_transmit+0x1fb/0x246 [sunrpc] [ 159.773009] [a02af2ab] ? __rpc_execute+0x7d/0x24d [sunrpc] [ 159.773032] [a02a94d4] ? rpc_run_task+0x53/0x5b [sunrpc] [ 159.773042] [a0334844] ? nfs_read_rpcsetup+0x1d2/0x1f4 [nfs] [ 159.773048] [a03344b5] ? readpage_async_filler+0x0/0xbf [nfs] [ 159.773061] [a0332c14] ? nfs_pageio_doio+0x2a/0x51 [nfs] [ 159.773067] [a0332d00] ? nfs_pageio_add_request+0xc5/0xd5 [nfs] [ 159.773072] [a0334532] ? readpage_async_filler+0x7d/0xbf [nfs] [ 159.773076] [810ba59c] ? read_cache_pages+0x91/0x105 [ 159.773082] [a033430a] ? nfs_readpages+0x155/0x1b4 [nfs] [ 159.773087] [a0334be0] ? nfs_pagein_one+0x0/0xd0 [nfs] [ 159.773092] [81046ccf] ? finish_task_switch+0x3a/0xaf [ 159.773094] [810ba0b9] ? __do_page_cache_readahead+0x11b/0x1b4 [ 159.773097] [810ba16e] ? ra_submit+0x1c/0x20 [ 159.773099] [810ba45d] ? page_cache_async_readahead+0x75/0xad [ 159.773109] [810b3c82] ? generic_file_aio_read+0x23a/0x52b [ 159.773118] [810ecdc9] ? do_sync_read+0xce/0x113 [ 159.773124] [8100f79c] ? __switch_to+0x285/0x297 [ 159.773126] [8106386e] ? autoremove_wake_function+0x0/0x2e [ 159.773129] [81046ccf] ? finish_task_switch+0x3a/0xaf [ 159.773131] [810ed812] ? vfs_read+0xa6/0xff [ 159.773133] [810ed927] ? sys_read+0x45/0x6e [ 159.773136] [81010b42] ? system_call_fastpath+0x16/0x1b [ 159.773138] Mem-Info: [ 159.773139] Node 0 DMA per-cpu: [ 159.773141] CPU0: hi:0, btch: 1 usd: 0 [ 159.773143] CPU1: hi:0, btch: 1 usd: 0 [ 159.773144] Node 0 DMA32 per-cpu: [ 159.773146] CPU0: hi: 186, btch: 31 usd: 184 [ 159.773147] CPU1: hi: 186, btch: 31 usd: 39 [ 159.773151] active_anon:5153 inactive_anon:2765 isolated_anon:0 [ 159.773152] active_file:17029 inactive_file:65343 isolated_file:0 [ 159.773153] unevictable:0 dirty:8266 writeback:0 unstable:443 [ 159.773154] free:787 slab_reclaimable:25621 slab_unreclaimable:3017 [ 159.773154] mapped:1946 shmem:238 pagetables:921 bounce:0 [ 159.773156] Node 0 DMA free:1992kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:3276kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15372kB mlocked:0kB dirty:1232kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:60kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
Bug#576838: virtio network crashes again
Am Samstag, den 07.08.2010, 12:18 +0100 schrieb Ben Hutchings: On Sat, 2010-08-07 at 11:21 +0200, Lukas Kolbe wrote: Hi, I sent this earlier today but the bug was archived so it didn't appear anywhere, hence the resend. I believe this issue is not fixed at all in 2.6.32-18. We have seen this behaviour in various kvm guests using virtio_net with the same kernel in the guest only minutes after starting the nightly backup (rdiff-backup to an nfs-volume on a remote server), eventually leading to a non-functional network. Often, the machines even do not reboot and hang instead. Using the rtl8139 instead of virtio helps, but that's really only a clumsy workaround. [...] I think you need to give your guests more memory. They all have between 512M and 2G - and it happens to all of them using virtio_net, and none of them using rtl8139 as a network driver, reproducibly. I would be delighted if it was as simple as giving them more RAM, but sadly it isn't. This is how we start the guests: #!/bin/bash KERNEL=2.6.32-5-amd64 NAME=tin kvm -smp 2 \ -drive if=virtio,file=/dev/system/tin_root,cache=off,boot=on \ -drive if=virtio,file=/dev/system/tin_log,cache=off,boot=off \ -drive if=virtio,file=/dev/system/tin_swap,cache=off,boot=off \ -drive if=virtio,file=/dev/system/tin_data,cache=off,boot=off \ -m 1024 \ -nographic \ -daemonize \ -name ${NAME} \ -kernel /boot/kvm/${NAME}/vmlinuz-${KERNEL} \ -initrd /boot/kvm/${NAME}/initrd.img-${KERNEL} \ -append root=/dev/vda ro console=ttyS0,115200 \ -serial mon:unix:/etc/kvm/consoles/${NAME}.sock,server,nowait \ -net nic,macaddr=00:1A:4A:00:8E:3c,model=rtl8139 \ -net tap,script=/etc/kvm/kvm-ifup-vlan142 Change model=rtl8139 to virtio, and the next time rdiff-backup runs, the network stops working and eventually the guest hangs/can't be halted anymore after a while. qemu-kvm is version 0.12.4+dfsg-1, kernel is 2.6.32-18 on both host and guest. And the page allocation failures look suspiciously similar to the ones the original bug reporter saw when using 2.6.32-12. If it would be an OOM situation, wouldn't the OOM-killer be supposed to kick in? /proc/meminfo on the host: sajama:~# cat /proc/meminfo MemTotal:8197652 kB MemFree: 2444964 kB Buffers: 13560 kB Cached: 128812 kB SwapCached: 6892 kB Active: 5102584 kB Inactive: 316616 kB Active(anon):5035456 kB Inactive(anon): 242180 kB Active(file): 67128 kB Inactive(file):74436 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 8388600 kB SwapFree:8355640 kB Dirty: 8 kB Writeback: 0 kB AnonPages: 5271936 kB Mapped: 5892 kB Shmem: 804 kB Slab: 79844 kB SReclaimable: 21184 kB SUnreclaim:58660 kB KernelStack:1880 kB PageTables:14256 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit:12487424 kB Committed_AS:6440192 kB VmallocTotal: 34359738367 kB VmallocUsed: 305788 kB VmallocChunk: 34359332988 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:7872 kB DirectMap2M: 8380416 kB /proc/meminfo on the guest (currently using rtl8139 as a network model): lin...@tin:~$ cat /proc/meminfo MemTotal:1027200 kB MemFree: 84336 kB Buffers: 99588 kB Cached: 152592 kB SwapCached: 3160 kB Active: 370304 kB Inactive: 401924 kB Active(anon): 264088 kB Inactive(anon): 256724 kB Active(file): 106216 kB Inactive(file): 145200 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 4194296 kB SwapFree:4175892 kB Dirty:16 kB Writeback: 0 kB AnonPages:517608 kB Mapped:31348 kB Shmem: 764 kB Slab: 147396 kB SReclaimable: 140440 kB SUnreclaim: 6956 kB KernelStack:1472 kB PageTables: 9948 kB NFS_Unstable: 0 kB Bounce:0 kB WritebackTmp: 0 kB CommitLimit: 4707896 kB Committed_AS: 893160 kB VmallocTotal: 34359738367 kB VmallocUsed:9096 kB VmallocChunk: 34359724404 kB HardwareCorrupted: 0 kB HugePages_Total: 0 HugePages_Free:0 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k:8180 kB DirectMap2M: 1040384 kB -- Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1281197867.11319.6.ca...@larosa.fritz.box
Bug#574348: similar crash on a different host
Package: linux-2.6 Version: 2.6.33-1 Hi again, we've just seen a very similar crash on a different host with 2.6.33-1. After the last trace, the machine crashed. I'll report this to upstream bugzilla as well and let you know about the bugnumber. Mar 18 20:11:22 simon kernel: [87960.628069] kswapd0 D 0002 047 2 0x Mar 18 20:11:22 simon kernel: [87960.628073] 88022b4e8700 0046 812ee74e Mar 18 20:11:22 simon kernel: [87960.628078] b45edcbe f8e0 88022e16ffd8 Mar 18 20:11:22 simon kernel: [87960.628081] 00015680 00015680 88022f135400 88022f1356f0 Mar 18 20:11:22 simon kernel: [87960.628085] Call Trace: Mar 18 20:11:22 simon kernel: [87960.628095] [812ee74e] ? common_interrupt+0xe/0x13 Mar 18 20:11:22 simon kernel: [87960.628100] [81066acd] ? ktime_get_ts+0x68/0xb2 Mar 18 20:11:22 simon kernel: [87960.628105] [8109401a] ? delayacct_end+0x74/0x7f Mar 18 20:11:22 simon kernel: [87960.628135] [a0386c0c] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628138] [812ed082] ? io_schedule+0x73/0xb7 Mar 18 20:11:22 simon kernel: [87960.628145] [a0386c15] ? nfs_wait_bit_uninterruptible+0x9/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628147] [812ed5ad] ? __wait_on_bit+0x41/0x70 Mar 18 20:11:22 simon kernel: [87960.628151] [81183c0b] ? __lookup_tag+0xad/0x11b Mar 18 20:11:22 simon kernel: [87960.628158] [a0386c0c] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628161] [812ed647] ? out_of_line_wait_on_bit+0x6b/0x77 Mar 18 20:11:22 simon kernel: [87960.628164] [8105f2e0] ? wake_bit_function+0x0/0x23 Mar 18 20:11:22 simon kernel: [87960.628172] [a038ab87] ? nfs_sync_mapping_wait+0xfa/0x227 [nfs] Mar 18 20:11:22 simon kernel: [87960.628179] [a038ad48] ? nfs_wb_page+0x94/0xc3 [nfs] Mar 18 20:11:22 simon kernel: [87960.628185] [a037de04] ? nfs_release_page+0x3a/0x57 [nfs] Mar 18 20:11:22 simon kernel: [87960.628189] [810b91e6] ? shrink_page_list+0x48d/0x617 Mar 18 20:11:22 simon kernel: [87960.628192] [810b82a1] ? isolate_pages_global+0x1a0/0x20f Mar 18 20:11:22 simon kernel: [87960.628194] [810b9d47] ? shrink_zone+0x710/0xae9 Mar 18 20:11:22 simon kernel: [87960.628198] [8100948e] ? apic_timer_interrupt+0xe/0x20 Mar 18 20:11:22 simon kernel: [87960.628201] [810bab7e] ? kswapd+0x5d3/0x80c Mar 18 20:11:22 simon kernel: [87960.628203] [810b8101] ? isolate_pages_global+0x0/0x20f Mar 18 20:11:22 simon kernel: [87960.628206] [8105f2b2] ? autoremove_wake_function+0x0/0x2e Mar 18 20:11:22 simon kernel: [87960.628208] [810ba5ab] ? kswapd+0x0/0x80c Mar 18 20:11:22 simon kernel: [87960.628211] [8105ee79] ? kthread+0x79/0x81 Mar 18 20:11:22 simon kernel: [87960.628214] [810098e4] ? kernel_thread_helper+0x4/0x10 Mar 18 20:11:22 simon kernel: [87960.628217] [8105ee00] ? kthread+0x0/0x81 Mar 18 20:11:22 simon kernel: [87960.628219] [810098e0] ? kernel_thread_helper+0x0/0x10 Mar 18 20:11:22 simon kernel: [87960.628404] flush-0:24D 0 6946 2 0x Mar 18 20:11:22 simon kernel: [87960.628407] 88022f075400 0046 812ee74e Mar 18 20:11:22 simon kernel: [87960.628411] b45edcbe f8e0 8801fc4affd8 Mar 18 20:11:22 simon kernel: [87960.628413] 00015680 00015680 88022b4e8700 88022b4e89f0 Mar 18 20:11:22 simon kernel: [87960.628416] Call Trace: Mar 18 20:11:22 simon kernel: [87960.628419] [812ee74e] ? common_interrupt+0xe/0x13 Mar 18 20:11:22 simon kernel: [87960.628422] [81066acd] ? ktime_get_ts+0x68/0xb2 Mar 18 20:11:22 simon kernel: [87960.628424] [8109401a] ? delayacct_end+0x74/0x7f Mar 18 20:11:22 simon kernel: [87960.628431] [a0386c0c] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628433] [812ed082] ? io_schedule+0x73/0xb7 Mar 18 20:11:22 simon kernel: [87960.628440] [a0386c15] ? nfs_wait_bit_uninterruptible+0x9/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628442] [812ed5ad] ? __wait_on_bit+0x41/0x70 Mar 18 20:11:22 simon kernel: [87960.628445] [81183c0b] ? __lookup_tag+0xad/0x11b Mar 18 20:11:22 simon kernel: [87960.628451] [a0386c0c] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 18 20:11:22 simon kernel: [87960.628454] [812ed647] ? out_of_line_wait_on_bit+0x6b/0x77 Mar 18 20:11:22 simon kernel: [87960.628457] [8105f2e0] ? wake_bit_function+0x0/0x23 Mar 18 20:11:22 simon kernel: [87960.628464] [a038ab87] ? nfs_sync_mapping_wait+0xfa/0x227 [nfs] Mar 18 20:11:22 simon
Bug#574348: upstream bug
reported to bugzilla.kernel.org as https://bugzilla.kernel.org/show_bug.cgi?id=15578 thanks, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1268992407.27945.6.ca...@larosa.zuhause
Bug#574348: Kernel 2.6.32-9: crash
Package: linux-2.6 Version: 2.6.32-9 Hi all, we have numerous fileservers currently running variants of 2.6.30, 2.6.32 and 2.6.33. When we used 2.6.32-9 (with ABI version 3), we got repeated crashes on one server in kswapd and flush. Three hours after these traces, the machine crashed hard (no console, no sysrq). A day later, the machine crashed again with 2.6.32-9, albeit leaving no messages in syslog. We're now back to using 2.6.30, which seems more stable. The server is moderatly loaded with nfs3/4 traffic and has 151 exports and has 4 ext4-filesystems (1.5TiB in total). Mar 11 06:45:14 river kernel: [40200.628071] kswapd0 D 0002 047 2 0x Mar 11 06:45:14 river kernel: [40200.628076] 88022f073880 0046 810114ce Mar 11 06:45:14 river kernel: [40200.628080] b468199e f8a0 88022c8e3fd8 000155c0 Mar 11 06:45:14 river kernel: [40200.628084] 000155c0 88022f135bd0 88022f135ec8 0001 Mar 11 06:45:14 river kernel: [40200.628088] Call Trace: Mar 11 06:45:14 river kernel: [40200.628112] [810114ce] ? common_interrupt+0xe/0x13 Mar 11 06:45:14 river kernel: [40200.628116] [81098e5e] ? delayacct_end+0x74/0x7f Mar 11 06:45:14 river kernel: [40200.628129] [a038dc38] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 11 06:45:14 river kernel: [40200.628134] [812ee03d] ? io_schedule+0x73/0xb7 Mar 11 06:45:14 river kernel: [40200.628140] [a038dc41] ? nfs_wait_bit_uninterruptible+0x9/0xd [nfs] Mar 11 06:45:14 river kernel: [40200.628143] [812ee53d] ? __wait_on_bit+0x41/0x70 Mar 11 06:45:14 river kernel: [40200.628148] [8118a10f] ? __lookup_tag+0xad/0x11b Mar 11 06:45:14 river kernel: [40200.628154] [a038dc38] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 11 06:45:14 river kernel: [40200.628157] [812ee5d7] ? out_of_line_wait_on_bit+0x6b/0x77 Mar 11 06:45:14 river kernel: [40200.628161] [81064a64] ? wake_bit_function+0x0/0x23 Mar 11 06:45:14 river kernel: [40200.628168] [a0391bc3] ? nfs_sync_mapping_wait+0xfa/0x227 [nfs] Mar 11 06:45:14 river kernel: [40200.628175] [a0391d84] ? nfs_wb_page+0x94/0xc3 [nfs] Mar 11 06:45:14 river kernel: [40200.628179] [810b4968] ? __remove_from_page_cache+0x33/0xb6 Mar 11 06:45:14 river kernel: [40200.628185] [a0384e54] ? nfs_release_page+0x3a/0x57 [nfs] Mar 11 06:45:14 river kernel: [40200.628189] [810bd244] ? shrink_page_list+0x481/0x617 Mar 11 06:45:14 river kernel: [40200.628192] [8101166e] ? apic_timer_interrupt+0xe/0x20 Mar 11 06:45:14 river kernel: [40200.628195] [810bc317] ? isolate_pages_global+0x1a0/0x20f Mar 11 06:45:14 river kernel: [40200.628198] [810bdaf1] ? shrink_list+0x44a/0x725 Mar 11 06:45:14 river kernel: [40200.628206] [a0276828] ? jbd2_journal_release_jbd_inode+0x55/0x10e [jbd2] Mar 11 06:45:14 river kernel: [40200.628211] [810e2fd7] ? add_partial+0x11/0x58 Mar 11 06:45:14 river kernel: [40200.628214] [810be04c] ? shrink_zone+0x280/0x342 Mar 11 06:45:14 river kernel: [40200.628216] [810be24f] ? shrink_slab+0x141/0x153 Mar 11 06:45:14 river kernel: [40200.628219] [810bea71] ? kswapd+0x4b9/0x683 Mar 11 06:45:14 river kernel: [40200.628222] [810bc177] ? isolate_pages_global+0x0/0x20f Mar 11 06:45:14 river kernel: [40200.628224] [81064a36] ? autoremove_wake_function+0x0/0x2e Mar 11 06:45:14 river kernel: [40200.628227] [810be5b8] ? kswapd+0x0/0x683 Mar 11 06:45:14 river kernel: [40200.628229] [81064769] ? kthread+0x79/0x81 Mar 11 06:45:14 river kernel: [40200.628232] [81011baa] ? child_rip+0xa/0x20 Mar 11 06:45:14 river kernel: [40200.628234] [810646f0] ? kthread+0x0/0x81 Mar 11 06:45:14 river kernel: [40200.628237] [81011ba0] ? child_rip+0x0/0x20 Mar 11 06:45:14 river kernel: [40200.628409] flush-0:24D 0002 0 4682 2 0x Mar 11 06:45:14 river kernel: [40200.628412] 88022f0754c0 0046 88020bfbf1ac Mar 11 06:45:14 river kernel: [40200.628415] f8a0 88020bfbffd8 000155c0 Mar 11 06:45:14 river kernel: [40200.628418] 000155c0 880203973170 880203973468 0002 Mar 11 06:45:14 river kernel: [40200.628421] Call Trace: Mar 11 06:45:14 river kernel: [40200.628423] [81098e5e] ? delayacct_end+0x74/0x7f Mar 11 06:45:14 river kernel: [40200.628430] [a038dc38] ? nfs_wait_bit_uninterruptible+0x0/0xd [nfs] Mar 11 06:45:14 river kernel: [40200.628433] [812ee03d] ? io_schedule+0x73/0xb7 Mar 11 06:45:14 river kernel: [40200.628439] [a038dc41] ? nfs_wait_bit_uninterruptible+0x9/0xd [nfs] Mar 11 06:45:14 river kernel: [40200.628442] [812ee53d] ? __wait_on_bit+0x41/0x70
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Hi Moritz, Does this bug still persist with the current Lenny kernel? Lucky me (sort of) - we stumbled upon this bug again on another server (8 cores, 8 GB Ram) using 2.6.26-1-amd64_2.6.26-11, and fortune wants it that it will be under heavy load tomorrow so that we can try the -12 kernel. I'll report back on it. Cheers, Moritz -- Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Am Sonntag, den 14.12.2008, 23:50 +0100 schrieb Moritz Muehlenhoff: On Mon, Sep 01, 2008 at 03:56:49PM +0200, Lukas Kolbe wrote: Hi! seeing if it is fixed in 2.6.27-rc5 might be more interesting. thanks 2.6.27-rc5 has now been running fine in the guest for more than four hours (and me restarting jboss every now and then). I'll report back tomorrow evening, that would be the timeframe the bug should've triggered. So far the 2.6.27-rc5 seems to be stable, at least it hasn't crashed on me. What can I do to help to get the needed fix to testing? (I know that this kernel won't make it and that's a good thing, but I don't really know how to identify what's needed to fix this). Does this bug still persist with the current Lenny kernel? I am sorry to report that I cannot reproduce this bug anymore, but that's because the system locks up hard (I have no serial console to it and it is remotely accessible only). When I do an rsync of ca. 100GB of data on a guest from a remote location (both guest and host with 2.6.26-1-amd64 2.6.26-12, but it also happened with -9), after around 10-25GB the host (!) locks up. Ping does work, but the shells are dead and no further access (via ssh) is possible at all. After a forced reboot, the logs show up completely clueless (as in: syslog marks, but nothing else until the new bootup messages). When I do the rsync in the host system (chroot'ed to the guest system for easyness), I can transfer all 100GB without a hitch. I am using virtio by the way, as in: #!/bin/bash KERNEL=2.6.26-1-amd64 NAME=myname kvm -smp 2 \ -drive if=virtio,file=/dev/vg0/${NAME}-root,cache=on,boot=on \ -drive if=virtio,file=/dev/vg0/${NAME}-log,cache=on,boot=off \ -m 512 \ -nographic \ -daemonize \ -name ${NAME} \ -kernel /boot/kvm/${NAME}/vmlinuz-${KERNEL} \ -initrd /boot/kvm/${NAME}/initrd.img-${KERNEL} \ -append root=/dev/vda ro console=ttyS0,115200 \ -serial mon:unix:/etc/kvm/consoles/${NAME}.sock,server,nowait \ -net nic,macaddr=DE:AD:BE:EF:21:75,model=virtio \ -net tap,ifname=tap04,script=/etc/kvm/kvm-ifup \ -net nic,macaddr=DE:AD:BE:EF:21:76,model=virtio \ -net tap,ifname=tap14,script=/etc/kvm/kvm-ifup-${NAME} \ I'm sorry as I can't at the moment reproduce the problem that lead to open this bug. Maybe I should open a new bug for this? Cheers, Moritz All the best and a happy new year, Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Hi, So far the 2.6.27-rc5 seems to be stable, at least it hasn't crashed on me. What can I do to help to get the needed fix to testing? (I know that this kernel won't make it and that's a good thing, but I don't really know how to identify what's needed to fix this). Does this bug still persist with the current Lenny kernel? Sorry, I have no machine to test this at the moment. Perhaps I find one in the next days ... Cheers, Moritz -- Lukas -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Hi! seeing if it is fixed in 2.6.27-rc5 might be more interesting. thanks 2.6.27-rc5 has now been running fine in the guest for more than four hours (and me restarting jboss every now and then). I'll report back tomorrow evening, that would be the timeframe the bug should've triggered. So far the 2.6.27-rc5 seems to be stable, at least it hasn't crashed on me. What can I do to help to get the needed fix to testing? (I know that this kernel won't make it and that's a good thing, but I don't really know how to identify what's needed to fix this). -- maks -- Lukas -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
maximilian attems wrote: should be fixed in 2.6.26-4, should be available tomorrow in unstable. otherwise find sid snapshots http://wiki.debian.org/DebianKernel Sorry, but it crashed on me again - this time stuck in swapper. [36037.786125] BUG: soft lockup - CPU#1 stuck for 4097s! [swapper:0] [36037.786125] Modules linked in: ipv6 snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport serio_raw psmouse pcspkr i2c_piix4 i2c_core button evdev joydev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virtio_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs [36037.786125] CPU 1: [36037.786125] Modules linked in: ipv6 snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport serio_raw psmouse pcspkr i2c_piix4 i2c_core button evdev joydev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virtio_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs [36037.786125] Pid: 0, comm: swapper Not tainted 2.6.26-1-amd64 #1 [36037.786125] RIP: 0010:[8021eb20] [8021eb20] native_safe_halt+0x2/0x3 [36037.786125] RSP: 0018:8100c6ea5f38 EFLAGS: 0246 [36037.786125] RAX: 8100c6ea5fd8 RBX: RCX: [36037.786125] RDX: RSI: 0001 RDI: 804fadf0 [36037.786125] RBP: 00ce8b7c R08: 8100010276a0 R09: 8100c55b4b10 [36037.786125] R10: 8100c45dbbc8 R11: 8100c485f160 R12: 8100c6ea5ed8 [36037.786125] R13: R14: 8023cd02 R15: 196c2157187c [36037.786125] FS: 414e8960() GS:8100c6e4a0c0() knlGS: [36037.786125] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [36037.786125] CR2: 7f4a0d0c4000 CR3: c548c000 CR4: 06e0 [36037.786125] DR0: DR1: DR2: [36037.786125] DR3: DR6: 0ff0 DR7: 0400 [36037.786125] [36037.786125] Call Trace: [36037.786125] [8020b0cd] ? default_idle+0x2a/0x49 [36037.786125] [8020ac79] ? cpu_idle+0x89/0xb3 [36037.786125] [40435.828242] BUG: soft lockup - CPU#0 stuck for 8193s! [cron:2082] [9223956581.178656] Modules linked in: ipv6 snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport serio_raw psmouse pcspkr i2c_piix4 i2c_core button evdev joydev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virtio_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs [9223956581.178656] CPU 0: [9223956581.178656] Modules linked in: ipv6 snd_pcm snd_timer snd soundcore snd_page_alloc parport_pc parport serio_raw psmouse pcspkr i2c_piix4 i2c_core button evdev joydev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virtio_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs [9223956581.178656] Pid: 2082, comm: cron Not tainted 2.6.26-1-amd64 #1 [9223956581.178656] RIP: 0010:[8024aa86] [8024aa86] getnstimeofday+0x9/0x98 [9223956581.178656] RSP: 0018:8100c545df18 EFLAGS: 0202 [9223956581.178656] RAX: 00ce8b7d RBX: 00ce8b7d RCX: 07d8 [9223956581.178656] RDX: 0002 RSI: RDI: 8100c545df38 [9223956581.178656] RBP: 0008 R08: 0003 R09: 07d8 [9223956581.178656] R10: 07d8 R11: 0246 R12: 1000 [9223956581.178656] R13: 01c8 R14: R15: [9223956581.178656] FS: 7fcf5ad0e6d0() GS:8053b000() knlGS: [9223956581.178656] CS: 0010 DS: ES: CR0: 8005003b [9223956581.178656] CR2: 00418238 CR3: c41ac000 CR4: 06e0 [9223956581.178656] DR0: DR1: DR2: [9223956581.178656] DR3: DR6: 0ff0 DR7: 0400 [9223956581.178656] [9223956581.178656] Call Trace: [9223956581.178656] [8024aab6] ? getnstimeofday+0x39/0x98
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Sorry, my previous answer didn't make it through my mail setup. I was using 2.6.26-4snapshot.12144 when the crash happened. I'll try it with the current snapshot again, though the changelog doesn't say anything about actual changes :) -- Lukas -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
maximilian attems wrote: On Fri, Aug 29, 2008 at 02:36:00PM +0200, Lukas Kolbe wrote: Sorry, my previous answer didn't make it through my mail setup. I was using 2.6.26-4snapshot.12144 when the crash happened. I'll try it with the current snapshot again, though the changelog doesn't say anything about actual changes :) seeing if it is fixed in 2.6.27-rc5 might be more interesting. thanks 2.6.27-rc5 has now been running fine in the guest for more than four hours (and me restarting jboss every now and then). I'll report back tomorrow evening, that would be the timeframe the bug should've triggered. -- maks -- Lukas -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#496917: BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] and crash (kvm guest)
Package: linux-image-2.6.26-1-amd64 Version: 2.6.26-3 Severity: important Using kvm 72, the guest is started with: kvm -smp 2 \ -net nic,macaddr=DE:AD:BE:EF:21:71,model=virtio \ -net tap,ifname=tap02,script=/etc/kvm/kvm-ifup \ -net nic,macaddr=DE:AD:BE:EF:21:72,model=virtio \ -net tap,ifname=tap12,script=/etc/kvm/kvm-ifup-web \ -drive if=virtio,boot=on,file=/dev/vg0/web-root \ -drive if=virtio,file=/dev/vg0/web-log \ -drive if=virtio,file=/dev/vg0/web-srv \ -drive if=virtio,file=/dev/vg0/web-swap \ -m 3192 \ -kernel /boot/vmlinuz-2.6.26-1-amd64 \ -initrd /boot/initrd.img-2.6.26-1-amd64 \ -append root=/dev/vda ro \ -curses On the guest, I run the latest JDK from Sun 1.6.0_07, and the latest JBoss from Redhat 4.2.3.GA, default configuration (except that it's started via runit: exec chpst -u jboss:jboss -e ./env -o 65536 /opt/jboss/bin/run.sh -b host After restarting it a few times, I get the following crash of the guest kernel: Aug 28 14:38:06 web-new kernel: [ 6677.939474] BUG: soft lockup - CPU#0 stuck for 4096s! [master:2066] Aug 28 14:38:06 web-new kernel: [ 6677.941978] Modules linked in: ipv6 snd_pcsp snd_pcm parport_pc parport snd_timer serio_raw snd psmouse soundcore i2c_piix4 snd_page_alloc i2c_core button evdev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virti o_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs Aug 28 14:38:06 web-new kernel: [ 6677.941978] CPU 0: Aug 28 14:38:06 web-new kernel: [ 6677.941978] Modules linked in: ipv6 snd_pcsp snd_pcm parport_pc parport snd_timer serio_raw snd psmouse soundcore i2c_piix4 snd_page_alloc i2c_core button evdev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virti o_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs Aug 28 14:38:06 web-new kernel: [ 6677.941978] Pid: 2066, comm: master Not tainted 2.6.26-1-amd64 #1 Aug 28 14:38:06 web-new kernel: [ 6677.941978] RIP: 0033:[7fcf1e3d4c0a] [7fcf1e3d4c0a] Aug 28 14:38:06 web-new kernel: [ 6677.941978] RSP: 002b:7fff264e18e0 EFLAGS: 0202 Aug 28 14:38:06 web-new kernel: [ 6677.941978] RAX: 0002 RBX: 0086 RCX: Aug 28 14:38:06 web-new kernel: [ 6677.941978] RDX: 0430 RSI: 7fcf1e4dc903 RDI: 00401895 Aug 28 14:38:06 web-new kernel: [ 6677.941978] RBP: 005078bc R08: 7fcf1e4e4c98 R09: Aug 28 14:38:06 web-new kernel: [ 6677.941978] R10: R11: 0064 R12: 7fff264e1e50 Aug 28 14:38:06 web-new kernel: [ 6677.941978] R13: 7fff264e1ed0 R14: 7fcf1e298de0 R15: 7fff264e1f50 Aug 28 14:38:06 web-new kernel: [ 6677.941978] FS: 7fcf1e4d96d0() GS:8053b000() knlGS: Aug 28 14:38:06 web-new kernel: [ 6677.941978] CS: 0010 DS: ES: CR0: 8005003b Aug 28 14:38:06 web-new kernel: [ 6677.941978] CR2: 7f65642cb860 CR3: c4162000 CR4: 06e0 Aug 28 14:38:06 web-new kernel: [ 6677.941978] DR0: DR1: DR2: Aug 28 14:38:06 web-new kernel: [ 6677.941978] DR3: DR6: 0ff0 DR7: 0400 Aug 28 14:38:06 web-new kernel: [ 6677.941978] Aug 28 14:38:06 web-new kernel: [ 6677.941978] Call Trace: Aug 28 14:38:06 web-new kernel: [ 6677.941978] Aug 28 14:38:06 web-new kernel: [ 6677.939462] BUG: soft lockup - CPU#1 stuck for 4096s! [java:3174] Aug 28 14:38:06 web-new kernel: [ 6677.939462] Modules linked in: ipv6 snd_pcsp snd_pcm parport_pc parport snd_timer serio_raw snd psmouse soundcore i2c_piix4 snd_page_alloc i2c_core button evdev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virti o_net virtio_blk virtio freq_table processor thermal_sys raid1 raid0 md_mod atiixp ahci sata_nv sata_sil sata_via libata dock via82cxxx ide_core 3w_9xxx 3w_ scsi_mod xfs ext3 jbd ext2 mbcache reiserfs Aug 28 14:38:06 web-new kernel: [ 6677.939462] CPU 1: Aug 28 14:38:06 web-new kernel: [ 6677.939462] Modules linked in: ipv6 snd_pcsp snd_pcm parport_pc parport snd_timer serio_raw snd psmouse soundcore i2c_piix4 snd_page_alloc i2c_core button evdev dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic floppy piix ide_pci_generic thermal fan virtio_balloon virtio_pci virtio_ring virtio_rng rng_core virti o_net virtio_blk virtio freq_table
Bug#271315: Same problem here (sun keyboard obviously gets wrong keymap)
I see this bug is now 183 days old, but I fell into the same trap. Recently, after upgrading to kernel-image-2.6.8-2-sparc64, The keyboard got a wrong layout - pretty much every key know has another meaning. Unfortunately, the the old 2.4 kernel is not bootable (silo has forgotten about it) any more, so I can't do much testing. I try to rescue that machine in a few days. -- Lukas signature.asc Description: This is a digitally signed message part