Package: linux-image-2.6.26-1-686
Severity: important


I has a working setup for lenny with 2.6.25 but now that 2.6.26 has entered
lenny I have found that the machine hangs.

Even though I have spent a lot of time trying to isolate the problem I can't
say much right now, the machines I started seing this on are [EMAIL PROTECTED] 
which
had to stablish an ipsec tunnel on their eth2 interface.

As I said the machines run ok on 2.6.25 but hang on 2.6.26 tested both
current lenny's -3 and also sid's -4 with the same result, a hang. I even
tested a -amd64 kernel copying the 32 bits files from the PIII to a core duo
machine and it also hanged. Typically it hangs on restarting openntpd when
configuring the network interfaces, the messages I was getting were like
this one:
[  125.628018] BUG: soft lockup - CPU#0 stuck for 61s! [ntpd:1915]
and are repeating every 64 seconds average.

The hang tends to be always when the machine is setting up the network on
booting, at that time it reloads the ntp daemon (I'm using openntpd) but
removing this daemon didn't mahe a change. Also sometimes it stops before
this point, when mounting the swap, but in a faster machine I found the
problem at a later stage, when starting the IKE server (racoon). It was on
this fater machine that I was able to boot on single user mode using the
amd64 kernel, but then when I tried to ping over the ipsec tunnel (I even
had the network cables off, so no tunnel had been established yet), the
machin hanged and started outputing to the screen a register dump and a call
trace.

I'll try to type here part of the call trace:

[ffffffff80429785] ? _spin_lock_bh+0x9/0x1f
[ffffffff80410505] ? __xfrm_state_destroy+0x3a/0xaf
[ffffffff80411c1a] ? xfrm_state_find+0x542/0x5a9
[ffffffff8040c51d] ? xfrm_tmpl_resolve+0x1af/0x2ed
[ffffffff803c585a] ? flow_cache_lookup+0x30a/0x35d
[ffffffff8040e367] ? xfrm_policy_lookup+0x0/0x1d4
[ffffffff804296bf] ? _read_lock_bh+0x9/0x19
[ffffffff8040caae] ? __xfrm_lookup+0x197/0x8e4
[ffffffff803d69bd] ? __ip_route_output_flow+0x83a/0x8f8
[ffffffff803d6ae4] ? ip_route_output_flow+0x69/0x1de
[ffffffff803f59e5] ? ip4_datagram_connect+0x165/0x248
[ffffffff803b047f] ? sys_connect+0x76/0xa6
[ffffffff802ab114] ? d_instantiate+0x52/0x67
[ffffffff803af60c] ? sock_attach_fd+0x84/0xaf
[ffffffff80299129] ? fd_install+0x25/0x56
[ffffffff803af686] ? sock_map_fd+0x4f/0x5a
[ffffffff803c8e1e] ? compat_sys_socketcall+0x73/0x172
[ffffffff80224bb2] ? sysenter_do_cal+0x1b/0x66

This dump was alternating with one that was the same as this one but without
the "?" and lacking the __xfrm_state_destroy+0x3a/0xaf line but with this
extra line at the end:
[ffffffff80306316] cap_task_post_setuid+0x0/0x1d3

The only way I could stop this message was to rename eth2 to eth3 for
example, without touching any other config, that way the eth2 card didn't
exist anymore and the services trying to use it wouldn't work, but the
machine did boot. After the boot I did ifconfig eth3 manually just to make
sure that having the interface down didn't have anything to do and the
machine wouldn't hang.

Just in case it was a problem with some network driver I changed the network
cards (a total of 3) for e100 driven cards on one test and 8139too on
another one with similar results as I had obtained on my first setup which
had a mixture of cards.

I tried to change stuff on the bios and tried not to load some of the unused
drivers just in case but nothing changed, I also did a clone of the machine
but without using a sofware raid which the first one had but nothing
changed.

Hope this helps finding the problem.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to