Re: [Xenomai-help] kernel 2.6.32.11 with xenomai 2.5.3 fails to boot on ubuntu lucid system

Stefan Kisdaroczi Wed, 18 Aug 2010 05:15:11 -0700

On 18.08.2010 10:27, Philippe Gerum wrote:
> On Tue, 2010-08-17 at 19:43 +0200, Stefan Kisdaroczi wrote:
>   
>> On 17.08.2010 12:27, Philippe Gerum wrote:
>>     
>>> On Mon, 2010-08-16 at 21:14 +0200, Theo Veenker wrote:
>>>   
>>>       
>>>> On 08/16/2010 04:26 PM, Theo Veenker wrote:
>>>>     
>>>>         
>>>>> Gilles Chanteperdrix wrote:
>>>>>       
>>>>>           
>>>>>> Theo Veenker wrote:
>>>>>>         
>>>>>>             
>>>>>>> Hi,
>>>>>>>
>>>>>>> I want to upgrade all our PC's from Ubuntu hardy to lucid and in the
>>>>>>> process
>>>>>>> I'm also going from kernel 2.6.29.5 with Xenomai 2.4.8 to kernel
>>>>>>> 2.6.32.11
>>>>>>> with Xenomai 2.5.3.
>>>>>>>
>>>>>>> I first built and tested the 2.6.32.11 kernel with 2.5.3 on my hardy
>>>>>>> system
>>>>>>> and all went fine. But the problem is it just doesn't run on the
>>>>>>> lucid distro.
>>>>>>>           
>>>>>>>               
>>>>>> This, I do not understand, the kernel does not need any support from the
>>>>>> distribution for booting, how can the same kernel boot with one
>>>>>> distribution, and not with the other? When you say the "same kernel", do
>>>>>> you mean the exact same zImage or bzImage, or do you mean the kernel
>>>>>> with the same configuration, but with a different compiler, or only the
>>>>>> version is identical?
>>>>>>
>>>>>>         
>>>>>>             
>>>>> It is a complete mystery to me either. I compiled my kernel into a deb
>>>>> package
>>>>> and installed the very same deb package on three machines:
>>>>> MSI p45 neo3 with Hardy on it -> works OK
>>>>> MSI p45 neo3 with Ludid on it -> nothing (works fine with regular kernel)
>>>>> MSI 945P with Lucid on it: -> nothing (works fine with regular kernel)
>>>>>
>>>>> I'll try the suggestions posted and keep you informed.
>>>>>       
>>>>>           
>>>> OK. Connected a terminal to catch early kernel messages. Still no output
>>>> unfortunately (with the regular kernel I do get output on the terminal,
>>>> so the connection works).
>>>>
>>>> Meanwhile also built and tested kernel 2.6.32.15 + xenomai 2.5.4. Still 
>>>> nothing.
>>>> I'm clueless. I'm running Xenomai for years on dozens of systems and I've
>>>> never run into problems like this. I think I'll have to sit down and take a
>>>> close look at what I'm doing. I've always built my kernels using make-kpkg,
>>>> maybe that somehow introduces a problem here. I'll try without it.
>>>>
>>>> (unfortunately/luckily I have to work from home for a few days so I can't
>>>> get to the test system until later this week)
>>>>     
>>>>         
>>> I failed to reproduce the issue yet, but it very much looks like an
>>> I-pipe bug. Could you try the following config variants when time
>>> allows:
>>>   
>>>       
>> I installed the kernel (2.6.32.15 2.5.4 x86 32bit) which is working on
>> my laptop in a kvm machine.
>> In the virtual machine the kernel never starts and hangs.
>> I attached gdb to kvm and according to the cpu registers and system.map
>> it hangs in 'doublefault_fn'. As I'm not really familiar with gdb i'm
>> thankful if someone has a hint how to proceed. Thanks
>>     
> If you could ask for a backtrace ("bt" command) in gdb once attached to
> the hanged kernel, and post the output there, that would be great.
>


hi philippe, hope this helps:

(gdb) bt
#0  doublefault_fn () at arch/x86/kernel/doublefault_32.c:47
#1  0x00000000 in ?? ()

I set two breakpoints:
1) do_test_wp_bit()
2) zap_low_mappings()

The second breakpoint is never reached, the fault seems to happen in
do_test_wp_bit().
arch/x86/mm/init_32.c : mem_init() -> test_wp_bit() -> do_test_wp_bit()

Breakpoint 1, do_test_wp_bit () at arch/x86/mm/init_32.c:981
981             __asm__ __volatile__(
(gdb) info registers
eax            0xffdff000       -2101248
ecx            0x7fc    2044
edx            0x13e8025        20873253
ebx            0xff7fe000       -8396800
esp            0xc1345fc0       0xc1345fc0
ebp            0x3830   0x3830
esi            0x160    352
edi            0x48d    1165
eip            0xc101a308       0xc101a308 <do_test_wp_bit>
eflags         0x2      [ ]
cs             0x60     96
ss             0x68     104
ds             0x7b     123
es             0x7b     123
fs             0xd8     216
gs             0x0      0

> Meanwhile, I tried to reproduce the issue in kvm with no luck so far.
> Aside of timing issues making the boot over kvm quite shaky and most of
> the time impossible with the APIC enabled, using a legacy 8254 mode
> boots but never hangs. Pure emulation with -no-kvm or enabling kvm on
> the host does not make a difference. I've been trying with a 32bit guest
> over a 64bit host, and both host and guest in 32bit mode to no avail so
> far (QEMU PC emulator version 0.12.3 (qemu-kvm-0.12.3)).
>
> I had a bit more luck on real hw though; a m65 Dell workstation (core2
> duo) seems to be kind enough to break during early boot. The failure
> ratio is variable, but 1 crash over 3-5 boots is common; sometimes it
> even crashes several times in a row. The bad news is that no rs232 is
> available from this machine, and the crash happens way to early to count
> on any usb<->serial converter to get any debug output; so this is going
> to take some time to nail down the bug on this hw. I don't expect
> netconsole to help me in any way either, for the same reason. Here are
> some more information I could get though:
>
> - CONFIG_SMP, CONFIG_*_APIC/IO_APIC do not make any difference. I still
> have a kernel crashing against the wall in plain, basic uniprocessor
> mode (i.e. 8254 legacy IRQ and timing).
>
> - The very same kernel image does not break when booted via tftp here.
> It really seems to need a boot of the kernel image from the hard drive
> to get the issue. However, having the rootfs over NFS or on the hdd does
> not seem to make any difference. This could be the sign of a mishandled
> early access fault, which would be confirmed by your trace showing that
> the double fault handler is called.
>
> - CONFIG_IPIPE introduces the issue alone; no need for CONFIG_XENOMAI.
>
> Since you are lucky enough to reproduce the bug over kvm, could you
> confirm my findings on your setup? i.e. that CONFIG_SMP, CONFIG_*APIC*
> and CONFIG_XENOMAI are not involved in this?
>
> PS: At this point, I think this bug only occurs in 32bit mode, but this
> has to be verified.
>
> TIA,
>
>

signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xenomai-help mailing list
[email protected]
https://mail.gna.org/listinfo/xenomai-help

Re: [Xenomai-help] kernel 2.6.32.11 with xenomai 2.5.3 fails to boot on ubuntu lucid system

Reply via email to