Philippe Gerum wrote:
> On Sun, 2007-10-07 at 17:27 +0200, Jan Kiszka wrote:
>> This patch fixes another bug of I-pipe for 2.6.22:
>> Due to the introduction of a pgd page cache (quicklist) into that
>> kernel,  __ipipe_pin_range_globally no longer addressed all spots that
>> need to be updated after vmalloc'ed memory was mapped into the kernel
>> address range. The result was that, after inserting modular Xenomai, new
>> application sometimes received an outdated pgd from the quicklist, and
>> the next timer IRQ triggered a minor fault over xeno_nucleus. As
>> handling faults inside non-root domains with the Linux handler doesn't
>> fly, the box blew up sooner or later.
> Good spot. This said, the page cache is fairly old stuff, introduced a
> long time ago and already present in 2.6.10, so this means that all
> patches featuring the on-demand mapping disable support do have the same
> problem.

Indeed. But somehow the switch to quicklist or some other pieces of
2.6.22 must have changed the preconditions of this issue. I'm using
Xenomai in modular form since ages on my notebook but only got that
lockups over 2.6.22. Anyway, so we should back-port my patch and also
spread it to the other archs.

>> So I've reworked __ipipe_pin_range_globally, basing it on pgd_list, the
>> list of all pgd pages (in use or cached) in the system, and folding
>> __ipipe_pin_range_mapping into it. That makes __ipipe_pin_range_globally
>> an arch-specific thing from now on.
>> So far the quicklist is only biting us on i386, but I would suggest to
>> check if/how we can apply this new pattern on other archs as well.
>> Jan
>> PS: UP is now stable with latest Xenomai here, but SMP unfortunately
>> still misbehaves (I suspect host timer issues).
> I still have a problem with UP here, but this one is due to a Xenomai
> bug -- host timer is no more forwarded when the nucleus timer starts.
> Does disabling NOHZ & HIRES get things working on your setup?

Yes, I have HIRES on, and I guess that's the point: My current
impression is that there are some bits in Xenomai missing to migrate
running hires timers from Linux's lapic clockevent device over xntimers. 
The effect here is that CPU0 continues (probably due to higher timer
load) while CPU1 stops scheduling timers:

0    2729        2727        31168      -          NULL         [host-timer/0]
0    11          10          305103844  1000000000  xnpod_watch  [watchdog]
1    11          10          309365472  1000000000  xnpod_watch  [watchdog]


Attachment: signature.asc
Description: OpenPGP digital signature

Xenomai-core mailing list

Reply via email to