On Sun, 2007-10-07 at 18:40 +0200, Jan Kiszka wrote:
> Philippe Gerum wrote:
> > On Sun, 2007-10-07 at 17:27 +0200, Jan Kiszka wrote:
> >> This patch fixes another bug of I-pipe for 2.6.22:
> >> Due to the introduction of a pgd page cache (quicklist) into that
> >> kernel, __ipipe_pin_range_globally no longer addressed all spots that
> >> need to be updated after vmalloc'ed memory was mapped into the kernel
> >> address range. The result was that, after inserting modular Xenomai, new
> >> application sometimes received an outdated pgd from the quicklist, and
> >> the next timer IRQ triggered a minor fault over xeno_nucleus. As
> >> handling faults inside non-root domains with the Linux handler doesn't
> >> fly, the box blew up sooner or later.
> > Good spot. This said, the page cache is fairly old stuff, introduced a
> > long time ago and already present in 2.6.10, so this means that all
> > patches featuring the on-demand mapping disable support do have the same
> > problem.
> Indeed. But somehow the switch to quicklist or some other pieces of
> 2.6.22 must have changed the preconditions of this issue. I'm using
> Xenomai in modular form since ages on my notebook but only got that
> lockups over 2.6.22.
We've been pretty lucky it seems, or most users end up compiling the
> Anyway, so we should back-port my patch and also
> spread it to the other archs.
When applicable, yes.
> >> So I've reworked __ipipe_pin_range_globally, basing it on pgd_list, the
> >> list of all pgd pages (in use or cached) in the system, and folding
> >> __ipipe_pin_range_mapping into it. That makes __ipipe_pin_range_globally
> >> an arch-specific thing from now on.
> >> So far the quicklist is only biting us on i386, but I would suggest to
> >> check if/how we can apply this new pattern on other archs as well.
> >> Jan
> >> PS: UP is now stable with latest Xenomai here, but SMP unfortunately
> >> still misbehaves (I suspect host timer issues).
> > I still have a problem with UP here, but this one is due to a Xenomai
> > bug -- host timer is no more forwarded when the nucleus timer starts.
> > Does disabling NOHZ & HIRES get things working on your setup?
> Yes, I have HIRES on, and I guess that's the point: My current
> impression is that there are some bits in Xenomai missing to migrate
> running hires timers from Linux's lapic clockevent device over xntimers.
> The effect here is that CPU0 continues (probably due to higher timer
> load) while CPU1 stops scheduling timers:
> CPU SCHEDULED FIRED TIMEOUT INTERVAL HANDLER NAME
> 0 2729 2727 31168 - NULL [host-timer/0]
> 0 11 10 305103844 1000000000 xnpod_watch [watchdog]
> 1 11 10 309365472 1000000000 xnpod_watch [watchdog]
The issue I see would be different it seems. I can reproduce the problem
in UP + PIT mode, LAPIC off.
Xenomai-core mailing list