Philippe Gerum wrote: > On Sun, 2007-10-07 at 17:27 +0200, Jan Kiszka wrote: >> This patch fixes another bug of I-pipe for 2.6.22: >> >> Due to the introduction of a pgd page cache (quicklist) into that >> kernel, __ipipe_pin_range_globally no longer addressed all spots that >> need to be updated after vmalloc'ed memory was mapped into the kernel >> address range. The result was that, after inserting modular Xenomai, new >> application sometimes received an outdated pgd from the quicklist, and >> the next timer IRQ triggered a minor fault over xeno_nucleus. As >> handling faults inside non-root domains with the Linux handler doesn't >> fly, the box blew up sooner or later. >> > > Good spot. This said, the page cache is fairly old stuff, introduced a > long time ago and already present in 2.6.10, so this means that all > patches featuring the on-demand mapping disable support do have the same > problem.
Indeed. But somehow the switch to quicklist or some other pieces of 2.6.22 must have changed the preconditions of this issue. I'm using Xenomai in modular form since ages on my notebook but only got that lockups over 2.6.22. Anyway, so we should back-port my patch and also spread it to the other archs. > >> So I've reworked __ipipe_pin_range_globally, basing it on pgd_list, the >> list of all pgd pages (in use or cached) in the system, and folding >> __ipipe_pin_range_mapping into it. That makes __ipipe_pin_range_globally >> an arch-specific thing from now on. >> >> So far the quicklist is only biting us on i386, but I would suggest to >> check if/how we can apply this new pattern on other archs as well. >> >> Jan >> >> PS: UP is now stable with latest Xenomai here, but SMP unfortunately >> still misbehaves (I suspect host timer issues). >> > > I still have a problem with UP here, but this one is due to a Xenomai > bug -- host timer is no more forwarded when the nucleus timer starts. > Does disabling NOHZ & HIRES get things working on your setup? > Yes, I have HIRES on, and I guess that's the point: My current impression is that there are some bits in Xenomai missing to migrate running hires timers from Linux's lapic clockevent device over xntimers. The effect here is that CPU0 continues (probably due to higher timer load) while CPU1 stops scheduling timers: CPU SCHEDULED FIRED TIMEOUT INTERVAL HANDLER NAME 0 2729 2727 31168 - NULL [host-timer/0] 0 11 10 305103844 1000000000 xnpod_watch [watchdog] 1 11 10 309365472 1000000000 xnpod_watch [watchdog] Jan
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core