Krzysztof Błaszkowski wrote:
> On Fri, 2010-08-20 at 11:54 +0200, Gilles Chanteperdrix wrote:
>> Krzysztof Błaszkowski wrote:
>>>> Yes, now if you find the culprit option, it would be nice to report here
>>>> so that we can fix the I-pipe patch.
>>> I do know it still. All i have are two configs. One which does not work
>>> and one working. I have tried so far breaking working one and also
>>> fixing broken. Both attempts have been unsuccessful.
>>> I tried many "obvious" settings mainly in "processor type and features"
>>> with no luck.
>>> This process must take some time ( i can't spend whole days on trying
>>> one-by-one each difference, recompile kernel,ync target's rootfs,
>>> reboot target and run fork regression test even that many steps i have
>> ever heard about bisecting ?
> sure i had.
>> List the diffs between the two configs
>> apply half of them
>> if it still works, apply half of the rest
>> if it does not unapply half of the one you applied
>> if there are 65000 differences, you will get to the result in 16 steps.
>> you can keep the same rootfs, all you have to do is rebuild the kernel
>> (without "make clean", so that only what changed in the .config is
> i used to use more fine grained changes set until it made me tired.
> and i as you may know most changes in "processor features" lead to
> recompile whole kernel - by not cleaning won't save anything.
>> It should take just an hour or two.
> poss. but i don't think so.
Well actually, bisecting was not the right approach, debugging the
segfault directly worked much better. We have only two atfork handlers,
and one is in the posix skin which Krzysztof test does not use, so we
knew where the bug was...
Anyway, here is the explanation: it is a big fastsync bug. In the atfork
handler, we unmap the father's private semaphore heap, in order to map
the child's private semaphore heap. In order to find the heap size, the
unmapping code issues a system call wich ends up looking for the thread
ppd, in order to find its private heap. The problem is that the thread
has not yet bound any skin, so it has no ppd, and it ends up using the
global ppd instead.
As long as the two heaps use the same size, it works, however, if they
have different sizes, we get a segmentation fault upon the call to munmap.
But that is not the worse: after unmapping the father's private heap, we
try and map the child's private heap. Here again, we issue the system
call, and here again, we get the global semaphore heaps data, this means
that the global semaphore heap gets used instead of the child's private
semaphore heap. That is definitely a bug, and would cause all sorts of bugs.
Even worse yet, we find that since the child process has not bound any
skin, it is unable to use skins services properly. Only the posix skin
rebinds itself at fork.
So, I propose the following fix:
- in the semaphore heaps atfork handler, unmap the private heap (using
the size which was used at map time), do not remap it.
- register atfork handlers for all skins, in order to rebind them after
fork, so that the skin services may be used in the child, the fork
There are other issues to consider, such as detecting that a private
mutex created in the father continues to be used in the child.
Any comments, anyone?
Xenomai-core mailing list