[Xenomai-core] [PATCH] bug found in psos queue
Hello all, I found a bug in xenomai-2.4/src/skins/psos+/queue.c, in function q_vreceive, struct modifier is not corectly fill with argument data. Here the patch to fix this bug regards, Alexandre diff --git a/CREDITS b/CREDITS index f2ce6ca..6342a8c 100644 --- a/CREDITS +++ b/CREDITS @@ -142,3 +142,7 @@ D: Autoconf-related fixes. N: Frederic Villeneuve E: frederic.villene...@laposte.net D: udev support for message pipes. + +N: Alexandre Coffignal +E: alexandre.coffig...@cenosys.com +D: psos skin q_vreceive bug fix diff --git a/src/skins/psos+/queue.c b/src/skins/psos+/queue.c index 228f5bd..c54f966 100644 --- a/src/skins/psos+/queue.c +++ b/src/skins/psos+/queue.c @@ -82,6 +82,10 @@ u_long q_vreceive(u_long qid, u_long flags, u_long timeout, u_long flags; u_long timeout; } modifiers;/* Combine to fit into available arg space (i.e. 5) */ + + modifiers.flags=flags; + modifiers.timeout=timeout; + return XENOMAI_SKINCALL5(__psos_muxid, __psos_q_vreceive, qid, modifiers, msgbuf_r, buflen, msglen_r); } -- Alexandre COFFIGNAL, Chef de Projet Email: alexandre.coffign...@]cenosys.com mailto:alexandre.coffig...@cenosys.com 10, Rue Xavier Bichat F-72000 Le MANS web : http://www.cenosys.com ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH v3 5/9] nucleus: Avoid returning errors from xnheap_destroy_mapped
On Sat, 2009-10-24 at 19:22 +0200, Philippe Gerum wrote: On Tue, 2009-10-20 at 13:37 +0200, Jan Kiszka wrote: Allowing xnheap_delete_mapped to return an error and then attempting to recover from it does not work out very well: Corner cases are racy, intransparent to the user, and proper error handling imposes a lot of complexity on the caller - if it actually bothers to check the return value... Fortunately, there is no reason for this function to fail: If the heap is still mapped, just install the provide cleanup handler and switch to deferred removal. If the unmapping fails, we either raced with some other caller of unmap or user space provided a bogus address, or something else is wrong. In any case, leaving the cleanup callback behind is the best we can do anyway. Removing the return value immediately allows to simplify the callers, namemly rt_queue_delete and rt_heap_delete. Note: This is still not 100% waterproof. If we issue xnheap_destroy_mapped from module cleanup passing a release handler that belongs to the module text, deferred release will cause a crash. But this corner case is no new regression, so let's keep the head in the sand. I agree with this one, eventually. This does make things clearer, and removes some opportunities for the upper interfaces to shot themselves in the foot. Merged, thanks. Well, actually, it does make things clearer, but it is broken. Enabling list debugging makes the nucleus pull the break after a double unlink in vmclose(). Basically, the issue is that calling rt_queue/heap_delete() explicitly from userland will break, due to the vmclose() handler being indirectly called by do_munmap() for the last mapping. The nasty thing is that without debugs on, kheapq is just silently trashed. Fix is on its way, along with nommu support for shared heaps as well. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- include/nucleus/heap.h|6 +++--- ksrc/nucleus/heap.c | 45 +++-- ksrc/skins/native/heap.c | 21 ++--- ksrc/skins/native/queue.c | 21 ++--- ksrc/skins/rtai/shm.c |6 +- 5 files changed, 43 insertions(+), 56 deletions(-) diff --git a/include/nucleus/heap.h b/include/nucleus/heap.h index ca691bf..44db738 100644 --- a/include/nucleus/heap.h +++ b/include/nucleus/heap.h @@ -204,9 +204,9 @@ int xnheap_init_mapped(xnheap_t *heap, u_long heapsize, int memflags); -int xnheap_destroy_mapped(xnheap_t *heap, - void (*release)(struct xnheap *heap), - void __user *mapaddr); +void xnheap_destroy_mapped(xnheap_t *heap, + void (*release)(struct xnheap *heap), + void __user *mapaddr); #define xnheap_mapped_offset(heap,ptr) \ (((caddr_t)(ptr)) - ((caddr_t)(heap)-archdep.heapbase)) diff --git a/ksrc/nucleus/heap.c b/ksrc/nucleus/heap.c index 4958daa..96c46f8 100644 --- a/ksrc/nucleus/heap.c +++ b/ksrc/nucleus/heap.c @@ -1173,42 +1173,51 @@ int xnheap_init_mapped(xnheap_t *heap, u_long heapsize, int memflags) return 0; } -int xnheap_destroy_mapped(xnheap_t *heap, void (*release)(struct xnheap *heap), - void __user *mapaddr) +void xnheap_destroy_mapped(xnheap_t *heap, + void (*release)(struct xnheap *heap), + void __user *mapaddr) { - int ret = 0, ccheck; unsigned long len; - ccheck = mapaddr ? 1 : 0; + /* +* Trying to unmap user memory without providing a release handler for +* deferred cleanup is a bug. +*/ + XENO_ASSERT(NUCLEUS, !mapaddr || release, /* nop */); spin_lock(kheapq_lock); - if (heap-archdep.numaps ccheck) { - heap-archdep.release = release; - spin_unlock(kheapq_lock); - return -EBUSY; - } - removeq(kheapq, heap-link); /* Prevent further mapping. */ + + heap-archdep.release = release; + + if (heap-archdep.numaps == 0) + mapaddr = NULL; /* nothing left to unmap */ + else + release = NULL; /* will be called by Linux on unmap */ + spin_unlock(kheapq_lock); len = xnheap_extentsize(heap); if (mapaddr) { down_write(current-mm-mmap_sem); - heap-archdep.release = NULL; - ret = do_munmap(current-mm, (unsigned long)mapaddr, len); + do_munmap(current-mm, (unsigned long)mapaddr, len); up_write(current-mm-mmap_sem); } - if (ret == 0) { + if (heap-archdep.numaps 0) { + /* The release handler is supposed to clean up the rest. */ + XENO_ASSERT(NUCLEUS, heap-archdep.release, /* nop */); + return; + } + + if (!mapaddr) {
Re: [Xenomai-core] [PATCH v3 5/9] nucleus: Avoid returning errors from xnheap_destroy_mapped
Philippe Gerum wrote: On Sat, 2009-10-24 at 19:22 +0200, Philippe Gerum wrote: On Tue, 2009-10-20 at 13:37 +0200, Jan Kiszka wrote: Allowing xnheap_delete_mapped to return an error and then attempting to recover from it does not work out very well: Corner cases are racy, intransparent to the user, and proper error handling imposes a lot of complexity on the caller - if it actually bothers to check the return value... Fortunately, there is no reason for this function to fail: If the heap is still mapped, just install the provide cleanup handler and switch to deferred removal. If the unmapping fails, we either raced with some other caller of unmap or user space provided a bogus address, or something else is wrong. In any case, leaving the cleanup callback behind is the best we can do anyway. Removing the return value immediately allows to simplify the callers, namemly rt_queue_delete and rt_heap_delete. Note: This is still not 100% waterproof. If we issue xnheap_destroy_mapped from module cleanup passing a release handler that belongs to the module text, deferred release will cause a crash. But this corner case is no new regression, so let's keep the head in the sand. I agree with this one, eventually. This does make things clearer, and removes some opportunities for the upper interfaces to shot themselves in the foot. Merged, thanks. Well, actually, it does make things clearer, but it is broken. Enabling list debugging makes the nucleus pull the break after a double unlink in vmclose(). Basically, the issue is that calling rt_queue/heap_delete() explicitly from userland will break, due to the vmclose() handler being indirectly called by do_munmap() for the last mapping. The nasty thing is that without debugs on, kheapq is just silently trashed. Fix is on its way, along with nommu support for shared heaps as well. OK, I see. Just on minor add-on to your fix: diff --git a/ksrc/nucleus/heap.c b/ksrc/nucleus/heap.c index ec14f73..1ae6af6 100644 --- a/ksrc/nucleus/heap.c +++ b/ksrc/nucleus/heap.c @@ -1241,6 +1241,7 @@ void xnheap_destroy_mapped(xnheap_t *heap, down_write(current-mm-mmap_sem); heap-archdep.release = NULL; do_munmap(current-mm, (unsigned long)mapaddr, len); + heap-archdep.release = release; up_write(current-mm-mmap_sem); } @@ -1252,7 +1253,6 @@ void xnheap_destroy_mapped(xnheap_t *heap, if (heap-archdep.numaps 0) { /* The release handler is supposed to clean up the rest. */ XENO_ASSERT(NUCLEUS, release != NULL, /* nop */); - heap-archdep.release = release; return; } This is safer than leaving a potential race window open between dropping mmap_sem and fixing up archdep.release again. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH v3 5/9] nucleus: Avoid returning errors from xnheap_destroy_mapped
On Mon, 2009-11-02 at 17:41 +0100, Jan Kiszka wrote: Philippe Gerum wrote: On Sat, 2009-10-24 at 19:22 +0200, Philippe Gerum wrote: On Tue, 2009-10-20 at 13:37 +0200, Jan Kiszka wrote: Allowing xnheap_delete_mapped to return an error and then attempting to recover from it does not work out very well: Corner cases are racy, intransparent to the user, and proper error handling imposes a lot of complexity on the caller - if it actually bothers to check the return value... Fortunately, there is no reason for this function to fail: If the heap is still mapped, just install the provide cleanup handler and switch to deferred removal. If the unmapping fails, we either raced with some other caller of unmap or user space provided a bogus address, or something else is wrong. In any case, leaving the cleanup callback behind is the best we can do anyway. Removing the return value immediately allows to simplify the callers, namemly rt_queue_delete and rt_heap_delete. Note: This is still not 100% waterproof. If we issue xnheap_destroy_mapped from module cleanup passing a release handler that belongs to the module text, deferred release will cause a crash. But this corner case is no new regression, so let's keep the head in the sand. I agree with this one, eventually. This does make things clearer, and removes some opportunities for the upper interfaces to shot themselves in the foot. Merged, thanks. Well, actually, it does make things clearer, but it is broken. Enabling list debugging makes the nucleus pull the break after a double unlink in vmclose(). Basically, the issue is that calling rt_queue/heap_delete() explicitly from userland will break, due to the vmclose() handler being indirectly called by do_munmap() for the last mapping. The nasty thing is that without debugs on, kheapq is just silently trashed. Fix is on its way, along with nommu support for shared heaps as well. OK, I see. Just on minor add-on to your fix: diff --git a/ksrc/nucleus/heap.c b/ksrc/nucleus/heap.c index ec14f73..1ae6af6 100644 --- a/ksrc/nucleus/heap.c +++ b/ksrc/nucleus/heap.c @@ -1241,6 +1241,7 @@ void xnheap_destroy_mapped(xnheap_t *heap, down_write(current-mm-mmap_sem); heap-archdep.release = NULL; do_munmap(current-mm, (unsigned long)mapaddr, len); + heap-archdep.release = release; up_write(current-mm-mmap_sem); } @@ -1252,7 +1253,6 @@ void xnheap_destroy_mapped(xnheap_t *heap, if (heap-archdep.numaps 0) { /* The release handler is supposed to clean up the rest. */ XENO_ASSERT(NUCLEUS, release != NULL, /* nop */); - heap-archdep.release = release; return; } This is safer than leaving a potential race window open between dropping mmap_sem and fixing up archdep.release again. Actually, we have to hold the kheap lock, in case weird code starts mapping randomly from userland without getting a valid descriptor through a skin call. Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH v3 5/9] nucleus: Avoid returning errors from xnheap_destroy_mapped
Jan Kiszka wrote: Philippe Gerum wrote: On Mon, 2009-11-02 at 17:41 +0100, Jan Kiszka wrote: Philippe Gerum wrote: On Sat, 2009-10-24 at 19:22 +0200, Philippe Gerum wrote: On Tue, 2009-10-20 at 13:37 +0200, Jan Kiszka wrote: Allowing xnheap_delete_mapped to return an error and then attempting to recover from it does not work out very well: Corner cases are racy, intransparent to the user, and proper error handling imposes a lot of complexity on the caller - if it actually bothers to check the return value... Fortunately, there is no reason for this function to fail: If the heap is still mapped, just install the provide cleanup handler and switch to deferred removal. If the unmapping fails, we either raced with some other caller of unmap or user space provided a bogus address, or something else is wrong. In any case, leaving the cleanup callback behind is the best we can do anyway. Removing the return value immediately allows to simplify the callers, namemly rt_queue_delete and rt_heap_delete. Note: This is still not 100% waterproof. If we issue xnheap_destroy_mapped from module cleanup passing a release handler that belongs to the module text, deferred release will cause a crash. But this corner case is no new regression, so let's keep the head in the sand. I agree with this one, eventually. This does make things clearer, and removes some opportunities for the upper interfaces to shot themselves in the foot. Merged, thanks. Well, actually, it does make things clearer, but it is broken. Enabling list debugging makes the nucleus pull the break after a double unlink in vmclose(). Basically, the issue is that calling rt_queue/heap_delete() explicitly from userland will break, due to the vmclose() handler being indirectly called by do_munmap() for the last mapping. The nasty thing is that without debugs on, kheapq is just silently trashed. Fix is on its way, along with nommu support for shared heaps as well. OK, I see. Just on minor add-on to your fix: diff --git a/ksrc/nucleus/heap.c b/ksrc/nucleus/heap.c index ec14f73..1ae6af6 100644 --- a/ksrc/nucleus/heap.c +++ b/ksrc/nucleus/heap.c @@ -1241,6 +1241,7 @@ void xnheap_destroy_mapped(xnheap_t *heap, down_write(current-mm-mmap_sem); heap-archdep.release = NULL; do_munmap(current-mm, (unsigned long)mapaddr, len); + heap-archdep.release = release; up_write(current-mm-mmap_sem); } @@ -1252,7 +1253,6 @@ void xnheap_destroy_mapped(xnheap_t *heap, if (heap-archdep.numaps 0) { /* The release handler is supposed to clean up the rest. */ XENO_ASSERT(NUCLEUS, release != NULL, /* nop */); - heap-archdep.release = release; return; } This is safer than leaving a potential race window open between dropping mmap_sem and fixing up archdep.release again. Actually, we have to hold the kheap lock, in case weird code starts mapping randomly from userland without getting a valid descriptor through a skin call. Yep, that as well. Note that 6b1a185b46 doesn't obsolete my patch (pull it from my tree if you like). Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [PATCH v3 5/9] nucleus: Avoid returning errors from xnheap_destroy_mapped
On Mon, 2009-11-02 at 19:01 +0100, Jan Kiszka wrote: Jan Kiszka wrote: Philippe Gerum wrote: On Mon, 2009-11-02 at 17:41 +0100, Jan Kiszka wrote: Philippe Gerum wrote: On Sat, 2009-10-24 at 19:22 +0200, Philippe Gerum wrote: On Tue, 2009-10-20 at 13:37 +0200, Jan Kiszka wrote: Allowing xnheap_delete_mapped to return an error and then attempting to recover from it does not work out very well: Corner cases are racy, intransparent to the user, and proper error handling imposes a lot of complexity on the caller - if it actually bothers to check the return value... Fortunately, there is no reason for this function to fail: If the heap is still mapped, just install the provide cleanup handler and switch to deferred removal. If the unmapping fails, we either raced with some other caller of unmap or user space provided a bogus address, or something else is wrong. In any case, leaving the cleanup callback behind is the best we can do anyway. Removing the return value immediately allows to simplify the callers, namemly rt_queue_delete and rt_heap_delete. Note: This is still not 100% waterproof. If we issue xnheap_destroy_mapped from module cleanup passing a release handler that belongs to the module text, deferred release will cause a crash. But this corner case is no new regression, so let's keep the head in the sand. I agree with this one, eventually. This does make things clearer, and removes some opportunities for the upper interfaces to shot themselves in the foot. Merged, thanks. Well, actually, it does make things clearer, but it is broken. Enabling list debugging makes the nucleus pull the break after a double unlink in vmclose(). Basically, the issue is that calling rt_queue/heap_delete() explicitly from userland will break, due to the vmclose() handler being indirectly called by do_munmap() for the last mapping. The nasty thing is that without debugs on, kheapq is just silently trashed. Fix is on its way, along with nommu support for shared heaps as well. OK, I see. Just on minor add-on to your fix: diff --git a/ksrc/nucleus/heap.c b/ksrc/nucleus/heap.c index ec14f73..1ae6af6 100644 --- a/ksrc/nucleus/heap.c +++ b/ksrc/nucleus/heap.c @@ -1241,6 +1241,7 @@ void xnheap_destroy_mapped(xnheap_t *heap, down_write(current-mm-mmap_sem); heap-archdep.release = NULL; do_munmap(current-mm, (unsigned long)mapaddr, len); + heap-archdep.release = release; up_write(current-mm-mmap_sem); } @@ -1252,7 +1253,6 @@ void xnheap_destroy_mapped(xnheap_t *heap, if (heap-archdep.numaps 0) { /* The release handler is supposed to clean up the rest. */ XENO_ASSERT(NUCLEUS, release != NULL, /* nop */); - heap-archdep.release = release; return; } This is safer than leaving a potential race window open between dropping mmap_sem and fixing up archdep.release again. Actually, we have to hold the kheap lock, in case weird code starts mapping randomly from userland without getting a valid descriptor through a skin call. Yep, that as well. Note that 6b1a185b46 doesn't obsolete my patch (pull it from my tree if you like). Are you still referring to a race with the vmclose() handler? Jan -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] lots of mode switches in xenomai-head tree?
Hi Jan, we updated the git on Oct. 29 (3 days ago). We do use the posix skin, i.e., we use the xeno-config --posix-ldflags. This worked all fine without mode switches under Xenomai 2.4.8. My git does include 4a2cb7b817. I will try to reproduce the error in a test program. -Stefan On Nov 1, 2009, at 23:43, Jan Kiszka wrote: Stefan Schaal wrote: Hi, I am working with the latest xenomai-head tree (we need analogy for our NI board ...). Under Xenomai 2.4.8 our code did not have any mode switches. Using the xenomai-head, we get a lot of mode switches. Using he backtrace_symbols_fd, we get print-outs like: xsimulation[0x808553b] [0xe400] /usr/xenomai/lib/librtdk.so.0(assert_nrt+0x85)[0xb7fa2ea5] /usr/xenomai/lib/librtdk.so.0(__wrap_clock_gettime+0x17)[0xb7fa2ef7] xsimulation[0x807cd16] xsimulation[0x807d7fb] /usr/xenomai/lib/libnative.so.3[0xb7fab689] /lib/tls/i686/cmov/libpthread.so.0[0xb7f824ff] /lib/tls/i686/cmov/libc.so.6(clone+0x5e)[0xb7e8f49e] Which indicates that the wrapper for clock_gettime causes this trouble, which is also confirmed by commenting clock_gettime out, and the mode switches disappear. Maybe something that needs fixing? Do you wrap link against the POSIX library, ie. use that skin as well? If not, your code is actually using clock_gettime incorrectly as it then falls back to the Linux service which can trigger syscalls (or even deadlocks when the TSC is used). If you do use libpthread_rt, then my next question is if your work is based on today's git head or some older version not including 4a2cb7b817. Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] lots of mode switches in xenomai-head tree?
Hi Jan, you pointer to the 4a2cb7b817 help! We had -lrtdk before - lpthread -lpthread_rt in our compile statement. Just in 2.4.8, this seems to make no difference. -Stefan On Nov 2, 2009, at 10:42, Stefan Schaal wrote: Hi Jan, we updated the git on Oct. 29 (3 days ago). We do use the posix skin, i.e., we use the xeno-config --posix-ldflags. This worked all fine without mode switches under Xenomai 2.4.8. My git does include 4a2cb7b817. I will try to reproduce the error in a test program. -Stefan On Nov 1, 2009, at 23:43, Jan Kiszka wrote: Stefan Schaal wrote: Hi, I am working with the latest xenomai-head tree (we need analogy for our NI board ...). Under Xenomai 2.4.8 our code did not have any mode switches. Using the xenomai-head, we get a lot of mode switches. Using he backtrace_symbols_fd, we get print-outs like: xsimulation[0x808553b] [0xe400] /usr/xenomai/lib/librtdk.so.0(assert_nrt+0x85)[0xb7fa2ea5] /usr/xenomai/lib/librtdk.so.0(__wrap_clock_gettime+0x17)[0xb7fa2ef7] xsimulation[0x807cd16] xsimulation[0x807d7fb] /usr/xenomai/lib/libnative.so.3[0xb7fab689] /lib/tls/i686/cmov/libpthread.so.0[0xb7f824ff] /lib/tls/i686/cmov/libc.so.6(clone+0x5e)[0xb7e8f49e] Which indicates that the wrapper for clock_gettime causes this trouble, which is also confirmed by commenting clock_gettime out, and the mode switches disappear. Maybe something that needs fixing? Do you wrap link against the POSIX library, ie. use that skin as well? If not, your code is actually using clock_gettime incorrectly as it then falls back to the Linux service which can trigger syscalls (or even deadlocks when the TSC is used). If you do use libpthread_rt, then my next question is if your work is based on today's git head or some older version not including 4a2cb7b817. Jan ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] lots of mode switches in xenomai-head tree?
Stefan Schaal wrote: Hi Jan, you pointer to the 4a2cb7b817 help! We had -lrtdk before - lpthread -lpthread_rt in our compile statement. Just in 2.4.8, this seems to make no difference. Do you use the wrap-link.sh script, or is this order change unrelated? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core