Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. So, I would propose: - to use 0x8 instead of 0, to cause a segfault if an invalid handle is used, without any risk of side effect (such as killing or joining the wrong task if ever a pthread_t is reused) - if Xenomai is compiled with --enable-debug, return -EINVAL, or use an assert if such a value is encountered. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
Gilles Chanteperdrix wrote: GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
Jan Kiszka wrote: Gilles Chanteperdrix wrote: GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Here is the test I used: #include stdio.h #include native/task.h #include sys/mman.h void task_main(void* arg) { rt_task_sleep(10); } int main(void) { RT_TASK task; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE); rt_task_start(task, task_main, NULL); fprintf(stderr, join: %d\n, rt_task_join(task)); fprintf(stderr, delete: %d\n, rt_task_delete(task)); } it prints: join: 0 delete: 0 This said, I like the segfault, because people probably never check the return value of rt_task_join/rt_task_delete (which is, I guess, the reason why phtread_cancel and pthread_join segfault themselves, because the posix spec allows to return ESRCH in that case). -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Here is the test I used: #include stdio.h #include native/task.h #include sys/mman.h void task_main(void* arg) { rt_task_sleep(10); } int main(void) { RT_TASK task; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE); rt_task_start(task, task_main, NULL); fprintf(stderr, join: %d\n, rt_task_join(task)); fprintf(stderr, delete: %d\n, rt_task_delete(task)); } it prints: join: 0 delete: 0 This said, I like the segfault, because people probably never check the return value of rt_task_join/rt_task_delete (which is, I guess, the reason why phtread_cancel and pthread_join segfault themselves, because the posix spec allows to return ESRCH in that case). Well, that's kind of hard to sell to people writing software based on documentation. We documented -EIDRM, so we should make it work like this. Will look into this later. The fact that pthread_cancel /may/ crash on invalid thread handles is an implementation detail: The handle is a pointer to an object that /may/ no longer exist in memory when dereferenced. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Here is the test I used: #include stdio.h #include native/task.h #include sys/mman.h void task_main(void* arg) { rt_task_sleep(10); } int main(void) { RT_TASK task; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE); rt_task_start(task, task_main, NULL); fprintf(stderr, join: %d\n, rt_task_join(task)); fprintf(stderr, delete: %d\n, rt_task_delete(task)); } it prints: join: 0 delete: 0 This said, I like the segfault, because people probably never check the return value of rt_task_join/rt_task_delete (which is, I guess, the reason why phtread_cancel and pthread_join segfault themselves, because the posix spec allows to return ESRCH in that case). Well, that's kind of hard to sell to people writing software based on documentation. We documented -EIDRM, so we should make it work like this. Will look into this later. The fact that pthread_cancel /may/ crash on invalid thread handles is an implementation detail: The handle is a pointer to an object that /may/ no longer exist in memory when dereferenced. Ok, another point is that when cancelling a joinable thread, pthread_join should still be called after pthread_cancel. So, rt_task_join should still be able to call pthread_join after a call to rt_task_delete. Running pthread_join inside rt_task_delete is not an option, as there would be a risk for deadlocks. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions
Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: Jan Kiszka wrote: Gilles Chanteperdrix wrote: GIT version control wrote: Module: xenomai-jki Branch: for-upstream Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 URL: http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705 Author: Jan Kiszka jan.kis...@siemens.com Date: Wed Apr 28 15:08:11 2010 +0200 native: Improve fault tolerance /wrt multiple task deletions As we may pass the pthread handle of an RT_TASK directly to glibc, we may trigger a SIGSEGV if the underlying thread was already terminated. Try to catch this application mistakes by clearing the handle at least in that task descriptor which successfully ran rt_task_delete or rt_task_join. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Ok. I have tested this patch (though I could not find whether it was discussed on the mailing list). And in fact, it looks to me like it turns an application error into a silently working application. Then there is probably something broken: rt_task_delete is supposed to return -EIDRM of the passed handle no longer exists. That's at least what the doc says. The point of this patch is to turn an application crash into a proper error return value (and that not only for --enable-debug). Here is the test I used: #include stdio.h #include native/task.h #include sys/mman.h void task_main(void* arg) { rt_task_sleep(10); } int main(void) { RT_TASK task; mlockall(MCL_CURRENT|MCL_FUTURE); rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE); rt_task_start(task, task_main, NULL); fprintf(stderr, join: %d\n, rt_task_join(task)); fprintf(stderr, delete: %d\n, rt_task_delete(task)); } it prints: join: 0 delete: 0 This said, I like the segfault, because people probably never check the return value of rt_task_join/rt_task_delete (which is, I guess, the reason why phtread_cancel and pthread_join segfault themselves, because the posix spec allows to return ESRCH in that case). Well, that's kind of hard to sell to people writing software based on documentation. We documented -EIDRM, so we should make it work like this. Will look into this later. The fact that pthread_cancel /may/ crash on invalid thread handles is an implementation detail: The handle is a pointer to an object that /may/ no longer exist in memory when dereferenced. Ok, another point is that when cancelling a joinable thread, pthread_join should still be called after pthread_cancel. So, rt_task_join should still be able to call pthread_join after a call to rt_task_delete. Running pthread_join inside rt_task_delete is not an option, as there would be a risk for deadlocks. So we should always pthread_detach before canceling a thread in rt_task_delete? Err, after, but only if it was joinable. Think we need to track T_JOINABLE in user space as well. Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core