Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Gilles Chanteperdrix
GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 
 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200
 
 native: Improve fault tolerance /wrt multiple task deletions
 
 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Ok. I have tested this patch (though I could not find whether it was
discussed on the mailing list). And in fact, it looks to me like it
turns an application error into a silently working application. So, I
would propose:
- to use 0x8 instead of 0, to cause a segfault if an invalid handle is
used, without any risk of side effect (such as killing or joining the
wrong task if ever a pthread_t is reused)
- if Xenomai is compiled with --enable-debug, return -EINVAL, or use an
assert if such a value is encountered.


-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200

 native: Improve fault tolerance /wrt multiple task deletions

 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 
 Ok. I have tested this patch (though I could not find whether it was
 discussed on the mailing list). And in fact, it looks to me like it
 turns an application error into a silently working application.

Then there is probably something broken: rt_task_delete is supposed to
return -EIDRM of the passed handle no longer exists. That's at least
what the doc says. The point of this patch is to turn an application
crash into a proper error return value (and that not only for
--enable-debug).

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200

 native: Improve fault tolerance /wrt multiple task deletions

 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Ok. I have tested this patch (though I could not find whether it was
 discussed on the mailing list). And in fact, it looks to me like it
 turns an application error into a silently working application.
 
 Then there is probably something broken: rt_task_delete is supposed to
 return -EIDRM of the passed handle no longer exists. That's at least
 what the doc says. The point of this patch is to turn an application
 crash into a proper error return value (and that not only for
 --enable-debug).

Here is the test I used:
#include stdio.h
#include native/task.h
#include sys/mman.h

void task_main(void* arg)
{
rt_task_sleep(10);
}

int main(void)
{
 RT_TASK task;

 mlockall(MCL_CURRENT|MCL_FUTURE);
 rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE);
 rt_task_start(task, task_main, NULL);
 fprintf(stderr, join: %d\n, rt_task_join(task));
 fprintf(stderr, delete: %d\n, rt_task_delete(task));
}

it prints:
join: 0
delete: 0

This said, I like the segfault, because people probably never check the
return value of rt_task_join/rt_task_delete (which is, I guess, the
reason why phtread_cancel and pthread_join segfault themselves, because
the posix spec allows to return ESRCH in that case).

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200

 native: Improve fault tolerance /wrt multiple task deletions

 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Ok. I have tested this patch (though I could not find whether it was
 discussed on the mailing list). And in fact, it looks to me like it
 turns an application error into a silently working application.
 Then there is probably something broken: rt_task_delete is supposed to
 return -EIDRM of the passed handle no longer exists. That's at least
 what the doc says. The point of this patch is to turn an application
 crash into a proper error return value (and that not only for
 --enable-debug).
 
 Here is the test I used:
 #include stdio.h
 #include native/task.h
 #include sys/mman.h
 
 void task_main(void* arg)
 {
 rt_task_sleep(10);
 }
 
 int main(void)
 {
  RT_TASK task;
 
  mlockall(MCL_CURRENT|MCL_FUTURE);
  rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE);
  rt_task_start(task, task_main, NULL);
  fprintf(stderr, join: %d\n, rt_task_join(task));
  fprintf(stderr, delete: %d\n, rt_task_delete(task));
 }
 
 it prints:
 join: 0
 delete: 0
 
 This said, I like the segfault, because people probably never check the
 return value of rt_task_join/rt_task_delete (which is, I guess, the
 reason why phtread_cancel and pthread_join segfault themselves, because
 the posix spec allows to return ESRCH in that case).

Well, that's kind of hard to sell to people writing software based on
documentation. We documented -EIDRM, so we should make it work like
this. Will look into this later.

The fact that pthread_cancel /may/ crash on invalid thread handles is an
implementation detail: The handle is a pointer to an object that /may/
no longer exist in memory when dereferenced.

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200

 native: Improve fault tolerance /wrt multiple task deletions

 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Ok. I have tested this patch (though I could not find whether it was
 discussed on the mailing list). And in fact, it looks to me like it
 turns an application error into a silently working application.
 Then there is probably something broken: rt_task_delete is supposed to
 return -EIDRM of the passed handle no longer exists. That's at least
 what the doc says. The point of this patch is to turn an application
 crash into a proper error return value (and that not only for
 --enable-debug).
 Here is the test I used:
 #include stdio.h
 #include native/task.h
 #include sys/mman.h

 void task_main(void* arg)
 {
 rt_task_sleep(10);
 }

 int main(void)
 {
  RT_TASK task;

  mlockall(MCL_CURRENT|MCL_FUTURE);
  rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE);
  rt_task_start(task, task_main, NULL);
  fprintf(stderr, join: %d\n, rt_task_join(task));
  fprintf(stderr, delete: %d\n, rt_task_delete(task));
 }

 it prints:
 join: 0
 delete: 0

 This said, I like the segfault, because people probably never check the
 return value of rt_task_join/rt_task_delete (which is, I guess, the
 reason why phtread_cancel and pthread_join segfault themselves, because
 the posix spec allows to return ESRCH in that case).
 
 Well, that's kind of hard to sell to people writing software based on
 documentation. We documented -EIDRM, so we should make it work like
 this. Will look into this later.
 
 The fact that pthread_cancel /may/ crash on invalid thread handles is an
 implementation detail: The handle is a pointer to an object that /may/
 no longer exist in memory when dereferenced.

Ok, another point is that when cancelling a joinable thread,
pthread_join should still be called after pthread_cancel. So,
rt_task_join should still be able to call pthread_join after a call to
rt_task_delete.

Running pthread_join inside rt_task_delete is not an option, as there
would be a risk for deadlocks.

-- 
Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] [Xenomai-git] Jan Kiszka : native: Improve fault tolerance /wrt multiple task deletions

2010-05-01 Thread Jan Kiszka
Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
 GIT version control wrote:
 Module: xenomai-jki
 Branch: for-upstream
 Commit: 5d2fa6c7578683e036d88bc6dbb6a7f458dfe705
 URL:
 http://git.xenomai.org/?p=xenomai-jki.git;a=commit;h=5d2fa6c7578683e036d88bc6dbb6a7f458dfe705

 Author: Jan Kiszka jan.kis...@siemens.com
 Date:   Wed Apr 28 15:08:11 2010 +0200

 native: Improve fault tolerance /wrt multiple task deletions

 As we may pass the pthread handle of an RT_TASK directly to glibc, we
 may trigger a SIGSEGV if the underlying thread was already terminated.
 Try to catch this application mistakes by clearing the handle at least
 in that task descriptor which successfully ran rt_task_delete or
 rt_task_join.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 Ok. I have tested this patch (though I could not find whether it was
 discussed on the mailing list). And in fact, it looks to me like it
 turns an application error into a silently working application.
 Then there is probably something broken: rt_task_delete is supposed to
 return -EIDRM of the passed handle no longer exists. That's at least
 what the doc says. The point of this patch is to turn an application
 crash into a proper error return value (and that not only for
 --enable-debug).
 Here is the test I used:
 #include stdio.h
 #include native/task.h
 #include sys/mman.h

 void task_main(void* arg)
 {
 rt_task_sleep(10);
 }

 int main(void)
 {
  RT_TASK task;

  mlockall(MCL_CURRENT|MCL_FUTURE);
  rt_task_create(task, task, 128*1024, 99, T_FPU|T_JOINABLE);
  rt_task_start(task, task_main, NULL);
  fprintf(stderr, join: %d\n, rt_task_join(task));
  fprintf(stderr, delete: %d\n, rt_task_delete(task));
 }

 it prints:
 join: 0
 delete: 0

 This said, I like the segfault, because people probably never check the
 return value of rt_task_join/rt_task_delete (which is, I guess, the
 reason why phtread_cancel and pthread_join segfault themselves, because
 the posix spec allows to return ESRCH in that case).
 Well, that's kind of hard to sell to people writing software based on
 documentation. We documented -EIDRM, so we should make it work like
 this. Will look into this later.

 The fact that pthread_cancel /may/ crash on invalid thread handles is an
 implementation detail: The handle is a pointer to an object that /may/
 no longer exist in memory when dereferenced.
 Ok, another point is that when cancelling a joinable thread,
 pthread_join should still be called after pthread_cancel. So,
 rt_task_join should still be able to call pthread_join after a call to
 rt_task_delete.

 Running pthread_join inside rt_task_delete is not an option, as there
 would be a risk for deadlocks.

 
 So we should always pthread_detach before canceling a thread in
 rt_task_delete?

Err, after, but only if it was joinable. Think we need to track
T_JOINABLE in user space as well.

Jan




signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core