Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Ack. There are rt_printf (rtdk) and the task-self services of vxworks, >>> vrtx and native skins. So if I get an OK for the proposal, I'll convert >>> the rest, too. >> And I really do not understand how the keys can be chosen at >> compilation-time, especially in the situation where multiple libraries >> allocate __thread objects separately: how does the compiler know which >> key to give for each library ? > > The keys are known at linking time when a) their context is module-local > and b) their TLS storage can be appended to main's TLS area (and that > can only happen during initial linking, of course). > > > BTW, another advantage of TLS over getspecific /wrt task-self is that it > overcomes its laziness: its now cheap and easy to set self in the thread > trampoline. > > Philippe, could it be that vrtx's sc_tinquiry implementation is broken? > I only find the corresponding pthread_setspecific in the lib > constructor. That pattern totally disagrees with native and vxworks. > Oh my... I killed this dead O/S once again! --- src/skins/vrtx/task.c (revision 4211) +++ src/skins/vrtx/task.c (working copy) @@ -80,6 +80,7 @@ struct sched_param param; int policy; long err; + TCB *tcb; /* Backup the arg struct, it might vanish after completion. */ memcpy(&_iargs, iargs, sizeof(_iargs)); @@ -94,6 +95,15 @@ /* vrtx_task_delete requires asynchronous cancellation */ pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL); + tcb = (TCB *) malloc(sizeof(*tcb)); + if (tcb == NULL) { + fprintf(stderr, "Xenomai: failed to allocate local TCB?!\n"); + err = -ENOMEM; + goto fail; + } + + pthread_setspecific(__vrtx_tskey, tcb); + old_sigharden_handler = signal(SIGHARDEN, &vrtx_task_sigharden); bulk.a1 = (u_long)iargs->tid; @@ -116,9 +126,7 @@ if (!err) _iargs.entry(_iargs.param); - - fail: - +fail: pthread_exit((void *)err); } Index: src/skins/vrtx/init.c === --- src/skins/vrtx/init.c (revision 4211) +++ src/skins/vrtx/init.c (working copy) @@ -39,8 +39,6 @@ static __attribute__ ((constructor)) void __init_xeno_interface(void) { - TCB *tcb; - __vrtx_muxid = xeno_bind_skin(VRTX_SKIN_MAGIC, "vrtx", "xeno_vrtx"); __vrtx_muxid = __xn_mux_shifted_id(__vrtx_muxid); @@ -51,13 +49,4 @@ fprintf(stderr, "Xenomai: failed to allocate new TSD key?!\n"); exit(1); } - - tcb = (TCB *) malloc(sizeof(*tcb)); - - if (!tcb) { - fprintf(stderr, "Xenomai: failed to allocate local TCB?!\n"); - exit(1); - } - - pthread_setspecific(__vrtx_tskey, tcb); } > Jan > -- Philippe. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Ack. There are rt_printf (rtdk) and the task-self services of vxworks, >> vrtx and native skins. So if I get an OK for the proposal, I'll convert >> the rest, too. > > And I really do not understand how the keys can be chosen at > compilation-time, especially in the situation where multiple libraries > allocate __thread objects separately: how does the compiler know which > key to give for each library ? The keys are known at linking time when a) their context is module-local and b) their TLS storage can be appended to main's TLS area (and that can only happen during initial linking, of course). BTW, another advantage of TLS over getspecific /wrt task-self is that it overcomes its laziness: its now cheap and easy to set self in the thread trampoline. Philippe, could it be that vrtx's sc_tinquiry implementation is broken? I only find the corresponding pthread_setspecific in the lib constructor. That pattern totally disagrees with native and vxworks. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Ack. There are rt_printf (rtdk) and the task-self services of vxworks, > vrtx and native skins. So if I get an OK for the proposal, I'll convert > the rest, too. And I really do not understand how the keys can be chosen at compilation-time, especially in the situation where multiple libraries allocate __thread objects separately: how does the compiler know which key to give for each library ? -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > So if I get an OK for the proposal, I'll convert the rest, too. >From my point of view, it really is a micro-optimization, which uselessly clutters code and configure script. But I really have no concrete objection, especially since you say that __thread is the future, not pthread_getspecific. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: It will always remain orders of magnitude heavier than __thread variables which are a) inlined and b) should only need two memory accesses at worst. Moreover, it is clearly the future, while the importance of pthread_getspecific will decrease over the time. The __thread storage class is C99 standard (though its implementation remains a separate topic). >>> You are exagerating a bit: pthread_getspecific is pretty efficient >>> already (from the few things that I have timed on ARM, it is the only >>> one which takes under the microsecond). That you will gain something >>> with __thread is not guaranteed by the C99 standard either: in fact the >>> implementation could use exactly the same functions. >> As long as we do not loose anything (performance or portability), > You loose portability. But I agree that we do not care much. The fallback remains - must remain in order to obtain true optimization from the TLS-based version without locking out some corner-case usage. Find a proposal below (on top of handle-based xeno_get_current). We have to set initial-exec as TLS model, otherwise we end up with a dynamic lookup similar (maybe still faster, dunno) to the pthread service. This model requires start-time linking, will not work with dlopen (I strongly assume the linker will bail out). But I consider runtime loading of Xenomai libs as a uncommon corner case, and the user can still re-enable it via --without-__thread. >>> glibc has a separate test to know whether the tsl_model attribute is >>> supported: >>> >>> if test "$libc_cv_gcc___thread" = yes; then >>> dnl Check whether the compiler supports the tls_model attribute. >>> AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl >>> cat > conftest.c <<\EOF >>> extern __thread int a __attribute__((tls_model ("initial-exec"))); >>> EOF >>> if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c &AS_MESSA >>> GE_LOG_FD]); then >>> libc_cv_gcc_tls_model_attr=yes >>> else >>> libc_cv_gcc_tls_model_attr=no >>> fi >>> rm -f conftest*]) >>> if test "$libc_cv_gcc_tls_model_attr" = yes; then >>> AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE) >>> fi >>> fi >>> >> OK, but for us the question is if we want __thread without initial-exec >> at all. If not (I think so), I could add -Werror to the __thread test >> and that should combine both tests into a single one, sufficient for our >> use case. > > Yes, my point was that if an implementation supports __thread, it does > not necessarily mean that it supports the tls_model attribute. We should > fallback to pthread_key if either one is not supported. However, other > parts of Xenomai skins use pthread_specific, so, if we implement > something based on __thread, I think we should factor it and use it > everywhere. Ack. There are rt_printf (rtdk) and the task-self services of vxworks, vrtx and native skins. So if I get an OK for the proposal, I'll convert the rest, too. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> It will always remain orders of magnitude heavier than __thread >>> variables which are a) inlined and b) should only need two memory >>> accesses at worst. Moreover, it is clearly the future, while the >>> importance of pthread_getspecific will decrease over the time. The >>> __thread storage class is C99 standard (though its implementation >>> remains a separate topic). >> You are exagerating a bit: pthread_getspecific is pretty efficient >> already (from the few things that I have timed on ARM, it is the only >> one which takes under the microsecond). That you will gain something >> with __thread is not guaranteed by the C99 standard either: in fact the >> implementation could use exactly the same functions. > As long as we do not loose anything (performance or portability), You loose portability. But I agree that we do not care much. >>> The fallback remains - must remain in order to obtain true optimization >>> from the TLS-based version without locking out some corner-case usage. >>> Find a proposal below (on top of handle-based xeno_get_current). >>> >>> We have to set initial-exec as TLS model, otherwise we end up with a >>> dynamic lookup similar (maybe still faster, dunno) to the pthread >>> service. This model requires start-time linking, will not work with >>> dlopen (I strongly assume the linker will bail out). But I consider >>> runtime loading of Xenomai libs as a uncommon corner case, and the user >>> can still re-enable it via --without-__thread. >> glibc has a separate test to know whether the tsl_model attribute is >> supported: >> >> if test "$libc_cv_gcc___thread" = yes; then >> dnl Check whether the compiler supports the tls_model attribute. >> AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl >> cat > conftest.c <<\EOF >> extern __thread int a __attribute__((tls_model ("initial-exec"))); >> EOF >> if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c >>> &AS_MESSA >> GE_LOG_FD]); then >> libc_cv_gcc_tls_model_attr=yes >> else >> libc_cv_gcc_tls_model_attr=no >> fi >> rm -f conftest*]) >> if test "$libc_cv_gcc_tls_model_attr" = yes; then >> AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE) >> fi >> fi >> > > OK, but for us the question is if we want __thread without initial-exec > at all. If not (I think so), I could add -Werror to the __thread test > and that should combine both tests into a single one, sufficient for our > use case. Yes, my point was that if an implementation supports __thread, it does not necessarily mean that it supports the tls_model attribute. We should fallback to pthread_key if either one is not supported. However, other parts of Xenomai skins use pthread_specific, so, if we implement something based on __thread, I think we should factor it and use it everywhere. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> It will always remain orders of magnitude heavier than __thread >> variables which are a) inlined and b) should only need two memory >> accesses at worst. Moreover, it is clearly the future, while the >> importance of pthread_getspecific will decrease over the time. The >> __thread storage class is C99 standard (though its implementation >> remains a separate topic). > You are exagerating a bit: pthread_getspecific is pretty efficient > already (from the few things that I have timed on ARM, it is the only > one which takes under the microsecond). That you will gain something > with __thread is not guaranteed by the C99 standard either: in fact the > implementation could use exactly the same functions. As long as we do not loose anything (performance or portability), >>> You loose portability. But I agree that we do not care much. >> The fallback remains - must remain in order to obtain true optimization >> from the TLS-based version without locking out some corner-case usage. >> Find a proposal below (on top of handle-based xeno_get_current). >> >> We have to set initial-exec as TLS model, otherwise we end up with a >> dynamic lookup similar (maybe still faster, dunno) to the pthread >> service. This model requires start-time linking, will not work with >> dlopen (I strongly assume the linker will bail out). But I consider >> runtime loading of Xenomai libs as a uncommon corner case, and the user >> can still re-enable it via --without-__thread. > > glibc has a separate test to know whether the tsl_model attribute is > supported: > > if test "$libc_cv_gcc___thread" = yes; then > dnl Check whether the compiler supports the tls_model attribute. > AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl > cat > conftest.c <<\EOF > extern __thread int a __attribute__((tls_model ("initial-exec"))); > EOF > if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c >> &AS_MESSA > GE_LOG_FD]); then > libc_cv_gcc_tls_model_attr=yes > else > libc_cv_gcc_tls_model_attr=no > fi > rm -f conftest*]) > if test "$libc_cv_gcc_tls_model_attr" = yes; then > AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE) > fi > fi > OK, but for us the question is if we want __thread without initial-exec at all. If not (I think so), I could add -Werror to the __thread test and that should combine both tests into a single one, sufficient for our use case. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: Jan Kiszka wrote: > It will always remain orders of magnitude heavier than __thread > variables which are a) inlined and b) should only need two memory > accesses at worst. Moreover, it is clearly the future, while the > importance of pthread_getspecific will decrease over the time. The > __thread storage class is C99 standard (though its implementation > remains a separate topic). You are exagerating a bit: pthread_getspecific is pretty efficient already (from the few things that I have timed on ARM, it is the only one which takes under the microsecond). That you will gain something with __thread is not guaranteed by the C99 standard either: in fact the implementation could use exactly the same functions. >>> As long as we do not loose anything (performance or portability), >> You loose portability. But I agree that we do not care much. > > The fallback remains - must remain in order to obtain true optimization > from the TLS-based version without locking out some corner-case usage. > Find a proposal below (on top of handle-based xeno_get_current). > > We have to set initial-exec as TLS model, otherwise we end up with a > dynamic lookup similar (maybe still faster, dunno) to the pthread > service. This model requires start-time linking, will not work with > dlopen (I strongly assume the linker will bail out). But I consider > runtime loading of Xenomai libs as a uncommon corner case, and the user > can still re-enable it via --without-__thread. glibc has a separate test to know whether the tsl_model attribute is supported: if test "$libc_cv_gcc___thread" = yes; then dnl Check whether the compiler supports the tls_model attribute. AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl cat > conftest.c <<\EOF extern __thread int a __attribute__((tls_model ("initial-exec"))); EOF if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c >&AS_MESSA GE_LOG_FD]); then libc_cv_gcc_tls_model_attr=yes else libc_cv_gcc_tls_model_attr=no fi rm -f conftest*]) if test "$libc_cv_gcc_tls_model_attr" = yes; then AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE) fi fi -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: It will always remain orders of magnitude heavier than __thread variables which are a) inlined and b) should only need two memory accesses at worst. Moreover, it is clearly the future, while the importance of pthread_getspecific will decrease over the time. The __thread storage class is C99 standard (though its implementation remains a separate topic). >>> You are exagerating a bit: pthread_getspecific is pretty efficient >>> already (from the few things that I have timed on ARM, it is the only >>> one which takes under the microsecond). That you will gain something >>> with __thread is not guaranteed by the C99 standard either: in fact the >>> implementation could use exactly the same functions. >> As long as we do not loose anything (performance or portability), > > You loose portability. But I agree that we do not care much. The fallback remains - must remain in order to obtain true optimization from the TLS-based version without locking out some corner-case usage. Find a proposal below (on top of handle-based xeno_get_current). We have to set initial-exec as TLS model, otherwise we end up with a dynamic lookup similar (maybe still faster, dunno) to the pthread service. This model requires start-time linking, will not work with dlopen (I strongly assume the linker will bail out). But I consider runtime loading of Xenomai libs as a uncommon corner case, and the user can still re-enable it via --without-__thread. Jan --- configure.in | 23 +++ include/asm-generic/bits/bind.h| 44 ++--- include/asm-generic/bits/current.h | 13 +- 3 files changed, 66 insertions(+), 14 deletions(-) Index: b/configure.in === --- a/configure.in +++ b/configure.in @@ -762,6 +762,29 @@ LIBS="$LIBS -lrt" AC_CHECK_FUNCS([shm_open shm_unlink]) LIBS="$save_LIBS" +AC_ARG_WITH([__thread], + AC_HELP_STRING([--without-__thread], + [do not use TLS features (allows for dlopen'ing Xenomai libs)]), + [use__thread=$withval], + [use__thread=yes]) + +dnl Check whether the compiler supports the __thread keyword. +if test "x$use__thread" != xno; then + AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread, + [cat > conftest.c <<\EOF +__thread int a __attribute__ ((tls_model ("initial-exec"))) = 42; +EOF + if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c >&AS_MESSAGE_LOG_FD]); then + libc_cv_gcc___thread=yes + else + libc_cv_gcc___thread=no + fi + rm -f conftest*]) + if test "$libc_cv_gcc___thread" = yes; then + AC_DEFINE(HAVE___THREAD,1,[config]) + fi +fi + dnl dnl Build the Makefiles dnl Index: b/include/asm-generic/bits/bind.h === --- a/include/asm-generic/bits/bind.h +++ b/include/asm-generic/bits/bind.h @@ -11,26 +11,26 @@ #include #include #include +#include #include +#ifdef HAVE___THREAD +__thread xnhandle_t xeno_current __attribute__ ((tls_model ("initial-exec"))) = + XN_NO_HANDLE; + +static inline void __xeno_set_current(xnhandle_t current) +{ + xeno_current = current; +} +#else /* !HAVE___THREAD */ __attribute__ ((weak)) pthread_key_t xeno_current_key; __attribute__ ((weak)) pthread_once_t xeno_init_current_key_once = PTHREAD_ONCE_INIT; -__attribute__ ((weak)) -void xeno_set_current(void) +static inline void __xeno_set_current(xnhandle_t current) { - void *kthread_cb; - int err; - - err = XENOMAI_SYSCALL1(__xn_sys_current, &kthread_cb); - if (err) { - fprintf(stderr, "Xenomai: error obtaining handle for current " - "thread: %s\n", strerror(err)); - exit(1); - } - pthread_setspecific(xeno_current_key, kthread_cb); + pthread_setspecific(xeno_current_key, (void *)current); } static void init_current_key(void) @@ -42,6 +42,22 @@ static void init_current_key(void) exit(1); } } +#endif /* !HAVE___THREAD */ + +__attribute__ ((weak)) +void xeno_set_current(void) +{ + xnhandle_t current; + int err; + + err = XENOMAI_SYSCALL1(__xn_sys_current, ¤t); + if (err) { + fprintf(stderr, "Xenomai: error obtaining handle for current " + "thread: %s\n", strerror(err)); + exit(1); + } + __xeno_set_current(current); +} #ifdef CONFIG_XENO_FASTSYNCH __attribute__ ((weak)) @@ -175,7 +191,9 @@ xeno_bind_skin(unsigned skin_magic, cons sa.sa_flags = 0; sigaction(SIGXCPU, &sa, NULL); +#ifndef HAVE___THREAD pthread_once(&xeno_init_current_key_once, &init_current_key); +#endif /* !HAVE___THREAD */
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> It will always remain orders of magnitude heavier than __thread >>> variables which are a) inlined and b) should only need two memory >>> accesses at worst. Moreover, it is clearly the future, while the >>> importance of pthread_getspecific will decrease over the time. The >>> __thread storage class is C99 standard (though its implementation >>> remains a separate topic). >> You are exagerating a bit: pthread_getspecific is pretty efficient >> already (from the few things that I have timed on ARM, it is the only >> one which takes under the microsecond). That you will gain something >> with __thread is not guaranteed by the C99 standard either: in fact the >> implementation could use exactly the same functions. > > As long as we do not loose anything (performance or portability), You loose portability. But I agree that we do not care much. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Hi, >> >> looking into the "xeno_in_primary_mode" thing I wondered how to make the >> thread state quickly retrievable. Going via pthread_getspecific as we do >> for xeno_get_current appears logical - but not optimal. Though >> getspecific is optimized for speed, it remains a function call, a few >> sanity checks, and only finally a TLS variable access. That could be >> achieved in a much lighter way by using a __thread variable. >> >> But can we assume that all target we support also support the __thread >> storage class? TLS is surely mandatory now: I assume pthread_getspecific >> would become non-RT safe without it, right? Is there anything we >> can/must check for during configure to verify __thread support? > I really think that this optimization is not worth the trouble. Anyway, As long as we cannot specify the amount of "trouble", it's hard to decide. Me current feeling is that it should rather simplify the implementation + save us quite a few ops in the fast path (even more with upcoming thread-mode check). >>> The trouble is to make some reliable detections in the configure script, >>> so that the user will know early that Xenomai can not work with its >>> current toolchain. And to make this detection work with uclibc as well >>> as with glibc, gcc 4 versus gcc 3, etc... >> Will work out a test program for configure. > > glibc does that: > > AC_ARG_WITH([__thread], > AC_HELP_STRING([--without-__thread], >[do not use TLS features even when supporting > them]), > [use__thread=$withval], > [use__thread=yes]) > > dnl Check whether the compiler supports the __thread keyword. > if test "x$use__thread" != xno; then > AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread, > [cat > conftest.c <<\EOF > __thread int a = 42; > EOF > if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c > >&AS_MESSAGE_LOG_F > D]); then > libc_cv_gcc___thread=yes > else > libc_cv_gcc___thread=no > fi > rm -f conftest*]) > if test "$libc_cv_gcc___thread" = yes; then > AC_DEFINE(HAVE___THREAD) > fi > else > libc_cv_gcc___thread=no > fi > Cool, thanks. Will play with this for xeno_get_current soon. Then we can review and test if it works and makes sense based on some real code. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> It will always remain orders of magnitude heavier than __thread >> variables which are a) inlined and b) should only need two memory >> accesses at worst. Moreover, it is clearly the future, while the >> importance of pthread_getspecific will decrease over the time. The >> __thread storage class is C99 standard (though its implementation >> remains a separate topic). > > You are exagerating a bit: pthread_getspecific is pretty efficient > already (from the few things that I have timed on ARM, it is the only > one which takes under the microsecond). That you will gain something > with __thread is not guaranteed by the C99 standard either: in fact the > implementation could use exactly the same functions. As long as we do not loose anything (performance or portability), there is no point in sticking with pthread_getspecific. At least on x86 the advantage is easily visible (==no more function calls). That's also due to the lighter concept of __thread: it pushes the key resolution from runtime to compile/link-time. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Gilles Chanteperdrix wrote: Jan Kiszka wrote: > Hi, > > looking into the "xeno_in_primary_mode" thing I wondered how to make the > thread state quickly retrievable. Going via pthread_getspecific as we do > for xeno_get_current appears logical - but not optimal. Though > getspecific is optimized for speed, it remains a function call, a few > sanity checks, and only finally a TLS variable access. That could be > achieved in a much lighter way by using a __thread variable. > > But can we assume that all target we support also support the __thread > storage class? TLS is surely mandatory now: I assume pthread_getspecific > would become non-RT safe without it, right? Is there anything we > can/must check for during configure to verify __thread support? I really think that this optimization is not worth the trouble. Anyway, >>> As long as we cannot specify the amount of "trouble", it's hard to >>> decide. Me current feeling is that it should rather simplify the >>> implementation + save us quite a few ops in the fast path (even more >>> with upcoming thread-mode check). >> The trouble is to make some reliable detections in the configure script, >> so that the user will know early that Xenomai can not work with its >> current toolchain. And to make this detection work with uclibc as well >> as with glibc, gcc 4 versus gcc 3, etc... > > Will work out a test program for configure. glibc does that: AC_ARG_WITH([__thread], AC_HELP_STRING([--without-__thread], [do not use TLS features even when supporting them]), [use__thread=$withval], [use__thread=yes]) dnl Check whether the compiler supports the __thread keyword. if test "x$use__thread" != xno; then AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread, [cat > conftest.c <<\EOF __thread int a = 42; EOF if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c >&AS_MESSAGE_LOG_F D]); then libc_cv_gcc___thread=yes else libc_cv_gcc___thread=no fi rm -f conftest*]) if test "$libc_cv_gcc___thread" = yes; then AC_DEFINE(HAVE___THREAD) fi else libc_cv_gcc___thread=no fi -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > It will always remain orders of magnitude heavier than __thread > variables which are a) inlined and b) should only need two memory > accesses at worst. Moreover, it is clearly the future, while the > importance of pthread_getspecific will decrease over the time. The > __thread storage class is C99 standard (though its implementation > remains a separate topic). You are exagerating a bit: pthread_getspecific is pretty efficient already (from the few things that I have timed on ARM, it is the only one which takes under the microsecond). That you will gain something with __thread is not guaranteed by the C99 standard either: in fact the implementation could use exactly the same functions. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Gilles Chanteperdrix wrote: >>> Jan Kiszka wrote: Hi, looking into the "xeno_in_primary_mode" thing I wondered how to make the thread state quickly retrievable. Going via pthread_getspecific as we do for xeno_get_current appears logical - but not optimal. Though getspecific is optimized for speed, it remains a function call, a few sanity checks, and only finally a TLS variable access. That could be achieved in a much lighter way by using a __thread variable. But can we assume that all target we support also support the __thread storage class? TLS is surely mandatory now: I assume pthread_getspecific would become non-RT safe without it, right? Is there anything we can/must check for during configure to verify __thread support? >>> I really think that this optimization is not worth the trouble. Anyway, >> As long as we cannot specify the amount of "trouble", it's hard to >> decide. Me current feeling is that it should rather simplify the >> implementation + save us quite a few ops in the fast path (even more >> with upcoming thread-mode check). > > The trouble is to make some reliable detections in the configure script, > so that the user will know early that Xenomai can not work with its > current toolchain. And to make this detection work with uclibc as well > as with glibc, gcc 4 versus gcc 3, etc... Will work out a test program for configure. > > Besides, pthread_getspecific can be implemented pretty efficiently in > user-space without __thread support: using a hash table would be enough. > So, if we rely on pthread_getspecific, we do not have to know if ptd > are implemented with some hardware trick. It will always remain orders of magnitude heavier than __thread variables which are a) inlined and b) should only need two memory accesses at worst. Moreover, it is clearly the future, while the importance of pthread_getspecific will decrease over the time. The __thread storage class is C99 standard (though its implementation remains a separate topic). > >>> I have one question: is an implementation guaranteed to support more >>> than one __thread variable? Because from ARM implementation I would say >>> that ARM has only one __thread variable. >> That would be weird - there is no such limitation known to me. Anyway, >> you could easily verify this with a simple test program I guess. /me >> also wonders how the glibc/NPTL is maintaining certain per-thread >> variables (and there are surely > 1) internally. > > I would say the __thread variable is used to store an array, which is, > according to what Philippe said yesterday, turned into a multilevel > structure when creating more than 32 keys. > > At the hardware level, I am pretty sure ARM has only one per thread > token. However I do not know how it is used to implement __thread > variable (or variables). Will look into some ELF definitions on this, but that there is a single hw token (CPU register) for accessing the TLS root is surely not uncommon. The rest is linker magic, specifically when it comes to dealing with offsets of cross-lib TLS variables. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Gilles Chanteperdrix wrote: >> Jan Kiszka wrote: >>> Hi, >>> >>> looking into the "xeno_in_primary_mode" thing I wondered how to make the >>> thread state quickly retrievable. Going via pthread_getspecific as we do >>> for xeno_get_current appears logical - but not optimal. Though >>> getspecific is optimized for speed, it remains a function call, a few >>> sanity checks, and only finally a TLS variable access. That could be >>> achieved in a much lighter way by using a __thread variable. >>> >>> But can we assume that all target we support also support the __thread >>> storage class? TLS is surely mandatory now: I assume pthread_getspecific >>> would become non-RT safe without it, right? Is there anything we >>> can/must check for during configure to verify __thread support? >> I really think that this optimization is not worth the trouble. Anyway, > > As long as we cannot specify the amount of "trouble", it's hard to > decide. Me current feeling is that it should rather simplify the > implementation + save us quite a few ops in the fast path (even more > with upcoming thread-mode check). The trouble is to make some reliable detections in the configure script, so that the user will know early that Xenomai can not work with its current toolchain. And to make this detection work with uclibc as well as with glibc, gcc 4 versus gcc 3, etc... Besides, pthread_getspecific can be implemented pretty efficiently in user-space without __thread support: using a hash table would be enough. So, if we rely on pthread_getspecific, we do not have to know if ptd are implemented with some hardware trick. > >> I have one question: is an implementation guaranteed to support more >> than one __thread variable? Because from ARM implementation I would say >> that ARM has only one __thread variable. > > That would be weird - there is no such limitation known to me. Anyway, > you could easily verify this with a simple test program I guess. /me > also wonders how the glibc/NPTL is maintaining certain per-thread > variables (and there are surely > 1) internally. I would say the __thread variable is used to store an array, which is, according to what Philippe said yesterday, turned into a multilevel structure when creating more than 32 keys. At the hardware level, I am pretty sure ARM has only one per thread token. However I do not know how it is used to implement __thread variable (or variables). -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Gilles Chanteperdrix wrote: > Jan Kiszka wrote: >> Hi, >> >> looking into the "xeno_in_primary_mode" thing I wondered how to make the >> thread state quickly retrievable. Going via pthread_getspecific as we do >> for xeno_get_current appears logical - but not optimal. Though >> getspecific is optimized for speed, it remains a function call, a few >> sanity checks, and only finally a TLS variable access. That could be >> achieved in a much lighter way by using a __thread variable. >> >> But can we assume that all target we support also support the __thread >> storage class? TLS is surely mandatory now: I assume pthread_getspecific >> would become non-RT safe without it, right? Is there anything we >> can/must check for during configure to verify __thread support? > > I really think that this optimization is not worth the trouble. Anyway, As long as we cannot specify the amount of "trouble", it's hard to decide. Me current feeling is that it should rather simplify the implementation + save us quite a few ops in the fast path (even more with upcoming thread-mode check). > I have one question: is an implementation guaranteed to support more > than one __thread variable? Because from ARM implementation I would say > that ARM has only one __thread variable. That would be weird - there is no such limitation known to me. Anyway, you could easily verify this with a simple test program I guess. /me also wonders how the glibc/NPTL is maintaining certain per-thread variables (and there are surely > 1) internally. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
Re: [Xenomai-core] __thread instead of pthread_get/setspecific
Jan Kiszka wrote: > Hi, > > looking into the "xeno_in_primary_mode" thing I wondered how to make the > thread state quickly retrievable. Going via pthread_getspecific as we do > for xeno_get_current appears logical - but not optimal. Though > getspecific is optimized for speed, it remains a function call, a few > sanity checks, and only finally a TLS variable access. That could be > achieved in a much lighter way by using a __thread variable. > > But can we assume that all target we support also support the __thread > storage class? TLS is surely mandatory now: I assume pthread_getspecific > would become non-RT safe without it, right? Is there anything we > can/must check for during configure to verify __thread support? I really think that this optimization is not worth the trouble. Anyway, I have one question: is an implementation guaranteed to support more than one __thread variable? Because from ARM implementation I would say that ARM has only one __thread variable. -- Gilles. ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core
[Xenomai-core] __thread instead of pthread_get/setspecific
Hi, looking into the "xeno_in_primary_mode" thing I wondered how to make the thread state quickly retrievable. Going via pthread_getspecific as we do for xeno_get_current appears logical - but not optimal. Though getspecific is optimized for speed, it remains a function call, a few sanity checks, and only finally a TLS variable access. That could be achieved in a much lighter way by using a __thread variable. But can we assume that all target we support also support the __thread storage class? TLS is surely mandatory now: I assume pthread_getspecific would become non-RT safe without it, right? Is there anything we can/must check for during configure to verify __thread support? Jan signature.asc Description: OpenPGP digital signature ___ Xenomai-core mailing list Xenomai-core@gna.org https://mail.gna.org/listinfo/xenomai-core