Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Philippe Gerum
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Ack. There are rt_printf (rtdk) and the task-self services of vxworks,
>>> vrtx and native skins. So if I get an OK for the proposal, I'll convert
>>> the rest, too.
>> And I really do not understand how the keys can be chosen at
>> compilation-time, especially in the situation where multiple libraries
>> allocate __thread objects separately: how does the compiler know which
>> key to give for each library ?
> 
> The keys are known at linking time when a) their context is module-local
> and b) their TLS storage can be appended to main's TLS area (and that
> can only happen during initial linking, of course).
> 
> 
> BTW, another advantage of TLS over getspecific /wrt task-self is that it
> overcomes its laziness: its now cheap and easy to set self in the thread
> trampoline.
> 
> Philippe, could it be that vrtx's sc_tinquiry implementation is broken?
> I only find the corresponding pthread_setspecific in the lib
> constructor. That pattern totally disagrees with native and vxworks.
>

Oh my... I killed this dead O/S once again!

--- src/skins/vrtx/task.c   (revision 4211)
+++ src/skins/vrtx/task.c   (working copy)
@@ -80,6 +80,7 @@
struct sched_param param;
int policy;
long err;
+   TCB *tcb;

/* Backup the arg struct, it might vanish after completion. */
memcpy(&_iargs, iargs, sizeof(_iargs));
@@ -94,6 +95,15 @@
/* vrtx_task_delete requires asynchronous cancellation */
pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);

+   tcb = (TCB *) malloc(sizeof(*tcb));
+   if (tcb == NULL) {
+   fprintf(stderr, "Xenomai: failed to allocate local TCB?!\n");
+   err = -ENOMEM;
+   goto fail;
+   }
+
+   pthread_setspecific(__vrtx_tskey, tcb);
+
old_sigharden_handler = signal(SIGHARDEN, &vrtx_task_sigharden);

bulk.a1 = (u_long)iargs->tid;
@@ -116,9 +126,7 @@

if (!err)
_iargs.entry(_iargs.param);
-
-  fail:
-
+fail:
pthread_exit((void *)err);
 }

Index: src/skins/vrtx/init.c
===
--- src/skins/vrtx/init.c   (revision 4211)
+++ src/skins/vrtx/init.c   (working copy)
@@ -39,8 +39,6 @@
 static __attribute__ ((constructor))
 void __init_xeno_interface(void)
 {
-   TCB *tcb;
-
__vrtx_muxid =
xeno_bind_skin(VRTX_SKIN_MAGIC, "vrtx", "xeno_vrtx");
__vrtx_muxid = __xn_mux_shifted_id(__vrtx_muxid);
@@ -51,13 +49,4 @@
fprintf(stderr, "Xenomai: failed to allocate new TSD key?!\n");
exit(1);
}
-
-   tcb = (TCB *) malloc(sizeof(*tcb));
-
-   if (!tcb) {
-   fprintf(stderr, "Xenomai: failed to allocate local TCB?!\n");
-   exit(1);
-   }
-
-   pthread_setspecific(__vrtx_tskey, tcb);
 }


> Jan
> 


-- 
Philippe.


___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Ack. There are rt_printf (rtdk) and the task-self services of vxworks,
>> vrtx and native skins. So if I get an OK for the proposal, I'll convert
>> the rest, too.
> 
> And I really do not understand how the keys can be chosen at
> compilation-time, especially in the situation where multiple libraries
> allocate __thread objects separately: how does the compiler know which
> key to give for each library ?

The keys are known at linking time when a) their context is module-local
and b) their TLS storage can be appended to main's TLS area (and that
can only happen during initial linking, of course).


BTW, another advantage of TLS over getspecific /wrt task-self is that it
overcomes its laziness: its now cheap and easy to set self in the thread
trampoline.

Philippe, could it be that vrtx's sc_tinquiry implementation is broken?
I only find the corresponding pthread_setspecific in the lib
constructor. That pattern totally disagrees with native and vxworks.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Ack. There are rt_printf (rtdk) and the task-self services of vxworks,
> vrtx and native skins. So if I get an OK for the proposal, I'll convert
> the rest, too.

And I really do not understand how the keys can be chosen at
compilation-time, especially in the situation where multiple libraries
allocate __thread objects separately: how does the compiler know which
key to give for each library ?

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> So if I get an OK for the proposal, I'll convert the rest, too.

>From my point of view, it really is a micro-optimization, which
uselessly clutters code and configure script. But I really have no
concrete objection, especially since you say that __thread is the
future, not pthread_getspecific.

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 It will always remain orders of magnitude heavier than __thread
 variables which are a) inlined and b) should only need two memory
 accesses at worst. Moreover, it is clearly the future, while the
 importance of pthread_getspecific will decrease over the time. The
 __thread storage class is C99 standard (though its implementation
 remains a separate topic).
>>> You are exagerating a bit: pthread_getspecific is pretty efficient
>>> already (from the few things that I have timed on ARM, it is the only
>>> one which takes under the microsecond). That you will gain something
>>> with __thread is not guaranteed by the C99 standard either: in fact the
>>> implementation could use exactly the same functions.
>> As long as we do not loose anything (performance or portability),
> You loose portability. But I agree that we do not care much.
 The fallback remains - must remain in order to obtain true optimization
 from the TLS-based version without locking out some corner-case usage.
 Find a proposal below (on top of handle-based xeno_get_current).

 We have to set initial-exec as TLS model, otherwise we end up with a
 dynamic lookup similar (maybe still faster, dunno) to the pthread
 service. This model requires start-time linking, will not work with
 dlopen (I strongly assume the linker will bail out). But I consider
 runtime loading of Xenomai libs as a uncommon corner case, and the user
 can still re-enable it via --without-__thread.
>>> glibc has a separate test to know whether the tsl_model attribute is
>>> supported:
>>>
>>> if test "$libc_cv_gcc___thread" = yes; then
>>>   dnl Check whether the compiler supports the tls_model attribute.
>>>   AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl
>>>   cat > conftest.c <<\EOF
>>> extern __thread int a __attribute__((tls_model ("initial-exec")));
>>> EOF
>>>   if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c
 &AS_MESSA
>>> GE_LOG_FD]); then
>>> libc_cv_gcc_tls_model_attr=yes
>>>   else
>>> libc_cv_gcc_tls_model_attr=no
>>>   fi
>>>   rm -f conftest*])
>>>   if test "$libc_cv_gcc_tls_model_attr" = yes; then
>>> AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE)
>>>   fi
>>> fi
>>>
>> OK, but for us the question is if we want __thread without initial-exec
>> at all. If not (I think so), I could add -Werror to the __thread test
>> and that should combine both tests into a single one, sufficient for our
>> use case.
> 
> Yes, my point was that if an implementation supports __thread, it does
> not necessarily mean that it supports the tls_model attribute. We should
> fallback to pthread_key if either one is not supported. However, other
> parts of Xenomai skins use pthread_specific, so, if we implement
> something based on __thread, I think we should factor it and use it
> everywhere.

Ack. There are rt_printf (rtdk) and the task-self services of vxworks,
vrtx and native skins. So if I get an OK for the proposal, I'll convert
the rest, too.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> It will always remain orders of magnitude heavier than __thread
>>> variables which are a) inlined and b) should only need two memory
>>> accesses at worst. Moreover, it is clearly the future, while the
>>> importance of pthread_getspecific will decrease over the time. The
>>> __thread storage class is C99 standard (though its implementation
>>> remains a separate topic).
>> You are exagerating a bit: pthread_getspecific is pretty efficient
>> already (from the few things that I have timed on ARM, it is the only
>> one which takes under the microsecond). That you will gain something
>> with __thread is not guaranteed by the C99 standard either: in fact the
>> implementation could use exactly the same functions.
> As long as we do not loose anything (performance or portability),
 You loose portability. But I agree that we do not care much.
>>> The fallback remains - must remain in order to obtain true optimization
>>> from the TLS-based version without locking out some corner-case usage.
>>> Find a proposal below (on top of handle-based xeno_get_current).
>>>
>>> We have to set initial-exec as TLS model, otherwise we end up with a
>>> dynamic lookup similar (maybe still faster, dunno) to the pthread
>>> service. This model requires start-time linking, will not work with
>>> dlopen (I strongly assume the linker will bail out). But I consider
>>> runtime loading of Xenomai libs as a uncommon corner case, and the user
>>> can still re-enable it via --without-__thread.
>> glibc has a separate test to know whether the tsl_model attribute is
>> supported:
>>
>> if test "$libc_cv_gcc___thread" = yes; then
>>   dnl Check whether the compiler supports the tls_model attribute.
>>   AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl
>>   cat > conftest.c <<\EOF
>> extern __thread int a __attribute__((tls_model ("initial-exec")));
>> EOF
>>   if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c
>>> &AS_MESSA
>> GE_LOG_FD]); then
>> libc_cv_gcc_tls_model_attr=yes
>>   else
>> libc_cv_gcc_tls_model_attr=no
>>   fi
>>   rm -f conftest*])
>>   if test "$libc_cv_gcc_tls_model_attr" = yes; then
>> AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE)
>>   fi
>> fi
>>
> 
> OK, but for us the question is if we want __thread without initial-exec
> at all. If not (I think so), I could add -Werror to the __thread test
> and that should combine both tests into a single one, sufficient for our
> use case.

Yes, my point was that if an implementation supports __thread, it does
not necessarily mean that it supports the tls_model attribute. We should
fallback to pthread_key if either one is not supported. However, other
parts of Xenomai skins use pthread_specific, so, if we implement
something based on __thread, I think we should factor it and use it
everywhere.

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> It will always remain orders of magnitude heavier than __thread
>> variables which are a) inlined and b) should only need two memory
>> accesses at worst. Moreover, it is clearly the future, while the
>> importance of pthread_getspecific will decrease over the time. The
>> __thread storage class is C99 standard (though its implementation
>> remains a separate topic).
> You are exagerating a bit: pthread_getspecific is pretty efficient
> already (from the few things that I have timed on ARM, it is the only
> one which takes under the microsecond). That you will gain something
> with __thread is not guaranteed by the C99 standard either: in fact the
> implementation could use exactly the same functions.
 As long as we do not loose anything (performance or portability),
>>> You loose portability. But I agree that we do not care much.
>> The fallback remains - must remain in order to obtain true optimization
>> from the TLS-based version without locking out some corner-case usage.
>> Find a proposal below (on top of handle-based xeno_get_current).
>>
>> We have to set initial-exec as TLS model, otherwise we end up with a
>> dynamic lookup similar (maybe still faster, dunno) to the pthread
>> service. This model requires start-time linking, will not work with
>> dlopen (I strongly assume the linker will bail out). But I consider
>> runtime loading of Xenomai libs as a uncommon corner case, and the user
>> can still re-enable it via --without-__thread.
> 
> glibc has a separate test to know whether the tsl_model attribute is
> supported:
> 
> if test "$libc_cv_gcc___thread" = yes; then
>   dnl Check whether the compiler supports the tls_model attribute.
>   AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl
>   cat > conftest.c <<\EOF
> extern __thread int a __attribute__((tls_model ("initial-exec")));
> EOF
>   if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c
>> &AS_MESSA
> GE_LOG_FD]); then
> libc_cv_gcc_tls_model_attr=yes
>   else
> libc_cv_gcc_tls_model_attr=no
>   fi
>   rm -f conftest*])
>   if test "$libc_cv_gcc_tls_model_attr" = yes; then
> AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE)
>   fi
> fi
> 

OK, but for us the question is if we want __thread without initial-exec
at all. If not (I think so), I could add -Werror to the __thread test
and that should combine both tests into a single one, sufficient for our
use case.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> It will always remain orders of magnitude heavier than __thread
> variables which are a) inlined and b) should only need two memory
> accesses at worst. Moreover, it is clearly the future, while the
> importance of pthread_getspecific will decrease over the time. The
> __thread storage class is C99 standard (though its implementation
> remains a separate topic).
 You are exagerating a bit: pthread_getspecific is pretty efficient
 already (from the few things that I have timed on ARM, it is the only
 one which takes under the microsecond). That you will gain something
 with __thread is not guaranteed by the C99 standard either: in fact the
 implementation could use exactly the same functions.
>>> As long as we do not loose anything (performance or portability),
>> You loose portability. But I agree that we do not care much.
> 
> The fallback remains - must remain in order to obtain true optimization
> from the TLS-based version without locking out some corner-case usage.
> Find a proposal below (on top of handle-based xeno_get_current).
> 
> We have to set initial-exec as TLS model, otherwise we end up with a
> dynamic lookup similar (maybe still faster, dunno) to the pthread
> service. This model requires start-time linking, will not work with
> dlopen (I strongly assume the linker will bail out). But I consider
> runtime loading of Xenomai libs as a uncommon corner case, and the user
> can still re-enable it via --without-__thread.

glibc has a separate test to know whether the tsl_model attribute is
supported:

if test "$libc_cv_gcc___thread" = yes; then
  dnl Check whether the compiler supports the tls_model attribute.
  AC_CACHE_CHECK([for tls_model attribute], libc_cv_gcc_tls_model_attr, [dnl
  cat > conftest.c <<\EOF
extern __thread int a __attribute__((tls_model ("initial-exec")));
EOF
  if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -S -Werror conftest.c
>&AS_MESSA
GE_LOG_FD]); then
libc_cv_gcc_tls_model_attr=yes
  else
libc_cv_gcc_tls_model_attr=no
  fi
  rm -f conftest*])
  if test "$libc_cv_gcc_tls_model_attr" = yes; then
AC_DEFINE(HAVE_TLS_MODEL_ATTRIBUTE)
  fi
fi


-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 It will always remain orders of magnitude heavier than __thread
 variables which are a) inlined and b) should only need two memory
 accesses at worst. Moreover, it is clearly the future, while the
 importance of pthread_getspecific will decrease over the time. The
 __thread storage class is C99 standard (though its implementation
 remains a separate topic).
>>> You are exagerating a bit: pthread_getspecific is pretty efficient
>>> already (from the few things that I have timed on ARM, it is the only
>>> one which takes under the microsecond). That you will gain something
>>> with __thread is not guaranteed by the C99 standard either: in fact the
>>> implementation could use exactly the same functions.
>> As long as we do not loose anything (performance or portability),
> 
> You loose portability. But I agree that we do not care much.

The fallback remains - must remain in order to obtain true optimization
from the TLS-based version without locking out some corner-case usage.
Find a proposal below (on top of handle-based xeno_get_current).

We have to set initial-exec as TLS model, otherwise we end up with a
dynamic lookup similar (maybe still faster, dunno) to the pthread
service. This model requires start-time linking, will not work with
dlopen (I strongly assume the linker will bail out). But I consider
runtime loading of Xenomai libs as a uncommon corner case, and the user
can still re-enable it via --without-__thread.

Jan

---
 configure.in   |   23 +++
 include/asm-generic/bits/bind.h|   44 ++---
 include/asm-generic/bits/current.h |   13 +-
 3 files changed, 66 insertions(+), 14 deletions(-)

Index: b/configure.in
===
--- a/configure.in
+++ b/configure.in
@@ -762,6 +762,29 @@ LIBS="$LIBS -lrt"
 AC_CHECK_FUNCS([shm_open shm_unlink])
 LIBS="$save_LIBS"
 
+AC_ARG_WITH([__thread],
+   AC_HELP_STRING([--without-__thread],
+  [do not use TLS features (allows for dlopen'ing 
Xenomai libs)]),
+   [use__thread=$withval],
+   [use__thread=yes])
+
+dnl Check whether the compiler supports the __thread keyword.
+if test "x$use__thread" != xno; then
+   AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread,
+   [cat > conftest.c <<\EOF
+__thread int a __attribute__ ((tls_model ("initial-exec"))) = 42;
+EOF
+   if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c 
>&AS_MESSAGE_LOG_FD]); then
+   libc_cv_gcc___thread=yes
+   else
+   libc_cv_gcc___thread=no
+   fi
+   rm -f conftest*])
+   if test "$libc_cv_gcc___thread" = yes; then
+   AC_DEFINE(HAVE___THREAD,1,[config])
+   fi
+fi
+
 dnl
 dnl Build the Makefiles
 dnl
Index: b/include/asm-generic/bits/bind.h
===
--- a/include/asm-generic/bits/bind.h
+++ b/include/asm-generic/bits/bind.h
@@ -11,26 +11,26 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
+#ifdef HAVE___THREAD
+__thread xnhandle_t xeno_current __attribute__ ((tls_model ("initial-exec"))) =
+   XN_NO_HANDLE;
+
+static inline void __xeno_set_current(xnhandle_t current)
+{
+   xeno_current = current;
+}
+#else /* !HAVE___THREAD */
 __attribute__ ((weak))
 pthread_key_t xeno_current_key;
 __attribute__ ((weak))
 pthread_once_t xeno_init_current_key_once = PTHREAD_ONCE_INIT;
 
-__attribute__ ((weak))
-void xeno_set_current(void)
+static inline void __xeno_set_current(xnhandle_t current)
 {
-   void *kthread_cb;
-   int err;
-
-   err = XENOMAI_SYSCALL1(__xn_sys_current, &kthread_cb);
-   if (err) {
-   fprintf(stderr, "Xenomai: error obtaining handle for current "
-   "thread: %s\n", strerror(err));
-   exit(1);
-   }
-   pthread_setspecific(xeno_current_key, kthread_cb);
+   pthread_setspecific(xeno_current_key, (void *)current);
 }
 
 static void init_current_key(void)
@@ -42,6 +42,22 @@ static void init_current_key(void)
exit(1);
}
 }
+#endif /* !HAVE___THREAD */
+
+__attribute__ ((weak))
+void xeno_set_current(void)
+{
+   xnhandle_t current;
+   int err;
+
+   err = XENOMAI_SYSCALL1(__xn_sys_current, ¤t);
+   if (err) {
+   fprintf(stderr, "Xenomai: error obtaining handle for current "
+   "thread: %s\n", strerror(err));
+   exit(1);
+   }
+   __xeno_set_current(current);
+}
 
 #ifdef CONFIG_XENO_FASTSYNCH
 __attribute__ ((weak))
@@ -175,7 +191,9 @@ xeno_bind_skin(unsigned skin_magic, cons
sa.sa_flags = 0;
sigaction(SIGXCPU, &sa, NULL);
 
+#ifndef HAVE___THREAD
pthread_once(&xeno_init_current_key_once, &init_current_key);
+#endif /* !HAVE___THREAD */
 

Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> It will always remain orders of magnitude heavier than __thread
>>> variables which are a) inlined and b) should only need two memory
>>> accesses at worst. Moreover, it is clearly the future, while the
>>> importance of pthread_getspecific will decrease over the time. The
>>> __thread storage class is C99 standard (though its implementation
>>> remains a separate topic).
>> You are exagerating a bit: pthread_getspecific is pretty efficient
>> already (from the few things that I have timed on ARM, it is the only
>> one which takes under the microsecond). That you will gain something
>> with __thread is not guaranteed by the C99 standard either: in fact the
>> implementation could use exactly the same functions.
> 
> As long as we do not loose anything (performance or portability),

You loose portability. But I agree that we do not care much.

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> looking into the "xeno_in_primary_mode" thing I wondered how to make the
>> thread state quickly retrievable. Going via pthread_getspecific as we do
>> for xeno_get_current appears logical - but not optimal. Though
>> getspecific is optimized for speed, it remains a function call, a few
>> sanity checks, and only finally a TLS variable access. That could be
>> achieved in a much lighter way by using a __thread variable.
>>
>> But can we assume that all target we support also support the __thread
>> storage class? TLS is surely mandatory now: I assume pthread_getspecific
>> would become non-RT safe without it, right? Is there anything we
>> can/must check for during configure to verify __thread support?
> I really think that this optimization is not worth the trouble. Anyway,
 As long as we cannot specify the amount of "trouble", it's hard to
 decide. Me current feeling is that it should rather simplify the
 implementation + save us quite a few ops in the fast path (even more
 with upcoming thread-mode check).
>>> The trouble is to make some reliable detections in the configure script,
>>> so that the user will know early that Xenomai can not work with its
>>> current toolchain. And to make this detection work with uclibc as well
>>> as with glibc, gcc 4 versus gcc 3, etc...
>> Will work out a test program for configure.
> 
> glibc does that:
> 
> AC_ARG_WITH([__thread],
> AC_HELP_STRING([--without-__thread],
>[do not use TLS features even when supporting 
> them]),
> [use__thread=$withval],
> [use__thread=yes])
> 
> dnl Check whether the compiler supports the __thread keyword.
> if test "x$use__thread" != xno; then
>   AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread,
>   [cat > conftest.c <<\EOF
> __thread int a = 42;
> EOF
>   if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c 
> >&AS_MESSAGE_LOG_F
> D]); then
> libc_cv_gcc___thread=yes
>   else
> libc_cv_gcc___thread=no
>   fi
>   rm -f conftest*])
>   if test "$libc_cv_gcc___thread" = yes; then
> AC_DEFINE(HAVE___THREAD)
>   fi
> else
>   libc_cv_gcc___thread=no
> fi
> 

Cool, thanks. Will play with this for xeno_get_current soon. Then we can
review and test if it works and makes sense based on some real code.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> It will always remain orders of magnitude heavier than __thread
>> variables which are a) inlined and b) should only need two memory
>> accesses at worst. Moreover, it is clearly the future, while the
>> importance of pthread_getspecific will decrease over the time. The
>> __thread storage class is C99 standard (though its implementation
>> remains a separate topic).
> 
> You are exagerating a bit: pthread_getspecific is pretty efficient
> already (from the few things that I have timed on ARM, it is the only
> one which takes under the microsecond). That you will gain something
> with __thread is not guaranteed by the C99 standard either: in fact the
> implementation could use exactly the same functions.

As long as we do not loose anything (performance or portability), there
is no point in sticking with pthread_getspecific. At least on x86 the
advantage is easily visible (==no more function calls). That's also due
to the lighter concept of __thread: it pushes the key resolution from
runtime to compile/link-time.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Gilles Chanteperdrix wrote:
 Jan Kiszka wrote:
> Hi,
>
> looking into the "xeno_in_primary_mode" thing I wondered how to make the
> thread state quickly retrievable. Going via pthread_getspecific as we do
> for xeno_get_current appears logical - but not optimal. Though
> getspecific is optimized for speed, it remains a function call, a few
> sanity checks, and only finally a TLS variable access. That could be
> achieved in a much lighter way by using a __thread variable.
>
> But can we assume that all target we support also support the __thread
> storage class? TLS is surely mandatory now: I assume pthread_getspecific
> would become non-RT safe without it, right? Is there anything we
> can/must check for during configure to verify __thread support?
 I really think that this optimization is not worth the trouble. Anyway,
>>> As long as we cannot specify the amount of "trouble", it's hard to
>>> decide. Me current feeling is that it should rather simplify the
>>> implementation + save us quite a few ops in the fast path (even more
>>> with upcoming thread-mode check).
>> The trouble is to make some reliable detections in the configure script,
>> so that the user will know early that Xenomai can not work with its
>> current toolchain. And to make this detection work with uclibc as well
>> as with glibc, gcc 4 versus gcc 3, etc...
> 
> Will work out a test program for configure.

glibc does that:

AC_ARG_WITH([__thread],
AC_HELP_STRING([--without-__thread],
   [do not use TLS features even when supporting them]),
[use__thread=$withval],
[use__thread=yes])

dnl Check whether the compiler supports the __thread keyword.
if test "x$use__thread" != xno; then
  AC_CACHE_CHECK([for __thread], libc_cv_gcc___thread,
  [cat > conftest.c <<\EOF
__thread int a = 42;
EOF
  if AC_TRY_COMMAND([${CC-cc} $CFLAGS $CPPFLAGS -c conftest.c >&AS_MESSAGE_LOG_F
D]); then
libc_cv_gcc___thread=yes
  else
libc_cv_gcc___thread=no
  fi
  rm -f conftest*])
  if test "$libc_cv_gcc___thread" = yes; then
AC_DEFINE(HAVE___THREAD)
  fi
else
  libc_cv_gcc___thread=no
fi



-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> It will always remain orders of magnitude heavier than __thread
> variables which are a) inlined and b) should only need two memory
> accesses at worst. Moreover, it is clearly the future, while the
> importance of pthread_getspecific will decrease over the time. The
> __thread storage class is C99 standard (though its implementation
> remains a separate topic).

You are exagerating a bit: pthread_getspecific is pretty efficient
already (from the few things that I have timed on ARM, it is the only
one which takes under the microsecond). That you will gain something
with __thread is not guaranteed by the C99 standard either: in fact the
implementation could use exactly the same functions.

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Gilles Chanteperdrix wrote:
>>> Jan Kiszka wrote:
 Hi,

 looking into the "xeno_in_primary_mode" thing I wondered how to make the
 thread state quickly retrievable. Going via pthread_getspecific as we do
 for xeno_get_current appears logical - but not optimal. Though
 getspecific is optimized for speed, it remains a function call, a few
 sanity checks, and only finally a TLS variable access. That could be
 achieved in a much lighter way by using a __thread variable.

 But can we assume that all target we support also support the __thread
 storage class? TLS is surely mandatory now: I assume pthread_getspecific
 would become non-RT safe without it, right? Is there anything we
 can/must check for during configure to verify __thread support?
>>> I really think that this optimization is not worth the trouble. Anyway,
>> As long as we cannot specify the amount of "trouble", it's hard to
>> decide. Me current feeling is that it should rather simplify the
>> implementation + save us quite a few ops in the fast path (even more
>> with upcoming thread-mode check).
> 
> The trouble is to make some reliable detections in the configure script,
> so that the user will know early that Xenomai can not work with its
> current toolchain. And to make this detection work with uclibc as well
> as with glibc, gcc 4 versus gcc 3, etc...

Will work out a test program for configure.

> 
> Besides, pthread_getspecific can be implemented pretty efficiently in
> user-space without __thread support: using a hash table would be enough.
>  So, if we rely on pthread_getspecific, we do not have to know if ptd
> are implemented with some hardware trick.

It will always remain orders of magnitude heavier than __thread
variables which are a) inlined and b) should only need two memory
accesses at worst. Moreover, it is clearly the future, while the
importance of pthread_getspecific will decrease over the time. The
__thread storage class is C99 standard (though its implementation
remains a separate topic).

> 
>>> I have one question: is an implementation guaranteed to support more
>>> than one __thread variable? Because from ARM implementation I would say
>>> that ARM has only one __thread variable.
>> That would be weird - there is no such limitation known to me. Anyway,
>> you could easily verify this with a simple test program I guess. /me
>> also wonders how the glibc/NPTL is maintaining certain per-thread
>> variables (and there are surely > 1) internally.
> 
> I would say the __thread variable is used to store an array, which is,
> according to what Philippe said yesterday, turned into a multilevel
> structure when creating more than 32 keys.
> 
> At the hardware level, I am pretty sure ARM has only one per thread
> token. However I do not know how it is used to implement __thread
> variable (or variables).

Will look into some ELF definitions on this, but that there is a single
hw token (CPU register) for accessing the TLS root is surely not
uncommon. The rest is linker magic, specifically when it comes to
dealing with offsets of cross-lib TLS variables.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Gilles Chanteperdrix wrote:
>> Jan Kiszka wrote:
>>> Hi,
>>>
>>> looking into the "xeno_in_primary_mode" thing I wondered how to make the
>>> thread state quickly retrievable. Going via pthread_getspecific as we do
>>> for xeno_get_current appears logical - but not optimal. Though
>>> getspecific is optimized for speed, it remains a function call, a few
>>> sanity checks, and only finally a TLS variable access. That could be
>>> achieved in a much lighter way by using a __thread variable.
>>>
>>> But can we assume that all target we support also support the __thread
>>> storage class? TLS is surely mandatory now: I assume pthread_getspecific
>>> would become non-RT safe without it, right? Is there anything we
>>> can/must check for during configure to verify __thread support?
>> I really think that this optimization is not worth the trouble. Anyway,
> 
> As long as we cannot specify the amount of "trouble", it's hard to
> decide. Me current feeling is that it should rather simplify the
> implementation + save us quite a few ops in the fast path (even more
> with upcoming thread-mode check).

The trouble is to make some reliable detections in the configure script,
so that the user will know early that Xenomai can not work with its
current toolchain. And to make this detection work with uclibc as well
as with glibc, gcc 4 versus gcc 3, etc...

Besides, pthread_getspecific can be implemented pretty efficiently in
user-space without __thread support: using a hash table would be enough.
 So, if we rely on pthread_getspecific, we do not have to know if ptd
are implemented with some hardware trick.

> 
>> I have one question: is an implementation guaranteed to support more
>> than one __thread variable? Because from ARM implementation I would say
>> that ARM has only one __thread variable.
> 
> That would be weird - there is no such limitation known to me. Anyway,
> you could easily verify this with a simple test program I guess. /me
> also wonders how the glibc/NPTL is maintaining certain per-thread
> variables (and there are surely > 1) internally.

I would say the __thread variable is used to store an array, which is,
according to what Philippe said yesterday, turned into a multilevel
structure when creating more than 32 keys.

At the hardware level, I am pretty sure ARM has only one per thread
token. However I do not know how it is used to implement __thread
variable (or variables).

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Jan Kiszka
Gilles Chanteperdrix wrote:
> Jan Kiszka wrote:
>> Hi,
>>
>> looking into the "xeno_in_primary_mode" thing I wondered how to make the
>> thread state quickly retrievable. Going via pthread_getspecific as we do
>> for xeno_get_current appears logical - but not optimal. Though
>> getspecific is optimized for speed, it remains a function call, a few
>> sanity checks, and only finally a TLS variable access. That could be
>> achieved in a much lighter way by using a __thread variable.
>>
>> But can we assume that all target we support also support the __thread
>> storage class? TLS is surely mandatory now: I assume pthread_getspecific
>> would become non-RT safe without it, right? Is there anything we
>> can/must check for during configure to verify __thread support?
> 
> I really think that this optimization is not worth the trouble. Anyway,

As long as we cannot specify the amount of "trouble", it's hard to
decide. Me current feeling is that it should rather simplify the
implementation + save us quite a few ops in the fast path (even more
with upcoming thread-mode check).

> I have one question: is an implementation guaranteed to support more
> than one __thread variable? Because from ARM implementation I would say
> that ARM has only one __thread variable.

That would be weird - there is no such limitation known to me. Anyway,
you could easily verify this with a simple test program I guess. /me
also wonders how the glibc/NPTL is maintaining certain per-thread
variables (and there are surely > 1) internally.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-14 Thread Gilles Chanteperdrix
Jan Kiszka wrote:
> Hi,
> 
> looking into the "xeno_in_primary_mode" thing I wondered how to make the
> thread state quickly retrievable. Going via pthread_getspecific as we do
> for xeno_get_current appears logical - but not optimal. Though
> getspecific is optimized for speed, it remains a function call, a few
> sanity checks, and only finally a TLS variable access. That could be
> achieved in a much lighter way by using a __thread variable.
> 
> But can we assume that all target we support also support the __thread
> storage class? TLS is surely mandatory now: I assume pthread_getspecific
> would become non-RT safe without it, right? Is there anything we
> can/must check for during configure to verify __thread support?

I really think that this optimization is not worth the trouble. Anyway,
I have one question: is an implementation guaranteed to support more
than one __thread variable? Because from ARM implementation I would say
that ARM has only one __thread variable.

-- 
 Gilles.

___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


[Xenomai-core] __thread instead of pthread_get/setspecific

2008-10-13 Thread Jan Kiszka
Hi,

looking into the "xeno_in_primary_mode" thing I wondered how to make the
thread state quickly retrievable. Going via pthread_getspecific as we do
for xeno_get_current appears logical - but not optimal. Though
getspecific is optimized for speed, it remains a function call, a few
sanity checks, and only finally a TLS variable access. That could be
achieved in a much lighter way by using a __thread variable.

But can we assume that all target we support also support the __thread
storage class? TLS is surely mandatory now: I assume pthread_getspecific
would become non-RT safe without it, right? Is there anything we
can/must check for during configure to verify __thread support?

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core