David McDaniel wrote:
I've heard various people make the claim that using __thread variables is *much* faster than pthread_get/setspecific(). However, when I look at the code emitted as well as the implementation of the get/setspecific api, it appears to me that _thread variables might be somewhat faster in binaries but actually might be slower in shared libraries. Does anyone have any fine grained experience that would help in determining the tradeoffs involved? Thanks
Well, this is one of those engineering trade-off problems.... __thread variables are indeed faster for binaries. pthread_get/set are pretty fast, esp. if you don't need a lot of different keys. Note that relative speeds depend on lots of factors such as machine architecture, processor type, etc, so these are generalizations. One significant advantage of the posix apis is that the only space allocated per thread per key is a pointer; this is not the case for __thread declarations of space. Of course, nothing prevents one from always making __thread declarations simply pointers to allocated storage. In the end, a test (evaluating exactly how you're doing to use this feature) is worth 1000 expert or more (or not so expert) opinions. A quick check with libmicro indicates that pthread_getspecific takes less than 8 nanoseconds on my 2 Ghz Opteron, so cache misses, etc, are far more important here than code path. Still, if your code reads like this: extern pthread_key_t key; int foo() { return (((struct bar *)pthread_getspecific(key))->foobar); } you're going to want to consider passing the thread_specific data explicitly for peak performance. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org