David McDaniel wrote:
I've heard various people make the claim that using __thread variables is *much* faster than pthread_get/setspecific(). However, when I look at the code emitted as well as the implementation of the get/setspecific api, it appears to me that _thread variables might be somewhat faster in binaries but actually might be slower in shared libraries. Does anyone have any fine grained experience that would help in determining the tradeoffs involved? Thanks

Well, this is one of those engineering trade-off problems....

__thread variables are indeed faster for binaries.
pthread_get/set are pretty fast, esp. if you don't need
a lot of different keys.  Note that relative speeds depend
on lots of factors such as machine architecture, processor
type, etc, so these are generalizations.

One significant advantage of the posix apis is that the only
space allocated per thread per key is a pointer; this is not
the case for __thread declarations of space. Of course, nothing
prevents one from always making
__thread declarations simply pointers to allocated storage.

In the end, a test (evaluating exactly how you're doing to use
this feature) is worth 1000 expert or more (or not so expert)
opinions.

A quick check with libmicro indicates that pthread_getspecific
takes less than 8 nanoseconds on my 2 Ghz Opteron, so cache misses,
etc, are far more important here than code path.  Still, if your
code reads like this:

extern pthread_key_t key;

int
foo()
{
        return (((struct bar *)pthread_getspecific(key))->foobar);
}

you're going to want to consider passing the thread_specific
data explicitly for peak performance.

- Bart



--
Bart Smaalders                  Solaris Kernel Performance
[EMAIL PROTECTED]               http://blogs.sun.com/barts
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to