http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55812
Bug #: 55812 Summary: Unnecessary TLS accesses Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: gli...@gcc.gnu.org Target: x86_64-linux-gnu Hello, TLS accesses are expensive, so as much as possible gcc should copy the address to a local variable and use that instead. The following example may not be very good, I am just trying to illustrate the issue. #include <vector> thread_local std::vector<int> v; int main(){ for(long i=0;i<400000000;++i){ v.push_back(i); } return v.size(); } compiled with g++ -std=c++11 -O2 -Wall -DNDEBUG. If I remove "thread_local", the speed-up is about 20%. It seems to me that the compiler should get the address of v once at the beginning of main and use that for the rest of the function, and thus the performance difference should be negligible. If I add "static" in front of "thread_local", the program fails to link, but my gcc snapshot is a bit old (Nov 20) and I think I've already seen that reported. I was surprised not to find any compiler option that would disable threads, so I could write thread_local but not pay the price when compiling a single-threaded program. Without -pthread, glibc uses cheap thread-unsafe functions, but I still pay for TLS.