[Bug middle-end/26461] liveness of thread local references across function calls

2017-09-04 Thread cbcode at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

cbcode at gmail dot com changed:

   What|Removed |Added

 CC||cbcode at gmail dot com

--- Comment #19 from cbcode at gmail dot com ---
As a compromise, I would like to suggest that '__thread volatile' or 'volatile
__thread' always reloads the thread-local storage while __thread without
volatile keeps the current caching behavior.

The C and C++ standards recognize that stack-switching exists and indicate
existing situations where variables need to be volatile-qualified in order to
survive a task-switch, see e.g.
http://en.cppreference.com/w/cpp/utility/program/setjmp .

[Bug middle-end/26461] liveness of thread local references across function calls

2017-07-13 Thread stephan at tobies dot info
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

--- Comment #18 from Stephan Tobies  ---
+1 - I have a use case where a QuickThread is migrated from one pthread to
another. TLS would need to be re-fetched after this migration, but is not due
to CSE optimizations. Having a way to disable this optimization, either locally
or on a per-file basis would be very useful!

[Bug middle-end/26461] liveness of thread local references across function calls

2017-03-31 Thread tmyklebu at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

Tor Myklebust  changed:

   What|Removed |Added

 CC||tmyklebu at gmail dot com

--- Comment #17 from Tor Myklebust  ---
(In reply to Jakub Jelinek from comment #14)
> Even if we have an option that avoids CSE of TLS addresses across function
> calls (or attribute for specific function), what would you expect to happen
> when user takes address of TLS variables himself:
> __thread int a;
> void
> foo ()
> {
>   int *p = 
>   *p = 10;
>   bar (); // Changes threads
>   *p += 10;
> }
> ?  The address can be stored anywhere, so the compiler can't do anything
> with it.  And of course such an option would cause major slowdown of
> anything using TLS, not only it would need to stop CSEing TLS addresses
> late, but stop treating TLS addresses as constant in all early optimizations
> as well.

When you take , gcc docs specify that you get the address of the running
thread's instance of a, which is a reasonable pointer for any thread to use as
long as the running thread is alive.  So everyone already expects that code
like this:

__thread int a;

void *bar(void *p) { printf("%i %i\n", *(int *)p, a); }
int main() {
  a = 42;
  pthread_t pth;
  pthread_create(, bar, );
  pthread_join(pth, 0);
}

should print "42 0" as p should point to the main thread's instance of a while
the reference of a in the third argument to printf in bar should reference the
child thread's instance of a, which is zero because TLS is initialised to zero.

It seems that your example:

__thread int a;

void foo() {
  int *p = 
  *p = 10;
  bar (); // Changes threads
  *p += 10;
}

must twice modify the instance of a in the thread that started running foo,
which is different behaviour from:

__thread int a;

void baz() {
  int *p = 
  *p = 10;
  bar (); // Changes threads
  p = 
  *p += 10;
}

which must modify the instance of a in the thread that started running baz()
once and the instance of a that finishes running baz() once, since bar may
change the value at %fs:0 by changing threads.

Perhaps there is a more serious problem with this whole idea if signal handlers
are permitted to twiddle the running thread.

[Bug middle-end/26461] liveness of thread local references across function calls

2017-03-31 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

--- Comment #16 from Richard Biener  ---
Implementing a switch that assumes that function calls (what about async
events??) can switch threads to the effect that the location of TLS variables
change would be a challenge.  Basically you have to implement sth that prevents
assumptions of a variables location inside a function, not only its value. 
That's currently done nowhere and I don't know how to model such kind of
dependency.

So I don't think it's easy to model.  It might be possible to put in place
more strict constraints to avoid the issue, like instructing the compiler
that it can't ever take the address of a __tls object.  But then array
accesses are modeled as  + index in the language so I can't see how this
would work reliably.

You'd have to expose __tls'ness by lowering that very early, not only during
RTL expansion.  That is, place TLS base reg loads and do accesses via them. 
Then
make sure calls clobber that base reg load.  So put all TLS data into sth like
a static frame where you'd have a global variable pointing to that.  I expect
this to be not-fun(TM) for performance.

[Bug middle-end/26461] liveness of thread local references across function calls

2017-03-31 Thread torvald at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

--- Comment #15 from torvald at gcc dot gnu.org ---
From a ISO C/C++ conformance perspective, this is not a bug:

* Pre-C11/C++11, threading was basically undefined.

* Starting with C11 and C++11, "threads of execution" (as they're called in
C++11 and more recent, often abbreviated in the standard as "threads") are the
lowest level of how execution is modelled.  They are defined as single flows of
control.  Nothing is promised about any resources that may be used to implement
threads of execution (e.g., similar to the "execution context" notion mentioned
in comment #10).

* Thread-specific state is bound to a particular thread of execution (e.g.,
regarding lifetime).  A thread of execution accessing a __thread variable
accesses the thread-specific state of this thread of execution in the abstract
machine.  (Of course, one can still access other threads's thread-specific
state through pointers initially provided by those threads.)

* Only the standards' mechanisms can create threads of execution.  There is
nothing in these standards that would break up the concept of a single flow of
control (ie, that what "looks" like a single flow of control in a program is
actually not always the same thread of execution when executed).  (Also note
that fork/join parallelism is not a counter-example to this.)


That said, I can see that this doesn't match nicely with the fact that we have
things like swapcontext elsewhere.  Do we have any data on what the performance
impact where if the compiler would assume that function calls to functions it
cannot analyze could switch the thread of execution.  Data for several
architectures and different TLS models would be helpful.

Coming back to C++, currently I think there is only one Technical Specification
(TS) that allows breaking up a single flow of control: .then() in the
Concurrency TS (whose specification is certainly not ready for the standard). 
Maybe the Networking TS has something similar, but I can't remember right now. 
There are a few proposals that either allow it (Task Blocks, targeting
Parallelism TS version 2) or require it for good performance ("stackful"
coroutines).
The "stackless" coroutines in the upcoming Coroutines TS are not really
affected by thread-specific state; it's not the coroutines code that would
potentially switch threads, but any runtime that would supply a particular
Awaitable implementation that then may switch threads (e.g., if using .then()).
 The Coroutines does not specify any actual runtime.

However, I don't think the existence of some C++ proposals that may switch
threads necessarily means that the compiler would have to take this into
account when those proposals should become part of the standard.  The other way
this could play out is that the standard simply makes using thread-specific
state undefined for those threads of execution that can use these proposals.

Overall, I think it may be useful to experiment with a command line switch or
something like that, primarily to assess how big the performance degradation
would be caused by having the compiler assume that threads can switch on
function calls.

(In reply to Jakub Jelinek from comment #14)
> Even if we have an option that avoids CSE of TLS addresses across function
> calls (or attribute for specific function), what would you expect to happen
> when user takes address of TLS variables himself:
> __thread int a;
> void
> foo ()
> {
>   int *p = 
>   *p = 10;
>   bar (); // Changes threads
>   *p += 10;
> }

I think that p would point to the initial thread's TLS, even after bar().   The
user would be wrong to assume that it still is the initial thread's object "a"
after having called bar().

[Bug middle-end/26461] liveness of thread local references across function calls

2017-03-31 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||torvald at gcc dot gnu.org

--- Comment #14 from Jakub Jelinek  ---
Even if we have an option that avoids CSE of TLS addresses across function
calls (or attribute for specific function), what would you expect to happen
when user takes address of TLS variables himself:
__thread int a;
void
foo ()
{
  int *p = 
  *p = 10;
  bar (); // Changes threads
  *p += 10;
}
?  The address can be stored anywhere, so the compiler can't do anything with
it.  And of course such an option would cause major slowdown of anything using
TLS, not only it would need to stop CSEing TLS addresses late, but stop
treating TLS addresses as constant in all early optimizations as well.

[Bug middle-end/26461] liveness of thread local references across function calls

2017-03-30 Thread ouster at cs dot stanford.edu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

John Ousterhout  changed:

   What|Removed |Added

 CC||ouster at cs dot stanford.edu

--- Comment #13 from John Ousterhout  ---
Kernel threads are great, and it may seem like there's no need for user-level
threads now that kernel threads are universally available. But layering
user-level threads on top of kernel threads can offer a speedup of at least 10x
for common operations. The fact that so many different people have responded on
this issue and issue 26461 is pretty good evidence that it can be useful to do
"context switching" on top of kernel threads. My research group has recently
run up against this same problem. Thread-local variables (i.e.
kernel-thread-locals) are still useful in this environment (for example, we use
one to keep track of the user thread that's loaded on the current kernel
thread).

One of the great things about gcc is that it has supported a huge variety of
applications and design styles; it would be a shame for gcc to preclude this
particular class of applications.

Is it unreasonably difficult to add a mechanism to force gcc to flush cached
thread-local addresses? Those of us using the mechanism would be happy to pay a
small performance penalty for it, but presumably that won't affect applications
that don't use the mechanism.

Please reconsider?

[Bug middle-end/26461] liveness of thread local references across function calls

2016-04-01 Thread andy at miniciv dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

Andy Robbins  changed:

   What|Removed |Added

 CC||andy at miniciv dot com

--- Comment #12 from Andy Robbins  ---
Cross posting to help others who need this feature. From a similar ticket on
LLVM, about adding the /GT flag (which fixes the OP's problem, while being
optional, and MSVC supports this):


[...] The option is /GT as specified in the title, and it is not enabled by
default.

There's one particular use case where this kind of option is really important:
a fiber-based job system, something that has been used in video game
development for multi-core machines.

In a system like this, it's common for one job (occupying a fiber) to be paused
(ie: swapped for another fiber in the thread it is running) while it waits for
some other work to finish, and then be resumed (ie: swapped to) from the next
available worker thread, which will be essentially a random worker thread. The
whole point here is to distribute jobs to all available CPU cores evenly and
automatically, so this TLS situation is inevitable and by design.

Yes, TLS is slower in this use case, but it is the correct behaviour. Not
having the /GT flag means having to manually inspect all code and roll a custom
replacement TLS, which is a considerable effort.

Please reconsider having this option.


Reference: https://llvm.org/bugs/show_bug.cgi?id=19177

[Bug middle-end/26461] liveness of thread local references across function calls

2016-03-08 Thread gpderetta at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461

--- Comment #11 from Giovanni Deretta  ---
In the last few years it has been clear that threads are not enough and
coroutines have seen a resurgence in many languages. Go, which is directly
supported by GCC, make them a first class construct; boost has had them for a
while (and it is affecte by this issue); the HPX library uses them to great
effects for HPC work; it even seem that C++ standard will eventually include
them officially [1].

Any chance this resolution will be reviewed? 

Note that an opt-in flag would be enough, and it should have very little effect
on x86, where the compiler doesn't bother to cache TLS address computations, as
it has fast TLS access, unless the address is explicitly taken.

[1] If MS get it its way, stackless coroutines shouldn't be affected by this
issue as the switch point can be statically identified by the compiler. But
there is still a chance that we will get proper non-crippled stackfull
coroutines.

[Bug middle-end/26461] liveness of thread local references across function calls

2010-08-23 Thread dwood at sybase dot com


--- Comment #8 from dwood at sybase dot com  2010-08-23 21:13 ---
I believe this is a bug or a serious oversite in understanding the need for
support of USER thread local storage.  There are two kinds of software
threads; a) kernel threads(AKA LWP's on Solaris) scheduled by the OS; and b)
user threads scheduled by the user and/or threading library.  Databases such as
Informix and Sybase both manage their own user threads(1 per client
connection).  These run on a small pool of engines which are either kernel
threads or processes(No more than one per cpu core).  The user threads do
cooperative scheduling by calling their own yield implementation and they never
yield in a critical section.  These products and perhaps others are not little
weird cases.

These user threads can migrate between kernel threads as load balancing occurs.
 Of course, this requires that the implementation of the user level thread
context switch must save/restore USER TLS variables from/to the TLS areas of
the kernel threads involved.  If the model where M user threads are handled
across N kernel threads is valid then the address of USER TLS variables can
change across a context switch.

We, Sybase, have run into the same problem on Solaris-Sparc and are evaluating
whether gcc __thread on x86_64 will have the same problem.

Of course, the database folks often hear from the OS folks the common reply
like: Why are you doing this?  Just trust the OS and leave it all to us.  But
for us we have to trust AIX, Linux, Solaris, HP-UX, Windows, etc.  This is the
same whether we are talking about using USER threads or the dreaded Linux
O_DIRECT debate.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2010-08-23 Thread pinskia at gcc dot gnu dot org


--- Comment #9 from pinskia at gcc dot gnu dot org  2010-08-23 21:19 ---
I think you should read:
http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/C99-Thread_002dLocal-Edits.html#C99-Thread_002dLocal-Edits

--- CUT ---
In GCC's case the thread is only can be created via pthread.

Note C++0x defines a threading interface and such.  I know of no implementation
that will allow the use of user threads really.  Really I think it is wrong
to even think about using user threads any more.  The main reason why they
existed was to support OS's which don't have threads but those don't really
exist any more.  Not to mention, support for things like TLS is only only for
OS provided threads.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||WONTFIX


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2010-08-23 Thread dwood at sybase dot com


--- Comment #10 from dwood at sybase dot com  2010-08-24 03:31 ---
(In reply to comment #9)
I don't disagree with the thread local writeup however it is lacking in
clarity.  A flow of control must be associated with an execution context. 
The existance of getcontext/setcontext allows both:

1) One flow of control to switch to another flow of control within a single
   execution context;
2) and, a flow of control to move from one execution context to another.

In fact, __thread variables are not actual bound to a flow of control but to
a specific execution context, part of which is usually some kind of thread_t
structure associated with a kernel thread.  If they where bound to a logical
flow of control then we wouldn't even need this discussion.  Nowhere above did
I refer to user thread.

The above should be consistant with your comment that TLS is supported for OS
threads only.  But the problem isn't the OS but the compiler.

When the address-of operator is applied to a thread-local variable, it is
evaluated at run-time and returns the address of the current thread's
instance of that variable.

The question is which OS thread is the current OS thread when the address is
obtained to actually fetch the current thread local value.  setcontext() avoids
changing %fs on Intel and %g7 on Sparc as it is really restoring a suspended
flow of control context onto a kernel thread's context as referenced by
%fs/%g7.  Caching these across function calls in a MT program is faultly.

Just as users of compiler's shouldn't depend on implementation defined behavior
neither should compiler writers assume OS implementation defined behavior.  OS
support for threads/setcontext existed prior to C's decision to provide basic
efficient access to TLS variables.  If some OS implementations of setcontext
allow the context to be pushed onto a different kernel thread then it was
gotten from then caching the current kernel thread handle or TLS address across
function calls is wrong.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2009-03-19 Thread gpderetta at gmail dot com


--- Comment #7 from gpderetta at gmail dot com  2009-03-19 12:14 ---
Hi, I'm the author of Boost.Coroutine (not yet part of boost, but one day...).

I have the exact same problem: gcc caches the address of TLS variables across
function calls which breaks when coroutines move from one thread to another.

Note that in my case I'm definitely *not* reinventing threads in user space.
Coroutines are for different use cases than threads (i.e. when you do not need
preemption but simply a way to organize event driven code). One use of
boost.coroutine is on top of boost.asio.

Posix has both threads and swapcontext and nowhere it says that swapcontext 
can't be used in threaded applications. In fact is simply states that the saved
context is restored after a call to setcontext, and IMHO any posix compatible
compiler should support this.

FWIW The microsoft c++ compiler has the /GT (fiber safe TLS) flag to prevent
exactly this kind of optimizations. Probably GCC should support something like
that too.

See: 
http://www.crystalclearsoftware.com/soc/coroutine/coroutinecoroutine_thread.html

for details.

Finally I see the problem even with plain pointers and references, not only
arrays, at least with gcc4.3:

#include ucontext.h

void bar(int);

__thread int x = 0;

void foo(ucontext_toucp, ucontext_t ucp) {
bar(x);
swapcontext(oucp, ucp);
bar(x);
}

Compiles down to this (with -O3, on x86_64):

:_Z3fooR8ucontextS0_:
movq%fs:0, %rax
movq%rbp, -16(%rsp)
movq%rbx, -24(%rsp)
movq%r12, -8(%rsp)
movq%rsi, %rbx
subq$24, %rsp
movq%rdi, %r12
leaqx...@tpoff(%rax), %rbp
movq%rbp, %rdi
call_Z3barRi
movq%r12, %rdi
movq%rbx, %rsi
callswapcontext
movq%rbp, %rdi
movq(%rsp), %rbx
movq8(%rsp), %rbp
movq16(%rsp), %r12
addq$24, %rsp
jmp _Z3barRi


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread pinskia at gcc dot gnu dot org


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
  Component|c   |middle-end


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread yichen dot xie at gmail dot com


--- Comment #2 from yichen dot xie at gmail dot com  2006-02-24 22:12 
---
(In reply to comment #1)
 It seems like you are trying to
 deal with your own threading system instead of allowing the OS do its work.
 

This is indeed what I am trying to do, and C seems to be the perfect language
for doing this. I agree it's not common to be switching thread contexts across
function calls, but I don't think it should be prohibited by GCC.

In my case, h simply saves its context, put it on the ready queue, and waits
for another thread to pick it up and resume execution with the new thread local
copy of array. So the question is there a way to force recompuation of
array[0] after h? Is it reasonable to request for a mechanism to force
recomputation of array[0]?

BTW, the solution IMO is simple: either make sure all thread local values and
addresses (the problem seems to exist only with arrays, the compiler is more
conservative dealing with pointers, etc) are dead after a function call, or add
a mechanism (__attribute__((thread_switch))?) to force it.


-- 

yichen dot xie at gmail dot com changed:

   What|Removed |Added

 CC||yichen dot xie at gmail dot
   ||com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread pinskia at gcc dot gnu dot org


--- Comment #3 from pinskia at gcc dot gnu dot org  2006-02-24 22:38 ---
(In reply to comment #2)
 (In reply to comment #1)
  It seems like you are trying to
  deal with your own threading system instead of allowing the OS do its work.
  
 
 This is indeed what I am trying to do, and C seems to be the perfect language
 for doing this. I agree it's not common to be switching thread contexts across
 function calls, but I don't think it should be prohibited by GCC.

Why not let the OS do its job?  I still don't understand that idea.
Actually no it is not responable in general since GCC assumes the address is
invariant which is correct except for your little weird case.  What function
are you using to save/restore the context?  There are no standard C function
which allows for that.  Even get/setcontext are POSIX but I doubt they support
across threads correctly anyways.  I know setjmp/longjmp don't for sure.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread yichen dot xie at gmail dot com


--- Comment #4 from yichen dot xie at gmail dot com  2006-02-24 23:06 
---
 Why not let the OS do its job?  I still don't understand that idea.

It's a thread library that builds on top of pthreads, so yes, OS is doing its
job, and we're doing more on top of that. C is a natural choice for us. Does it
help if we rename h to reschedule?

__thread array[1];
for (;;) {
  // do something with array
  reschedule();
}

 Actually no it is not responable in general since GCC assumes the address is
 invariant which is correct except for your little weird case.  What function

Well, it may be a bit weird for any other language, but not C (IMO). It's
definitely not weird if you compare it to the kernel, which is largely written
in C. 

Thread local objects are invariant within a thread, not across threads. I think
it could be dangerous for gcc to assume that function calls preserve thread
context, esp. when the function is written in assembly. At least there should
be a way to tell the compiler not to assume that, given C is a low-level
language where everything should be possible.

 are you using to save/restore the context?  There are no standard C function

Very simple assembly code that stores/restores a few registers, including %esp.
C is a low level language, and it should interoperate well not only with
standard C functions, but also with assembly or any other weird functions.
That's what C is good for, isn't it?

 which allows for that.  Even get/setcontext are POSIX but I doubt they support
 across threads correctly anyways.  I know setjmp/longjmp don't for sure.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread pinskia at gcc dot gnu dot org


--- Comment #5 from pinskia at gcc dot gnu dot org  2006-02-25 00:02 ---
ISO C is not your normal low level language any more.  It actually tries to be
a high level language.

So this is not a bug.


-- 

pinskia at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution||INVALID


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461



[Bug middle-end/26461] liveness of thread local references across function calls

2006-02-24 Thread yichen dot xie at gmail dot com


--- Comment #6 from yichen dot xie at gmail dot com  2006-02-25 01:55 
---
(In reply to comment #5)
 ISO C is not your normal low level language any more.  It actually tries to be
 a high level language.
 
 So this is not a bug.
 

I still don't think it's a good idea to treat thread local array addresses as
invariant. If you look at the implementation of getcontext/swapcontext, they
intentionally left gs segment register out in the context, leaving open the
possibility that a context saved by one thread be resumed by another. What will
gcc do in this case? 

If you don't mind, could you point me to the section of ISO C where it
specifies that function calls must preserve thread contexts? If not, by all
means it's a bug in the optimizer.

Does any one else have an opinion?


-- 

yichen dot xie at gmail dot com changed:

   What|Removed |Added

 Status|RESOLVED|UNCONFIRMED
 Resolution|INVALID |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26461