Re: TLS with static non-PIE binaries

2017-11-09 Thread Philip Guenther
On Thu, Nov 9, 2017 at 12:07 AM, Charles Collicutt 
wrote:

> On Wed, Nov 08, 2017 at 10:58:09PM -0800, Philip Guenther wrote:
> > @nobits means the section won't have filespace (where the initialization
> > data is) included in the loaded data.  It should work with @progbits
> > instead.
>
> Wow, I suck at assembly. However, once again binutils saved me:
>
> $ readelf -SW test.o | grep .tdata
>   [ 4] .tdataPROGBITS 4c 04 00
> WAT  0   0  4
>

Dang it, binutils keeps breaking all my pat answers!  Totally unfair.



> I'm afraid changing it to @progbits in the source doesn't make any
> difference to the result. It still returns the correct thing when
> dynamically linked but zero when statically linked.
>

More alignment fun.  The problem will go away if your data has an alignment
of 8 or a size that's a multiple of 8.  I need to think about the right way
to represent the necessary bitsat some time when I'm not falling asleep.


Philip Guenther


Re: TLS with static non-PIE binaries

2017-11-09 Thread Charles Collicutt
On Wed, Nov 08, 2017 at 10:58:09PM -0800, Philip Guenther wrote:
> @nobits means the section won't have filespace (where the initialization
> data is) included in the loaded data.  It should work with @progbits
> instead.

Wow, I suck at assembly. However, once again binutils saved me:

$ readelf -SW test.o | grep .tdata
  [ 4] .tdataPROGBITS 4c 04 00 WAT  
0   0  4

I'm afraid changing it to @progbits in the source doesn't make any
difference to the result. It still returns the correct thing when
dynamically linked but zero when statically linked.

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-08 Thread Philip Guenther
On Wed, Nov 8, 2017 at 10:43 PM, Charles Collicutt 
wrote:

> On Sun, Nov 05, 2017 at 01:02:36PM -0800, Philip Guenther wrote:
> > Well, ld.so and libc _should_ currently support startup-time TLS using
> the
> > initial-exec and local-exec modules.
>
> I can't see support for R_x_TPOFF64 relocations in ld.so(1) so I
> don't think initial-exec will work. But local-exec doesn't require any
> relocations by the dynamic linker so that should work.
>

Hmm, yeah, that makes sense.



> > The diff below fixes that, at least on amd64, by checking whether no
> > AUX_phdr value was found and, if so, trying to instead find them via the
> > ELF header referenced via the linker-provided __executable_start symbol.
>
> The patch fixed the segfaults I was seeing. However, initialized TLS
> data doesn't work in static executables: it appears as zero.
>
> $ cat test.s
> .globl  foo
> .section.tdata,"awT",@nobits
>

@nobits means the section won't have filespace (where the initialization
data is) included in the loaded data.  It should work with @progbits
instead.


Philip Guenther


Re: TLS with static non-PIE binaries

2017-11-08 Thread Charles Collicutt
On Sun, Nov 05, 2017 at 01:02:36PM -0800, Philip Guenther wrote:
> Well, ld.so and libc _should_ currently support startup-time TLS using the 
> initial-exec and local-exec modules.

I can't see support for R_x_TPOFF64 relocations in ld.so(1) so I
don't think initial-exec will work. But local-exec doesn't require any
relocations by the dynamic linker so that should work.


> The diff below fixes that, at least on amd64, by checking whether no 
> AUX_phdr value was found and, if so, trying to instead find them via the 
> ELF header referenced via the linker-provided __executable_start symbol. 

The patch fixed the segfaults I was seeing. However, initialized TLS
data doesn't work in static executables: it appears as zero.

$ cat test.s
.globl  foo
.section.tdata,"awT",@nobits
.align 4
.size   foo, 4
foo:
.int42
.text
.globl  get_foo
.type   get_foo, @function
get_foo:
movl%fs:foo@tpoff, %eax
ret

$ cc -o test.o -c test.s
$ cc -o test test.o main.c
$ ./test
42
$ cc -static -o test test.o main.c
$ ./test
0

PIE/no-PIE doesn't seem to make any difference to this.

I don't actually need initialized TLS data for my use, which is
statically linking Go binaries, so you've fixed my problem: thank you!

I've run the complete Go test-suite with your patch applied and
everything that I'd expect to pass does pass. This is on AMD64 only; I
don't have access to any other architecture for testing.

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-06 Thread Charles Collicutt
On Sun, Nov 05, 2017 at 04:17:50PM -0800, Philip Guenther wrote:
> BTW, in the .s file in your original message you had this line:
> .type   foo, @object
> 
> Since foo is in an SHF_TLS section, it has to be of type STT_TLS.  

You're right. But...


> Indeed, binutils is silently overriding what you wrote.

Yes, see: http://www.cygwin.com/ml/binutils/2002-11/msg00409.html

That assembly was actually generated by GCC. Clang generates the same.
The reason is that not all assemblers understand @tls_object. For
example, Sun as(1) wants @tls_obj instead. So GCC relies on the fact
that it will be fixed later.

See: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00284.html

Anyway, that was just a simple example to demonstrate the problem. I
don't expect that code to outlive the e-mail. (At least, I hope not...)

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-05 Thread Philip Guenther
On Mon, 6 Nov 2017, Charles Collicutt wrote:
> On Sun, Nov 05, 2017 at 01:02:36PM -0800, Philip Guenther wrote:
> > The problem with static non-PIE executables is that we don't pass an AUX 
> > vector to such processes, so the startup code can't find the TLS segment 
> > and doesn't leave any space for it.
> 
> Ah, thank you.

BTW, in the .s file in your original message you had this line:
.type   foo, @object

Since foo is in an SHF_TLS section, it has to be of type STT_TLS.  
Indeed, binutils is silently overriding what you wrote.  I suggest you 
either update it to match the reality:
.type   foo, @tls_object
or
.type   foo, STT_TLS

...or just delete the .type declaration entirely.


Philip Guenther



Re: TLS with static non-PIE binaries

2017-11-05 Thread Charles Collicutt
On Sun, Nov 05, 2017 at 01:02:36PM -0800, Philip Guenther wrote:
> The problem with static non-PIE executables is that we don't pass an AUX 
> vector to such processes, so the startup code can't find the TLS segment 
> and doesn't leave any space for it.

Ah, thank you.


> The diff below fixes that, at least on amd64 [...]

I will test it. Thanks for your help and explanation.

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-05 Thread Philip Guenther
On Sun, 5 Nov 2017, Stuart Henderson wrote:
> On 2017/11/05 21:08, Charles Collicutt wrote:
> > Hello,
> > 
> > I have a program that uses Thread-Local Storage (TLS) with the 'Local Exec'
> > access model [1] on AMD64. This looks like it should work on OpenBSD and,
> > indeed, it mostly does.
> 
> OpenBSD doesn't have real thread-local storage yet.

Well, ld.so and libc _should_ currently support startup-time TLS using the 
initial-exec and local-exec modules.

The problem with static non-PIE executables is that we don't pass an AUX 
vector to such processes, so the startup code can't find the TLS segment 
and doesn't leave any space for it.  On a variant II arch like amd64, the 
negative offset of a TLS variable results in accessing before the 
beginning of the page the TCB allocation is on, thus the fault.

The diff below fixes that, at least on amd64, by checking whether no 
AUX_phdr value was found and, if so, trying to instead find them via the 
ELF header referenced via the linker-provided __executable_start symbol. 
That's the second and third chunks below.

The first and fourth chunks correct the sizing of the allocation with the 
requested alignment was less than the natural alignment of the TIB itself,
resulting in a misaligned TIB.

The last chunk correctly relocates the pointer to the TLS initialization 
data (the .tdata segment), so that initialized TLS data works in static 
executables.


This will obviously require some testing...


Philip


Index: dlfcn/init.c
===
RCS file: /data/src/openbsd/src/lib/libc/dlfcn/init.c,v
retrieving revision 1.5
diff -u -p -r1.5 init.c
--- dlfcn/init.c6 Sep 2016 18:49:34 -   1.5
+++ dlfcn/init.c5 Nov 2017 21:02:13 -
@@ -34,6 +34,8 @@
 
 #include "init.h"
 
+#define MAX(a,b)   (((a)>(b))?(a):(b))
+
 /* XXX should be in an include file shared with csu */
 char   ***_csu_finish(char **_argv, char **_envp, void (*_cleanup)(void));
 
@@ -53,8 +55,10 @@ struct dl_phdr_info  _static_phdr_info = 
 
 static inline void early_static_init(char **_argv, char **_envp);
 static inline void setup_static_tib(Elf_Phdr *_phdr, int _phnum);
-#endif /* PIC */
 
+/* provided by the linker */
+extern Elf_Ehdr __executable_start[] __attribute__((weak));
+#endif /* PIC */
 
 /*
  * extract useful bits from the auxiliary vector and either
@@ -99,6 +103,15 @@ _csu_finish(char **argv, char **envp, vo
}
 
 #ifndef PIC
+   if (cleanup == NULL && phdr == NULL && __executable_start != NULL) {
+   /*
+* Static non-PIE processes don't get an AUX vector,
+* so find the phdrs through the ELF header
+*/
+   phdr = (void *)((char *)__executable_start +
+   __executable_start->e_phoff);
+   phnum = __executable_start->e_phnum;
+   }
/* static libc in a static link? */
if (cleanup == NULL)
setup_static_tib(phdr, phnum);
@@ -169,13 +182,22 @@ setup_static_tib(Elf_Phdr *phdr, int phn
 #elif TLS_VARIANT == 2
/*
 * variant 2 places the data before the TIB
-* so we need to round up to the alignment
+* so we need to round up the size to the
+* larger of the TLS data alignment and the
+* TIB's alignment.
+* Example A: p_memsz=24 p_align=16 align(TIB)=8
+* - need to allocate 32 bytes for TLS as compiler
+* - will give the first TLS symbol an offset of -32
+* Example B: p_memsz=4 p_align=4 align(TIB)=8
+* - need to allocate 8 bytes so that the TIB is
+* - properly aligned
 */
_static_tls_size = ELF_ROUND(phdr[i].p_memsz,
-   phdr[i].p_align);
+   MAX(__alignof__(struct tib), phdr[i].p_align));
 #endif
if (phdr[i].p_vaddr != 0 && phdr[i].p_filesz != 0) {
-   static_tls = (void *)phdr[i].p_vaddr;
+   static_tls = (void *)phdr[i].p_vaddr +
+   _static_phdr_info.dlpi_addr;
static_tls_fsize = phdr[i].p_filesz;
}
break;



Re: TLS with static non-PIE binaries

2017-11-05 Thread Marc Espie
The only "tls" support we have that's currently working is emulated tls, 
as exemplified by both clang and the gcc 4.9 port.

See /usr/src/lib/libcompiler_rt/emutls.c
for how it works.  It's not incredibly fast, but it does the job...

Fortunately, the gcc and clang people managed to make some kind of
standard, so both emutls flavors are actually compatible.



Re: TLS with static non-PIE binaries

2017-11-05 Thread Charles Collicutt
On Mon, Nov 06, 2017 at 12:58:18AM +0200, Paul Irofti wrote:
> We are missing dlopen-time TLS, dynamic TLS relocations and
> thread storage-specifier support.

Thanks. None of those things are needed for the Local Exec access model though,
so that should be OK.

Do you have any idea what might cause the segfault I saw? If it's expected
behaviour at this stage then that's fine. I just didn't think it looked right.

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-05 Thread Paul Irofti
On Sun, Nov 05, 2017 at 10:51:41PM +, Charles Collicutt wrote:
> On Sun, Nov 05, 2017 at 10:15:57PM +, Stuart Henderson wrote:
> > On 2017/11/05 21:08, Charles Collicutt wrote:
> > > I have a program that uses Thread-Local Storage (TLS) with the 'Local 
> > > Exec'
> > > access model [1] on AMD64. This looks like it should work on OpenBSD and,
> > > indeed, it mostly does.
> > 
> > OpenBSD doesn't have real thread-local storage yet.
> 
> What does 'real' mean here? OpenBSD looks like it has everything needed for
> 'Local Exec' thread-local storage.
> 
> You've got TLS block setup in ld.so(1), _csu_finish, and pthread_create(3).
> You've got the binutils linker, ld(1), which can handle R_x_TPOFF32 
> relocations.
> You don't need anything else for local exec TLS.
> 
> This is mainly interesting to me because the Go runtime uses local exec TLS to
> store a pointer to the current goroutine. I know people use Go on OpenBSD: if
> you're saying that isn't actually expected to work, that would be good to 
> know.

We are missing dlopen-time TLS, dynamic TLS relocations and
thread storage-specifier support.



Re: TLS with static non-PIE binaries

2017-11-05 Thread Charles Collicutt
On Sun, Nov 05, 2017 at 10:15:57PM +, Stuart Henderson wrote:
> On 2017/11/05 21:08, Charles Collicutt wrote:
> > I have a program that uses Thread-Local Storage (TLS) with the 'Local Exec'
> > access model [1] on AMD64. This looks like it should work on OpenBSD and,
> > indeed, it mostly does.
> 
> OpenBSD doesn't have real thread-local storage yet.

What does 'real' mean here? OpenBSD looks like it has everything needed for
'Local Exec' thread-local storage.

You've got TLS block setup in ld.so(1), _csu_finish, and pthread_create(3).
You've got the binutils linker, ld(1), which can handle R_x_TPOFF32 relocations.
You don't need anything else for local exec TLS.

This is mainly interesting to me because the Go runtime uses local exec TLS to
store a pointer to the current goroutine. I know people use Go on OpenBSD: if
you're saying that isn't actually expected to work, that would be good to know.

-- 
Charles



Re: TLS with static non-PIE binaries

2017-11-05 Thread Stuart Henderson
On 2017/11/05 21:08, Charles Collicutt wrote:
> Hello,
> 
> I have a program that uses Thread-Local Storage (TLS) with the 'Local Exec'
> access model [1] on AMD64. This looks like it should work on OpenBSD and,
> indeed, it mostly does.

OpenBSD doesn't have real thread-local storage yet.