Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-17 Thread Uros Bizjak
On Mon, May 12, 2014 at 7:38 PM, Wei Mi w...@google.com wrote:
 Here is a patch for the test. It contains two changes:
 1. For emutls, there will be an explicit call generated at expand
 pass, and no stack adjustment is needed. So add /* {
 dg-require-effective-target tls_native } */ in the test.
 2. Replace cfi_def_cfa_offset with insn sequence check.

 Is it ok?

 No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld:

 FAIL: gcc.target/i386/pr58066.c scan-assembler 
 sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr

 The TLS code sequence is different here:

 subl$8, %esp
 lealccc1@tlsgd(,%ebx,1), %eax
 callccc1@tlsgdplt

 I fear this insn scanning is going to be extremely fragile.

 Rainer

 Thanks for trying the testcase. rtl scanning will be slightly better
 than assembly scanning. So how about this one?

This is OK, with a small effective-target addition, as shown below.

Thanks,
Uros.


 Index: testsuite/gcc.target/i386/pr58066.c
 ===
 --- testsuite/gcc.target/i386/pr58066.c (revision 210222)
 +++ testsuite/gcc.target/i386/pr58066.c (working copy)
 @@ -1,5 +1,6 @@
  /* { dg-do compile } */
 -/* { dg-options -fPIC -O2 } */
 +/* { dg-require-effective-target tls_native } */

Please also add

/* { dg-require-effective-target fpic } */

 +/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */

  /* Check whether the stack frame starting addresses of tls expanded calls
 in foo and goo are 16bytes aligned.  */
 @@ -15,4 +16,6 @@ void* goo()
   return ccc2;
  }

 -/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
 +/* { dg-final { scan-rtl-dump Function
 foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
 -8.*UNSPEC_TLS.*Function goo final } } */
 +/* { dg-final { scan-rtl-dump Function
 goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
 -8.*UNSPEC_TLS final } } */
 +/* { dg-final { cleanup-rtl-dump final } } */


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-15 Thread Rainer Orth
Wei Mi w...@google.com writes:

 Can I checkin this testcase fix?

I think this is for Uros to approve.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-14 Thread Wei Mi
Can I checkin this testcase fix?

Thanks,
Wei.


On Tue, May 13, 2014 at 1:39 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Wei Mi w...@google.com writes:

 Thanks for trying the testcase. rtl scanning will be slightly better
 than assembly scanning. So how about this one?

 This one works fine for me.

 Thanks.
 Rainer

 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-13 Thread Rainer Orth
Wei Mi w...@google.com writes:

 Thanks for trying the testcase. rtl scanning will be slightly better
 than assembly scanning. So how about this one?

This one works fine for me.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-12 Thread Rainer Orth
Hi Wei,

please teach your mailer not to break/mangle long lines.  Thanks.

 Here is a patch for the test. It contains two changes:
 1. For emutls, there will be an explicit call generated at expand
 pass, and no stack adjustment is needed. So add /* {
 dg-require-effective-target tls_native } */ in the test.
 2. Replace cfi_def_cfa_offset with insn sequence check.

 Is it ok?

No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld:

FAIL: gcc.target/i386/pr58066.c scan-assembler 
sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr

The TLS code sequence is different here:

subl$8, %esp
lealccc1@tlsgd(,%ebx,1), %eax
callccc1@tlsgdplt

I fear this insn scanning is going to be extremely fragile.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-12 Thread Wei Mi
 Here is a patch for the test. It contains two changes:
 1. For emutls, there will be an explicit call generated at expand
 pass, and no stack adjustment is needed. So add /* {
 dg-require-effective-target tls_native } */ in the test.
 2. Replace cfi_def_cfa_offset with insn sequence check.

 Is it ok?

 No, the test FAILs for 32-bit i386-pc-solaris2.11 with Sun as/ld:

 FAIL: gcc.target/i386/pr58066.c scan-assembler 
 sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr.*sub[^\r\n]*8[^\r\n]*sp.*call[^\r\n]*__tls_get_addr

 The TLS code sequence is different here:

 subl$8, %esp
 lealccc1@tlsgd(,%ebx,1), %eax
 callccc1@tlsgdplt

 I fear this insn scanning is going to be extremely fragile.

 Rainer

Thanks for trying the testcase. rtl scanning will be slightly better
than assembly scanning. So how about this one?

Thanks,
Wei.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 210222)
+++ testsuite/gcc.target/i386/pr58066.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options -fPIC -O2 } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -fPIC -fomit-frame-pointer -O2 -fdump-rtl-final } */

 /* Check whether the stack frame starting addresses of tls expanded calls
in foo and goo are 16bytes aligned.  */
@@ -15,4 +16,6 @@ void* goo()
  return ccc2;
 }

-/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
+/* { dg-final { scan-rtl-dump Function
foo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
-8.*UNSPEC_TLS.*Function goo final } } */
+/* { dg-final { scan-rtl-dump Function
goo.*set\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*plus\[^\r\n\]*sp\\)\[\r\n\]\[^\r\n\]*const_int
-8.*UNSPEC_TLS final } } */
+/* { dg-final { cleanup-rtl-dump final } } */


Re: Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-10 Thread Dominique Dhumieres
 This is the updated patch of pr58066-3.patch. ...

On x86_64-apple-darwin13 I get

FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2

Dominique


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-10 Thread Rainer Orth
domi...@lps.ens.fr (Dominique Dhumieres) writes:

 This is the updated patch of pr58066-3.patch. ...

 On x86_64-apple-darwin13 I get

 FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2

Same on i386-pc-solaris2.* with Sun as (which doesn't support cfi
directives).

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-10 Thread Wei Mi
Here is a patch for the test. It contains two changes:
1. For emutls, there will be an explicit call generated at expand
pass, and no stack adjustment is needed. So add /* {
dg-require-effective-target tls_native } */ in the test.
2. Replace cfi_def_cfa_offset with insn sequence check.

Is it ok?

Thanks,
Wei.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 210301)
+++ testsuite/gcc.target/i386/pr58066.c (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options -fPIC -O2 } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-options -fPIC -fomit-frame-pointer -O2 } */

 /* Check whether the stack frame starting addresses of tls expanded calls
in foo and goo are 16bytes aligned.  */
@@ -15,4 +16,4 @@ void* goo()
  return ccc2;
 }

-/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
+/* { dg-final { scan-assembler
sub\[^\r\n\]*8\[^\r\n\]*sp.*call\[^\r\n\]*__tls_get_addr.*sub\[^\r\n\]*8\[^\r\n\]*sp.*call\[^\r\n\]*__tls_get_addr
} } */

On Sat, May 10, 2014 at 6:47 AM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 domi...@lps.ens.fr (Dominique Dhumieres) writes:

 This is the updated patch of pr58066-3.patch. ...

 On x86_64-apple-darwin13 I get

 FAIL: gcc.target/i386/pr58066.c scan-assembler-times .cfi_def_cfa_offset 16 2

 Same on i386-pc-solaris2.* with Sun as (which doesn't support cfi
 directives).

 Rainer

 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-08 Thread Uros Bizjak
On Thu, May 8, 2014 at 12:59 AM, Wei Mi w...@google.com wrote:

 The calls added in the templates of tls_local_dynamic_base_32 and
 tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2
 from moving sp setting across implicit tls calls, but those calls make
 the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that
 the optimization in tls_local_dynamic_32_once to convert local_dynamic
 to global_dynamic mode for single tls reference cannot take effect. In
 the updated patch, I remove those calls from insn templates and add
 reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and
 UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above,
 and now the optimization in tls_local_dynamic_32_once works.

 bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK
 if regression passes?

Please update ChangeLog with all changes, see below:

 ChangeLog:

 gcc/
 2014-05-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout):
 preferred_stack_boundary updated for tls expanded call.

(...): Update preferred_stack_boundary for call, expanded from tls descriptor.

 * config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun.

* config/i386/i386.md (*tls_global_dynamic_32_gnu): Depend on SP register.
(*tls_local_dynamic_base_32_gnu): Ditto.
...
(tls_global_dynamic_32): Set
ix86_tls_descriptor_calls_expanded_in_cfun.  Update RTX to depend on
SP register.
(tls_local_dynamic_base_32): Ditto.
...

The patch is OK for mainline with updated and complete ChangeLog entry.

Thanks,
Uros.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-07 Thread Wei Mi
This is the updated patch of pr58066-3.patch.

The calls added in the templates of tls_local_dynamic_base_32 and
tls_global_dynamic_32 in pr58066-3.patch are used to prevent sched2
from moving sp setting across implicit tls calls, but those calls make
the combine of UNSPEC_TLS_LD_BASE and UNSPEC_DTPOFF difficult, so that
the optimization in tls_local_dynamic_32_once to convert local_dynamic
to global_dynamic mode for single tls reference cannot take effect. In
the updated patch, I remove those calls from insn templates and add
reg:SI SP_REG explicitly in the templates of UNSPEC_TLS_GD and
UNSPEC_TLS_LD_BASE. It solves the sched2 and combine problems above,
and now the optimization in tls_local_dynamic_32_once works.

bootstrapped ok on x86_64-linux-gnu. regression is going on. Is it OK
if regression passes?

Thanks.
Wei.

ChangeLog:

gcc/
2014-05-07  Wei Mi  w...@google.com

* config/i386/i386.c (ix86_compute_frame_layout):
preferred_stack_boundary updated for tls expanded call.
* config/i386/i386.md: Set ix86_tls_descriptor_calls_expanded_in_cfun.

gcc/testsuite/
2014-05-07  Wei Mi  w...@google.com

* gcc.target/i386/pr58066.c: New test.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 0)
+++ testsuite/gcc.target/i386/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -fPIC -O2 } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return ccc2;
+}
+
+/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 209979)
+++ config/i386/i386.c  (working copy)
@@ -9485,20 +9485,30 @@ ix86_compute_frame_layout (struct ix86_f
   frame-nregs = ix86_nsaved_regs ();
   frame-nsseregs = ix86_nsaved_sseregs ();

-  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
-  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;
-
   /* 64-bit MS ABI seem to require stack alignment to be always 16 except for
  function prologues and leaf.  */
-  if ((TARGET_64BIT_MS_ABI  preferred_alignment  16)
+  if ((TARGET_64BIT_MS_ABI  crtl-preferred_stack_boundary  128)
(!crtl-is_leaf || cfun-calls_alloca != 0
   || ix86_current_function_calls_tls_descriptor))
 {
-  preferred_alignment = 16;
-  stack_alignment_needed = 16;
   crtl-preferred_stack_boundary = 128;
   crtl-stack_alignment_needed = 128;
 }
+  /* preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (ix86_current_function_calls_tls_descriptor
+   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
+{
+  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
+   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}
+
+  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
+  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;

   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 209979)
+++ config/i386/i386.md (working copy)
@@ -12530,7 +12530,8 @@
(unspec:SI
 [(match_operand:SI 1 register_operand b)
  (match_operand 2 tls_symbolic_operand)
- (match_operand 3 constant_call_address_operand z)]
+ (match_operand 3 constant_call_address_operand z)
+ (reg:SI SP_REG)]
 UNSPEC_TLS_GD))
(clobber (match_scratch:SI 4 =d))
(clobber (match_scratch:SI 5 =c))
@@ -12555,11 +12556,14 @@
 [(set (match_operand:SI 0 register_operand)
  (unspec:SI [(match_operand:SI 2 register_operand)
  (match_operand 1 tls_symbolic_operand)
- (match_operand 3 constant_call_address_operand)]
+ (match_operand 3 constant_call_address_operand)
+ (reg:SI SP_REG)]
 UNSPEC_TLS_GD))
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;)

 (define_insn *tls_global_dynamic_64_mode
   [(set (match_operand:P 0 register_operand =a)
@@ -12614,13 +12618,15 @@
   (const_int 0)))
   

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-01 Thread Uros Bizjak
On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote:
 Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?

None of these patches have correct ChangeLog entries. Please follow
the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting
Patches section), otherwise your patches will be simply ignored.

 I attached the patch which combined your two patches and the fix in
 legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
 the code looked fine. Do you think it is ok?

 Thanks,
 Wei.

 Either pr58066-3.patch or pr58066-4.patch looks good to me.

pr58066-4 patch is definitely not OK. I wonder, how it works at all,
since you can't split the insn to the same pattern. The generic code
detects this condition and forces ICE (IIRC: this is the reason for
UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1).

From pr58066-3 patch:

-;; Local dynamic of a single variable is a lose.  Show combine how
-;; to convert that back to global dynamic.
-
-(define_insn_and_split *tls_local_dynamic_32_once
-  [(set (match_operand:SI 0 register_operand =a)
-(plus:SI
- (unspec:SI [(match_operand:SI 1 register_operand b)
- (match_operand 2 constant_call_address_operand z)]
-UNSPEC_TLS_LD_BASE)
- (const:SI (unspec:SI
-[(match_operand 3 tls_symbolic_operand)]
-UNSPEC_DTPOFF
-   (clobber (match_scratch:SI 4 =d))
-   (clobber (match_scratch:SI 5 =c))
-   (clobber (reg:CC FLAGS_REG))]
-  
-  #
-  
-  [(parallel
- [(set (match_dup 0)
-   (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)]
-  UNSPEC_TLS_GD))
-  (clobber (match_dup 4))
-  (clobber (match_dup 5))
-  (clobber (reg:CC FLAGS_REG))])])

Why did you remove this splitter?

Please do not write:

+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

but use a short form:

+  ix86_tls_descriptor_calls_expanded_in_cfun = true;)

Please also add a testcase (from one of the previous mails):

--- testsuite/gcc.dg/pr58066.c (revision 0)
+++ testsuite/gcc.dg/pr58066.c (revision 0)

Put this test to gcc.target/i386 directory ...
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */

... to avoid target selector.

+/* { dg-options -fPIC -O2 } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return ccc2;
+}
+
+/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */

Please repost the complete patch with a proper ChangeLog.

Uros.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-05-01 Thread Wei Mi
On Wed, Apr 30, 2014 at 11:44 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Thu, May 1, 2014 at 6:42 AM, Wei Mi w...@google.com wrote:
 Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?

 None of these patches have correct ChangeLog entries. Please follow
 the rules, outlined in http://gcc.gnu.org/contribute.html (Submitting
 Patches section), otherwise your patches will be simply ignored.


Will add Changelog in the final patch.


 pr58066-4 patch is definitely not OK. I wonder, how it works at all,
 since you can't split the insn to the same pattern. The generic code
 detects this condition and forces ICE (IIRC: this is the reason for
 UNSPEC_DIV_ALREADY_SPLIT tag in divmodmode4_1).


If the generic code detects same pattern, it will not ICE but return
the original insn (See the code in try_split under the comment /*
Avoid infinite loop if any insn of the result matches the original
pattern. */). ix86_tls_descriptor_calls_expanded_in_cfun will be set
to true even if original insn is returned by try_split. That is how
pr58066-4 works.

 From pr58066-3 patch:

 -;; Local dynamic of a single variable is a lose.  Show combine how
 -;; to convert that back to global dynamic.
 -
 -(define_insn_and_split *tls_local_dynamic_32_once
 -  [(set (match_operand:SI 0 register_operand =a)
 -(plus:SI
 - (unspec:SI [(match_operand:SI 1 register_operand b)
 - (match_operand 2 constant_call_address_operand z)]
 -UNSPEC_TLS_LD_BASE)
 - (const:SI (unspec:SI
 -[(match_operand 3 tls_symbolic_operand)]
 -UNSPEC_DTPOFF
 -   (clobber (match_scratch:SI 4 =d))
 -   (clobber (match_scratch:SI 5 =c))
 -   (clobber (reg:CC FLAGS_REG))]
 -  
 -  #
 -  
 -  [(parallel
 - [(set (match_dup 0)
 -   (unspec:SI [(match_dup 1) (match_dup 3) (match_dup 2)]
 -  UNSPEC_TLS_GD))
 -  (clobber (match_dup 4))
 -  (clobber (match_dup 5))
 -  (clobber (reg:CC FLAGS_REG))])])

 Why did you remove this splitter?


After we add add call into the pattern of tls_local_dynamic_base_32,
it is difficult for the pattern tls_local_dynamic_32_once to be
matched. I don't have a good way to rewrite tls_local_dynamic_32_once.
Do you have any idea?

(define_expand tls_local_dynamic_base_32
  [(parallel
 [(set (match_operand:SI 0 register_operand)
   (call:SI
(mem:QI (match_operand 2 constant_call_address_operand))
(const_int 0)))
  (unspec:SI [(match_operand:SI 1 register_operand)]
 UNSPEC_TLS_LD_BASE)
  (clobber (match_scratch:SI 3))
  (clobber (match_scratch:SI 4))
  (clobber (reg:CC FLAGS_REG))])]

 Please do not write:

 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

 but use a short form:

 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;)

 Please also add a testcase (from one of the previous mails):

 --- testsuite/gcc.dg/pr58066.c (revision 0)
 +++ testsuite/gcc.dg/pr58066.c (revision 0)

 Put this test to gcc.target/i386 directory ...
 @@ -0,0 +1,18 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */

 ... to avoid target selector.

 +/* { dg-options -fPIC -O2 } */
 +
 +/* Check whether the stack frame starting addresses of tls expanded calls
 +   in foo and goo are 16bytes aligned.  */
 +static __thread char ccc1;
 +void* foo()
 +{
 + return ccc1;
 +}
 +
 +__thread char ccc2;
 +void* goo()
 +{
 + return ccc2;
 +}
 +
 +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */

 Please repost the complete patch with a proper ChangeLog.

 Uros.

will do that.

Thanks,
Wei.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-04-30 Thread Wei Mi
Ping. Is pr58066-3.patch or pr58066-4.patch ok for trunk?

Thanks,
Wei.

 I attached the patch which combined your two patches and the fix in
 legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
 the code looked fine. Do you think it is ok?

 Thanks,
 Wei.

 Either pr58066-3.patch or pr58066-4.patch looks good to me.

 Thanks.

 --
 H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread H.J. Lu
On Wed, Mar 12, 2014 at 10:52 PM, Wei Mi w...@google.com wrote:
 I saw the problem last patch had on ia32. Without explicit call in rtl
 template, scheduler may schedule the sp adjusting insn across tls
 descriptor and break the alignment assumption.
 I am testing the updated patch on x86_64.

 Can we combine the last two patches, both adding call explicitly in
 rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
 set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
 reload complete?


My ia32 change generates much worse code:

[hjl@gnu-6 gcc]$ cat /tmp/c.i
static __thread char ccc, bbb;

int __cxa_get_globals()
{
 return ccc - bbb;
}
[hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i
[hjl@gnu-6 gcc]$ cat c.s
.file c.i
.section .text.unlikely,ax,@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl __cxa_get_globals
.type __cxa_get_globals, @function
__cxa_get_globals:
.LFB0:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
leaq ccc@tlsld(%rip), %rdi
call __tls_get_addr@PLT
addq $8, %rsp
.cfi_def_cfa_offset 8
leaq ccc@dtpoff(%rax), %rcx
leaq bbb@dtpoff(%rax), %rdx
movq %rcx, %rax
subq %rdx, %rax
ret
.cfi_endproc
.LFE0:
.size __cxa_get_globals, .-__cxa_get_globals
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.section .tbss,awT,@nobits
.type bbb, @object
.size bbb, 1
bbb:
.zero 1
.type ccc, @object
.size ccc, 1
ccc:
.zero 1
.ident GCC: (GNU) 4.9.0 20140312 (experimental)
.section .note.GNU-stack,,@progbits
[hjl@gnu-6 gcc]$ cat /tmp/c.i
static __thread char ccc, bbb;

int __cxa_get_globals()
{
 return ccc - bbb;
}
[hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i -m32
[hjl@gnu-6 gcc]$ cat c.s
.file c.i
.section .text.unlikely,ax,@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl __cxa_get_globals
.type __cxa_get_globals, @function
__cxa_get_globals:
.LFB0:
.cfi_startproc
pushl %esi
.cfi_def_cfa_offset 8
.cfi_offset 6, -8
pushl %ebx
.cfi_def_cfa_offset 12
.cfi_offset 3, -12
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
subl $4, %esp
.cfi_def_cfa_offset 16
leal ccc@tlsldm(%ebx), %eax
call ___tls_get_addr@PLT
leal ccc@dtpoff(%eax), %esi
leal ccc@tlsldm(%ebx), %eax
call ___tls_get_addr@PLT
addl $4, %esp
.cfi_def_cfa_offset 12
leal bbb@dtpoff(%eax), %eax
popl %ebx
.cfi_restore 3
.cfi_def_cfa_offset 8
subl %eax, %esi
movl %esi, %eax
popl %esi
.cfi_restore 6
.cfi_def_cfa_offset 4
ret
.cfi_endproc

Maybe we should keep the original patterns and
split them to add CALL.

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
pr58066-2.patch worked for pr58066.c on ia32/x32/x86_64, but it failed
on bootstrap.

/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/xgcc
-B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/./gcc/
-B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/bin/
-B/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/lib/
-isystem 
/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/include
-isystem 
/usr/local/google/home/wmi/workarea/gcc-r208410-2/build/install/x86_64-unknown-linux-gnu/sys-include
   -g -O2 -m64 -O2  -g -O2 -DIN_GCC-W -Wall -Wwrite-strings
-Wcast-qual -Wno-format -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition  -isystem ./include   -fpic -mlong-double-80 -g
-DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fpic
-mlong-double-80 -I. -I. -I../../.././gcc -I../../../../src/libgcc
-I../../../../src/libgcc/. -I../../../../src/libgcc/../gcc
-I../../../../src/libgcc/../include
-I../../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT
-DHAVE_CC_TLS  -DUSE_TLS -o bid_decimal_globals.o -MT
bid_decimal_globals.o -MD -MP -MF bid_decimal_globals.dep -c
../../../../src/libgcc/config/libbid/bid_decimal_globals.c

(call_insn 5 2 6 2 (parallel [
(set (reg/f:SI 85)
(call:SI (mem:QI (symbol_ref:SI (___tls_get_addr)) [0  S1 A8])
(const_int 0 [0])))
(unspec:SI [
(reg:SI 3 bx)
(symbol_ref:SI (__bid_IDEC_glbflags) [flags
0x10]  var_decl 0x761c1da8 __bid_IDEC_glbflags)
] UNSPEC_TLS_GD)
(clobber (reg:SI 91))
(clobber (reg:SI 92))
(clobber (reg:CC 17 flags))
]) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51
772 {*tls_global_dynamic_32_gnu}
 (expr_list:REG_UNUSED (reg:SI 92)
(expr_list:REG_UNUSED (reg:SI 91)
(nil)))
(nil))
../../../../src/libgcc/config/libbid/bid_decimal_globals.c:52:1:
internal compiler error: in curr_insn_transform, at
lra-constraints.c:3262
0xad8453 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../../src/gcc/rtl-error.c:109
0x9d1221 curr_insn_transform
../../src/gcc/lra-constraints.c:3262
0x9d40e4 lra_constraints(bool)
../../src/gcc/lra-constraints.c:4157
0x9c0ad8 lra(_IO_FILE*)
../../src/gcc/lra.c:2340
0x96e310 do_reload
../../src/gcc/ira.c:5457
0x96e622 rest_of_handle_reload
../../src/gcc/ira.c:5598
0x96e66c execute
../../src/gcc/ira.c:5627

The problem is the return value of the call may be assigned to a
different hardreg than AX_REG. But LRA cannot do reload for output
operand of call. The fix is to change the above pattern to the
following pattern in legitimize_tls_address() in config/i386/i386.c.

(call_insn/u 5 4 6 (parallel [
(set (reg:SI 0 ax)
(call:SI (mem:QI (symbol_ref:SI (___tls_get_addr)) [0  S1 A8])
(const_int 0 [0])))
(unspec:SI [
(reg:SI 3 bx)
(symbol_ref:SI (__bid_IDEC_glbflags) [flags
0x10]  var_decl 0x75ef3da8 __bid_IDEC_glbflags)
] UNSPEC_TLS_GD)
(clobber (scratch:SI))
(clobber (scratch:SI))
(clobber (reg:CC 17 flags))
]) ../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51 -1
 (expr_list:REG_EH_REGION (const_int -2147483648 [0x8000])
(nil))
(nil))

(insn 6 5 7 (set (reg/f:SI 85)
(reg:SI 0 ax))
../../../../src/libgcc/config/libbid/bid_decimal_globals.c:51 -1
 (expr_list:REG_EQUAL (symbol_ref:SI (__bid_IDEC_glbflags)
[flags 0x10]  var_decl 0x75ef3da8 __bid_IDEC_glbflags)

After the problem is fixed, bootstrap and regression test on x86-64 are ok.

Thanks,
Wei.
Index: config/i386/i386.md
===
--- config/i386/i386.md	(revision 208410)
+++ config/i386/i386.md	(working copy)
@@ -12859,13 +12859,14 @@
 
 (define_insn *tls_global_dynamic_32_gnu
   [(set (match_operand:SI 0 register_operand =a)
-	(unspec:SI
-	 [(match_operand:SI 1 register_operand b)
-	  (match_operand 2 tls_symbolic_operand)
-	  (match_operand 3 constant_call_address_operand z)]
-	 UNSPEC_TLS_GD))
-   (clobber (match_scratch:SI 4 =d))
-   (clobber (match_scratch:SI 5 =c))
+	(call:SI
+	 (mem:QI (match_operand 3 constant_call_address_operand z))
+	 (match_operand 4)))
+   (unspec:SI [(match_operand:SI 1 register_operand b)
+	   (match_operand 2 tls_symbolic_operand)]
+	  UNSPEC_TLS_GD)
+   (clobber (match_scratch:SI 5 =d))
+   (clobber (match_scratch:SI 6 =c))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT  TARGET_GNU_TLS
 {
@@ -12885,13 +12886,19 @@
 (define_expand tls_global_dynamic_32
   [(parallel
 [(set (match_operand:SI 0 register_operand)
-	  (unspec:SI [(match_operand:SI 2 register_operand)
-		  (match_operand 1 

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi

 My ia32 change generates much worse code:

 [hjl@gnu-6 gcc]$ cat /tmp/c.i
 static __thread char ccc, bbb;

 int __cxa_get_globals()
 {
  return ccc - bbb;
 }
 [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i
 [hjl@gnu-6 gcc]$ cat c.s
 .file c.i
 .section .text.unlikely,ax,@progbits
 .LCOLDB0:
 .text
 .LHOTB0:
 .p2align 4,,15
 .globl __cxa_get_globals
 .type __cxa_get_globals, @function
 __cxa_get_globals:
 .LFB0:
 .cfi_startproc
 subq $8, %rsp
 .cfi_def_cfa_offset 16
 leaq ccc@tlsld(%rip), %rdi
 call __tls_get_addr@PLT
 addq $8, %rsp
 .cfi_def_cfa_offset 8
 leaq ccc@dtpoff(%rax), %rcx
 leaq bbb@dtpoff(%rax), %rdx
 movq %rcx, %rax
 subq %rdx, %rax
 ret
 .cfi_endproc
 .LFE0:
 .size __cxa_get_globals, .-__cxa_get_globals
 .section .text.unlikely
 .LCOLDE0:
 .text
 .LHOTE0:
 .section .tbss,awT,@nobits
 .type bbb, @object
 .size bbb, 1
 bbb:
 .zero 1
 .type ccc, @object
 .size ccc, 1
 ccc:
 .zero 1
 .ident GCC: (GNU) 4.9.0 20140312 (experimental)
 .section .note.GNU-stack,,@progbits
 [hjl@gnu-6 gcc]$ cat /tmp/c.i
 static __thread char ccc, bbb;

 int __cxa_get_globals()
 {
  return ccc - bbb;
 }
 [hjl@gnu-6 gcc]$ ./xgcc -B./ -S -O2 -fPIC /tmp/c.i -m32
 [hjl@gnu-6 gcc]$ cat c.s
 .file c.i
 .section .text.unlikely,ax,@progbits
 .LCOLDB0:
 .text
 .LHOTB0:
 .p2align 4,,15
 .globl __cxa_get_globals
 .type __cxa_get_globals, @function
 __cxa_get_globals:
 .LFB0:
 .cfi_startproc
 pushl %esi
 .cfi_def_cfa_offset 8
 .cfi_offset 6, -8
 pushl %ebx
 .cfi_def_cfa_offset 12
 .cfi_offset 3, -12
 call __x86.get_pc_thunk.bx
 addl $_GLOBAL_OFFSET_TABLE_, %ebx
 subl $4, %esp
 .cfi_def_cfa_offset 16
 leal ccc@tlsldm(%ebx), %eax
 call ___tls_get_addr@PLT
 leal ccc@dtpoff(%eax), %esi
 leal ccc@tlsldm(%ebx), %eax
 call ___tls_get_addr@PLT
 addl $4, %esp
 .cfi_def_cfa_offset 12
 leal bbb@dtpoff(%eax), %eax
 popl %ebx
 .cfi_restore 3
 .cfi_def_cfa_offset 8
 subl %eax, %esi
 movl %esi, %eax
 popl %esi
 .cfi_restore 6
 .cfi_def_cfa_offset 4
 ret
 .cfi_endproc

 Maybe we should keep the original patterns and
 split them to add CALL.

 --
 H.J.

I tried pr58066-3.patch on the above testcase, the code it generated
seems ok. I think after we change the 32bits pattern in i386.md to be
similar as 64bits pattern, we should change 32bit expand to be similar
as 64bit expand in legitimize_tls_address too?

Thanks,
Wei.

~/workarea/gcc-r208410-2/build/install/bin/gcc -m32 -S -fPIC 1.c

.file   1.c
.section.tbss,awT,@nobits
.type   ccc, @object
.size   ccc, 1
ccc:
.zero   1
.type   bbb, @object
.size   bbb, 1
bbb:
.zero   1
.text
.globl  __cxa_get_globals
.type   __cxa_get_globals, @function
__cxa_get_globals:
.LFB0:
.cfi_startproc
pushl   %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl%esp, %ebp
.cfi_def_cfa_register 5
pushl   %esi
pushl   %ebx
.cfi_offset 6, -12
.cfi_offset 3, -16
call__x86.get_pc_thunk.bx
addl$_GLOBAL_OFFSET_TABLE_, %ebx
lealccc@tlsgd(,%ebx,1), %eax
call___tls_get_addr@PLT
movl%eax, %esi
lealbbb@tlsgd(,%ebx,1), %eax
call___tls_get_addr@PLT
subl%eax, %esi
movl%esi, %eax
popl%ebx
.cfi_restore 3
popl%esi
.cfi_restore 6
popl%ebp
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size   __cxa_get_globals, .-__cxa_get_globals
.section
.text.__x86.get_pc_thunk.bx,axG,@progbits,__x86.get_pc_thunk.bx,comdat
.globl  __x86.get_pc_thunk.bx
.hidden __x86.get_pc_thunk.bx
.type   __x86.get_pc_thunk.bx, @function
__x86.get_pc_thunk.bx:
.LFB1:
.cfi_startproc
movl(%esp), %ebx
ret
.cfi_endproc
.LFE1:
.ident  GCC: (GNU) 4.9.0 20140307 (experimental)
.section.note.GNU-stack,,@progbits


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread Wei Mi
 Can we combine the last two patches, both adding call explicitly in
 rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
 set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
 reload complete?


Hi H.J.

I attached the patch which combined your two patches and the fix in
legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
the code looked fine. Do you think it is ok?

Thanks,
Wei.
Index: config/i386/i386.c
===
--- config/i386/i386.c	(revision 208410)
+++ config/i386/i386.c	(working copy)
@@ -9082,7 +9082,7 @@ ix86_frame_pointer_required (void)
  we've not got a leaf function.  */
   if (TARGET_OMIT_LEAF_FRAME_POINTER
(!crtl-is_leaf
-	  || ix86_current_function_calls_tls_descriptor))
+	  || ix86_tls_descriptor_calls_expanded_in_cfun))
 return true;
 
   if (crtl-profile  !flag_fentry)
@@ -9331,7 +9331,7 @@ ix86_select_alt_pic_regnum (void)
 {
   if (crtl-is_leaf
!crtl-profile
-   !ix86_current_function_calls_tls_descriptor)
+   !ix86_tls_descriptor_calls_expanded_in_cfun)
 {
   int i, drap;
   /* Can't use the same register for both PIC and DRAP.  */
@@ -9490,20 +9490,28 @@ ix86_compute_frame_layout (struct ix86_f
   frame-nregs = ix86_nsaved_regs ();
   frame-nsseregs = ix86_nsaved_sseregs ();
 
-  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
-  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;
-
   /* 64-bit MS ABI seem to require stack alignment to be always 16 except for
  function prologues and leaf.  */
-  if ((TARGET_64BIT_MS_ABI  preferred_alignment  16)
+  if ((TARGET_64BIT_MS_ABI  crtl-preferred_stack_boundary  128)
(!crtl-is_leaf || cfun-calls_alloca != 0
-  || ix86_current_function_calls_tls_descriptor))
+  || ix86_tls_descriptor_calls_expanded_in_cfun))
 {
-  preferred_alignment = 16;
-  stack_alignment_needed = 16;
   crtl-preferred_stack_boundary = 128;
   crtl-stack_alignment_needed = 128;
 }
+  /* preferred_stack_boundary is never updated for call expanded from
+ tls descriptor. Update it here. We don't update it in expand stage
+ because tls calls may be optimized away.  */
+  else if (ix86_tls_descriptor_calls_expanded_in_cfun
+   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
+{
+  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
+   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}
+
+  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
+  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;
 
   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
@@ -9608,7 +9616,7 @@ ix86_compute_frame_layout (struct ix86_f
   || size != 0
   || !crtl-is_leaf
   || cfun-calls_alloca
-  || ix86_current_function_calls_tls_descriptor)
+  || ix86_tls_descriptor_calls_expanded_in_cfun)
 offset = (offset + stack_alignment_needed - 1)  -stack_alignment_needed;
 
   /* Frame pointer points here.  */
@@ -9623,7 +9631,7 @@ ix86_compute_frame_layout (struct ix86_f
  of stack frame are unused.  */
   if (ACCUMULATE_OUTGOING_ARGS
(!crtl-is_leaf || cfun-calls_alloca
-	  || ix86_current_function_calls_tls_descriptor))
+	  || ix86_tls_descriptor_calls_expanded_in_cfun))
 {
   offset += crtl-outgoing_args_size;
   frame-outgoing_arguments_size = crtl-outgoing_args_size;
@@ -9634,7 +9642,7 @@ ix86_compute_frame_layout (struct ix86_f
   /* Align stack boundary.  Only needed if we're calling another function
  or using alloca.  */
   if (!crtl-is_leaf || cfun-calls_alloca
-  || ix86_current_function_calls_tls_descriptor)
+  || ix86_tls_descriptor_calls_expanded_in_cfun)
 offset = (offset + preferred_alignment - 1)  -preferred_alignment;
 
   /* We've reached end of stack frame.  */
@@ -9650,7 +9658,7 @@ ix86_compute_frame_layout (struct ix86_f
   if (ix86_using_red_zone ()
crtl-sp_is_unchanging
crtl-is_leaf
-   !ix86_current_function_calls_tls_descriptor)
+   !ix86_tls_descriptor_calls_expanded_in_cfun)
 {
   frame-red_zone_size = to_allocate;
   if (frame-save_regs_using_mov)
@@ -10623,7 +10631,7 @@ ix86_finalize_stack_realign_flags (void)
crtl-is_leaf
flag_omit_frame_pointer
crtl-sp_is_unchanging
-   !ix86_current_function_calls_tls_descriptor
+   !ix86_tls_descriptor_calls_expanded_in_cfun
!crtl-accesses_prior_frames
!cfun-calls_alloca
!crtl-calls_eh_return
@@ -13437,26 +13445,25 @@ legitimize_tls_address (rtx x, enum tls_
   else
 	{
 	  rtx caddr = ix86_tls_get_addr ();
+	  rtx ax = gen_rtx_REG (Pmode, AX_REG);
+	  rtx insns;
+	  start_sequence ();
 
 	  if (TARGET_64BIT)
-	{
-	  

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-13 Thread H.J. Lu
On Thu, Mar 13, 2014 at 10:55 AM, Wei Mi w...@google.com wrote:
 Can we combine the last two patches, both adding call explicitly in
 rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
 set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
 reload complete?


 Hi H.J.

 I attached the patch which combined your two patches and the fix in
 legitimize_tls_address. I tried pr58066.c and c.i in ia32/x32/x86_64,
 the code looked fine. Do you think it is ok?

 Thanks,
 Wei.

Either pr58066-3.patch or pr58066-4.patch looks good to me.

Thanks.

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Fri, Mar 7, 2014 at 2:33 PM, Wei Mi w...@google.com wrote:
 Yes, x32 has the same problem. It should be tested. Fixed.

 Thanks,
 Wei.


 On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote:
 Hi,

 This patch is to fix the problem described here:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

 I follow Ian's suggestion and set
 ix86_tls_descriptor_calls_expanded_in_cfun in
 tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode.
 Although 32bit doesn't have the problem,
 ix86_tls_descriptor_calls_expanded_in_cfun is also set for
 tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
 ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
 32bits and 64bits.

 If ix86_current_function_calls_tls_descriptor is set, we know that
 there is tls expanded call in current function. Update
 crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be
 no less than PREFERED_STACK_ALIGNMENT at the start of
 ix86_compute_frame_layout. We don't do the update in
 legitimize_tls_address in cfgexpand stage, which is too early because
 according to the comments before
 ix86_current_function_calls_tls_descriptor, tls call may be optimized
 away. ix86_compute_frame_layout is the latest place to do the update.

 bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
 for trunk if tests pass?

 Thanks,
 Wei.

 gcc/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout): Update
 preferred_stack_boundary when there is tls expanded call.
 * config/i386/i386.md: Set
 ix86_tls_descriptor_calls_expanded_in_cfun.

 gcc/testsuite/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * g++.dg/pr58066.C: New test.


 Index: gcc/config/i386/i386.c
 ===
 --- gcc/config/i386/i386.c  (revision 208410)
 +++ gcc/config/i386/i386.c  (working copy)
 @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
crtl-preferred_stack_boundary = 128;
crtl-stack_alignment_needed = 128;
  }
 +  /* For 64-bit target, preferred_stack_boundary is never updated for call
 + expanded from tls descriptor. Update it here. We don't update it in
 + expand stage because according to the comments before
 + ix86_current_function_calls_tls_descriptor, tls calls may be optimized
 + away.  */
 +  else if (TARGET_64BIT
 +   ix86_current_function_calls_tls_descriptor
 +   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
 +{
 +  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 +  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
 +   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
 +}


There are several problems with this:

1.  It doesn't work with C.
2.  IA32 has the same issue and isn't fixed.
3.  There is no testcase for global dynamic model.

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
 There are several problems with this:

 1.  It doesn't work with C.

Ok, I will change the testcase using C.

 2.  IA32 has the same issue and isn't fixed.

I thought IA32 didn't have the same issue because abi only requires 32
bit alignment for stack starting address.

oh, I found the old patch
http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
the default alignment to 128bit. Ok, will remove the TARGET_64BIT
constraint.

 3.  There is no testcase for global dynamic model.

 --
 H.J.

Will add the testcase.

Thanks,
Wei.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi w...@google.com wrote:
 There are several problems with this:

 1.  It doesn't work with C.

 Ok, I will change the testcase using C.

 2.  IA32 has the same issue and isn't fixed.

 I thought IA32 didn't have the same issue because abi only requires 32
 bit alignment for stack starting address.

 oh, I found the old patch
 http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
 the default alignment to 128bit. Ok, will remove the TARGET_64BIT
 constraint.

 3.  There is no testcase for global dynamic model.

 --
 H.J.

 Will add the testcase.


I posted a different patch in

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Hi H.J.,

Could you show me why you postpone the setting
ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
use ix86_tls_descriptor_calls_expanded_in_cfun instead of
ix86_current_function_calls_tls_descriptor? Isn't
ix86_current_function_calls_tls_descriptor useful to consider the case
that tls call is optimized away?

Thanks,
Wei.

On Wed, Mar 12, 2014 at 2:07 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 2:03 PM, Wei Mi w...@google.com wrote:
 There are several problems with this:

 1.  It doesn't work with C.

 Ok, I will change the testcase using C.

 2.  IA32 has the same issue and isn't fixed.

 I thought IA32 didn't have the same issue because abi only requires 32
 bit alignment for stack starting address.

 oh, I found the old patch
 http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00298.html which changed
 the default alignment to 128bit. Ok, will remove the TARGET_64BIT
 constraint.

 3.  There is no testcase for global dynamic model.

 --
 H.J.

 Will add the testcase.


 I posted a different patch in

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

 --
 H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote:
 Hi H.J.,

 Could you show me why you postpone the setting
 ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
 use ix86_tls_descriptor_calls_expanded_in_cfun instead of
 ix86_current_function_calls_tls_descriptor? Isn't
 ix86_current_function_calls_tls_descriptor useful to consider the case
 that tls call is optimized away?


When a tls call is optimized away, it won't survive reload.
If it does survive reload, it isn't optimized away.  Also
checking df_regs_ever_live_p (SP_REG) isn't reliable
when called from ix86_compute_frame_layout.

-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
Oh, I see. Thanks!

Wei.

On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote:
 Hi H.J.,

 Could you show me why you postpone the setting
 ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
 use ix86_tls_descriptor_calls_expanded_in_cfun instead of
 ix86_current_function_calls_tls_descriptor? Isn't
 ix86_current_function_calls_tls_descriptor useful to consider the case
 that tls call is optimized away?


 When a tls call is optimized away, it won't survive reload.
 If it does survive reload, it isn't optimized away.  Also
 checking df_regs_ever_live_p (SP_REG) isn't reliable
 when called from ix86_compute_frame_layout.

 --
 H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
This is the updated testcase.

Thanks,
Wei.

===
--- testsuite/gcc.dg/pr58066.c (revision 0)
+++ testsuite/gcc.dg/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */
+/* { dg-options -fPIC -O2 } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return ccc2;
+}
+
+/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */

On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi w...@google.com wrote:
 Oh, I see. Thanks!

 Wei.

 On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote:
 Hi H.J.,

 Could you show me why you postpone the setting
 ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
 use ix86_tls_descriptor_calls_expanded_in_cfun instead of
 ix86_current_function_calls_tls_descriptor? Isn't
 ix86_current_function_calls_tls_descriptor useful to consider the case
 that tls call is optimized away?


 When a tls call is optimized away, it won't survive reload.
 If it does survive reload, it isn't optimized away.  Also
 checking df_regs_ever_live_p (SP_REG) isn't reliable
 when called from ix86_compute_frame_layout.

 --
 H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi w...@google.com wrote:
 This is the updated testcase.

Does my patch fix the original problem?

 Thanks,
 Wei.

 ===
 --- testsuite/gcc.dg/pr58066.c (revision 0)
 +++ testsuite/gcc.dg/pr58066.c (revision 0)
 @@ -0,0 +1,18 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */

Since it is a C testcase and we should test it under ia32, it
should be moved to gcc.target/i386 and remove target.

 +/* { dg-options -fPIC -O2 } */
 +
 +/* Check whether the stack frame starting addresses of tls expanded calls
 +   in foo and goo are 16bytes aligned.  */
 +static __thread char ccc1;
 +void* foo()
 +{
 + return ccc1;
 +}
 +
 +__thread char ccc2;
 +void* goo()
 +{
 + return ccc2;
 +}
 +
 +/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */

 On Wed, Mar 12, 2014 at 2:51 PM, Wei Mi w...@google.com wrote:
 Oh, I see. Thanks!

 Wei.

 On Wed, Mar 12, 2014 at 2:42 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 2:36 PM, Wei Mi w...@google.com wrote:
 Hi H.J.,

 Could you show me why you postpone the setting
 ix86_tls_descriptor_calls_expanded_in_cfun until reload_complete and
 use ix86_tls_descriptor_calls_expanded_in_cfun instead of
 ix86_current_function_calls_tls_descriptor? Isn't
 ix86_current_function_calls_tls_descriptor useful to consider the case
 that tls call is optimized away?


 When a tls call is optimized away, it won't survive reload.
 If it does survive reload, it isn't optimized away.  Also
 checking df_regs_ever_live_p (SP_REG) isn't reliable
 when called from ix86_compute_frame_layout.

 --
 H.J.



-- 
H.J.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
On Wed, Mar 12, 2014 at 3:07 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 2:58 PM, Wei Mi w...@google.com wrote:
 This is the updated testcase.

 Does my patch fix the original problem?

Yes, it works. I am doing bootstrap and regression test for your patch. Thanks!


 Thanks,
 Wei.

 ===
 --- testsuite/gcc.dg/pr58066.c (revision 0)
 +++ testsuite/gcc.dg/pr58066.c (revision 0)
 @@ -0,0 +1,18 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  { ! ia32 } } } } */

 Since it is a C testcase and we should test it under ia32, it
 should be moved to gcc.target/i386 and remove target.


Fixed.

Thanks,
Wei.

Index: testsuite/gcc.target/i386/pr58066.c
===
--- testsuite/gcc.target/i386/pr58066.c (revision 0)
+++ testsuite/gcc.target/i386/pr58066.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options -fPIC -O2 } */
+
+/* Check whether the stack frame starting addresses of tls expanded calls
+   in foo and goo are 16bytes aligned.  */
+static __thread char ccc1;
+void* foo()
+{
+ return ccc1;
+}
+
+__thread char ccc2;
+void* goo()
+{
+ return ccc2;
+}
+
+/* { dg-final { scan-assembler-times .cfi_def_cfa_offset 16 2 } } */


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
 Does my patch fix the original problem?

 Yes, it works. I am doing bootstrap and regression test for your patch. 
 Thanks!


The patch passes bootstrap and regression test on x86_64-linux-gnu.

Thanks,
Wei.


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread H.J. Lu
On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi w...@google.com wrote:
 Does my patch fix the original problem?

 Yes, it works. I am doing bootstrap and regression test for your patch. 
 Thanks!


 The patch passes bootstrap and regression test on x86_64-linux-gnu.


My patch fails to handle ia32.  Here is the updated one.

-- 
H.J.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9e33d53..da603b6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -9490,20 +9490,30 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   frame-nregs = ix86_nsaved_regs ();
   frame-nsseregs = ix86_nsaved_sseregs ();
 
-  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
-  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;
-
   /* 64-bit MS ABI seem to require stack alignment to be always 16 except for
  function prologues and leaf.  */
-  if ((TARGET_64BIT_MS_ABI  preferred_alignment  16)
+  if ((TARGET_64BIT_MS_ABI  crtl-preferred_stack_boundary  128)
(!crtl-is_leaf || cfun-calls_alloca != 0
   || ix86_current_function_calls_tls_descriptor))
 {
-  preferred_alignment = 16;
-  stack_alignment_needed = 16;
   crtl-preferred_stack_boundary = 128;
   crtl-stack_alignment_needed = 128;
 }
+  /* preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (ix86_current_function_calls_tls_descriptor
+	crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
+{
+  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
+	crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}
+
+  stack_alignment_needed = crtl-stack_alignment_needed / BITS_PER_UNIT;
+  preferred_alignment = crtl-preferred_stack_boundary / BITS_PER_UNIT;
 
   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ea1d85f..bc8fb1f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12859,13 +12859,14 @@
 
 (define_insn *tls_global_dynamic_32_gnu
   [(set (match_operand:SI 0 register_operand =a)
-	(unspec:SI
-	 [(match_operand:SI 1 register_operand b)
-	  (match_operand 2 tls_symbolic_operand)
-	  (match_operand 3 constant_call_address_operand z)]
-	 UNSPEC_TLS_GD))
-   (clobber (match_scratch:SI 4 =d))
-   (clobber (match_scratch:SI 5 =c))
+	(call:SI
+	 (mem:QI (match_operand 3 constant_call_address_operand z))
+	 (match_operand 4)))
+   (unspec:SI [(match_operand:SI 1 register_operand b)
+	   (match_operand 2 tls_symbolic_operand)]
+	  UNSPEC_TLS_GD)
+   (clobber (match_scratch:SI 5 =d))
+   (clobber (match_scratch:SI 6 =c))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT  TARGET_GNU_TLS
 {
@@ -12885,13 +12886,19 @@
 (define_expand tls_global_dynamic_32
   [(parallel
 [(set (match_operand:SI 0 register_operand)
-	  (unspec:SI [(match_operand:SI 2 register_operand)
-		  (match_operand 1 tls_symbolic_operand)
-		  (match_operand 3 constant_call_address_operand)]
-		 UNSPEC_TLS_GD))
+	  (call:SI
+	   (mem:QI (match_operand 3 constant_call_address_operand))
+	   (const_int 0)))
+ (unspec:SI [(match_operand:SI 2 register_operand)
+		 (match_operand 1 tls_symbolic_operand)]
+		UNSPEC_TLS_GD)
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})
 
 (define_insn *tls_global_dynamic_64_mode
   [(set (match_operand:P 0 register_operand =a)
@@ -12946,16 +12953,20 @@
 	   (const_int 0)))
  (unspec:P [(match_operand 1 tls_symbolic_operand)]
 	   UNSPEC_TLS_GD)])]
-  TARGET_64BIT)
+  TARGET_64BIT
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})
 
 (define_insn *tls_local_dynamic_base_32_gnu
   [(set (match_operand:SI 0 register_operand =a)
-	(unspec:SI
-	 [(match_operand:SI 1 register_operand b)
-	  (match_operand 2 constant_call_address_operand z)]
-	 UNSPEC_TLS_LD_BASE))
-   (clobber (match_scratch:SI 3 =d))
-   (clobber (match_scratch:SI 4 =c))
+	(call:SI
+	 (mem:QI (match_operand 2 constant_call_address_operand z))
+	 (match_operand 3)))
+   (unspec:SI [(match_operand:SI 1 register_operand b)]
+	  UNSPEC_TLS_LD_BASE)
+   (clobber (match_scratch:SI 4 =d))
+   (clobber (match_scratch:SI 5 =c))
(clobber (reg:CC FLAGS_REG))]
   !TARGET_64BIT  TARGET_GNU_TLS
 {
@@ -12976,13 +12987,18 @@
 (define_expand tls_local_dynamic_base_32
   [(parallel
  [(set (match_operand:SI 0 register_operand)
-	   (unspec:SI
-	[(match_operand:SI 1 register_operand)
-	 (match_operand 2 

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-12 Thread Wei Mi
I saw the problem last patch had on ia32. Without explicit call in rtl
template, scheduler may schedule the sp adjusting insn across tls
descriptor and break the alignment assumption.
I am testing the updated patch on x86_64.

Can we combine the last two patches, both adding call explicitly in
rtl template for tls_local_dynamic_base_32/tls_global_dynamic_32, and
set ix86_tls_descriptor_calls_expanded_in_cfun to true only after
reload complete?

Regards,
Wei.

On Wed, Mar 12, 2014 at 5:33 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Wed, Mar 12, 2014 at 5:28 PM, Wei Mi w...@google.com wrote:
 Does my patch fix the original problem?

 Yes, it works. I am doing bootstrap and regression test for your patch. 
 Thanks!


 The patch passes bootstrap and regression test on x86_64-linux-gnu.


 My patch fails to handle ia32.  Here is the updated one.

 --
 H.J.


[PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Hi,

This patch is to fix the problem described here:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

I follow Ian's suggestion and set
ix86_tls_descriptor_calls_expanded_in_cfun in
tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode.
Although 32bit doesn't have the problem,
ix86_tls_descriptor_calls_expanded_in_cfun is also set for
tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
32bits and 64bits.

If ix86_current_function_calls_tls_descriptor is set, we know that
there is tls expanded call in current function. Update
crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be
no less than PREFERED_STACK_ALIGNMENT at the start of
ix86_compute_frame_layout. We don't do the update in
legitimize_tls_address in cfgexpand stage, which is too early because
according to the comments before
ix86_current_function_calls_tls_descriptor, tls call may be optimized
away. ix86_compute_frame_layout is the latest place to do the update.

bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
for trunk if tests pass?

Thanks,
Wei.

gcc/ChangeLog:

2014-03-07  Wei Mi  w...@google.com

* config/i386/i386.c (ix86_compute_frame_layout): Update
preferred_stack_boundary when there is tls expanded call.
* config/i386/i386.md: Set
ix86_tls_descriptor_calls_expanded_in_cfun.

gcc/testsuite/ChangeLog:

2014-03-07  Wei Mi  w...@google.com

* g++.dg/pr58066.C: New test.


Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 208410)
+++ gcc/config/i386/i386.c  (working copy)
@@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
   crtl-preferred_stack_boundary = 128;
   crtl-stack_alignment_needed = 128;
 }
+  /* For 64-bit target, preferred_stack_boundary is never updated for call
+ expanded from tls descriptor. Update it here. We don't update it in
+ expand stage because according to the comments before
+ ix86_current_function_calls_tls_descriptor, tls calls may be optimized
+ away.  */
+  else if (TARGET_64BIT
+   ix86_current_function_calls_tls_descriptor
+   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
+{
+  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
+  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
+   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
+}

   gcc_assert (!size || stack_alignment_needed);
   gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 208410)
+++ gcc/config/i386/i386.md (working copy)
@@ -12891,7 +12891,11 @@
 UNSPEC_TLS_GD))
  (clobber (match_scratch:SI 4))
  (clobber (match_scratch:SI 5))
- (clobber (reg:CC FLAGS_REG))])])
+ (clobber (reg:CC FLAGS_REG))])]
+  
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn *tls_global_dynamic_64_mode
   [(set (match_operand:P 0 register_operand =a)
@@ -12946,7 +12950,10 @@
   (const_int 0)))
  (unspec:P [(match_operand 1 tls_symbolic_operand)]
   UNSPEC_TLS_GD)])]
-  TARGET_64BIT)
+  TARGET_64BIT
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn *tls_local_dynamic_base_32_gnu
   [(set (match_operand:SI 0 register_operand =a)
@@ -12982,7 +12989,11 @@
UNSPEC_TLS_LD_BASE))
   (clobber (match_scratch:SI 3))
   (clobber (match_scratch:SI 4))
-  (clobber (reg:CC FLAGS_REG))])])
+  (clobber (reg:CC FLAGS_REG))])]
+  
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 (define_insn *tls_local_dynamic_base_64_mode
   [(set (match_operand:P 0 register_operand =a)
@@ -13029,7 +13040,10 @@
(mem:QI (match_operand 1))
(const_int 0)))
   (unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])]
-  TARGET_64BIT)
+  TARGET_64BIT
+{
+  ix86_tls_descriptor_calls_expanded_in_cfun = true;
+})

 ;; Local dynamic of a single variable is a lose.  Show combine how
 ;; to convert that back to global dynamic.
Index: gcc/testsuite/g++.dg/pr58066.C
===
--- gcc/testsuite/g++.dg/pr58066.C  (revision 0)
+++ gcc/testsuite/g++.dg/pr58066.C  (revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  lp64 } } } */
+/* { dg-options -fPIC -O2 -m64 } */
+
+/* Check whether the stack frame starting address of tls expanded call
+   in __cxa_get_globals() is 16bytes aligned.  */
+static __thread char ccc;
+extern C void* __cxa_get_globals() throw()
+{
+ return ccc;
+}
+
+/* { dg-final { scan-assembler .cfi_def_cfa_offset 16 } } */


Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Regression test is ok.

Thanks,
Wei.


On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote:
 Hi,

 This patch is to fix the problem described here:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

 I follow Ian's suggestion and set
 ix86_tls_descriptor_calls_expanded_in_cfun in
 tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode.
 Although 32bit doesn't have the problem,
 ix86_tls_descriptor_calls_expanded_in_cfun is also set for
 tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
 ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
 32bits and 64bits.

 If ix86_current_function_calls_tls_descriptor is set, we know that
 there is tls expanded call in current function. Update
 crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be
 no less than PREFERED_STACK_ALIGNMENT at the start of
 ix86_compute_frame_layout. We don't do the update in
 legitimize_tls_address in cfgexpand stage, which is too early because
 according to the comments before
 ix86_current_function_calls_tls_descriptor, tls call may be optimized
 away. ix86_compute_frame_layout is the latest place to do the update.

 bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
 for trunk if tests pass?

 Thanks,
 Wei.

 gcc/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout): Update
 preferred_stack_boundary when there is tls expanded call.
 * config/i386/i386.md: Set
 ix86_tls_descriptor_calls_expanded_in_cfun.

 gcc/testsuite/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * g++.dg/pr58066.C: New test.


 Index: gcc/config/i386/i386.c
 ===
 --- gcc/config/i386/i386.c  (revision 208410)
 +++ gcc/config/i386/i386.c  (working copy)
 @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
crtl-preferred_stack_boundary = 128;
crtl-stack_alignment_needed = 128;
  }
 +  /* For 64-bit target, preferred_stack_boundary is never updated for call
 + expanded from tls descriptor. Update it here. We don't update it in
 + expand stage because according to the comments before
 + ix86_current_function_calls_tls_descriptor, tls calls may be optimized
 + away.  */
 +  else if (TARGET_64BIT
 +   ix86_current_function_calls_tls_descriptor
 +   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
 +{
 +  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 +  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
 +   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
 +}

gcc_assert (!size || stack_alignment_needed);
gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
 Index: gcc/config/i386/i386.md
 ===
 --- gcc/config/i386/i386.md (revision 208410)
 +++ gcc/config/i386/i386.md (working copy)
 @@ -12891,7 +12891,11 @@
  UNSPEC_TLS_GD))
   (clobber (match_scratch:SI 4))
   (clobber (match_scratch:SI 5))
 - (clobber (reg:CC FLAGS_REG))])])
 + (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_global_dynamic_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -12946,7 +12950,10 @@
(const_int 0)))
   (unspec:P [(match_operand 1 tls_symbolic_operand)]
UNSPEC_TLS_GD)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_32_gnu
[(set (match_operand:SI 0 register_operand =a)
 @@ -12982,7 +12989,11 @@
 UNSPEC_TLS_LD_BASE))
(clobber (match_scratch:SI 3))
(clobber (match_scratch:SI 4))
 -  (clobber (reg:CC FLAGS_REG))])])
 +  (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -13029,7 +13040,10 @@
 (mem:QI (match_operand 1))
 (const_int 0)))
(unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  ;; Local dynamic of a single variable is a lose.  Show combine how
  ;; to convert that back to global dynamic.
 Index: gcc/testsuite/g++.dg/pr58066.C
 ===
 --- gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 +++ gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 @@ -0,0 +1,12 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  lp64 } } } */
 +/* { dg-options -fPIC -O2 -m64 } */
 +
 +/* Check whether the stack frame starting address of tls expanded call
 +   in __cxa_get_globals() is 16bytes aligned.  */
 +static __thread char ccc;
 +extern C 

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread H.J. Lu
On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote:
 Hi,

 This patch is to fix the problem described here:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

 I follow Ian's suggestion and set
 ix86_tls_descriptor_calls_expanded_in_cfun in
 tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode.
 Although 32bit doesn't have the problem,
 ix86_tls_descriptor_calls_expanded_in_cfun is also set for
 tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
 ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
 32bits and 64bits.

 If ix86_current_function_calls_tls_descriptor is set, we know that
 there is tls expanded call in current function. Update
 crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be
 no less than PREFERED_STACK_ALIGNMENT at the start of
 ix86_compute_frame_layout. We don't do the update in
 legitimize_tls_address in cfgexpand stage, which is too early because
 according to the comments before
 ix86_current_function_calls_tls_descriptor, tls call may be optimized
 away. ix86_compute_frame_layout is the latest place to do the update.

 bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
 for trunk if tests pass?

 Thanks,
 Wei.

 gcc/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout): Update
 preferred_stack_boundary when there is tls expanded call.
 * config/i386/i386.md: Set
 ix86_tls_descriptor_calls_expanded_in_cfun.

 gcc/testsuite/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * g++.dg/pr58066.C: New test.


 Index: gcc/config/i386/i386.c
 ===
 --- gcc/config/i386/i386.c  (revision 208410)
 +++ gcc/config/i386/i386.c  (working copy)
 @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
crtl-preferred_stack_boundary = 128;
crtl-stack_alignment_needed = 128;
  }
 +  /* For 64-bit target, preferred_stack_boundary is never updated for call
 + expanded from tls descriptor. Update it here. We don't update it in
 + expand stage because according to the comments before
 + ix86_current_function_calls_tls_descriptor, tls calls may be optimized
 + away.  */
 +  else if (TARGET_64BIT
 +   ix86_current_function_calls_tls_descriptor
 +   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
 +{
 +  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 +  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
 +   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
 +}

gcc_assert (!size || stack_alignment_needed);
gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
 Index: gcc/config/i386/i386.md
 ===
 --- gcc/config/i386/i386.md (revision 208410)
 +++ gcc/config/i386/i386.md (working copy)
 @@ -12891,7 +12891,11 @@
  UNSPEC_TLS_GD))
   (clobber (match_scratch:SI 4))
   (clobber (match_scratch:SI 5))
 - (clobber (reg:CC FLAGS_REG))])])
 + (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_global_dynamic_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -12946,7 +12950,10 @@
(const_int 0)))
   (unspec:P [(match_operand 1 tls_symbolic_operand)]
UNSPEC_TLS_GD)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_32_gnu
[(set (match_operand:SI 0 register_operand =a)
 @@ -12982,7 +12989,11 @@
 UNSPEC_TLS_LD_BASE))
(clobber (match_scratch:SI 3))
(clobber (match_scratch:SI 4))
 -  (clobber (reg:CC FLAGS_REG))])])
 +  (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -13029,7 +13040,10 @@
 (mem:QI (match_operand 1))
 (const_int 0)))
(unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  ;; Local dynamic of a single variable is a lose.  Show combine how
  ;; to convert that back to global dynamic.
 Index: gcc/testsuite/g++.dg/pr58066.C
 ===
 --- gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 +++ gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 @@ -0,0 +1,12 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  lp64 } } } */

 ^ It should be not ia32.
since we also want to test it for x32.

 +/* { dg-options -fPIC -O2 -m64 } */
   No need to add -m64.
 +
 +/* Check whether the stack frame starting address 

Re: [PATCH, PR58066] preferred_stack_boundary update for tls expanded call

2014-03-07 Thread Wei Mi
Yes, x32 has the same problem. It should be tested. Fixed.

Thanks,
Wei.


On Fri, Mar 7, 2014 at 2:06 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, Mar 7, 2014 at 1:26 PM, Wei Mi w...@google.com wrote:
 Hi,

 This patch is to fix the problem described here:
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

 I follow Ian's suggestion and set
 ix86_tls_descriptor_calls_expanded_in_cfun in
 tls_global_dynamic_64_mode and tls_local_dynamic_base_64_mode.
 Although 32bit doesn't have the problem,
 ix86_tls_descriptor_calls_expanded_in_cfun is also set for
 tls_global_dynamic_32 and tls_local_dynamic_base_32 to make
 ix86_tls_descriptor_calls_expanded_in_cfun setting consistent across
 32bits and 64bits.

 If ix86_current_function_calls_tls_descriptor is set, we know that
 there is tls expanded call in current function. Update
 crtl-preferred_stack_boundary and crtl-stack_alignment_needed to be
 no less than PREFERED_STACK_ALIGNMENT at the start of
 ix86_compute_frame_layout. We don't do the update in
 legitimize_tls_address in cfgexpand stage, which is too early because
 according to the comments before
 ix86_current_function_calls_tls_descriptor, tls call may be optimized
 away. ix86_compute_frame_layout is the latest place to do the update.

 bootstrap on x86_64-linux-gnu is ok. regression test is going on. Ok
 for trunk if tests pass?

 Thanks,
 Wei.

 gcc/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * config/i386/i386.c (ix86_compute_frame_layout): Update
 preferred_stack_boundary when there is tls expanded call.
 * config/i386/i386.md: Set
 ix86_tls_descriptor_calls_expanded_in_cfun.

 gcc/testsuite/ChangeLog:

 2014-03-07  Wei Mi  w...@google.com

 * g++.dg/pr58066.C: New test.


 Index: gcc/config/i386/i386.c
 ===
 --- gcc/config/i386/i386.c  (revision 208410)
 +++ gcc/config/i386/i386.c  (working copy)
 @@ -9504,6 +9504,19 @@ ix86_compute_frame_layout (struct ix86_f
crtl-preferred_stack_boundary = 128;
crtl-stack_alignment_needed = 128;
  }
 +  /* For 64-bit target, preferred_stack_boundary is never updated for call
 + expanded from tls descriptor. Update it here. We don't update it in
 + expand stage because according to the comments before
 + ix86_current_function_calls_tls_descriptor, tls calls may be optimized
 + away.  */
 +  else if (TARGET_64BIT
 +   ix86_current_function_calls_tls_descriptor
 +   crtl-preferred_stack_boundary  PREFERRED_STACK_BOUNDARY)
 +{
 +  crtl-preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 +  if (crtl-stack_alignment_needed  PREFERRED_STACK_BOUNDARY)
 +   crtl-stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
 +}

gcc_assert (!size || stack_alignment_needed);
gcc_assert (preferred_alignment = STACK_BOUNDARY / BITS_PER_UNIT);
 Index: gcc/config/i386/i386.md
 ===
 --- gcc/config/i386/i386.md (revision 208410)
 +++ gcc/config/i386/i386.md (working copy)
 @@ -12891,7 +12891,11 @@
  UNSPEC_TLS_GD))
   (clobber (match_scratch:SI 4))
   (clobber (match_scratch:SI 5))
 - (clobber (reg:CC FLAGS_REG))])])
 + (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_global_dynamic_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -12946,7 +12950,10 @@
(const_int 0)))
   (unspec:P [(match_operand 1 tls_symbolic_operand)]
UNSPEC_TLS_GD)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_32_gnu
[(set (match_operand:SI 0 register_operand =a)
 @@ -12982,7 +12989,11 @@
 UNSPEC_TLS_LD_BASE))
(clobber (match_scratch:SI 3))
(clobber (match_scratch:SI 4))
 -  (clobber (reg:CC FLAGS_REG))])])
 +  (clobber (reg:CC FLAGS_REG))])]
 +  
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  (define_insn *tls_local_dynamic_base_64_mode
[(set (match_operand:P 0 register_operand =a)
 @@ -13029,7 +13040,10 @@
 (mem:QI (match_operand 1))
 (const_int 0)))
(unspec:P [(const_int 0)] UNSPEC_TLS_LD_BASE)])]
 -  TARGET_64BIT)
 +  TARGET_64BIT
 +{
 +  ix86_tls_descriptor_calls_expanded_in_cfun = true;
 +})

  ;; Local dynamic of a single variable is a lose.  Show combine how
  ;; to convert that back to global dynamic.
 Index: gcc/testsuite/g++.dg/pr58066.C
 ===
 --- gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 +++ gcc/testsuite/g++.dg/pr58066.C  (revision 0)
 @@ -0,0 +1,12 @@
 +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* }  lp64 } } } */

  ^ It should be not ia32.
 since we also want to test it for x32.

 +/* { dg-options