Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)

2013-12-04 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 09:43:51PM -0700, Jeff Law wrote:
 On 12/03/13 15:46, Jakub Jelinek wrote:
 As described in the PR, the problem here is that during combine
 i2 pattern is substituted into more than one place in i3 pattern,
 unique_copy is 0 (but, even if it would be non-zero, it could happen
 if the comparison was processed first before the normal set inside of the
 parallel) and thus the same RTL is (temporarily) shared between two
 locations.  force_to_mode is first called with mask 0xdc36 (that is
 actually find for both occurrences in the andhi_2 pattern) and later on
 inside of the comparison again with mask 0x8000, and as it modifies
 the IF_THEN_ELSE in place, it modifies also the other location (it is fine
 if the comparison uses 0x8000 mask, but not in the other spot).
 As in the end we fold it to a constant, we don't undo it and use incorrect
 constant.
 
 Fixed by making sure force_to_mode doesn't modify x in place.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?
 
 2013-12-03  Jakub Jelinek  ja...@redhat.com
 
  PR rtl-optimization/58726
  * combine.c (force_to_mode): Fix comment typo.  Don't destructively
  modify x for ROTATE, ROTATERT and IF_THEN_ELSE.
 
  * gcc.c-torture/execute/pr58726.c: New test.
 I'd worry there's other latent bugs of this nature and if we'd be
 better off avoiding the temporary sharing.  We have structure
 sharing rules for a reason -- I'd hate to think of all the code that
 would need auditing to ensure it was safe with this bogus sharing.

I'm afraid I'm not familiar with the unwritten rules of combine.c enough
to know whether the above fix is all we need, or if there are further issues
just latent.

 Any thoughts of how painful it'd be to avoid the sharing to start with?

Perhaps most of the subst function could be moved to subst_1, drop the
and subst as a wrapper (with dropped unique_copy argument?) could do
for_each_rtx first to find out how many occurrences of from there are
in x, and if it is more than one, subst_1 then would copy_rtx for each
return of to other than the last one.  IMHO doing copy_rtx unconditionally
would be too expensive for the common case, would create too much garbage.

But it doesn't look like a safe thing to do on the release branches, at
least not until it is for a few months on the trunk.

Jakub


RE: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-04 Thread Bernd Edlinger
On Tue, 3 Dec 2013 15:12:05, Richard Biener wrote:

 On Tue, Dec 3, 2013 at 2:10 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger
 bernd.edlin...@hotmail.de wrote:
 Hi Jeff,

 please find attached the patch (incl. test cases) for the unaligned read 
 BUG that I found while investigating
 on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

 one test case is this one:

 pr57748-3.c:
 /* PR middle-end/57748 */
 /* { dg-do run } */
 /* wrong code in expand_expr_real_1. */

 #include stdlib.h

 extern void abort (void);

 typedef long long V
 __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

 typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));

 struct __attribute__((packed)) T { char c; P s; };

 void __attribute__((noinline, noclone))
 check (P *p)
 {
 if (p-b[0][0] != 3 || p-b[0][1] != 4)
 abort ();
 }

 void __attribute__((noinline, noclone))
 foo (struct T *t)
 {
 V a = { 3, 4 };
 t-s.b[0] = a;
 }

 int
 main ()
 {
 struct T *t = (struct T *) calloc (128, 1);

 foo (t);
 check (t-s);

 free (t);
 return 0;
 }


 and the other one is
 pr57748-4.c:
 /* PR middle-end/57748 */
 /* { dg-do run } */
 /* wrong code in expand_expr_real_1. */

 #include stdlib.h

 extern void abort (void);

 typedef long long V
 __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

 typedef struct S { V b[1]; } P __attribute__((aligned (1)));

 struct __attribute__((packed)) T { char c; P s; };

 void __attribute__((noinline, noclone))
 check (P *p)
 {
 if (p-b[1][0] != 3 || p-b[1][1] != 4)
 abort ();
 }

 void __attribute__((noinline, noclone))
 foo (struct T *t)
 {
 V a = { 3, 4 };
 t-s.b[1] = a;
 }

 int
 main ()
 {
 struct T *t = (struct T *) calloc (128, 1);

 foo (t);
 check (t-s);

 free (t);
 return 0;
 }


 The patch does add a boolean expand_reference parameter to 
 expand_expr_real and
 expand_expr_real_1. I pass true when I intend to use the returned memory 
 context
 as an array reference, instead of a value. At places where mis-aligned 
 values are extracted,
 I do not return a register with the extracted mis-aligned value if 
 expand_reference is true.
 When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
 expand_reference
 to the inner expand_expr_real call. Expand_reference, is pretty much 
 similar to the
 expand_modifier EXPAND_MEMORY.

 Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).

 Ok for trunk?

 It still feels like papering over the underlying issue. Let me have a
 second (or third?) look.

 Few comments on your patch.

 @@ -9520,6 +9526,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 align = get_object_alignment (exp);
 if (modifier != EXPAND_WRITE
  modifier != EXPAND_MEMORY
 +  !expand_reference
  mode != BLKmode
  align  GET_MODE_ALIGNMENT (mode)
 /* If the target does not have special handling for unaligned

 (TARGET_MEM_REF), expand_reference should never be true here,
 there may be no component-refs around TARGET_MEM_REFs.


Ok, I was not sure. Removed - Thanks.

 You miss adjusting the VIEW_CONVERT_EXPR path? (line-numbers
 are off a lot in your patch, context doesn't help very much :/ Does not
 seem to be against 4.8 either ...)


Sorry, The line-numbers moved a lot 100 lines.
The patch is against 4.9-trunk. Re-generated.

 Index: gcc/cfgexpand.c
 ===
 --- gcc/cfgexpand.c (revision 204411)
 +++ gcc/cfgexpand.c (working copy)
 @@ -2189,7 +2189,7 @@ expand_call_stmt (gimple stmt)
 if (lhs)
 expand_assignment (lhs, exp, false);
 else
 - expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL);
 + expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL, false);

 mark_transaction_restart_calls (stmt);
 }

 this should use

 expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);

 anyway.


expand_expr_real_1 is expand_expr_real without the ERROR check?

Ok, changed to expand_expr.


 @@ -10286,7 +10297,10 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 op0 = copy_rtx (op0);
 set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type)));
 }
 - else if (mode != BLKmode
 + else if (modifier != EXPAND_WRITE
 +  modifier != EXPAND_MEMORY
 +  !expand_reference
 +  mode != BLKmode
  MEM_ALIGN (op0)  GET_MODE_ALIGNMENT (mode)
 /* If the target does have special handling for unaligned
 loads of mode then use them. */

 @@ -10307,6 +10321,9 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 return reg;
 }
 else if (STRICT_ALIGNMENT
 +  modifier != EXPAND_WRITE
 +  modifier != EXPAND_MEMORY
 +  !expand_reference
  mode != BLKmode
  MEM_ALIGN (op0)  GET_MODE_ALIGNMENT (mode))
 {

 why the unrelated change to add the modifier checks? Looks like both
 if cases are close by and factoring out common code like

 else if (!expand_reference
  mode != BLKmode
  MEM_ALIGN (...
 {
 if (...)
 else if (STRICT_ALIGNMENT)

 would be better, also matching the other 

Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 06:28:31AM +0100, Konstantin Serebryany wrote:
 We don't have any .cfi stuff in sanitizer_common (and I don't think we
 really need it in the internal_clone).
 Before fixing the tsan sources I'd like to see some indication that
 anyone cares about tsan working on older systems.

Of course various parties care about that.  But that doesn't necessarily
imply they can or want to run buildbots for a different compiler they don't
care about, there are e.g. security implications to that, resources etc.

For GCC new ports etc. we never require people running some buildbots,
people just report issues they encounter and those issues are fixed.

 tsan makes many assumptions about the system (address space, etc)
 which may not hold on old systems anyway.
 And tsan, even if it builds, will not work reliably.
 
 I really think that we need to disable libsanitizer on old systems
 until someone volunteers to set up a proper testing process upstream.
 If no one volunteers -- no one really needs it.

The problem is that the definition of old systems is extremelly fuzzy,
for you it is clearly != Ubuntu 12.04 or similar, but the unwritten
assumptions libsanitizer makes are not related to one exact version of
a single dependent component, it is a matter of compiler, compiler
configuration, binutils, glibc, kernel, kernel headers, ...

So, how do you disable libsanitizer on old systems when it is so fuzzy?
There may be assumptions you just don't know about, and fixing bugreports
by just adding reporter's toolchain combinations to kill lists is certainly
not the way GCC is developed.

Jakub


RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-12-04 Thread Gopalasubramanian, Ganesh
Hi Uros!

Attached is the revised patch.
The target independent part has been already approved and added.

This revision of the patch adds a x86 tune definition and checks it while 
deciding the unroll factor.

Accommodated the comments given by you except one.

 *x will never be null for active insns.
Since every rtx in the insn is checked for memory references, the NULL_RTX 
check is required.

Regards
Ganesh

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Friday, November 22, 2013 1:46 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther richard.guent...@gmail.com 
(richard.guent...@gmail.com); borntrae...@de.ibm.com; H.J. Lu 
(hjl.to...@gmail.com); Jakub Jelinek (ja...@redhat.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

On Wed, Nov 20, 2013 at 7:26 PM, Gopalasubramanian, Ganesh 
ganesh.gopalasubraman...@amd.com wrote:

 Steamroller processors contain a loop predictor and a loop buffer, which may 
 make unrolling small loops less important.
 When unrolling small loops for steamroller, making the unrolled loop fit in 
 the loop buffer should be a priority.

 This patch uses a heuristic approach (number of memory references) to decide 
 the unrolling factor for small loops.
 This patch has some noise in SPEC 2006 results.

 Bootstrapping passes.

 I would like to know your comments before committing.

Please split the patch to target-dependant and target-independant part, and get 
target-idependant part reviewed first.

This part:

+  if (ix86_tune != PROCESSOR_BDVER3  ix86_tune != PROCESSOR_BDVER4)  
+ {
+return nunroll;
+  }

is wrong. You should introduce tune variable (as H.J. suggested) and check that 
variable here. Target dependant tuning options should be in x86-tune.def, so 
everything regarding tuning can be found in one place.

+if (INSN_P (insn)  INSN_CODE (insn) != -1)
+for_each_rtx (insn, (rtx_function) ix86_loop_memcount,
mem_count);

if (NONDEBUG_INSN_P (insn))
  for_each_rtx (PATTERN(insn), ...);

otherwise your heuristics will depend on -g compile option.

+  if ( (mem_count*nunroll) = 32)

Extra parenthesis.

+static int
+ix86_loop_memcount (rtx *x, unsigned *mem_count) {
+  if (*x != NULL_RTX  MEM_P (*x))

*x will never be null for active insns.

Uros.



unroll-adjust.patch
Description: unroll-adjust.patch


Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)

2013-12-04 Thread Uros Bizjak
On Mon, Dec 2, 2013 at 11:58 PM, Jakub Jelinek ja...@redhat.com wrote:

 As discussed in the PR, combiner can combine e.g. unaligned integral
 load (e.g. TImode) together with some SSE instruction that requires aligned
 load, but doesn't actually check it.  For AVX, most of the instructions
 actually allow unaligned operands, except for a few vmov* instructions where
 the pattern typically handle the misaligned mems through misaligned_operand
 checks, and some nontemporal move insns that have UNSPECs that should
 prevent combination.  The following patch attempts to solve this by
 rejecting combining of unaligned memory loads/stores into SSE insns that
 don't allow it.  I've added ssememalign attribute for that, but actually
 only later on realized that even for the insns which load/store  16 byte
 memory values if strict alignment checking isn't turned on in hw, the
 arguments don't have to be aligned at all, so perhaps instead of
 ssememalign in bits all we could have is a boolean attribute whether
 insn requires for pre-AVX memory operands to be as aligned as their mode, or
 not (with default that it does require that).

I think that we should only prevent invalid compiler transformations,
so the proposed approach is correct. In the attached example, the data
is properly aligned, but compiler transforms correct program to an
invalid one. The code, created with the patched compiler is valid,
even if someone enables trap on unaligned setting.

When trap on unaligned is activated, some program should not trap in
different way due to compiler transformations. The optimized and
unoptimized code should run (and trap) in the same way.

However, I don't think ix86_expand_special_args_builtin is needed. It
is the duty of the programmer to pass properly aligned address to
these builtins. The compiler shouldn't magically fix invalid code,
in the same way as it shouldn't break valid code as in the case
above.

Uros.


Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh
ganesh.gopalasubraman...@amd.com wrote:

 Attached is the revised patch.
 The target independent part has been already approved and added.

 This revision of the patch adds a x86 tune definition and checks it while 
 deciding the unroll factor.

 Accommodated the comments given by you except one.

 *x will never be null for active insns.
 Since every rtx in the insn is checked for memory references, the NULL_RTX 
 check is required.

Yes you are correct. for_each_rtx also passes NULL_RTX, I was
distracted by There are no sub-expressions. comment.

+if (NONDEBUG_INSN_P (insn)  INSN_CODE (insn) != -1)

Do you need to check for INSN_CODE here? IIRC, checking for
NONDEBUG_INSN_P is enough.

+for_each_rtx (insn, (rtx_function) ix86_loop_memcount,
mem_count);
+}
+  free (bbs);
+
+  if (mem_count =32)
+return 32/mem_count;

Ouch... mem_count can be zero. Is there a reason to change this part
from previous patch?

Uros.


Re: [PING] 3 patches waiting for approval/review

2013-12-04 Thread Andreas Krebbel
Hi Jeff,

to my understanding a leaf function is characterized by not calling another 
function in its function
body. So for the patch the question is whether we consider the mcount call 
belonging to the function
body or not. As I see it from the backends for all uses of is_leaf the function 
body is what comes
between the function prologue and the epilogue.  Hence calling mcount *before* 
the function prologue
does not speak against the leafness of a function. What do you think?

Bye,

-Andreas-

On 22/10/13 21:28, Andreas Krebbel wrote:
 On 16/10/13 22:25, Jeff Law wrote:
...
 I still really feel this should be a target hook that is off by default 
 so that the target maintainers can audit their target to ensure it 
 operates correctly.

 Maybe I'm missing something, so perhaps another approach.  Can you 
 explain why you think it is safe to ignore calls to mcount when trying 
 to determine if a function is a leaf or not?
 
 In general it is not safe to ignore calls to mcount. But if a target does 
 insert the call to mcount
 before the function prologue then mcount will not use the stack space 
 allocated by the current
 function. It will instead use the stack space allocated by the caller for the 
 current function. The
 current function can still be a leaf function since the call does not happen 
 within the scope of its
 stack frame.
 
 The difference between PROFILE_HOOKS and FUNCTION_PROFILER is that for the 
 first the call to mcount
 is inserted always after the function prologue no matter what the backend 
 returns for the
 profile_before_prologue target hook.



[gomp4] Restore GIMPLE_OACC_PARALLEL functionality (was: r205231 - in /branches/gomp-4_0-branch: ./ Chan...)

2013-12-04 Thread Thomas Schwinge
Hi!

On Thu, 21 Nov 2013 20:20:45 -, ja...@gcc.gnu.org wrote:
 Author: jakub
 Date: Thu Nov 21 20:20:44 2013
 New Revision: 205231
 
 URL: http://gcc.gnu.org/viewcvs?rev=205231root=gccview=rev
 Log:
 svn merge -r204964:205223 svn+ssh://gcc.gnu.org/svn/gcc/trunk

Jakub, many thanks for handling the vast majority of the merge changes!
I only had to fix one additional case, r205658:

gcc/
* gimple.h (is_a_helper): Handle GIMPLE_OACC_PARALLEL.

--- gcc/gimple.h
+++ gcc/gimple.h
@@ -969,7 +969,8 @@ template 
 inline bool
 is_a_helper const gimple_statement_omp_parallel::test (const_gimple gs)
 {
-  return gs-code == GIMPLE_OMP_PARALLEL || gs-code == GIMPLE_OMP_TASK || 
gs-code == GIMPLE_OMP_TARGET;
+  return gs-code == GIMPLE_OMP_PARALLEL || gs-code == GIMPLE_OMP_TASK
+|| gs-code == GIMPLE_OMP_TARGET || gs-code == GIMPLE_OACC_PARALLEL;
 }
 
 template 


Grüße,
 Thomas


pgpKT5E2HMfZD.pgp
Description: PGP signature


Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)

2013-12-04 Thread Eric Botcazou
 Fixed by making sure force_to_mode doesn't modify x in place.

I think that it's the way to go, force_to_mode doesn't modify its argument 
except for these 2 cases.  I'm not sure what the story is, but calling SUBST 
for these 2 cases doesn't seem really necessary.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?
 
 2013-12-03  Jakub Jelinek  ja...@redhat.com
 
   PR rtl-optimization/58726
   * combine.c (force_to_mode): Fix comment typo.  Don't destructively
   modify x for ROTATE, ROTATERT and IF_THEN_ELSE.
 
   * gcc.c-torture/execute/pr58726.c: New test.

IMO it's the best fix at this point of the release cycles.

-- 
Eric Botcazou


Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote:
 When trap on unaligned is activated, some program should not trap in
 different way due to compiler transformations. The optimized and
 unoptimized code should run (and trap) in the same way.

Ok.

Note, seems I have missed a i386/sse-1.c test regression (apparently because
earlier version of the patch failed that test and I have compared
testresults against that instead of earlier regtest), and from extra testing
that recorded all insns which were rejected by the new return false
in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed.
So attached updated patch fixes those two.  I've missed ssememalign
on sse_loadlpd insn where all the alternatives don't #UD on misaligned,
and previously also skipped sse2_load[lh]pd because I was afraid about the
splitters, but the splitter creates a DFmode store, which is always fine
misaligned.

Incremental diff is:
--- gcc/config/i386/i386.c  2013-12-02 19:57:39.116438744 +0100
+++ gcc/config/i386/i386.c  2013-12-04 10:39:43.614269591 +0100
@@ -32563,6 +32543,15 @@
   nargs = 1;
   klass = load;
   memory = 0;
+  switch (icode)
+   {
+   case CODE_FOR_sse4_1_movntdqa:
+   case CODE_FOR_avx2_movntdqa:
+ aligned_mem = true;
+ break;
+   default:
+ break;
+   }
   break;
 case VOID_FTYPE_PV2SF_V4SF:
 case VOID_FTYPE_PV4DI_V4DI:
--- gcc/config/i386/sse.md  2013-12-02 18:02:00.325523050 +0100
+++ gcc/config/i386/sse.md  2013-12-04 11:08:48.589128840 +0100
@@ -5278,6 +5278,7 @@
%vmovlps\t{%2, %0|%q0, %2}
   [(set_attr isa noavx,avx,noavx,avx,*)
(set_attr type sseshuf,sseshuf,ssemov,ssemov,ssemov)
+   (set_attr ssememalign 64)
(set_attr length_immediate 1,1,*,*,*)
(set_attr prefix orig,vex,orig,vex,maybe_vex)
(set_attr mode V4SF,V4SF,V2SF,V2SF,V2SF)])
@@ -7066,6 +7067,7 @@
#
   [(set_attr isa noavx,avx,noavx,avx,*,*,*)
(set_attr type ssemov,ssemov,sselog,sselog,ssemov,fmov,imov)
+   (set_attr ssememalign 64)
(set_attr prefix_data16 1,*,*,*,*,*,*)
(set_attr prefix orig,vex,orig,vex,*,*,*)
(set_attr mode V1DF,V1DF,V2DF,V2DF,DF,DF,DF)])
@@ -7134,6 +7136,7 @@
  (const_string imov)
   ]
   (const_string ssemov)))
+   (set_attr ssememalign 64)
(set_attr prefix_data16 *,1,*,*,*,*,1,*,*,*,*)
(set_attr length_immediate *,*,*,*,*,1,*,*,*,*,*)
(set_attr prefix maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*)

 However, I don't think ix86_expand_special_args_builtin is needed. It
 is the duty of the programmer to pass properly aligned address to
 these builtins. The compiler shouldn't magically fix invalid code,
 in the same way as it shouldn't break valid code as in the case
 above.

What I'm trying to do in ix86_expand_special_args_builtin is not in any way
fixing invalid code.  The problem is that those builtins don't have a memory
as an argument, they have pointer, and the MEM is compiler created from
scratch.  As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT
alignment on those MEMs, so the problem is that the combiner hook will
reject any changes whatsoever on such instructions.  get_pointer_alignment
(added in the patch) adds there alignment info from what the compiler can
find out (e.g.  if the pointer points to a decl with certain alignment
etc.), but if it is e.g.  an argument of a function, it will still return
likely BITS_PER_UNIT even when on valid code it will always be properly
aligned (you'd need to use __builtin_assume_aligned or similar to force the
alignment there).  I went through the special builtins for which gen_rtx_MEM
is called and they apparently fall into just two categories, one is various
builtins for unaligned vector loads or for  16 byte load/store instructions
(type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the
other are non-temporal loads/stores, which both pre-AVX and AVX+ require
strict alignment.  Only for the latter category I'm forcing the mode
alignment, because even if the compiler can't prove correct alignment based
on ccp etc., valid program must have such MEMs properly aligned and if we
create MEMs with smaller alignment, the combine hook will just punt on them
always.

So, ok for trunk this way?  I guess 4.8 patch should wait some time before
being backported.

2013-12-04  Jakub Jelinek  ja...@redhat.com
Uros Bizjak  ubiz...@gmail.com

PR target/59163
* config/i386/i386.c (ix86_legitimate_combined_insn): If for
!TARGET_AVX there is misaligned MEM operand with vector mode
and get_attr_ssememalign is 0, return false.
(ix86_expand_special_args_builtin): Add get_pointer_alignment
computed alignment and for non-temporal loads/stores also
at least GET_MODE_ALIGNMENT as MEM_ALIGN.
* config/i386/sse.md
(sse_loadussemodesuffixavxsizesuffixmask_name,

Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)

2013-12-04 Thread Eric Botcazou
 I'd worry there's other latent bugs of this nature and if we'd be better
 off avoiding the temporary sharing.  We have structure sharing rules for
 a reason -- I'd hate to think of all the code that would need auditing
 to ensure it was safe with this bogus sharing.

I wouldn't throw the baby with the bath's water here, it's one of the numerous 
PRs opened by Zhendong Su and which clearly look machine-generated.  It fails 
with 4.4.x onwards and apparently nobody noticed the problem in real life.
We know that we have latent sharing issues in the combiner because of the way 
it's designed, but they are quite rare in practice.

-- 
Eric Botcazou


Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Richard Biener
On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
 Yufeng Zhang yufeng.zh...@arm.com wrote:
 On 12/03/13 14:20, Richard Biener wrote:
  On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
 wrote:
  On 12/03/13 06:48, Jeff Law wrote:
 
  On 12/02/13 08:47, Yufeng Zhang wrote:
 
  Ping~
 
  http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
 
 
 
  Thanks,
  Yufeng
 
  On 11/26/13 15:02, Yufeng Zhang wrote:
 
  On 11/26/13 12:45, Richard Biener wrote:
 
  On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
  Zhangyufeng.zh...@arm.com wrote:
 
  On 11/13/13 20:54, Bill Schmidt wrote:
 
  The second version of your original patch is ok with me with
 the
  following changes.  Sorry for the little side adventure into
 the
  next-interp logic; in the end that's going to hurt more than
 it
  helps in
  this case.  Thanks for having a look at it, anyway.  Thanks
 also for
  cleaning up this version to be less intrusive to common
 interfaces; I
  appreciate it.
 
 
 
  Thanks a lot for the review.  I've attached an updated patch
 with the
  suggested changes incorporated.
 
  For the next-interp adventure, I was quite happy to do the
  experiment; it's
  a good chance of gaining insight into the pass.  Many thanks
 for
  your prompt
  replies and patience in guiding!
 
 
  Everything else looks OK to me.  Please ask Richard for final
  approval,
  as I'm not a maintainer.
 
  First a note, I need to check on voting for Bill as the slsr
 maintainer
  from the steering committee.   Voting was in progress just before
 the
  close of stage1 development so I haven't tallied the results :-)
 
 
  Looking forward to some good news! :)
 
 
 
  Yes, you are right about the non-trivial 'base' tree are rarely
 shared.
   The cached is introduced mainly because get_alternative_base
 () may
  be
  called twice on the same 'base' tree, once in the
  find_basis_for_candidate () for look-up and the other time in
  alloc_cand_and_find_basis () for record_potential_basis ().  I'm
 happy
  to leave out the cache if you think the benefit is trivial.
 
  Without some sense of how expensive the lookups are vs how often
 the
  cache hits it's awful hard to know if the cache is worth it.
 
  I'd say take it out unless you have some sense it's really saving
 time.
  It's a pretty minor implementation detail either way.
 
 
  I think the affine tree routines are generally expensive; it is
 worth having
  a cache to avoid calling them too many times.  I run the slsr-*.c
 tests
  under gcc.dg/tree-ssa/ and find out that the cache hit rates range
 from
  55.6% to 90%, with 73.5% as the average.  The samples may not well
 represent
  the real world scenario, but they do show the fact that the 'base'
 tree can
  be shared to some extent.  So I'd like to have the cache in the
 patch.
 
 
 
 
  +/* { dg-do compile } */
  +/* { dg-options -O2 -fdump-tree-slsr } */
  +
  +typedef int arr_2[50][50];
  +
  +void foo (arr_2 a2, int v1)
  +{
  +  int i, j;
  +
  +  i = v1 + 5;
  +  j = i;
  +  a2 [i-10] [j] = 2;
  +  a2 [i] [j++] = i;
  +  a2 [i+20] [j++] = i;
  +  a2 [i-3] [i-1] += 1;
  +  return;
  +}
  +
  +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
  +/* { dg-final { cleanup-tree-dump slsr } } */
 
  scanning for 5 MEMs looks non-sensical.  What transform do
  you expect?  I see other slsr testcases do similar non-sensical
  checking which is bad, too.
 
 
  As the slsr optimizes CAND_REF candidates by simply lowering them
 to
  MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
 MEM_REFs
  is an effective check.  Alternatively, I can add a follow-up
 patch to
  add some dumping facility in replace_ref () to print out the
 replacing
  actions when -fdump-tree-slsr-details is on.
 
  I think adding some details to the dump and scanning for them would
 be
  better.  That's the only change that is required for this to move
 forward.
 
 
  I've updated to patch to dump more details when
 -fdump-tree-slsr-details is
  on.  The tests have also been updated to scan for these new dumps
 instead of
  MEMs.
 
 
 
  I suggest doing it quickly.  We're well past stage1 close at this
 point.
 
 
  The bootstrapping on x86_64 is still running.  OK to commit if it
 succeeds?
 
  I still don't like it.  It's using the wrong and too expensive tools
 to do
  stuff.  What kind of bases are we ultimately interested in?  Browsing
  the code it looks like we're having
 
 /* Base expression for the chain of candidates:  often, but not
always, an SSA name.  */
 tree base_expr;
 
  which isn't really too informative but I suppose they are all
  kind-of-gimple_val()s?  That said, I wonder if you can simply
  use get_addr_base_and_unit_offset in place of get_alternative_base
 (),
  ignoring the returned offset.
 
 'base_expr' is essentially the base address of a handled_component_p,
 e.g. ARRAY_REF, COMPONENT_REF, etc.  In most 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Richard Biener
On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
 Yufeng Zhang yufeng.zh...@arm.com wrote:
 On 12/03/13 14:20, Richard Biener wrote:
  On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
 wrote:
  On 12/03/13 06:48, Jeff Law wrote:
 
  On 12/02/13 08:47, Yufeng Zhang wrote:
 
  Ping~
 
  http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
 
 
 
  Thanks,
  Yufeng
 
  On 11/26/13 15:02, Yufeng Zhang wrote:
 
  On 11/26/13 12:45, Richard Biener wrote:
 
  On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
  Zhangyufeng.zh...@arm.com wrote:
 
  On 11/13/13 20:54, Bill Schmidt wrote:
 
  The second version of your original patch is ok with me with
 the
  following changes.  Sorry for the little side adventure into
 the
  next-interp logic; in the end that's going to hurt more than
 it
  helps in
  this case.  Thanks for having a look at it, anyway.  Thanks
 also for
  cleaning up this version to be less intrusive to common
 interfaces; I
  appreciate it.
 
 
 
  Thanks a lot for the review.  I've attached an updated patch
 with the
  suggested changes incorporated.
 
  For the next-interp adventure, I was quite happy to do the
  experiment; it's
  a good chance of gaining insight into the pass.  Many thanks
 for
  your prompt
  replies and patience in guiding!
 
 
  Everything else looks OK to me.  Please ask Richard for final
  approval,
  as I'm not a maintainer.
 
  First a note, I need to check on voting for Bill as the slsr
 maintainer
  from the steering committee.   Voting was in progress just before
 the
  close of stage1 development so I haven't tallied the results :-)
 
 
  Looking forward to some good news! :)
 
 
 
  Yes, you are right about the non-trivial 'base' tree are rarely
 shared.
   The cached is introduced mainly because get_alternative_base
 () may
  be
  called twice on the same 'base' tree, once in the
  find_basis_for_candidate () for look-up and the other time in
  alloc_cand_and_find_basis () for record_potential_basis ().  I'm
 happy
  to leave out the cache if you think the benefit is trivial.
 
  Without some sense of how expensive the lookups are vs how often
 the
  cache hits it's awful hard to know if the cache is worth it.
 
  I'd say take it out unless you have some sense it's really saving
 time.
  It's a pretty minor implementation detail either way.
 
 
  I think the affine tree routines are generally expensive; it is
 worth having
  a cache to avoid calling them too many times.  I run the slsr-*.c
 tests
  under gcc.dg/tree-ssa/ and find out that the cache hit rates range
 from
  55.6% to 90%, with 73.5% as the average.  The samples may not well
 represent
  the real world scenario, but they do show the fact that the 'base'
 tree can
  be shared to some extent.  So I'd like to have the cache in the
 patch.
 
 
 
 
  +/* { dg-do compile } */
  +/* { dg-options -O2 -fdump-tree-slsr } */
  +
  +typedef int arr_2[50][50];
  +
  +void foo (arr_2 a2, int v1)
  +{
  +  int i, j;
  +
  +  i = v1 + 5;
  +  j = i;
  +  a2 [i-10] [j] = 2;
  +  a2 [i] [j++] = i;
  +  a2 [i+20] [j++] = i;
  +  a2 [i-3] [i-1] += 1;
  +  return;
  +}
  +
  +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
  +/* { dg-final { cleanup-tree-dump slsr } } */
 
  scanning for 5 MEMs looks non-sensical.  What transform do
  you expect?  I see other slsr testcases do similar non-sensical
  checking which is bad, too.
 
 
  As the slsr optimizes CAND_REF candidates by simply lowering them
 to
  MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
 MEM_REFs
  is an effective check.  Alternatively, I can add a follow-up
 patch to
  add some dumping facility in replace_ref () to print out the
 replacing
  actions when -fdump-tree-slsr-details is on.
 
  I think adding some details to the dump and scanning for them would
 be
  better.  That's the only change that is required for this to move
 forward.
 
 
  I've updated to patch to dump more details when
 -fdump-tree-slsr-details is
  on.  The tests have also been updated to scan for these new dumps
 instead of
  MEMs.
 
 
 
  I suggest doing it quickly.  We're well past stage1 close at this
 point.
 
 
  The bootstrapping on x86_64 is still running.  OK to commit if it
 succeeds?
 
  I still don't like it.  It's using the wrong and too expensive tools
 to do
  stuff.  What kind of bases are we ultimately interested in?  Browsing
  the code it looks like we're having
 
 /* Base expression for the chain of candidates:  often, but not
always, an SSA name.  */
 tree base_expr;
 
  which isn't really too informative but I suppose they are all
  kind-of-gimple_val()s?  That said, I wonder if you can simply
  use get_addr_base_and_unit_offset in place of get_alternative_base
 (),
  ignoring the returned offset.
 
 'base_expr' is essentially the 

RE: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

2013-12-04 Thread Gopalasubramanian, Ganesh
 Ouch... mem_count can be zero. Is there a reason to change this part from 
 previous patch?

Oops!  You're right. I will correct this.

The idea is to count the memory references and decide on the unrolling factor.
Previous patch does that in two steps I thought of doing that in a single step. 
(I think I missed my step here ;) )

Regards
Ganesh

-Original Message-
From: Uros Bizjak [mailto:ubiz...@gmail.com] 
Sent: Wednesday, December 04, 2013 3:17 PM
To: Gopalasubramanian, Ganesh
Cc: gcc-patches@gcc.gnu.org; Richard Guenther richard.guent...@gmail.com 
(richard.guent...@gmail.com)
Subject: Re: [RFC] [PATCH, i386] Adjust unroll factor for bdver3 and bdver4

On Wed, Dec 4, 2013 at 9:39 AM, Gopalasubramanian, Ganesh 
ganesh.gopalasubraman...@amd.com wrote:

 Attached is the revised patch.
 The target independent part has been already approved and added.

 This revision of the patch adds a x86 tune definition and checks it while 
 deciding the unroll factor.

 Accommodated the comments given by you except one.

 *x will never be null for active insns.
 Since every rtx in the insn is checked for memory references, the NULL_RTX 
 check is required.

Yes you are correct. for_each_rtx also passes NULL_RTX, I was distracted by 
There are no sub-expressions. comment.

+if (NONDEBUG_INSN_P (insn)  INSN_CODE (insn) != -1)

Do you need to check for INSN_CODE here? IIRC, checking for NONDEBUG_INSN_P is 
enough.

+for_each_rtx (insn, (rtx_function) ix86_loop_memcount,
mem_count);
+}
+  free (bbs);
+
+  if (mem_count =32)
+return 32/mem_count;

Ouch... mem_count can be zero. Is there a reason to change this part from 
previous patch?

Uros.




Re: Fix a bug in points-to solver

2013-12-04 Thread Richard Biener
On Tue, Dec 3, 2013 at 11:54 PM, Xinliang David Li davi...@google.com wrote:
 Done. Retested with the suggested change.

 Ok for trunk?

Ok.

Thanks,
Richard.

 thanks,

 David

 On Tue, Dec 3, 2013 at 2:13 AM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Mon, Dec 2, 2013 at 6:38 PM, Xinliang David Li davi...@google.com wrote:
 Points to solver has a bug that can cause complex constraints to be
 skipped leading to wrong points-to results. In the case that exposed
 the problem, there is sd constraint: x = *y which is never processed.
 'y''s final points to set is { NULL READONLY ESCAPED NOLOCAL}, but 'x'
 points-to set is {}.

 What happens is before 'y'' is processed, it is merged with another
 node 'z' during cycle elimination (the complex constraints get
 transferred to 'z'), but 'z' is not marked as 'changed' so it is
 skipped in a later iteration.

 The attached patch fixed the problem. The problem is exposed by a
 large program built with -fprofile-generate in LIPO mode -- so there
 is no small testcase attached.

 Bootstrapped and regression tested on x86_64-unknown-linux-gnu, OK for 
 trunk?

 Hmm, the unify_nodes call in eliminate_indirect_cycles is supposed to
 set the changed bit...  which in this special case (updating of complex
 constraints, not the solution!) doesn't happen because unify_nodes doesn't
 consider this as a change.  Which needs to change.

 So, can you please update your patch to return a bool from
 merge_node_constraints (any change happened?) and update changed
 accordingly in unify_nodes?

 Thanks,
 Richard.

 Index: ChangeLog
 ===
 --- ChangeLog   (revision 205579)
 +++ ChangeLog   (working copy)
 @@ -1,3 +1,8 @@
 +2013-12-02  Xinliang David Li  davi...@google.com
 +
 +   * tree-ssa-structalias.c (solve_graph): Mark rep node changed
 +   after cycle elimination.
 +
  2013-12-01  Eric Botcazou  ebotca...@adacore.com

 * config/i386/winnt.c (i386_pe_asm_named_section): Be prepared for 
 an
 Index: tree-ssa-structalias.c
 ===
 --- tree-ssa-structalias.c  (revision 205579)
 +++ tree-ssa-structalias.c  (working copy)
 @@ -2655,8 +2655,13 @@ solve_graph (constraint_graph_t graph)

   /* In certain indirect cycle cases, we may merge this
  variable to another.  */
 - if (eliminate_indirect_cycles (i)  find (i) != i)
 -   continue;
 + if (eliminate_indirect_cycles (i))
 +{
 + unsigned int rep = find (i);
 + bitmap_set_bit (changed, rep);
 + if (i != rep)
 +   continue;
 +}

   /* If the node has changed, we need to process the
  complex constraints and outgoing edges again.  */


Re: Add TREE_INT_CST_OFFSET_NUNITS

2013-12-04 Thread Richard Biener
On Wed, Dec 4, 2013 at 1:03 AM, Richard Sandiford
rdsandif...@googlemail.com wrote:
 Richard Biener richard.guent...@gmail.com writes:
 Looking at the implementation it seems it would also work with

return MIN (TREE_INT_CST_EXT_NUNITS (m_t), N / HOST_BITS_PER_WIDE_INT);

 Yeah, the MIN in the patch was probably bogus sorry.  It only works
 if we can assume that no bitsizetype will have ADDR_MAX_PRECISION
 significant (non-sign) bits -- in particular that there's no such thing as
 an all-1s _unsigned_ bitsizetype.  That might be true in practice given
 the way we use offsets, but it isn't safe to generalise that to all N.

 A safer form would be:

if (ext_len  OFFSET_INT_ELTS)
  TREE_INT_CST_OFFSET_NUNITS (t) = len;
else
  TREE_INT_CST_OFFSET_NUNITS (t) = ext_len;

 The reason the general form doesn't work for all N is because of the
 compressed representation.  E.g. the example I gave a while ago about
 a 256-bit all-1s unsigned number being { -1 } as a 256-bit number and
 { -1, -1, -1, -1, 0 } as a 257+-bit number.

 But the point of the patch is to avoid any runtime checks here,
 so the TYPE_PRECISION case is never actually used now.  I just kept
 it around for completeness, since we'd been using it successfully so far.
 I can put in a gcc_unreachable if you prefer.

 Yeah, I'd prefer a gcc_unreachable and a comment.

 OK, how about this version?  Tested on x86_64-linux-gnu.

Ok.

Thanks,
Richard.

 Thanks,
 Richard


 Index: gcc/ChangeLog.wide-int
 ===
 --- gcc/ChangeLog.wide-int  2013-12-03 23:55:26.142873345 +
 +++ gcc/ChangeLog.wide-int  2013-12-03 23:59:18.823744425 +
 @@ -617,6 +617,7 @@
 (TREE_INT_CST_HIGH): Delete.
 (TREE_INT_CST_NUNITS): New.
 (TREE_INT_CST_EXT_NUNITS): Likewise.
 +   (TREE_INT_CST_OFFSET_NUNITS): Likewise.
 (TREE_INT_CST_ELT): Likewise.
 (INT_CST_LT): Use wide-int interfaces.
 (INT_CST_LE): New.
 Index: gcc/tree-core.h
 ===
 --- gcc/tree-core.h 2013-12-03 23:55:26.142873345 +
 +++ gcc/tree-core.h 2013-12-04 00:02:22.910222722 +
 @@ -764,11 +764,17 @@ struct GTY(()) tree_base {
  struct {
/* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in
  its native precision.  */
 -  unsigned short unextended;
 +  unsigned char unextended;

/* The number of HOST_WIDE_INTs if the INTEGER_CST is extended to
  wider precisions based on its TYPE_SIGN.  */
 -  unsigned short extended;
 +  unsigned char extended;
 +
 +  /* The number of HOST_WIDE_INTs if the INTEGER_CST is accessed in
 +offset_int precision, with smaller integers being extended
 +according to their TYPE_SIGN.  This is equal to one of the two
 +fields above but is cached for speed.  */
 +  unsigned char offset;
  } int_length;

  /* VEC length.  This field is only used with TREE_VEC.  */
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  2013-12-03 23:55:26.142873345 +
 +++ gcc/tree.c  2013-12-03 23:59:18.821744409 +
 @@ -1285,6 +1285,7 @@ wide_int_to_tree (tree type, const wide_
 /* Make sure no one is clobbering the shared constant.  */
 gcc_checking_assert (TREE_TYPE (t) == type
   TREE_INT_CST_NUNITS (t) == 1
 + TREE_INT_CST_OFFSET_NUNITS (t) == 1
   TREE_INT_CST_EXT_NUNITS (t) == 1
   TREE_INT_CST_ELT (t, 0) == hwi);
   else
 @@ -1964,6 +1965,13 @@ make_int_cst_stat (int len, int ext_len
TREE_SET_CODE (t, INTEGER_CST);
TREE_INT_CST_NUNITS (t) = len;
TREE_INT_CST_EXT_NUNITS (t) = ext_len;
 +  /* to_offset can only be applied to trees that are offset_int-sized
 + or smaller.  EXT_LEN is correct if it fits, otherwise the constant
 + must be exactly the precision of offset_int and so LEN is correct.  */
 +  if (ext_len = OFFSET_INT_ELTS)
 +TREE_INT_CST_OFFSET_NUNITS (t) = ext_len;
 +  else
 +TREE_INT_CST_OFFSET_NUNITS (t) = len;

TREE_CONSTANT (t) = 1;

 Index: gcc/tree.h
 ===
 --- gcc/tree.h  2013-12-03 23:55:26.142873345 +
 +++ gcc/tree.h  2013-12-04 00:01:48.258944485 +
 @@ -907,6 +907,8 @@ #define TREE_INT_CST_NUNITS(NODE) \
(INTEGER_CST_CHECK (NODE)-base.u.int_length.unextended)
  #define TREE_INT_CST_EXT_NUNITS(NODE) \
(INTEGER_CST_CHECK (NODE)-base.u.int_length.extended)
 +#define TREE_INT_CST_OFFSET_NUNITS(NODE) \
 +  (INTEGER_CST_CHECK (NODE)-base.u.int_length.offset)
  #define TREE_INT_CST_ELT(NODE, I) TREE_INT_CST_ELT_CHECK (NODE, I)
  #define TREE_INT_CST_LOW(NODE) \
((unsigned HOST_WIDE_INT) TREE_INT_CST_ELT (NODE, 0))
 @@ -4623,11 +4625,15 @@ 

Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)

2013-12-04 Thread Richard Biener
On Wed, Dec 4, 2013 at 11:07 AM, Eric Botcazou ebotca...@adacore.com wrote:
 Fixed by making sure force_to_mode doesn't modify x in place.

 I think that it's the way to go, force_to_mode doesn't modify its argument
 except for these 2 cases.  I'm not sure what the story is, but calling SUBST
 for these 2 cases doesn't seem really necessary.

 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

 2013-12-03  Jakub Jelinek  ja...@redhat.com

   PR rtl-optimization/58726
   * combine.c (force_to_mode): Fix comment typo.  Don't destructively
   modify x for ROTATE, ROTATERT and IF_THEN_ELSE.

   * gcc.c-torture/execute/pr58726.c: New test.

 IMO it's the best fix at this point of the release cycles.

I agree.

Richard.

 --
 Eric Botcazou


Re: [PATCH] Fix SSE (pre-AVX) alignment handling (PR target/59163)

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 11:15 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Dec 04, 2013 at 10:20:20AM +0100, Uros Bizjak wrote:
 When trap on unaligned is activated, some program should not trap in
 different way due to compiler transformations. The optimized and
 unoptimized code should run (and trap) in the same way.

 Ok.

 Note, seems I have missed a i386/sse-1.c test regression (apparently because
 earlier version of the patch failed that test and I have compared
 testresults against that instead of earlier regtest), and from extra testing
 that recorded all insns which were rejected by the new return false
 in the ix86_legitimate_combined_insn hook also sse4_1-movntdqa.c regressed.
 So attached updated patch fixes those two.  I've missed ssememalign
 on sse_loadlpd insn where all the alternatives don't #UD on misaligned,
 and previously also skipped sse2_load[lh]pd because I was afraid about the
 splitters, but the splitter creates a DFmode store, which is always fine
 misaligned.

 Incremental diff is:
 --- gcc/config/i386/i386.c  2013-12-02 19:57:39.116438744 +0100
 +++ gcc/config/i386/i386.c  2013-12-04 10:39:43.614269591 +0100
 @@ -32563,6 +32543,15 @@
nargs = 1;
klass = load;
memory = 0;
 +  switch (icode)
 +   {
 +   case CODE_FOR_sse4_1_movntdqa:
 +   case CODE_FOR_avx2_movntdqa:
 + aligned_mem = true;
 + break;
 +   default:
 + break;
 +   }
break;
  case VOID_FTYPE_PV2SF_V4SF:
  case VOID_FTYPE_PV4DI_V4DI:
 --- gcc/config/i386/sse.md  2013-12-02 18:02:00.325523050 +0100
 +++ gcc/config/i386/sse.md  2013-12-04 11:08:48.589128840 +0100
 @@ -5278,6 +5278,7 @@
 %vmovlps\t{%2, %0|%q0, %2}
[(set_attr isa noavx,avx,noavx,avx,*)
 (set_attr type sseshuf,sseshuf,ssemov,ssemov,ssemov)
 +   (set_attr ssememalign 64)
 (set_attr length_immediate 1,1,*,*,*)
 (set_attr prefix orig,vex,orig,vex,maybe_vex)
 (set_attr mode V4SF,V4SF,V2SF,V2SF,V2SF)])
 @@ -7066,6 +7067,7 @@
 #
[(set_attr isa noavx,avx,noavx,avx,*,*,*)
 (set_attr type ssemov,ssemov,sselog,sselog,ssemov,fmov,imov)
 +   (set_attr ssememalign 64)
 (set_attr prefix_data16 1,*,*,*,*,*,*)
 (set_attr prefix orig,vex,orig,vex,*,*,*)
 (set_attr mode V1DF,V1DF,V2DF,V2DF,DF,DF,DF)])
 @@ -7134,6 +7136,7 @@
   (const_string imov)
]
(const_string ssemov)))
 +   (set_attr ssememalign 64)
 (set_attr prefix_data16 *,1,*,*,*,*,1,*,*,*,*)
 (set_attr length_immediate *,*,*,*,*,1,*,*,*,*,*)
 (set_attr prefix maybe_vex,orig,vex,orig,vex,orig,orig,vex,*,*,*)

 However, I don't think ix86_expand_special_args_builtin is needed. It
 is the duty of the programmer to pass properly aligned address to
 these builtins. The compiler shouldn't magically fix invalid code,
 in the same way as it shouldn't break valid code as in the case
 above.

 What I'm trying to do in ix86_expand_special_args_builtin is not in any way
 fixing invalid code.  The problem is that those builtins don't have a memory
 as an argument, they have pointer, and the MEM is compiler created from
 scratch.  As it uses just gen_rtx_MEM, it always means just BITS_PER_UNIT
 alignment on those MEMs, so the problem is that the combiner hook will
 reject any changes whatsoever on such instructions.  get_pointer_alignment
 (added in the patch) adds there alignment info from what the compiler can
 find out (e.g.  if the pointer points to a decl with certain alignment
 etc.), but if it is e.g.  an argument of a function, it will still return
 likely BITS_PER_UNIT even when on valid code it will always be properly
 aligned (you'd need to use __builtin_assume_aligned or similar to force the
 alignment there).  I went through the special builtins for which gen_rtx_MEM
 is called and they apparently fall into just two categories, one is various
 builtins for unaligned vector loads or for  16 byte load/store instructions
 (type 5 in AVX spec, so no #UD on misalign, at most #AC if enabled), and the
 other are non-temporal loads/stores, which both pre-AVX and AVX+ require
 strict alignment.  Only for the latter category I'm forcing the mode
 alignment, because even if the compiler can't prove correct alignment based
 on ccp etc., valid program must have such MEMs properly aligned and if we
 create MEMs with smaller alignment, the combine hook will just punt on them
 always.

Thanks for the explanation! Maybe you should add a short comment like
the above in the code.

 So, ok for trunk this way?  I guess 4.8 patch should wait some time before
 being backported.

 2013-12-04  Jakub Jelinek  ja...@redhat.com
 Uros Bizjak  ubiz...@gmail.com

 PR target/59163
 * config/i386/i386.c (ix86_legitimate_combined_insn): If for
 !TARGET_AVX there is misaligned MEM operand with vector mode
 and get_attr_ssememalign is 0, return false.
 (ix86_expand_special_args_builtin): Add 

[Ada] Fix corrupted string with case expression and concatenation

2013-12-04 Thread Eric Botcazou
In Ada 2012 we now have case expressions, i.e. conditional expressions with 
more than 2 choices, and this exposes an issue with the way such conditional 
constructs are translated in gigi, resulting in released stack storage being 
used in some cases.

Tested on x86_64-suse-linux, applied on the mainline.


2013-12-04  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/trans.c (Case_Statement_to_gnu): Do not push a binding
level for each branch if this is a case expression in Ada 2012.
(gnat_to_gnu) case N_Expression_With_Actions: Adjust comment.


-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 205654)
+++ gcc-interface/trans.c	(working copy)
@@ -2348,12 +2348,17 @@ Case_Statement_to_gnu (Node_Id gnat_node
 	}
 	}
 
-  /* Push a binding level here in case variables are declared as we want
-	 them to be local to this set of statements instead of to the block
-	 containing the Case statement.  */
+  /* This construct doesn't define a scope so we shouldn't push a binding
+	 level around the statement list.  Except that we have always done so
+	 historically and this makes it possible to reduce stack usage.  As a
+	 compromise, we keep doing it for case statements, for which this has
+	 never been problematic, but not for case expressions in Ada 2012.  */
   if (choices_added_p)
 	{
-	  tree group = build_stmt_group (Statements (gnat_when), true);
+	  const bool is_case_expression
+	= (Nkind (Parent (gnat_node)) == N_Expression_With_Actions);
+	  tree group
+	= build_stmt_group (Statements (gnat_when), !is_case_expression);
 	  bool group_may_fallthru = block_may_fallthru (group);
 	  add_stmt (group);
 	  if (group_may_fallthru)
@@ -7002,8 +7007,8 @@ gnat_to_gnu (Node_Id gnat_node)
 //
 
 case N_Expression_With_Actions:
-  /* This construct doesn't define a scope so we don't wrap the statement
-	 list in a BIND_EXPR; however, we wrap it in a SAVE_EXPR to protect it
+  /* This construct doesn't define a scope so we don't push a binding level
+	 around the statement list; but we wrap it in a SAVE_EXPR to protect it
 	 from unsharing.  */
   gnu_result = build_stmt_group (Actions (gnat_node), false);
   gnu_result = build1 (SAVE_EXPR, void_type_node, gnu_result);


Re: [PATCH][ARM]Use of vcvt for float to fixed point conversions.

2013-12-04 Thread Ramana Radhakrishnan

Sorry about the slow response. Been on holiday.

On 20/11/13 16:27, Renlin Li wrote:

Hi all,

This patch will make the arm back-end use vcvt for float to fixed point
conversions when applicable.

Test on arm-none-linux-gnueabi has been done on the model.
Okay for trunk?





+ (define_insn *combine_vcvtf2i
+   [(set (match_operand:SI 0 s_register_operand =r)
+   (fix:SI (fix:SF (mult:SF (match_operand:SF 1 s_register_operand t)
+(match_operand 2
+const_double_vcvt_power_of_two Dp)]
+   TARGET_32BIT  TARGET_HARD_FLOAT  TARGET_VFP3  !flag_rounding_math
+   vcvt%?.s32.f32\\t%1, %1, %v2\;vmov%?\\t%0, %1
+   [(set_attr predicable yes)
+(set_attr predicable_short_it no)
+(set_attr ce_count 2)
+(set_attr type f_cvtf2i)]
+ )
+


You need to set length to 8.


--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/fixed_float_conversion.c
@@ -0,0 +1,15 @@
+/* Check that vcvt is used for fixed and float data conversions.  */
+/* { dg-do compile } */
+/* { dg-options -O1 -mfpu=vfp3 } */
+/* { dg-require-effective-target arm_vfp_ok } */
+float fixed_to_float(int i)
+{
+return ((float)i / (1  16));
+}
+
+int float_to_fixed(float f)
+{
+return ((int)(f*(1  16)));
+}
+/* { dg-final { scan-assembler vcvt.f32.s32 } } */
+/* { dg-final { scan-assembler vcvt.s32.f32 } } */



GNU coding style for functions.

Ok with those changes.




regards
Ramana




Kind regards,
Renlin Li


gcc/ChangeLog:

2013-11-20  Renlin Li  renlin...@arm.com

  * config/arm/arm-protos.h (vfp_const_double_for_bits): Declare.
  * config/arm/constraints.md (Dp): Define new constraint.
  * config/arm/predicates.md ( const_double_vcvt_power_of_two): Define
  new predicate.
  * config/arm/arm.c (arm_print_operand): Add print for new fucntion.
  (vfp3_const_double_for_bits): New function.
  * config/arm/vfp.md (combine_vcvtf2i): Define new instruction.

gcc/testsuite/ChangeLog:

2013-11-20  Renlin Li  renlin...@arm.com

  * gcc.target/arm/fixed_float_conversion.c: New test case.






[Ada] Fix layout of packed record type with zero-sized component

2013-12-04 Thread Eric Botcazou
In Ada we have the equivalent of bit-fields with zero size but they don't 
behave as in C (or rather according to most ABIs) when it comes to the layout.
This patch makes it so that they are laid out manually instead of being passed 
to stor-layout.c as for the other fields.

Tested on x86_64-suse-linux, applied on the mainline.


2013-12-04  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/decl.c (components_to_record): Add specific handling
for fields with zero size and no representation clause.


2013-12-04  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/pack19.adb: New test.


-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 205654)
+++ gcc-interface/decl.c	(working copy)
@@ -6932,6 +6932,7 @@ components_to_record (tree gnu_record_ty
   tree gnu_rep_list = NULL_TREE;
   tree gnu_var_list = NULL_TREE;
   tree gnu_self_list = NULL_TREE;
+  tree gnu_zero_list = NULL_TREE;
 
   /* For each component referenced in a component declaration create a GCC
  field and add it to the list, skipping pragmas in the GNAT list.  */
@@ -7262,6 +7263,10 @@ components_to_record (tree gnu_record_ty
  to do this in a separate pass since we want to handle the discriminants
  but can't play with them until we've used them in debugging data above.
 
+ Similarly, pull out the fields with zero size and no rep clause, as they
+ would otherwise modify the layout and thus very likely run afoul of the
+ Ada semantics, which are different from those of C here.
+
  ??? If we reorder them, debugging information will be wrong but there is
  nothing that can be done about this at the moment.  */
   gnu_last = NULL_TREE;
@@ -7300,6 +7305,19 @@ components_to_record (tree gnu_record_ty
 	  continue;
 	}
 
+  if (DECL_SIZE (gnu_field)  integer_zerop (DECL_SIZE (gnu_field)))
+	{
+	  DECL_FIELD_OFFSET (gnu_field) = size_zero_node;
+	  SET_DECL_OFFSET_ALIGN (gnu_field, BIGGEST_ALIGNMENT);
+	  DECL_FIELD_BIT_OFFSET (gnu_field) = bitsize_zero_node;
+	  if (field_is_aliased (gnu_field))
+	TYPE_ALIGN (gnu_record_type)
+	  = MAX (TYPE_ALIGN (gnu_record_type),
+		 TYPE_ALIGN (TREE_TYPE (gnu_field)));
+	  MOVE_FROM_FIELD_LIST_TO (gnu_zero_list);
+	  continue;
+	}
+
   gnu_last = gnu_field;
 }
 
@@ -7392,6 +7410,11 @@ components_to_record (tree gnu_record_ty
   finish_record_type (gnu_record_type, gnu_field_list, layout_with_rep ? 1 : 0,
 		  debug_info  !maybe_unused);
 
+  /* Chain the fields with zero size at the beginning of the field list.  */
+  if (gnu_zero_list)
+TYPE_FIELDS (gnu_record_type)
+  = chainon (gnu_zero_list, TYPE_FIELDS (gnu_record_type));
+
   return (gnu_rep_list  !p_gnu_rep_list) || variants_have_rep;
 }
 -- { dg-do run }

procedure Pack19 is

   subtype Always_False is Boolean range False .. False;

   type Rec1 is record
  B1 : Boolean;
  B2 : Boolean;
  B3 : Boolean;
  B4 : Boolean;
  B5 : Boolean;
  B6 : Boolean;
  B7 : Always_False;
  B8 : Boolean;
   end record;
   pragma Pack (Rec1);

   subtype Always_True is Boolean range True .. True;

   type Rec2 is record
  B1 : Boolean;
  B2 : Boolean;
  B3 : Boolean;
  B4 : Boolean;
  B5 : Boolean;
  B6 : Boolean;
  B7 : Always_True;
  B8 : Boolean;
   end record;
   pragma Pack (Rec2);

   R1 : Rec1 := (True, True, True, True, True, True, False, False);
   R2 : Rec2 := (False, False, False, False, False, False, True, True);

begin
   R1.B8 := True;
   if R1.B7 /= False then
  raise Program_Error;
   end if;

   R1.B7 := False;
   if R1.B7 /= False then
  raise Program_Error;
   end if;

   R2.B8 := False;
   if R2.B7 /= True then
  raise Program_Error;
   end if;

   R2.B7 := True;
   if R2.B7 /= True then
  raise Program_Error;
   end if;
end;

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Yufeng Zhang

On 12/04/13 10:30, Richard Biener wrote:

On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
richard.guent...@gmail.com  wrote:

On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com  wrote:

On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:

Yufeng Zhangyufeng.zh...@arm.com  wrote:

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com

wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhangyufeng.zh...@arm.com  wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with

the

following changes.  Sorry for the little side adventure into

the

next-interp logic; in the end that's going to hurt more than

it

helps in
this case.  Thanks for having a look at it, anyway.  Thanks

also for

cleaning up this version to be less intrusive to common

interfaces; I

appreciate it.




Thanks a lot for the review.  I've attached an updated patch

with the

suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks

for

your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr

maintainer

from the steering committee.   Voting was in progress just before

the

close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely

shared.

  The cached is introduced mainly because get_alternative_base

() may

be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm

happy

to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often

the

cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving

time.

 It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is

worth having

a cache to avoid calling them too many times.  I run the slsr-*.c

tests

under gcc.dg/tree-ssa/ and find out that the cache hit rates range

from

55.6% to 90%, with 73.5% as the average.  The samples may not well

represent

the real world scenario, but they do show the fact that the 'base'

tree can

be shared to some extent.  So I'd like to have the cache in the

patch.








+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-slsr } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
+/* { dg-final { cleanup-tree-dump slsr } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them

to

MEM_REF from e.g. ARRAY_REF, I think scanning for the number of

MEM_REFs

is an effective check.  Alternatively, I can add a follow-up

patch to

add some dumping facility in replace_ref () to print out the

replacing

actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would

be

better.  That's the only change that is required for this to move

forward.



I've updated to patch to dump more details when

-fdump-tree-slsr-details is

on.  The tests have also been updated to scan for these new dumps

instead of

MEMs.




I suggest doing it quickly.  We're well past stage1 close at this

point.



The bootstrapping on x86_64 is still running.  OK to commit if it

succeeds?


I still don't like it.  It's using the wrong and too expensive tools

to do

stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

/* Base expression for the chain of candidates:  often, but not
   always, an SSA name.  */
tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base

(),

ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p,
e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of

the object returned by 

Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-12-04 Thread Christophe Lyon
Committed on Kugan's behalf as rev 205666.

Christophe.


On 3 December 2013 20:47, Jeff Law l...@redhat.com wrote:
 On 12/02/13 23:39, Kugan wrote:


 +2013-11-27  Kugan Vivekanandarajah  kug...@linaro.org
 +
 +   * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for
 +   architectures that does not have hardware divide instruction.
 +   i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__.
 +


 Is this OK for trunk now?

 Yes, this part is fine too.  AFAICT, it just implements what Richard E.
 suggested ;-)

 jeff


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Konstantin Serebryany
 Of course various parties care about that.  But that doesn't necessarily
 imply they can or want to run buildbots for a different compiler they don't
 care about, there are e.g. security implications to that, resources etc.

This brings us back to the initial issue: the way asanco are developed.
The fact of life is that the development happens in the LLVM tree.
The only strategy to keeping GCC's version in sync with upstream that
works for us
it to have verbatim copy of the sources in GCC and to have the merge process
purely mechanical. It has not been purely mechanical with the last merge.
Any other strategy would mean that someone else does the merges.
This is totally fine as long as the sources do not diverge over time;
but I afraid they will diverge.

Testing asanco upstream does not necessary require testing the rest
of the LLVM compiler.
It would be really great to have someone test the entire LLVM tree on
e.g. old Fedora,
but we can limit such testing just to the compiler-rt project.

Ideally for me, GCC would use sanitizer sources from compiler-rt
completely verbatim,
retaining the entire directory structure with all the tests and not
adding any extra files inside that tree.
Maybe even use svn external for greater simplicity.
The GCC's build files (Makefile.am, etc) and GCC-specific tests would
live outside of that source tree.
This way it would be easy to set up a build bot that takes fresh
compiler-rt, puts it into the GCC tree,
and runs the GCC testing.
The merges than will become purely mechanical: check the bots, put the
new sources into gcc
(or even update the svn external revision!).




 For GCC new ports etc. we never require people running some buildbots,
 people just report issues they encounter and those issues are fixed.

 tsan makes many assumptions about the system (address space, etc)
 which may not hold on old systems anyway.
 And tsan, even if it builds, will not work reliably.

 I really think that we need to disable libsanitizer on old systems
 until someone volunteers to set up a proper testing process upstream.
 If no one volunteers -- no one really needs it.

 The problem is that the definition of old systems is extremelly fuzzy,
 for you it is clearly != Ubuntu 12.04 or similar, but the unwritten
 assumptions libsanitizer makes are not related to one exact version of
 a single dependent component, it is a matter of compiler, compiler
 configuration, binutils, glibc, kernel, kernel headers, ...

 So, how do you disable libsanitizer on old systems when it is so fuzzy?

I would start from kernel version and glibc version, this should cover
the majority of use cases.

--kcc

 There may be assumptions you just don't know about, and fixing bugreports
 by just adding reporter's toolchain combinations to kill lists is certainly
 not the way GCC is developed.

 Jakub


Re: [wide-int] Add fast path for hosts with HWI widening multiplication

2013-12-04 Thread Richard Sandiford
Richard Sandiford rdsandif...@googlemail.com writes:
 This patch handles multiplications using a single HWIxHWI-2HWI multiplication
 on hosts that have one.  This removes all uses of the slow (half-HWI) path
 for insn-recog.ii.  The slow path is still used 58 times for cp/parser.ii
 and 168 times for fold-const.ii, but at that kind of level it shouldn't
 matter much.

 I followed Joseph's suggestion and reused longlong.h.  I copied it from
 libgcc rather than glibc since it seemed better for GCC to have a single
 version across both gcc/ and libgcc/.  I can put it in include/ if that
 seems better.

I've committed the patch to move longlong.h to trunk and merged back
to the branch, so all that's left is the wide-int.cc patch.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 +
+++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 +
@@ -27,6 +27,16 @@ along with GCC; see the file COPYING3.
 #include tree.h
 #include dumpfile.h
 
+#if GCC_VERSION = 3000
+#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT
+typedef unsigned HOST_HALF_WIDE_INT UHWtype;
+typedef unsigned HOST_WIDE_INT UWtype;
+typedef unsigned int UQItype __attribute__ ((mode (QI)));
+typedef unsigned int USItype __attribute__ ((mode (SI)));
+typedef unsigned int UDItype __attribute__ ((mode (DI)));
+#include longlong.h
+#endif
+
 /* This is the maximal size of the buffer needed for dump.  */
 const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4
+ (MAX_BITSIZE_MODE_ANY_INT
@@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result,
record in *OVERFLOW whether the result overflowed.  SGN controls
the signedness and is used to check overflow or if HIGH is set.  */
 unsigned int
-wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1,
- unsigned int op1len, const HOST_WIDE_INT *op2,
+wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
+ unsigned int op1len, const HOST_WIDE_INT *op2val,
  unsigned int op2len, unsigned int prec, signop sgn,
  bool *overflow, bool high)
 {
@@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   if (needs_overflow)
 *overflow = false;
 
+  wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec);
+  wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec);
+
   /* This is a surprisingly common case, so do it first.  */
-  if ((op1len == 1  op1[0] == 0) || (op2len == 1  op2[0] == 0))
+  if (op1 == 0 || op2 == 0)
 {
   val[0] = 0;
   return 1;
 }
 
+#ifdef umul_ppmm
+  if (sgn == UNSIGNED)
+{
+  /* If the inputs are single HWIs and the output has room for at
+least two HWIs, we can use umul_ppmm directly.  */
+  if (prec = HOST_BITS_PER_WIDE_INT * 2
+  wi::fits_uhwi_p (op1)
+  wi::fits_uhwi_p (op2))
+   {
+ umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
+ return 1 + (val[1] != 0 || val[0]  0);
+   }
+  /* Likewise if the output is a full single HWI, except that the
+upper HWI of the result is only used for determining overflow.
+(We handle this case inline when overflow isn't needed.)  */
+  else if (prec == HOST_BITS_PER_WIDE_INT)
+   {
+ unsigned HOST_WIDE_INT upper;
+ umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ());
+ if (needs_overflow)
+   *overflow = (upper != 0);
+ return 1;
+   }
+}
+#endif
+
   /* Handle multiplications by 1.  */
-  if (op1len == 1  op1[0] == 1)
+  if (op1 == 1)
 {
   for (i = 0; i  op2len; i++)
-   val[i] = op2[i];
+   val[i] = op2val[i];
   return op2len;
 }
-  if (op2len == 1  op2[0] == 1)
+  if (op2 == 1)
 {
   for (i = 0; i  op1len; i++)
-   val[i] = op1[i];
+   val[i] = op1val[i];
   return op1len;
 }
 
@@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 
   if (sgn == SIGNED)
{
- o0 = sext_hwi (op1[0], prec);
- o1 = sext_hwi (op2[0], prec);
+ o0 = op1.to_shwi ();
+ o1 = op2.to_shwi ();
}
   else
{
- o0 = zext_hwi (op1[0], prec);
- o1 = zext_hwi (op2[0], prec);
+ o0 = op1.to_uhwi ();
+ o1 = op2.to_uhwi ();
}
 
   r = o0 * o1;
@@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co
 }
 
   /* We do unsigned mul and then correct it.  */
-  wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len,
+  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
 half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len,
+  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
 half_blocks_needed, prec, SIGNED);
 
   /* The 2 is for a full mult.  */
@@ -1371,7 +1410,7 @@ wi::mul_internal 

Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 04:49:22PM +0400, Konstantin Serebryany wrote:
 I would start from kernel version and glibc version, this should cover
 the majority of use cases.

Well, for the kernel headers what we perhaps can do is just add
libsanitizer/include/linux/ tree that will be maintained by GCC and will
contain (where needed) wrappers for kernel headers or their replacements
to make sure things compile, if you don't care about it in the compiler-rt
tree.  But for the ppc32 stuff, we can't avoid modifying sanitizer_common
(the first patch I've posted recently, btw, I wonder if it works on sparc*,
we'll need to wait for somebody to test it), or e.g. for the .cfi_* stuff
(if you just apply the the .cfi_* related part of the patch I've posted
with say the macros __GCC_HAVE_* replaced by SANITIZER_USE_CFI_DIRECTIVES
or similar, I guess we can provide the right definition for that outside of
the compiler-rt maintained files.
Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and
later, rather than having an (even for glibc 2.11/2.12 incorrect) values for
older glibcs?

Jakub


Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Bill Schmidt
On Wed, 2013-12-04 at 11:26 +0100, Richard Biener wrote:
 On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
  Yufeng Zhang yufeng.zh...@arm.com wrote:
  On 12/03/13 14:20, Richard Biener wrote:
   On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
  wrote:
   On 12/03/13 06:48, Jeff Law wrote:
  
   On 12/02/13 08:47, Yufeng Zhang wrote:
  
   Ping~
  
   http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
  
  
  
   Thanks,
   Yufeng
  
   On 11/26/13 15:02, Yufeng Zhang wrote:
  
   On 11/26/13 12:45, Richard Biener wrote:
  
   On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
   Zhangyufeng.zh...@arm.com wrote:
  
   On 11/13/13 20:54, Bill Schmidt wrote:
  
   The second version of your original patch is ok with me with
  the
   following changes.  Sorry for the little side adventure into
  the
   next-interp logic; in the end that's going to hurt more than
  it
   helps in
   this case.  Thanks for having a look at it, anyway.  Thanks
  also for
   cleaning up this version to be less intrusive to common
  interfaces; I
   appreciate it.
  
  
  
   Thanks a lot for the review.  I've attached an updated patch
  with the
   suggested changes incorporated.
  
   For the next-interp adventure, I was quite happy to do the
   experiment; it's
   a good chance of gaining insight into the pass.  Many thanks
  for
   your prompt
   replies and patience in guiding!
  
  
   Everything else looks OK to me.  Please ask Richard for final
   approval,
   as I'm not a maintainer.
  
   First a note, I need to check on voting for Bill as the slsr
  maintainer
   from the steering committee.   Voting was in progress just before
  the
   close of stage1 development so I haven't tallied the results :-)
  
  
   Looking forward to some good news! :)
  
  
  
   Yes, you are right about the non-trivial 'base' tree are rarely
  shared.
The cached is introduced mainly because get_alternative_base
  () may
   be
   called twice on the same 'base' tree, once in the
   find_basis_for_candidate () for look-up and the other time in
   alloc_cand_and_find_basis () for record_potential_basis ().  I'm
  happy
   to leave out the cache if you think the benefit is trivial.
  
   Without some sense of how expensive the lookups are vs how often
  the
   cache hits it's awful hard to know if the cache is worth it.
  
   I'd say take it out unless you have some sense it's really saving
  time.
   It's a pretty minor implementation detail either way.
  
  
   I think the affine tree routines are generally expensive; it is
  worth having
   a cache to avoid calling them too many times.  I run the slsr-*.c
  tests
   under gcc.dg/tree-ssa/ and find out that the cache hit rates range
  from
   55.6% to 90%, with 73.5% as the average.  The samples may not well
  represent
   the real world scenario, but they do show the fact that the 'base'
  tree can
   be shared to some extent.  So I'd like to have the cache in the
  patch.
  
  
  
  
   +/* { dg-do compile } */
   +/* { dg-options -O2 -fdump-tree-slsr } */
   +
   +typedef int arr_2[50][50];
   +
   +void foo (arr_2 a2, int v1)
   +{
   +  int i, j;
   +
   +  i = v1 + 5;
   +  j = i;
   +  a2 [i-10] [j] = 2;
   +  a2 [i] [j++] = i;
   +  a2 [i+20] [j++] = i;
   +  a2 [i-3] [i-1] += 1;
   +  return;
   +}
   +
   +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
   +/* { dg-final { cleanup-tree-dump slsr } } */
  
   scanning for 5 MEMs looks non-sensical.  What transform do
   you expect?  I see other slsr testcases do similar non-sensical
   checking which is bad, too.
  
  
   As the slsr optimizes CAND_REF candidates by simply lowering them
  to
   MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
  MEM_REFs
   is an effective check.  Alternatively, I can add a follow-up
  patch to
   add some dumping facility in replace_ref () to print out the
  replacing
   actions when -fdump-tree-slsr-details is on.
  
   I think adding some details to the dump and scanning for them would
  be
   better.  That's the only change that is required for this to move
  forward.
  
  
   I've updated to patch to dump more details when
  -fdump-tree-slsr-details is
   on.  The tests have also been updated to scan for these new dumps
  instead of
   MEMs.
  
  
  
   I suggest doing it quickly.  We're well past stage1 close at this
  point.
  
  
   The bootstrapping on x86_64 is still running.  OK to commit if it
  succeeds?
  
   I still don't like it.  It's using the wrong and too expensive tools
  to do
   stuff.  What kind of bases are we ultimately interested in?  Browsing
   the code it looks like we're having
  
  /* Base expression for the chain of candidates:  often, but not
 always, an SSA name.  */
  tree base_expr;
  
   which isn't really too informative but I suppose they are all
   kind-of-gimple_val()s?  That said, I wonder if you can 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Bill Schmidt
On Wed, 2013-12-04 at 11:30 +0100, Richard Biener wrote:
 On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
 richard.guent...@gmail.com wrote:
  On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
  wschm...@linux.vnet.ibm.com wrote:
  On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
  Yufeng Zhang yufeng.zh...@arm.com wrote:
  On 12/03/13 14:20, Richard Biener wrote:
   On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
  wrote:
   On 12/03/13 06:48, Jeff Law wrote:
  
   On 12/02/13 08:47, Yufeng Zhang wrote:
  
   Ping~
  
   http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
  
  
  
   Thanks,
   Yufeng
  
   On 11/26/13 15:02, Yufeng Zhang wrote:
  
   On 11/26/13 12:45, Richard Biener wrote:
  
   On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
   Zhangyufeng.zh...@arm.com wrote:
  
   On 11/13/13 20:54, Bill Schmidt wrote:
  
   The second version of your original patch is ok with me with
  the
   following changes.  Sorry for the little side adventure into
  the
   next-interp logic; in the end that's going to hurt more than
  it
   helps in
   this case.  Thanks for having a look at it, anyway.  Thanks
  also for
   cleaning up this version to be less intrusive to common
  interfaces; I
   appreciate it.
  
  
  
   Thanks a lot for the review.  I've attached an updated patch
  with the
   suggested changes incorporated.
  
   For the next-interp adventure, I was quite happy to do the
   experiment; it's
   a good chance of gaining insight into the pass.  Many thanks
  for
   your prompt
   replies and patience in guiding!
  
  
   Everything else looks OK to me.  Please ask Richard for final
   approval,
   as I'm not a maintainer.
  
   First a note, I need to check on voting for Bill as the slsr
  maintainer
   from the steering committee.   Voting was in progress just before
  the
   close of stage1 development so I haven't tallied the results :-)
  
  
   Looking forward to some good news! :)
  
  
  
   Yes, you are right about the non-trivial 'base' tree are rarely
  shared.
The cached is introduced mainly because get_alternative_base
  () may
   be
   called twice on the same 'base' tree, once in the
   find_basis_for_candidate () for look-up and the other time in
   alloc_cand_and_find_basis () for record_potential_basis ().  I'm
  happy
   to leave out the cache if you think the benefit is trivial.
  
   Without some sense of how expensive the lookups are vs how often
  the
   cache hits it's awful hard to know if the cache is worth it.
  
   I'd say take it out unless you have some sense it's really saving
  time.
   It's a pretty minor implementation detail either way.
  
  
   I think the affine tree routines are generally expensive; it is
  worth having
   a cache to avoid calling them too many times.  I run the slsr-*.c
  tests
   under gcc.dg/tree-ssa/ and find out that the cache hit rates range
  from
   55.6% to 90%, with 73.5% as the average.  The samples may not well
  represent
   the real world scenario, but they do show the fact that the 'base'
  tree can
   be shared to some extent.  So I'd like to have the cache in the
  patch.
  
  
  
  
   +/* { dg-do compile } */
   +/* { dg-options -O2 -fdump-tree-slsr } */
   +
   +typedef int arr_2[50][50];
   +
   +void foo (arr_2 a2, int v1)
   +{
   +  int i, j;
   +
   +  i = v1 + 5;
   +  j = i;
   +  a2 [i-10] [j] = 2;
   +  a2 [i] [j++] = i;
   +  a2 [i+20] [j++] = i;
   +  a2 [i-3] [i-1] += 1;
   +  return;
   +}
   +
   +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
   +/* { dg-final { cleanup-tree-dump slsr } } */
  
   scanning for 5 MEMs looks non-sensical.  What transform do
   you expect?  I see other slsr testcases do similar non-sensical
   checking which is bad, too.
  
  
   As the slsr optimizes CAND_REF candidates by simply lowering them
  to
   MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
  MEM_REFs
   is an effective check.  Alternatively, I can add a follow-up
  patch to
   add some dumping facility in replace_ref () to print out the
  replacing
   actions when -fdump-tree-slsr-details is on.
  
   I think adding some details to the dump and scanning for them would
  be
   better.  That's the only change that is required for this to move
  forward.
  
  
   I've updated to patch to dump more details when
  -fdump-tree-slsr-details is
   on.  The tests have also been updated to scan for these new dumps
  instead of
   MEMs.
  
  
  
   I suggest doing it quickly.  We're well past stage1 close at this
  point.
  
  
   The bootstrapping on x86_64 is still running.  OK to commit if it
  succeeds?
  
   I still don't like it.  It's using the wrong and too expensive tools
  to do
   stuff.  What kind of bases are we ultimately interested in?  Browsing
   the code it looks like we're having
  
  /* Base expression for the chain of candidates:  often, but not
 always, an SSA name.  */
  tree base_expr;
  
   which isn't really too informative 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Bill Schmidt
On Wed, 2013-12-04 at 11:32 +, Yufeng Zhang wrote:
 On 12/04/13 10:30, Richard Biener wrote:
  On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
  richard.guent...@gmail.com  wrote:
  On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
  wschm...@linux.vnet.ibm.com  wrote:
  On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
  Yufeng Zhangyufeng.zh...@arm.com  wrote:
  On 12/03/13 14:20, Richard Biener wrote:
  On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
  wrote:
  On 12/03/13 06:48, Jeff Law wrote:
 
  On 12/02/13 08:47, Yufeng Zhang wrote:
 
  Ping~
 
  http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
 
 
 
  Thanks,
  Yufeng
 
  On 11/26/13 15:02, Yufeng Zhang wrote:
 
  On 11/26/13 12:45, Richard Biener wrote:
 
  On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
  Zhangyufeng.zh...@arm.com  wrote:
 
  On 11/13/13 20:54, Bill Schmidt wrote:
 
  The second version of your original patch is ok with me with
  the
  following changes.  Sorry for the little side adventure into
  the
  next-interp logic; in the end that's going to hurt more than
  it
  helps in
  this case.  Thanks for having a look at it, anyway.  Thanks
  also for
  cleaning up this version to be less intrusive to common
  interfaces; I
  appreciate it.
 
 
 
  Thanks a lot for the review.  I've attached an updated patch
  with the
  suggested changes incorporated.
 
  For the next-interp adventure, I was quite happy to do the
  experiment; it's
  a good chance of gaining insight into the pass.  Many thanks
  for
  your prompt
  replies and patience in guiding!
 
 
  Everything else looks OK to me.  Please ask Richard for final
  approval,
  as I'm not a maintainer.
 
  First a note, I need to check on voting for Bill as the slsr
  maintainer
  from the steering committee.   Voting was in progress just before
  the
  close of stage1 development so I haven't tallied the results :-)
 
 
  Looking forward to some good news! :)
 
 
 
  Yes, you are right about the non-trivial 'base' tree are rarely
  shared.
The cached is introduced mainly because get_alternative_base
  () may
  be
  called twice on the same 'base' tree, once in the
  find_basis_for_candidate () for look-up and the other time in
  alloc_cand_and_find_basis () for record_potential_basis ().  I'm
  happy
  to leave out the cache if you think the benefit is trivial.
 
  Without some sense of how expensive the lookups are vs how often
  the
  cache hits it's awful hard to know if the cache is worth it.
 
  I'd say take it out unless you have some sense it's really saving
  time.
   It's a pretty minor implementation detail either way.
 
 
  I think the affine tree routines are generally expensive; it is
  worth having
  a cache to avoid calling them too many times.  I run the slsr-*.c
  tests
  under gcc.dg/tree-ssa/ and find out that the cache hit rates range
  from
  55.6% to 90%, with 73.5% as the average.  The samples may not well
  represent
  the real world scenario, but they do show the fact that the 'base'
  tree can
  be shared to some extent.  So I'd like to have the cache in the
  patch.
 
 
 
 
  +/* { dg-do compile } */
  +/* { dg-options -O2 -fdump-tree-slsr } */
  +
  +typedef int arr_2[50][50];
  +
  +void foo (arr_2 a2, int v1)
  +{
  +  int i, j;
  +
  +  i = v1 + 5;
  +  j = i;
  +  a2 [i-10] [j] = 2;
  +  a2 [i] [j++] = i;
  +  a2 [i+20] [j++] = i;
  +  a2 [i-3] [i-1] += 1;
  +  return;
  +}
  +
  +/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
  +/* { dg-final { cleanup-tree-dump slsr } } */
 
  scanning for 5 MEMs looks non-sensical.  What transform do
  you expect?  I see other slsr testcases do similar non-sensical
  checking which is bad, too.
 
 
  As the slsr optimizes CAND_REF candidates by simply lowering them
  to
  MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
  MEM_REFs
  is an effective check.  Alternatively, I can add a follow-up
  patch to
  add some dumping facility in replace_ref () to print out the
  replacing
  actions when -fdump-tree-slsr-details is on.
 
  I think adding some details to the dump and scanning for them would
  be
  better.  That's the only change that is required for this to move
  forward.
 
 
  I've updated to patch to dump more details when
  -fdump-tree-slsr-details is
  on.  The tests have also been updated to scan for these new dumps
  instead of
  MEMs.
 
 
 
  I suggest doing it quickly.  We're well past stage1 close at this
  point.
 
 
  The bootstrapping on x86_64 is still running.  OK to commit if it
  succeeds?
 
  I still don't like it.  It's using the wrong and too expensive tools
  to do
  stuff.  What kind of bases are we ultimately interested in?  Browsing
  the code it looks like we're having
 
  /* Base expression for the chain of candidates:  often, but not
 always, an SSA name.  */
  tree base_expr;
 
  which isn't really too informative but I suppose they are all
  kind-of-gimple_val()s?  That said, I wonder if you can 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-04 Thread Bill Schmidt
On Wed, 2013-12-04 at 07:13 -0600, Bill Schmidt wrote:
 On Wed, 2013-12-04 at 11:30 +0100, Richard Biener wrote:
  On Wed, Dec 4, 2013 at 11:26 AM, Richard Biener
  richard.guent...@gmail.com wrote:
   On Tue, Dec 3, 2013 at 11:04 PM, Bill Schmidt
   wschm...@linux.vnet.ibm.com wrote:
   On Tue, 2013-12-03 at 21:35 +0100, Richard Biener wrote:
   Yufeng Zhang yufeng.zh...@arm.com wrote:
   On 12/03/13 14:20, Richard Biener wrote:
On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhangyufeng.zh...@arm.com
   wrote:
On 12/03/13 06:48, Jeff Law wrote:
   
On 12/02/13 08:47, Yufeng Zhang wrote:
   
Ping~
   
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
   
   
   
Thanks,
Yufeng
   
On 11/26/13 15:02, Yufeng Zhang wrote:
   
On 11/26/13 12:45, Richard Biener wrote:
   
On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhangyufeng.zh...@arm.com wrote:
   
On 11/13/13 20:54, Bill Schmidt wrote:
   
The second version of your original patch is ok with me with
   the
following changes.  Sorry for the little side adventure into
   the
next-interp logic; in the end that's going to hurt more than
   it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks
   also for
cleaning up this version to be less intrusive to common
   interfaces; I
appreciate it.
   
   
   
Thanks a lot for the review.  I've attached an updated patch
   with the
suggested changes incorporated.
   
For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks
   for
your prompt
replies and patience in guiding!
   
   
Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.
   
First a note, I need to check on voting for Bill as the slsr
   maintainer
from the steering committee.   Voting was in progress just before
   the
close of stage1 development so I haven't tallied the results :-)
   
   
Looking forward to some good news! :)
   
   
   
Yes, you are right about the non-trivial 'base' tree are rarely
   shared.
 The cached is introduced mainly because get_alternative_base
   () may
be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm
   happy
to leave out the cache if you think the benefit is trivial.
   
Without some sense of how expensive the lookups are vs how often
   the
cache hits it's awful hard to know if the cache is worth it.
   
I'd say take it out unless you have some sense it's really saving
   time.
It's a pretty minor implementation detail either way.
   
   
I think the affine tree routines are generally expensive; it is
   worth having
a cache to avoid calling them too many times.  I run the slsr-*.c
   tests
under gcc.dg/tree-ssa/ and find out that the cache hit rates range
   from
55.6% to 90%, with 73.5% as the average.  The samples may not well
   represent
the real world scenario, but they do show the fact that the 'base'
   tree can
be shared to some extent.  So I'd like to have the cache in the
   patch.
   
   
   
   
+/* { dg-do compile } */
+/* { dg-options -O2 -fdump-tree-slsr } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times MEM 5 slsr } } */
+/* { dg-final { cleanup-tree-dump slsr } } */
   
scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.
   
   
As the slsr optimizes CAND_REF candidates by simply lowering them
   to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
   MEM_REFs
is an effective check.  Alternatively, I can add a follow-up
   patch to
add some dumping facility in replace_ref () to print out the
   replacing
actions when -fdump-tree-slsr-details is on.
   
I think adding some details to the dump and scanning for them would
   be
better.  That's the only change that is required for this to move
   forward.
   
   
I've updated to patch to dump more details when
   -fdump-tree-slsr-details is
on.  The tests have also been updated to scan for these new dumps
   instead of
MEMs.
   
   
   
I suggest doing it quickly.  We're well past stage1 close at this
   point.
   
   
The bootstrapping on x86_64 is still running.  OK to commit if it
   succeeds?
   
I still don't like it.  It's using the wrong and too expensive tools
   to do
stuff.  What kind of bases are we ultimately interested 

Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Konstantin Serebryany
On Wed, Dec 4, 2013 at 5:02 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Dec 04, 2013 at 04:49:22PM +0400, Konstantin Serebryany wrote:
 I would start from kernel version and glibc version, this should cover
 the majority of use cases.

 Well, for the kernel headers what we perhaps can do is just add
 libsanitizer/include/linux/ tree that will be maintained by GCC and will

if that works for you, no objections.

 contain (where needed) wrappers for kernel headers or their replacements
 to make sure things compile, if you don't care about it in the compiler-rt
 tree.  But for the ppc32 stuff, we can't avoid modifying sanitizer_common
 (the first patch I've posted recently, btw, I wonder if it works on sparc*,
 we'll need to wait for somebody to test it), or e.g. for the .cfi_* stuff
 (if you just apply the the .cfi_* related part of the patch I've posted
 with say the macros __GCC_HAVE_* replaced by SANITIZER_USE_CFI_DIRECTIVES
 or similar, I guess we can provide the right definition for that outside of
 the compiler-rt maintained files.

.cfi is used only in tsan sources now, and tsan is not supported
anywhere but x86_64
ppc32 never worked (last time I tried there were several different
issues so we disabled 32-bit build)
-- we should just disable it in GCC too. There is not value in
building code that does not run.

 Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and
 later, rather than having an (even for glibc 2.11/2.12 incorrect) values for
 older glibcs?

That would work for me, although it may bring some surprises later.
If we incorrectly compute the tls boundaries, lsan my produce false
positives or false negatives.
Having kThreadDescriptorSize=0 means that we include the stack
descriptor in the lsan's root set and thus
may miss a leak (with rather low probability). I can live with this.

Like this (tested only on my box)?
Index: sanitizer_linux_libcdep.cc
===
--- sanitizer_linux_libcdep.cc  (revision 196375)
+++ sanitizer_linux_libcdep.cc  (working copy)
@@ -207,12 +207,12 @@

 #if defined(__x86_64__) || defined(__i386__)
 // sizeof(struct thread) from glibc.
-// There has been a report of this being different on glibc 2.11 and 2.13. We
-// don't know when this change happened, so 2.14 is a conservative estimate.
-#if __GLIBC_PREREQ(2, 14)
+// This may change between glibc versions, we only support the versions we know
+// avout (= 2.13). For others we set kThreadDescriptorSize to 0.
+#if __GLIBC_PREREQ(2, 13)
 const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
 #else
-const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
+const uptr kThreadDescriptorSize = 0;  // Unknown.
 #endif

 uptr ThreadDescriptorSize() {
@@ -255,7 +255,7 @@
   *stk_addr = stack_bottom;
   *stk_size = stack_top - stack_bottom;

-  if (!main) {
+  if (!main  kThreadDescriptorSize) {
 // If stack and tls intersect, make them non-intersecting.
 if (*tls_addr  *stk_addr  *tls_addr  *stk_addr + *stk_size) {
   CHECK_GT(*tls_addr + *tls_size, *stk_addr);
Index: tests/sanitizer_linux_test.cc
===
--- tests/sanitizer_linux_test.cc   (revision 196375)
+++ tests/sanitizer_linux_test.cc   (working copy)
@@ -224,6 +224,7 @@

 TEST(SanitizerLinux, ThreadDescriptorSize) {
   pthread_t tid;
+  if (!ThreadDescriptorSize()) return;
   void *result;
   ASSERT_EQ(0, pthread_create(tid, 0, thread_descriptor_size_test_func, 0));
   ASSERT_EQ(0, pthread_join(tid, result));



grumbling
If I had a buildbot with old Fedora, I would simply submit the
change and see if it broke/fixed it.
/grumbling

--kcc




 Jakub


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 05:28:40PM +0400, Konstantin Serebryany wrote:
  Well, for the kernel headers what we perhaps can do is just add
  libsanitizer/include/linux/ tree that will be maintained by GCC and will
 
 if that works for you, no objections.

I haven't tried to do that yet, so don't know how much work it will be,
but at least from the second patch posted recently it it might work fine, at
least for now.

 .cfi is used only in tsan sources now, and tsan is not supported
 anywhere but x86_64

But the .cfi_* issue is platform independent.  Whether the compiler
decides to emit them or not depends on how it was configured, on assembler
and on compiler flags.
I don't see how it can be a maintainance problem to just guard the few
(right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros
instead of .cfi_* directives directly in the assembly file.
Other projects (e.g. glibc) manage to do that for years without any trouble.

 ppc32 never worked (last time I tried there were several different
 issues so we disabled 32-bit build)
 -- we should just disable it in GCC too. There is not value in
 building code that does not run.

That doesn't mean it can't be made to work, and the patch I've posted is
at least an (IMHO correct) step towards that.  Note, I had just much bigger
problems on ppc64 with the addr2line symbolization because of the ppc64
opd/plt stuff, though supposedly that might go away once I patch
libsanitizer to use libbacktrace for symbolization.
There is no inherent reason why ppc32 wouldn't work and ppc64 would, after
all ppc64 with its weirdo function descriptor stuff is much harder to
support.

  Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and
  later, rather than having an (even for glibc 2.11/2.12 incorrect) values for
  older glibcs?
 
 That would work for me, although it may bring some surprises later.
 If we incorrectly compute the tls boundaries, lsan my produce false
 positives or false negatives.

But is that solely for lsan and nothing else?  Because, the assertion
was failing in asan tests, without any asan options to request leak
checking.  And for non-i?86/x86_64 you ignore the tls boundaries too.

 Having kThreadDescriptorSize=0 means that we include the stack
 descriptor in the lsan's root set and thus
 may miss a leak (with rather low probability). I can live with this.
 
 Like this (tested only on my box)?

 --- sanitizer_linux_libcdep.cc  (revision 196375)
 +++ sanitizer_linux_libcdep.cc  (working copy)
 @@ -207,12 +207,12 @@
 
  #if defined(__x86_64__) || defined(__i386__)
  // sizeof(struct thread) from glibc.
 -// There has been a report of this being different on glibc 2.11 and 2.13. We
 -// don't know when this change happened, so 2.14 is a conservative estimate.
 -#if __GLIBC_PREREQ(2, 14)
 +// This may change between glibc versions, we only support the versions we 
 know
 +// avout (= 2.13). For others we set kThreadDescriptorSize to 0.
 +#if __GLIBC_PREREQ(2, 13)
  const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
  #else
 -const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
 +const uptr kThreadDescriptorSize = 0;  // Unknown.

Depends on (as I've asked earlier) on if you need the exact precise
value or if say conservatively smaller value is fine.  Then you could
say for glibc = 2.5 pick the minimum of the values I've gathered.

Jakub


[PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Marek Polacek
And this is the i?86 specific part of -fsanitize=signed-integer-overflow,
split out of the huge patch.  It really is dependent on the generic
parts, when commiting, I'll put both parts together.

Uros, would you mind taking a look at this?

Regtested/bootstrapped on x86_64-linux.  Ok for trunk?

2013-12-04  Jakub Jelinek  ja...@redhat.com  
Marek Polacek  pola...@redhat.com

* config/i386/i386.md (addvmode4, subvmode4, mulvmode4,
negvmode3, negvmode3_1): Define expands.
(*addvmode4, *subvmode4, *mulvmode4, *negvmode3): Define
insns.

--- gcc/config/i386/i386.md.mp  2013-12-04 12:15:33.508905947 +0100
+++ gcc/config/i386/i386.md 2013-12-04 12:15:39.608929341 +0100
@@ -6153,6 +6153,42 @@
   [(set_attr type alu)
(set_attr mode QI)])
 
+(define_mode_attr widerintmode [(QI HI) (HI SI) (SI DI) (DI TI)])
+
+;; Add with jump on overflow.
+(define_expand addvmode4
+  [(parallel [(set (reg:CCO FLAGS_REG)
+  (eq:CCO (plus:widerintmode
+ (sign_extend:widerintmode
+(match_operand:SWI 1 register_operand))
+ (sign_extend:widerintmode
+(match_operand:SWI 2 general_operand)))
+  (sign_extend:widerintmode
+ (plus:SWI (match_dup 1) (match_dup 2)
+ (set (match_operand:SWI 0 register_operand)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (eq (reg:CCO FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  )
+
+(define_insn *addvmode4
+  [(set (reg:CCO FLAGS_REG)
+   (eq:CCO (plus:widerintmode
+  (sign_extend:widerintmode
+ (match_operand:SWI 1 nonimmediate_operand %0,0))
+  (sign_extend:widerintmode
+ (match_operand:SWI 2 general_operand g,ri)))
+   (sign_extend:widerintmode
+  (plus:SWI (match_dup 1) (match_dup 2)
+   (set (match_operand:SWI 0 nonimmediate_operand =r,rm)
+   (plus:SWI (match_dup 1) (match_dup 2)))]
+  ix86_binary_operator_ok (PLUS, MODEmode, operands)
+  add{imodesuffix}\t{%2, %0|%0, %2}
+  [(set_attr type alu)
+   (set_attr mode MODE)])
+
 ;; The lea patterns for modes less than 32 bits need to be matched by
 ;; several insns converted to real lea by splitters.
 
@@ -6390,6 +6426,40 @@
   [(set_attr type alu)
(set_attr mode SI)])
 
+;; Subtract with jump on overflow.
+(define_expand subvmode4
+  [(parallel [(set (reg:CCO FLAGS_REG)
+  (eq:CCO (minus:widerintmode
+ (sign_extend:widerintmode
+(match_operand:SWI 1 register_operand))
+ (sign_extend:widerintmode
+(match_operand:SWI 2 general_operand)))
+  (sign_extend:widerintmode
+ (minus:SWI (match_dup 1) (match_dup 2)
+ (set (match_operand:SWI 0 register_operand)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (eq (reg:CCO FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  )
+
+(define_insn *subvmode4
+  [(set (reg:CCO FLAGS_REG)
+   (eq:CCO (minus:widerintmode
+  (sign_extend:widerintmode
+ (match_operand:SWI 1 nonimmediate_operand 0,0))
+  (sign_extend:widerintmode
+ (match_operand:SWI 2 general_operand ri,rm)))
+   (sign_extend:widerintmode
+  (minus:SWI (match_dup 1) (match_dup 2)
+   (set (match_operand:SWI 0 nonimmediate_operand =rm,r)
+   (minus:SWI (match_dup 1) (match_dup 2)))]
+  ix86_binary_operator_ok (MINUS, MODEmode, operands)
+  sub{imodesuffix}\t{%2, %0|%0, %2}
+  [(set_attr type alu)
+   (set_attr mode MODE)])
+
 (define_insn *submode_3
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 nonimmediate_operand 0,0)
@@ -6704,6 +6774,59 @@
(set_attr bdver1_decode direct)
(set_attr mode QI)])
 
+;; Multiply with jump on overflow.
+(define_expand mulvmode4
+  [(parallel [(set (reg:CCO FLAGS_REG)
+  (eq:CCO (mult:widerintmode
+ (sign_extend:widerintmode
+(match_operand:SWI48 1 register_operand))
+ (sign_extend:widerintmode
+(match_operand:SWI48 2 general_operand)))
+  (sign_extend:widerintmode
+ (mult:SWI48 (match_dup 1) (match_dup 2)
+ (set (match_operand:SWI48 0 register_operand)
+  (mult:SWI48 (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (eq (reg:CCO FLAGS_REG) (const_int 0))
+  (label_ref 

[PATCH 1/2] Implement -fsanitize=signed-integer-overflow (generic parts)

2013-12-04 Thread Marek Polacek
This is a repost of rebased version of the signed-integer-overflow
patch, split into generic parts and i?86 parts.  By i?86 parts I mean
the stuff that resides in config/i386, I haven't really tried to
untangle it more.
Except the two formatting fixes I also moved various PROB_ macros into
predict.h and made the users include it, rather than duplicating
the defines everywhere.

Regtested/bootstrapped on x86_64-linux.  Ok for trunk?

There are still things to do, but I'd like to get this in first.

2013-12-04  Jakub Jelinek  ja...@redhat.com  
Marek Polacek  pola...@redhat.com

* opts.c (common_handle_option): Handle
-fsanitize=signed-integer-overflow.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_ADD_OVERFLOW,
BUILT_IN_UBSAN_HANDLE_SUB_OVERFLOW,
BUILT_IN_UBSAN_HANDLE_MUL_OVERFLOW,
BUILT_IN_UBSAN_HANDLE_NEGATE_OVERFLOW): Define.
* ubsan.h (PROB_VERY_UNLIKELY, PROB_EVEN, PROB_VERY_LIKELY,
PROB_ALWAYS): Define.
(ubsan_build_overflow_builtin): Declare.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Add folding of
internal functions.
* ubsan.c (PROB_VERY_UNLIKELY): Don't define here.
(ubsan_build_overflow_builtin): New function.
(instrument_si_overflow): Likewise.
(ubsan_pass): Add signed integer overflow checking.
(gate_ubsan): Enable the pass also when SANITIZE_SI_OVERFLOW.
* flag-types.h (enum sanitize_code): Add SANITIZE_SI_OVERFLOW.
* internal-fn.c: Include ubsan.h and target.h.
(ubsan_expand_si_overflow_addsub_check): New function.
(ubsan_expand_si_overflow_neg_check): Likewise.
(ubsan_expand_si_overflow_mul_check): Likewise.
(expand_UBSAN_CHECK_ADD): Likewise.
(expand_UBSAN_CHECK_SUB): Likewise.
(expand_UBSAN_CHECK_MUL): Likewise.
* fold-const.c (fold_binary_loc): Don't fold A + (-B) - A - B and
(-A) + B - B - A when doing the signed integer overflow checking.
* internal-fn.def (UBSAN_CHECK_ADD, UBSAN_CHECK_SUB, UBSAN_CHECK_MUL):
Define.
* tree-vrp.c (extract_range_basic): Handle internal calls.
* optabs.def (addv4_optab, subv4_optab, mulv4_optab, negv4_optab): New
optabs.
* asan.c: Include predict.h.
(PROB_VERY_UNLIKELY, PROB_ALWAYS): Don't define here.
* predict.c: Move the PROB_* macros...
* predict.h (enum br_predictor): ...here.
(PROB_LIKELY, PROB_UNLIKELY): Define.
* trans-mem.c: Include predict.h.
(PROB_VERY_UNLIKELY, PROB_ALWAYS, PROB_VERY_LIKELY,
PROB_LIKELY, PROB_UNLIKELY): Don't define here.
c-family/
* c-gimplify.c (c_gimplify_expr): If doing the integer-overflow
sanitization, call unsigned_type_for only when !TYPE_OVERFLOW_WRAPS.
testsuite/
* c-c++-common/ubsan/overflow-mul-2.c: New test.
* c-c++-common/ubsan/overflow-add-1.c: New test.
* c-c++-common/ubsan/overflow-add-2.c: New test.
* c-c++-common/ubsan/overflow-mul-1.c: New test.
* c-c++-common/ubsan/overflow-sub-1.c: New test.
* c-c++-common/ubsan/overflow-sub-2.c: New test.
* c-c++-common/ubsan/overflow-negate-1.c: New test.

--- gcc/opts.c.mp   2013-12-04 12:15:33.517905987 +0100
+++ gcc/opts.c  2013-12-04 12:15:39.640929478 +0100
@@ -1460,6 +1460,8 @@ common_handle_option (struct gcc_options
  { vla-bound, SANITIZE_VLA, sizeof vla-bound - 1 },
  { return, SANITIZE_RETURN, sizeof return - 1 },
  { null, SANITIZE_NULL, sizeof null - 1 },
+ { signed-integer-overflow, SANITIZE_SI_OVERFLOW,
+   sizeof signed-integer-overflow -1 },
  { NULL, 0, 0 }
};
const char *comma;
--- gcc/predict.h.mp2013-12-04 12:15:33.520905999 +0100
+++ gcc/predict.h   2013-12-04 12:15:39.645929498 +0100
@@ -20,6 +20,16 @@ along with GCC; see the file COPYING3.
 #ifndef GCC_PREDICT_H
 #define GCC_PREDICT_H
 
+/* Random guesstimation given names.
+   PROB_VERY_UNLIKELY should be small enough so basic block predicted
+   by it gets below HOT_BB_FREQUENCY_FRACTION.  */
+#define PROB_VERY_UNLIKELY (REG_BR_PROB_BASE / 2000 - 1)
+#define PROB_EVEN  (REG_BR_PROB_BASE / 2)
+#define PROB_VERY_LIKELY   (REG_BR_PROB_BASE - PROB_VERY_UNLIKELY)
+#define PROB_ALWAYS(REG_BR_PROB_BASE)
+#define PROB_UNLIKELY   (REG_BR_PROB_BASE / 5 - 1)
+#define PROB_LIKELY (PROB_ALWAYS - PROB_VERY_LIKELY)
+
 #define DEF_PREDICTOR(ENUM, NAME, HITRATE, FLAGS) ENUM,
 enum br_predictor
 {
--- gcc/c-family/c-gimplify.c.mp2013-12-04 12:15:33.506905939 +0100
+++ gcc/c-family/c-gimplify.c   2013-12-04 12:15:39.598929297 +0100
@@ -199,7 +199,9 @@ c_gimplify_expr (tree *expr_p, gimple_se
tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
if (INTEGRAL_TYPE_P (type)  c_promoting_integer_type_p (type))
  {
-   if 

Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote:
 And this is the i?86 specific part of -fsanitize=signed-integer-overflow,
 split out of the huge patch.  It really is dependent on the generic
 parts, when commiting, I'll put both parts together.

Just a question (I will review the patch later today): shouldn't
generic parts also work without new target patterns and use __addv*
stuff from libgcc when patterns are not present?

Uros.


Re: [wide-int] Add fast path for hosts with HWI widening multiplication

2013-12-04 Thread Kenneth Zadeck

On 12/04/2013 07:56 AM, Richard Sandiford wrote:

Richard Sandiford rdsandif...@googlemail.com writes:

This patch handles multiplications using a single HWIxHWI-2HWI multiplication
on hosts that have one.  This removes all uses of the slow (half-HWI) path
for insn-recog.ii.  The slow path is still used 58 times for cp/parser.ii
and 168 times for fold-const.ii, but at that kind of level it shouldn't
matter much.

I followed Joseph's suggestion and reused longlong.h.  I copied it from
libgcc rather than glibc since it seemed better for GCC to have a single
version across both gcc/ and libgcc/.  I can put it in include/ if that
seems better.

I've committed the patch to move longlong.h to trunk and merged back
to the branch, so all that's left is the wide-int.cc patch.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2013-12-03 23:59:08.133658567 +
+++ gcc/wide-int.cc 2013-12-04 12:55:28.466895358 +
@@ -27,6 +27,16 @@ along with GCC; see the file COPYING3.
  #include tree.h
  #include dumpfile.h
  
+#if GCC_VERSION = 3000

+#define W_TYPE_SIZE HOST_BITS_PER_WIDE_INT
+typedef unsigned HOST_HALF_WIDE_INT UHWtype;
+typedef unsigned HOST_WIDE_INT UWtype;
+typedef unsigned int UQItype __attribute__ ((mode (QI)));
+typedef unsigned int USItype __attribute__ ((mode (SI)));
+typedef unsigned int UDItype __attribute__ ((mode (DI)));
+#include longlong.h
+#endif
+
  /* This is the maximal size of the buffer needed for dump.  */
  const unsigned int MAX_SIZE = (4 * (MAX_BITSIZE_MODE_ANY_INT / 4
+ (MAX_BITSIZE_MODE_ANY_INT
@@ -1255,8 +1265,8 @@ wi_pack (unsigned HOST_WIDE_INT *result,
 record in *OVERFLOW whether the result overflowed.  SGN controls
 the signedness and is used to check overflow or if HIGH is set.  */
  unsigned int
-wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1,
- unsigned int op1len, const HOST_WIDE_INT *op2,
+wi::mul_internal (HOST_WIDE_INT *val, const HOST_WIDE_INT *op1val,
+ unsigned int op1len, const HOST_WIDE_INT *op2val,
  unsigned int op2len, unsigned int prec, signop sgn,
  bool *overflow, bool high)
  {
@@ -1285,24 +1295,53 @@ wi::mul_internal (HOST_WIDE_INT *val, co
if (needs_overflow)
  *overflow = false;
  
+  wide_int_ref op1 = wi::storage_ref (op1val, op1len, prec);

+  wide_int_ref op2 = wi::storage_ref (op2val, op2len, prec);
+
/* This is a surprisingly common case, so do it first.  */
-  if ((op1len == 1  op1[0] == 0) || (op2len == 1  op2[0] == 0))
+  if (op1 == 0 || op2 == 0)
  {
val[0] = 0;
return 1;
  }
  
+#ifdef umul_ppmm

+  if (sgn == UNSIGNED)
+{
+  /* If the inputs are single HWIs and the output has room for at
+least two HWIs, we can use umul_ppmm directly.  */
+  if (prec = HOST_BITS_PER_WIDE_INT * 2
+  wi::fits_uhwi_p (op1)
+  wi::fits_uhwi_p (op2))
+   {
+ umul_ppmm (val[1], val[0], op1.ulow (), op2.ulow ());
+ return 1 + (val[1] != 0 || val[0]  0);
+   }
+  /* Likewise if the output is a full single HWI, except that the
+upper HWI of the result is only used for determining overflow.
+(We handle this case inline when overflow isn't needed.)  */
+  else if (prec == HOST_BITS_PER_WIDE_INT)
+   {
+ unsigned HOST_WIDE_INT upper;
+ umul_ppmm (upper, val[0], op1.ulow (), op2.ulow ());
+ if (needs_overflow)
+   *overflow = (upper != 0);
+ return 1;
+   }
+}
+#endif
+
/* Handle multiplications by 1.  */
-  if (op1len == 1  op1[0] == 1)
+  if (op1 == 1)
  {
for (i = 0; i  op2len; i++)
-   val[i] = op2[i];
+   val[i] = op2val[i];
return op2len;
  }
-  if (op2len == 1  op2[0] == 1)
+  if (op2 == 1)
  {
for (i = 0; i  op1len; i++)
-   val[i] = op1[i];
+   val[i] = op1val[i];
return op1len;
  }
  
@@ -1316,13 +1355,13 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  
if (sgn == SIGNED)

{
- o0 = sext_hwi (op1[0], prec);
- o1 = sext_hwi (op2[0], prec);
+ o0 = op1.to_shwi ();
+ o1 = op2.to_shwi ();
}
else
{
- o0 = zext_hwi (op1[0], prec);
- o1 = zext_hwi (op2[0], prec);
+ o0 = op1.to_uhwi ();
+ o1 = op2.to_uhwi ();
}
  
r = o0 * o1;

@@ -1344,9 +1383,9 @@ wi::mul_internal (HOST_WIDE_INT *val, co
  }
  
/* We do unsigned mul and then correct it.  */

-  wi_unpack (u, (const unsigned HOST_WIDE_INT*)op1, op1len,
+  wi_unpack (u, (const unsigned HOST_WIDE_INT *) op1val, op1len,
 half_blocks_needed, prec, SIGNED);
-  wi_unpack (v, (const unsigned HOST_WIDE_INT*)op2, op2len,
+  wi_unpack (v, (const unsigned HOST_WIDE_INT *) op2val, op2len,
 half_blocks_needed, prec, SIGNED);
  

Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 02:52:25PM +0100, Uros Bizjak wrote:
 On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote:
  And this is the i?86 specific part of -fsanitize=signed-integer-overflow,
  split out of the huge patch.  It really is dependent on the generic
  parts, when commiting, I'll put both parts together.
 
 Just a question (I will review the patch later today): shouldn't
 generic parts also work without new target patterns and use __addv*
 stuff from libgcc when patterns are not present?

They work (except for multiplication checking with widest supported mode, to
be supported later), but they can't use __addv* and co., because those
functions __builtin_trap () on overflow, while for
-fsanitize=signed-integer-overflow, if we wanted a library solution, we'd
need library functions that would return us both result and bool whether
overflow happened.  As addition/subtraction/negation overflow checking
is short and easily inlinable, that is done always inline now, and
for multiplication the code right now expands WIDEN_MULT_EXPR if possible.

Note that using get_range_info the generic expansion could be supposedly
improved, for add/sub we right now at runtime compare op1 against zero
and do one thing if it is negative and another if non-negative.
If VRP info tells us that either op0 or op1 is known to be non-negative
or known to be negative, we could just simplify the expansion.
I guess similarly for the multiplication, but after all, I think the VRP
info could be useful even for normal multiplication expansion, e.g. if
we want to do a WIDEN_MULT_EXPR, but know that given the operand ranges
we can actually do a MULT_EXPR only and then just sign/zero extend the
result, that will likely be cheaper.

If VRP figures out there will never be an overflow, then we already optimize
the UBSAN_* internal builtins into normal PLUS_EXPR etc.

Jakub


Re: [PATCH] Fix --with-long-double-128 for sparc32 when defaulting to 64-bit

2013-12-04 Thread Aurelien Jarno
On Wed, Dec 04, 2013 at 08:53:50AM +0100, Jakub Jelinek wrote:
 On Wed, Dec 04, 2013 at 08:49:32AM +0100, Aurelien Jarno wrote:
  On sparc, the --with-long-double-128 option doesn't change anything for
  a 64-bit compiler, as it always default to 128-bit long doubles. For
  a 32/64-bit compiler defaulting to 32-bit this correctly control the
  size of long double of the 32-bit compiler, however for a 32/64-bit
  compiler defaulting to 64-bit, the built-in specs force the 
  -mlong-double-64 option. This makes the option useless in this case.
  
  The patch below fixes that by removing the -mlong-double-64 from the
  built-in spec, using the default instead.
 
 So how do you configure 64/32-bit compiler defaulting to 64-bit, where
 32-bit defaults to -mlong-double-64?

Naively I would have say by *not* passing --with-long-double-128 to
configure like for a 64/32-bit compiler defaulting to 32-bit, but it
stills defaults to 128-bit long doubles with my patch.

Actually it's also the case for a 64/32-bit compiler defaulting to
32-bit, which make the --with-long-double-128 option completely useless
on sparc64. Whatever the option, the result would always be the same
with the current SVN:

64/32-bit compiler defaulting to 32-bit:
- 128-bit long doubles for -m32
- 128-bit long doubles for -m64

64/32-bit compiler defaulting to 64-bit:
- 64-bit long doubles for -m32
- 128-bit long doubles for -m64

I have to digg a bit more to see how to fix that, but even the current
code is not really consistent.

  Changelog gcc/
  
  2013-12-04  Aurelien Jarno  aurel...@aurel32.net
  
  * config/sparc/linux64.h (CC1_SPEC): When defaulting to 64-bit,
  don't force -mlong-double-64 when -m32 or -mv8plus is given.
  
  Index: gcc/config/sparc/linux64.h
  ===
  --- gcc/config/sparc/linux64.h  (revision 205647)
  +++ gcc/config/sparc/linux64.h  (working copy)
  @@ -162,9 +162,9 @@
   #else
   #define CC1_SPEC %{profile:-p} \
   %{m32:%{m64:%emay not use both -m32 and -m64}} \
  -%{m32:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
  +%{m32:-mptr32 -mno-stack-bias \
 %{!mcpu*:-mcpu=cypress}} \
  -%{mv8plus:-mptr32 -mno-stack-bias %{!mlong-double-128:-mlong-double-64} \
  +%{mv8plus:-mptr32 -mno-stack-bias \
 %{!mcpu*:-mcpu=v9}} \
   %{!m32:%{!mcpu*:-mcpu=ultrasparc}} \
   %{!mno-vis:%{!m32:%{!mcpu=v9:-mvis}}} \
 
   Jakub
 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net


Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Marek Polacek
On Wed, Dec 04, 2013 at 02:52:25PM +0100, Uros Bizjak wrote:
 On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote:
  And this is the i?86 specific part of -fsanitize=signed-integer-overflow,
  split out of the huge patch.  It really is dependent on the generic
  parts, when commiting, I'll put both parts together.
 
 Just a question (I will review the patch later today): shouldn't

Perfect, thanks!

 generic parts also work without new target patterns and use __addv*
 stuff from libgcc when patterns are not present?

If we can't use target patterns, we fall back to generic
implementation, using emit_cmp_and_jump_insns/emit_jump etc.
This generic implementation is indeed modelled after libgcc routines.

Marek


.cfi in sanitizer code

2013-12-04 Thread Konstantin Serebryany
[new subject. was: libsanitizer merge from upstream r196090]

 .cfi is used only in tsan sources now, and tsan is not supported
 anywhere but x86_64

 But the .cfi_* issue is platform independent.  Whether the compiler
 decides to emit them or not depends on how it was configured, on assembler
 and on compiler flags.
 I don't see how it can be a maintainance problem to just guard the few
 (right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros
 instead of .cfi_* directives directly in the assembly file.
 Other projects (e.g. glibc) manage to do that for years without any trouble.

This is a maintenance problem because we can not test if we broke
something during development.
e.g. clang doesn't seem to support -fno-dwarf2-cfi-asm
Then, if we get notified about the problem we spend 10x more time
fixing it because
1. the context is different
2. the patch you or other GCC folks send applies to GCC tree while we
need to apply it to LLVM
  (e.g. your patch has tsan/tsan_rtl.h but our tree has
tsan/rtl/tsan_rtl.h and even with that fixed it does not apply)
3. we still can't easily verify the fix.

I can commit a change similar to your cfi-related changes
(guarded by SANITIZER_DONT_USE_CFI_ASM instead of
__GCC_HAVE_DWARF2_CFI_ASM), but the problem will arise again

--kcc


Re: [PATCH] Add signed integer overflow checking to ubsan

2013-12-04 Thread Marek Polacek
On Tue, Dec 03, 2013 at 02:14:17PM -0700, Jeff Law wrote:
 Perhaps split this patch into two parts which can be reviewed
 independently, but go into the tree at the same time.  The obvious
 hope would be that Uros or one of the other x86 backend folks could
 chime in on that part.

I posted the i?86 bits separately.

 --- gcc/ubsan.h.mp   2013-11-27 08:46:28.046629473 +0100
 +++ gcc/ubsan.h  2013-11-27 08:46:57.578753342 +0100
 @@ -21,6 +21,12 @@ along with GCC; see the file COPYING3.
   #ifndef GCC_UBSAN_H
   #define GCC_UBSAN_H
 
 +/* From predict.c.  */
 +#define PROB_VERY_UNLIKELY  (REG_BR_PROB_BASE / 2000 - 1)
 +#define PROB_EVEN   (REG_BR_PROB_BASE / 2)
 +#define PROB_VERY_LIKELY(REG_BR_PROB_BASE - PROB_VERY_UNLIKELY)
 +#define PROB_ALWAYS (REG_BR_PROB_BASE)
 Seems like this should factor out rather than get duplicated.

I moved all the into predict.h, the users now include predict.h.

 --- gcc/gimple-fold.c.mp 2013-11-27 08:46:27.979629191 +0100
 +++ gcc/gimple-fold.c2013-11-27 08:46:57.556753251 +0100
 @@ -2660,8 +2660,30 @@ gimple_fold_stmt_to_constant_1 (gimple s
  tree fn;
 
  if (gimple_call_internal_p (stmt))
 -  /* No folding yet for these functions.  */
 -  return NULL_TREE;
 +  {
 +enum tree_code subcode = ERROR_MARK;
 +switch (gimple_call_internal_fn (stmt))
 +  {
 +  case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; break;
 +  case IFN_UBSAN_CHECK_SUB: subcode = MINUS_EXPR; break;
 +  case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break;
 Minor detail, put the case value and associated codes on separate lines.
 
   case FU:
 code;
 more code
 break;
   case BAR
 blah;
 break;

Done.
 
 --- gcc/tree-vrp.c.mp2013-11-27 08:46:28.043629459 +0100
 +++ gcc/tree-vrp.c   2013-11-27 08:46:57.570753307 +0100
 @@ -3757,6 +3757,40 @@ extract_range_basic (value_range_t *vr,
break;
  }
   }
 +  else if (is_gimple_call (stmt)
 +gimple_call_internal_p (stmt))
 +{
 +  enum tree_code subcode = ERROR_MARK;
 +  switch (gimple_call_internal_fn (stmt))
 +{
 +case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; break;
 +case IFN_UBSAN_CHECK_SUB: subcode = MINUS_EXPR; break;
 +case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break;
 +default: break;
 Formatting again.
 
Done.
 
 Overall the stuff outside the i386 directory looks pretty good,
 though it needs some minor updates.  I'd suggest extracting the i386
 bits and pinging them as a separate patch in the hope that we'll get
 Uros's attention.

Done, I posted splitted version of the patch.  Thanks for the review.

Marek


Re: .cfi in sanitizer code

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 06:09:56PM +0400, Konstantin Serebryany wrote:
 This is a maintenance problem because we can not test if we broke
 something during development.
 e.g. clang doesn't seem to support -fno-dwarf2-cfi-asm

It does, at least both clang 3.3 (from Fedora 19) and clang
3.4 r194685 (which I've built myself some time ago just to look at the
use-after-return etc. sanitization).

 I can commit a change similar to your cfi-related changes
 (guarded by SANITIZER_DONT_USE_CFI_ASM instead of
 __GCC_HAVE_DWARF2_CFI_ASM), but the problem will arise again

Why?  Is it so hard to remember that when you add .cfi_* directives
they should be guarded by that macro?  Even if the patch author
forgets about that, patch reviewer should catch that.

Jakub


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Konstantin Serebryany
On Wed, Dec 4, 2013 at 5:44 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Dec 04, 2013 at 05:28:40PM +0400, Konstantin Serebryany wrote:
  Well, for the kernel headers what we perhaps can do is just add
  libsanitizer/include/linux/ tree that will be maintained by GCC and will

 if that works for you, no objections.

 I haven't tried to do that yet, so don't know how much work it will be,
 but at least from the second patch posted recently it it might work fine, at
 least for now.

 .cfi is used only in tsan sources now, and tsan is not supported
 anywhere but x86_64

 But the .cfi_* issue is platform independent.  Whether the compiler
 decides to emit them or not depends on how it was configured, on assembler
 and on compiler flags.
 I don't see how it can be a maintainance problem to just guard the few
 (right now two) .cfi_* occurrences in the C++ sources, or using CFI_* macros
 instead of .cfi_* directives directly in the assembly file.
 Other projects (e.g. glibc) manage to do that for years without any trouble.

replied separately.


 ppc32 never worked (last time I tried there were several different
 issues so we disabled 32-bit build)
 -- we should just disable it in GCC too. There is not value in
 building code that does not run.

 That doesn't mean it can't be made to work, and the patch I've posted is
 at least an (IMHO correct) step towards that.

Sure it can. But all my previous grumbling about maintenance cost and
our inability to test changes, etc applies here.

   Note, I had just much bigger
 problems on ppc64 with the addr2line symbolization because of the ppc64
 opd/plt stuff, though supposedly that might go away once I patch
 libsanitizer to use libbacktrace for symbolization.
 There is no inherent reason why ppc32 wouldn't work and ppc64 would, after
 all ppc64 with its weirdo function descriptor stuff is much harder to
 support.

  Regarding the TLS size, can you e.g. just only do it for glibc 2.13 and
  later, rather than having an (even for glibc 2.11/2.12 incorrect) values 
  for
  older glibcs?

 That would work for me, although it may bring some surprises later.
 If we incorrectly compute the tls boundaries, lsan my produce false
 positives or false negatives.

 But is that solely for lsan and nothing else?

Mmm. I *think* yes, today this is lsan-only.

 Because, the assertion
 was failing in asan tests, without any asan options to request leak
 checking.  And for non-i?86/x86_64 you ignore the tls boundaries too.

My patch above should remove the assertion on  2.13


 Having kThreadDescriptorSize=0 means that we include the stack
 descriptor in the lsan's root set and thus
 may miss a leak (with rather low probability). I can live with this.

 Like this (tested only on my box)?

 --- sanitizer_linux_libcdep.cc  (revision 196375)
 +++ sanitizer_linux_libcdep.cc  (working copy)
 @@ -207,12 +207,12 @@

  #if defined(__x86_64__) || defined(__i386__)
  // sizeof(struct thread) from glibc.
 -// There has been a report of this being different on glibc 2.11 and 2.13. 
 We
 -// don't know when this change happened, so 2.14 is a conservative estimate.
 -#if __GLIBC_PREREQ(2, 14)
 +// This may change between glibc versions, we only support the versions we 
 know
 +// avout (= 2.13). For others we set kThreadDescriptorSize to 0.
 +#if __GLIBC_PREREQ(2, 13)
  const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
  #else
 -const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
 +const uptr kThreadDescriptorSize = 0;  // Unknown.

 Depends on (as I've asked earlier) on if you need the exact precise
 value or if say conservatively smaller value is fine.  Then you could
 say for glibc = 2.5 pick the minimum of the values I've gathered.

precise is better, otherwise we may lose leaks.


 Jakub


Re: [wwwdocs] Update obvious fix commit policy

2013-12-04 Thread Diego Novillo
On Tue, Dec 3, 2013 at 6:55 PM, Gerald Pfeifer ger...@pfeifer.com wrote:
 On Thu, 28 Nov 2013, Richard Biener wrote:
 Why remove ChangeLog files, web pages and comments?

 I was going to complain about web pages being removed. :-)

 On Thu, 28 Nov 2013, Diego Novillo wrote:
 -pFixes for obvious typos in ChangeLog files, docs, web pages, comments
 -and similar stuff.  Just check in the fix and copy it to
 -codegcc-patches/code.  We don't want to get overly anal-retentive
 -about checkin policies./p
 +pObvious fixes can be committed without prior approval.  Just check
 +in the fix and copy it to codegcc-patches/code.  A good test to
 +determine whether a fix is obvious: qwill the person who objects to
 +my work the most be able to find a fault with my fix?/q  If the fix
 +is later found to be faulty, it can always be rolled back.  We don't
 +want to get overly restrictive about checkin policies./p

 I am in favor of this change.

 To some extent, this is more a clarification of what I have seen as
 our current policy than a change in policy, though to a laywer-minded
 person it surely looks like the latter.  Not sure what kind of approval
 this needs?  Mind it has.

I have not received any feedback against this change. I will wait
another 48 hours and commit.


Diego.


[PATCH][ARM][3/3] Implement crypto intrinsics in AArch32 ARMv8-A - documentation

2013-12-04 Thread Kyrill Tkachov

Hi all,

This is the final patch in the series, adding the documentation for the 
intrinsics. Most of it is autogenerated from neon-docgen.ml and the ones that 
are not are added explicitly in neon-docgen.ml so that they appear in the 
generated .texi file. Not much else to say on this patch.


Ok for trunk?

Thanks,
Kyrill

2013-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/neon-docgen.ml: Add crypto intrinsics documentation.
* doc/arm-neon-intrinsics.texi: Regenerate.
diff --git a/gcc/config/arm/neon-docgen.ml b/gcc/config/arm/neon-docgen.ml
index f17314f..41ae059 100644
--- a/gcc/config/arm/neon-docgen.ml
+++ b/gcc/config/arm/neon-docgen.ml
@@ -36,8 +36,8 @@
 
 open Neon
 
-(* The combined ops and reinterp table.  *)
-let ops_reinterp = reinterp @ ops
+(* The combined ops and reinterp tables.  *)
+let ops_reinterp = reinterp @ reinterpq @ ops
 
 (* Helper functions for extracting things from the ops table.  *)
 let single_opcode desired_opcode () =
@@ -329,6 +329,77 @@ let gnu_header chan =
   @c This file is generated automatically using gcc/config/arm/neon-docgen.ml;
   @c Please do not edit manually.]
 
+let crypto_doc =
+
+@itemize @bullet
+@item poly128_t vldrq_p128(poly128_t const *)
+@end itemize
+
+@itemize @bullet
+@item void vstrq_p128(poly128_t *, poly128_t)
+@end itemize
+
+@itemize @bullet
+@item uint32_t vsha1h_u32 (uint32_t)
+@*@emph{Form of expected instruction(s):} @code{sha1h.32 @var{q0}, @var{q1}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha1cq_u32 (uint32x4_t, uint32_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha1c.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha1pq_u32 (uint32x4_t, uint32_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha1p.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha1mq_u32 (uint32x4_t, uint32_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha1m.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha1su0q_u32 (uint32x4_t, uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha1su0.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha1su1q_u32 (uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha1su1.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha256hq_u32 (uint32x4_t, uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha256h.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha256h2q_u32 (uint32x4_t, uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha256h2.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha256su0q_u32 (uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha256su0.32 @var{q0}, @var{q1}}
+@end itemize
+
+@itemize @bullet
+@item uint32x4_t vsha256su1q_u32 (uint32x4_t, uint32x4_t, uint32x4_t)
+@*@emph{Form of expected instruction(s):} @code{sha256su1.32 @var{q0}, @var{q1}, @var{q2}}
+@end itemize
+
+@itemize @bullet
+@item poly128_t vmull_p64 (poly64_t a, poly64_t b)
+@*@emph{Form of expected instruction(s):} @code{vmull.p64 @var{q0}, @var{d1}, @var{d2}}
+@end itemize
+
+@itemize @bullet
+@item poly128_t vmull_high_p64 (poly64x2_t a, poly64x2_t b)
+@*@emph{Form of expected instruction(s):} @code{vmull.p64 @var{q0}, @var{d1}, @var{d2}}
+@end itemize
+
+
 (* Program entry point.  *)
 let _ =
   if Array.length Sys.argv  2 then
@@ -339,6 +410,7 @@ let _ =
   let chan = open_out file in
 gnu_header chan;
 List.iter (document_group chan) intrinsic_groups;
+Printf.fprintf chan %s\n crypto_doc;
 close_out chan
 with Sys_error sys -
   failwith (Could not create output file  ^ file ^ :  ^ sys)

[PATCH][ARM][0/3] Implement crypto intrinsics in AArch32 ARMv8-A

2013-12-04 Thread Kyrill Tkachov

Hi all,

This patch series implements the new arm_neon.h intrinsics that map down to the 
ARMv8-A cryptographic instructions. The instructions are considered to be part 
of NEON and they can be enabled by specifying -mfpu=crypto-neon-fp-armv8 (of 
course we still need the hard or softfp float ABI).


Two of the intrinsics: vmull_p64 and vmull_high_p64 use the new poly64_t and 
poly128_t types and therefore these patches also add support for these types and 
most of the intrinsics associated with creating, reinterpreting, loading, 
storing and extracting these types. Most of these auxiliary intrinsics are 
autogenerated from the neon.ml scripts in the arm backend, but some had to be 
hardcoded because they don't follow a regular pattern.
Note that these types and intrinsics are not available unless you specify the 
crypto-neon-fp-armv8 FPU.


The __ARM_FEATURE_CRYPTO feature test macro is defined and is used throughout 
arm_neon.h to gate the new types and intrinsics.



Patches 2 and 3 add the testsuite and documentation respectively. Most of it is 
autogenerated.


Bootstrapped on arm-none-linux-gnueabihf and tested on a model.

Note, this patch series' context depends on the CRC32 intrinsics patch that is 
in review at:

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02351.html

Thanks,
Kyrill


P.S. These patches only touch the arm backend and do not affect any other parts 
of the compiler.





Re: [patch] combine ICE fix

2013-12-04 Thread Kenneth Zadeck

On 12/03/2013 02:38 PM, Jeff Law wrote:

On 12/03/13 12:25, Kenneth Zadeck wrote:

On 12/03/2013 01:52 PM, Mike Stump wrote:

On Dec 2, 2013, at 10:26 PM, Jeff Law l...@redhat.com wrote:

On 11/27/13 17:13, Cesar Philippidis wrote:
I looked into adding support for incremental DF scanning from 
df*.[ch]

in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under 
this

category (df-core.c):

c) If the pass modifies insns several times, this incremental
   updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be 
deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to 
SET_INSN_DELETED

take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

I think at this stage of the release cycle, converting combine to
incremental DF is probably a no-go.  However, we should keep it in
mind for the future -- while hairy I'd really like to see that happen
in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure
he sees it.

it is the tendency to undo things (i would use the word frequently
rather than) occasionally that kept me from doing this years ago.
Shove a bunch of things together, simplify, then try to recognize the 
result.  If that fails, undo everything.


In theory, this could be replaced by making a copy of the original, 
doing the combination/simplification, then recognition. If successful, 
then update DF and remove the original I3, if not successful, drop the 
copy.  That avoids the undo nonsense.


jeff

that could certainly work.


Re: [PATCH/AARCH64 3/6] Fix up multi-lib options

2013-12-04 Thread Yufeng Zhang

Looks good to me, but I cannot approve it.

Yufeng

On 12/03/13 21:24, Andrew Pinski wrote:


Hi,
   The arguments to --with-multilib-list for AARCH64 are exclusive but 
currently is being treated as ones which are not.  This causes problems in that 
we get four library sets with --with-multilib-list=lp64,ilp32: empty, lp64, 
ilp32, lp64/ilp32.  The first and last one does not make sense and should not 
be there.

This patch changes the definition of MULTILIB_OPTIONS so we have a / inbetween 
the options rather than a space.

OK?  Build and tested on aarch64-elf with both --with-multilib-list=lp64,ilp32 
and without it.

Thanks,
Andrew Pinski

* config/aarch64/t-aarch64 (MULTILIB_OPTIONS): Fix definition so
that options are conflicting ones.
---
  gcc/ChangeLog|2 +-
  gcc/config/aarch64/t-aarch64 |2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

iff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 9f8d8cd..98a30d8 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -41,5 +41,5 @@ aarch-common.o: $(srcdir)/config/arm/aarch-common.c 
$(CONFIG_H) $(SYSTEM_H) \
$(srcdir)/config/arm/aarch-common.c

  comma=,
-MULTILIB_OPTIONS= $(patsubst %, mabi=%, $(subst $(comma), 
,$(TM_MULTILIB_CONFIG)))
+MULTILIB_OPTIONS= $(subst $(comma),/, $(patsubst %, mabi=%, $(subst 
$(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG
  MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))





[commited] Fix up testcase

2013-12-04 Thread Marek Polacek
I'm applying the following as obvious, GCC 4.7 doesn't grok -Wpedantic.
Sorry for not testing that properly.

2013-12-04  Marek Polacek  pola...@redhat.com

PR c/59351
testsuite/
* gcc.dg/pr59351.c: Use -pedantic instead of -Wpedantic.

--- gcc/testsuite/gcc.dg/pr59351.c.mp3  2013-12-04 16:49:17.232824975 +0100
+++ gcc/testsuite/gcc.dg/pr59351.c  2013-12-04 16:49:30.380873769 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -std=c99 -Wpedantic } */
+/* { dg-options -std=c99 -pedantic } */
 
 unsigned int
 foo (void)

Marek


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread FX
  Well, it regresses against 4.8, so it still is a P1 regression.
 
 Does anyone care?


Well, you’re one of the maintainers of libsanitizer for GCC, so if you do not 
care about regressions in your code, it makes little sense for GCC (the whole 
project) to keep libsanitizer.

I’ve posted this regression a month ago, it was not addressed. I’m not sure 
under what specific arrangement libsanitizer was added to GCC, but in general 
there is a responsibility of maintainers not to break bootstrap in their code. 
Yes, it’s a cost, and if you are not willing to do it, why did you contribute 
in the first place?

Or is it a “hit and run” approach to maintainership?

FX

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread Tejas Belagod

Richard Sandiford wrote:

Tejas Belagod tbela...@arm.com writes:

Richard Sandiford wrote:

Tejas Belagod tbela...@arm.com writes:

The problem is that one reg rtx can span several hard registers.
E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
but it might instead represent two 32-bit registers (nos. 32 and 33).
Obviously the latter's not very likely for vectors this small,
but more likely for larger ones (including on NEON IIRC).

So if we had 2 32-bit registers being treated as a V4HI, it would be:

   --3233--
   msb  lsb
   
   
   
   msb  lsb
   --32--

for big endian and:

   --3332--
   msb  lsb
   
   
   
   msb  lsb
   --32--

for little endian.

Ah, ok, that makes things clearer. Thanks for that.

I can't find any helper function that figures out if we're writing
partial or
full result regs. Would something like

 REGNO (src) == REGNO (dst) 
 HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1

be a sane check for partial result regs?

Yeah, that should work.  I think a more general alternative would be:

  simplify_subreg_regno (REGNO (src), GET_MODE (src),
 offset, GET_MODE (dst)) == (int) REGNO (dst)

where:

  offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))

That offset is the byte offset of the first selected element from the
start of a vector in memory, which is also the way that SUBREG_BYTEs
are counted.  For little-endian it gives the offset of the lsb of the
slice, while for big-endian it gives the offset of the msb (which is
also how SUBREG_BYTEs work).

The simplify_subreg_regno should cope with both single-register vectors
and multi-register vectors.

Sorry for the delayed response to this.

Thanks for the tip. Here's an improved patch that implements the 
simplify_sureg_regno () method of eliminating redundant moves. Regarding the 
test case, I failed to get the ppc back-end to generate RTL pattern that this 
patch checks for. I can easily write a test case for aarch64(big and little 
endian) on these lines


typedef float float32x4_t __attribute__ ((__vector_size__ (16)));

float foo_be (float32x4_t x)
{
   return x[3];
}

float foo_le (float32x4_t x)
{
   return x[0];
}

where I know that the vector indexing will generate a vec_select on
the same src and dst regs that could be optimized away and hence test
it. But I'm struggling to get a test case that the ppc altivec
back-end will generate such a vec_select for. I see that altivec does
not define vec_extract, so a simple indexing like this seems to happen
via memory. Also, I don't know enough about the ppc PCS or
architecture to write a test that will check for this optimization
opportunity on same src and dst hard-registers. Any hints?


Me neither, sorry.

FWIW, the MIPS tests:

  typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
  void bar (float);
  void foo_be (float32x2_t x) { bar (x[1]); }
  void foo_le (float32x2_t x) { bar (x[0]); }

also exercise it, but I don't think they add anything over the aarch64
versions.  I can add them to the testsuite anyway if it helps though.


diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..ca25ce5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
   dst = SUBREG_REG (dst);
 }
 
+  /* It is a NOOP if destination overlaps with selected src vector

+ elements.  */
+  if (GET_CODE (src) == VEC_SELECT
+   REG_P (XEXP (src, 0))  REG_P (dst)
+   HARD_REGISTER_P (XEXP (src, 0))
+   HARD_REGISTER_P (dst))
+{
+  rtx par = XEXP (src, 1);
+  rtx src0 = XEXP (src, 0);
+  HOST_WIDE_INT offset =
+   GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
+
+  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
+   offset, GET_MODE (dst)) == (int)REGNO (dst);
+}
+


Since this also (correctly) triggers for vector results, we need to keep
the check for consecutive indices that you had originally.  (It's always
the first index that should be used for the simplify_subreg_regno though.)

Looks good to me otherwise, thanks.


Thanks Richard. Here is a revised patch. Sorry about the delay - I was 
investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated to 
this patch. I've added a test case that I expect to pass for aarch64. I've also 
added the tests that you suggested for MIPS, but haven't checked for the target 
because I'm not sure what optimizations happen on MIPS.


OK for trunk?

Thanks,
Tejas.

2013-12-04  Tejas Belagod  tejas.bela...@arm.com

gcc/
* rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select
for overlapping register lanes.

testsuite/
* config/gcc.dg/vect/vect-nop-move.c: New.

diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..e1388c8 100644
--- 

[PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Ian Bolton
Hi,

Currently, on ARM, you have to either call abort() or raise(SIGTRAP)
to achieve a handy crash.

This patch allows you to instead call __builtin_trap() which is much
more efficient at falling over because it becomes just a single
instruction that will trap for you.

Two testcases have been added (for ARM and Thumb) and both pass.


Note: This is a modified version of a patch originally submitted by Mark
Mitchell back in 2010, which came in response to PR target/59091.

http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091

The main update, other than cosmetic differences, is that we've chosen
the same ARM encoding as LLVM for practical purposes.  (The Thumb
encoding in Mark's patch already matched LLVM.)


OK for trunk?

Cheers,
Ian


2013-12-04  Ian Bolton  ian.bol...@arm.com
Mark Mitchell  m...@codesourcery.com

gcc/
* config/arm/arm.md (trap): New pattern.
* config/arm/types.md: Added a type for trap.

testsuite/
* gcc.target/arm/builtin-trap.c: New test.
* gcc.target/arm/thumb-builtin-trap.c: Likewise.
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index dd73366..3b7a827 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9927,6 +9927,22 @@
(set_attr type mov_reg)]
 )
 
+(define_insn trap
+  [(trap_if (const_int 1) (const_int 0))]
+  
+  *
+  if (TARGET_ARM)
+return \.inst\\t0xe7f000f0\;
+  else
+return \.inst\\t0xdeff\;
+  
+  [(set (attr length)
+   (if_then_else (eq_attr is_thumb yes)
+ (const_int 2)
+ (const_int 4)))
+   (set_attr type trap)]
+)
+
 
 ;; Patterns to allow combination of arithmetic, cond code and shifts
 
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 1c4b9e3..6351f08 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -152,6 +152,7 @@
 ; store2 store 2 words to memory from arm registers.
 ; store3 store 3 words to memory from arm registers.
 ; store4 store 4 (or more) words to memory from arm registers.
+; trap   cause a trap in the kernel.
 ; udiv   unsigned division.
 ; umaal  unsigned multiply accumulate accumulate long.
 ; umlal  unsigned multiply accumulate long.
@@ -645,6 +646,7 @@
   store2,\
   store3,\
   store4,\
+  trap,\
   udiv,\
   umaal,\
   umlal,\
diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/builtin-trap.c
new file mode 100644
index 000..4ff8d25
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm32 } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */
diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c 
b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
new file mode 100644
index 000..22e90e7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options -mthumb } */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void
+trap ()
+{
+  __builtin_trap ();
+}
+
+/* { dg-final { scan-assembler 0xdeff } } */


Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 8:06 AM, Tejas Belagod tbela...@arm.com wrote:
 Thanks Richard. Here is a revised patch. Sorry about the delay - I was
 investigating to make sure an LRA ICE I was seeing on aarch64 was unrelated
 to this patch. I've added a test case that I expect to pass for aarch64.
 I've also added the tests that you suggested for MIPS, but haven't checked
 for the target because I'm not sure what optimizations happen on MIPS.

 OK for trunk?

 Thanks,
 Tejas.

 2013-12-04  Tejas Belagod  tejas.bela...@arm.com


 gcc/
 * rtlanal.c (set_noop_p): Return nonzero in case of redundant
 vec_select
 for overlapping register lanes.

 testsuite/
 * config/gcc.dg/vect/vect-nop-move.c: New.


 diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
 index 0cd0c7e..e1388c8 100644
 --- a/gcc/rtlanal.c
 +++ b/gcc/rtlanal.c
 @@ -1180,6 +1180,26 @@ set_noop_p (const_rtx set)
dst = SUBREG_REG (dst);
  }

 +  /* It is a NOOP if destination overlaps with selected src vector
 + elements.  */
 +  if (GET_CODE (src) == VEC_SELECT
 +   REG_P (XEXP (src, 0))  REG_P (dst)
 +   HARD_REGISTER_P (XEXP (src, 0))
 +   HARD_REGISTER_P (dst))
 +{
 +  int i;
 +  rtx par = XEXP (src, 1);
 +  rtx src0 = XEXP (src, 0);
 +  int c0 = INTVAL (XVECEXP (par, 0, 0));
 +  HOST_WIDE_INT offset = GET_MODE_UNIT_SIZE (GET_MODE (src0)) * c0;
 +
 +  for (i = 1; i  XVECLEN (par, 0); i++)
 +   if (INTVAL (XVECEXP (par, 0, i)) != c0 + i)
 + return 0;
 +  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
 +   offset, GET_MODE (dst)) == (int)REGNO
 (dst);
 +}
 +
return (REG_P (src)  REG_P (dst)
REGNO (src) == REGNO (dst));
  }
 diff --git a/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
 b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
 new file mode 100644
 index 000..1941933
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vect/vect-nop-move.c
 @@ -0,0 +1,64 @@
 +/* { dg-do run } */
 +/* { dg-require-effective-target vect_float } */
 +/* { dg-options -O3 -fdump-rtl-combine-details } */
 +
 +extern void abort (void);
 +
 +#define NOINLINE __attribute__((noinline))
 +
 +typedef float float32x4_t __attribute__ ((__vector_size__ (16)));
 +typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
 +
 +NOINLINE float
 +foo32x4_be (float32x4_t x)
 +{
 +  return x[3];
 +}
 +
 +NOINLINE float
 +foo32x4_le (float32x4_t x)
 +{
 +  return x[0];
 +}
 +
 +NOINLINE float
 +bar (float a)
 +{
 +  return a;
 +}
 +
 +NOINLINE float
 +foo32x2_be (float32x2_t x)
 +{
 +  return bar (x[1]);
 +}
 +
 +NOINLINE float
 +foo32x2_le (float32x2_t x)
 +{
 +  return bar (x[0]);
 +}
 +
 +int
 +main()
 +{
 +  float32x4_t a = { 0.0f, 1.0f, 2.0f, 3.0f };
 +  float32x2_t b = { 0.0f, 1.0f };
 +
 +  if (foo32x4_be (a) != 3.0f)
 +abort ();
 +
 +  if (foo32x4_le (a) != 0.0f)
 +abort ();
 +
 +  if (foo32x2_be (b) != 1.0f)
 +abort ();
 +
 +  if (foo32x2_le (b) != 0.0f)
 +abort ();
 +
 +  return 0;
 +}
 +
 +/* { dg-final { scan-rtl-dump deleting noop move combine { target
 aarch64*-*-* } } } */

Any particular reason why it doesn't work for x86?

 +/* { dg-final { cleanup-rtl-dump combine } } */

Thanks.

-- 
H.J.


Re: [PATCH] Fix force_to_mode not to modify in place the passed rtl (PR rtl-optimization/58726)

2013-12-04 Thread Jeff Law

On 12/04/13 03:40, Richard Biener wrote:

On Wed, Dec 4, 2013 at 11:07 AM, Eric Botcazou ebotca...@adacore.com wrote:

Fixed by making sure force_to_mode doesn't modify x in place.


I think that it's the way to go, force_to_mode doesn't modify its argument
except for these 2 cases.  I'm not sure what the story is, but calling SUBST
for these 2 cases doesn't seem really necessary.


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8?

2013-12-03  Jakub Jelinek  ja...@redhat.com

   PR rtl-optimization/58726
   * combine.c (force_to_mode): Fix comment typo.  Don't destructively
   modify x for ROTATE, ROTATERT and IF_THEN_ELSE.

   * gcc.c-torture/execute/pr58726.c: New test.


IMO it's the best fix at this point of the release cycles.


I agree.
I can live with the nagging feeling that we've got a deeper problem here 
:-)  So I won't object to this approach.


jeff



Re: [PATCH/AARCH64 6/6] Support ILP32 multi-lib

2013-12-04 Thread Yufeng Zhang
I think together with this patch, the default value for 
--with-multilib-list when it is absent can be updated to lp64,ilp32 
from lp64 only.  This will make the multi-lib default setting on 
aarch64*-*-linux* consist that on aarch64*-*-elf.  See gcc/config.gcc.


Thanks,
Yufeng

P.S. Copypaste related configury snippet.

aarch64*-*-linux*)
tm_file=${tm_file} dbxelf.h elfos.h gnu-user.h linux.h 
glibc-stdint.h

tm_file=${tm_file} aarch64/aarch64-elf.h aarch64/aarch64-linux.h
tmake_file=${tmake_file} aarch64/t-aarch64 
aarch64/t-aarch64-linux

case $target in
aarch64_be-*)
tm_defines=${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1
;;
esac
aarch64_multilibs=${with_multilib_list}
if test $aarch64_multilibs = default; then
# TODO: turn on ILP32 multilib build after its support 
is mature.

# aarch64_multilibs=lp64,ilp32
aarch64_multilibs=lp64
fi


On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   This is the final patch which adds support for the dynamic linker and
multi-lib directories for ILP32.  I did not change multi-arch support as
I did not know what it should be changed to and internally here at Cavium,
we don't use multi-arch.


OK?  Build and tested for aarch64-linux-gnu with and without 
--with-multilib-list=lp64,ilp32.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): 
/lib/ld-linux32-aarch64.so.1
is used for ILP32.
(LINUX_TARGET_LINK_SPEC): Add linker script
 file whose name depends on -mabi= and -mbig-endian.
* config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 
better
and handle ilp32 too.
(MULTILIB_OPTIONS): Delete.
(MULTILIB_DIRNAMES): Delete.
---
  gcc/ChangeLog  |   11 +++
  gcc/config/aarch64/aarch64-linux.h |5 +++--
  gcc/config/aarch64/t-aarch64-linux |7 ++-
  3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 83efad4..408297a 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -21,7 +21,7 @@
  #ifndef GCC_AARCH64_LINUX_H
  #define GCC_AARCH64_LINUX_H

-#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64.so.1
+#define GLIBC_DYNAMIC_LINKER /lib/ld-linux%{mabi=ilp32:32}-aarch64.so.1

  #define CPP_SPEC %{pthread:-D_REENTRANT}

@@ -32,7 +32,8 @@
 %{rdynamic:-export-dynamic}\
 -dynamic-linker  GNU_USER_DYNAMIC_LINKER   \
 -X \
-   %{mbig-endian:-EB} %{mlittle-endian:-EL}
+   %{mbig-endian:-EB} %{mlittle-endian:-EL}\
+   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}

  #define LINK_SPEC LINUX_TARGET_LINK_SPEC

diff --git a/gcc/config/aarch64/t-aarch64-linux 
b/gcc/config/aarch64/t-aarch64-linux
index ca1525e..5032ea9 100644
--- a/gcc/config/aarch64/t-aarch64-linux
+++ b/gcc/config/aarch64/t-aarch64-linux
@@ -22,10 +22,7 @@ LIB1ASMSRC   = aarch64/lib1funcs.asm
  LIB1ASMFUNCS = _aarch64_sync_cache_range

  AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be)
-MULTILIB_OSDIRNAMES = .=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
+MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
  MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu)

-# Disable the multilib for linux-gnu targets for the time being; focus
-# on the baremetal targets.
-MULTILIB_OPTIONS=
-MULTILIB_DIRNAMES   =
+MULTILIB_OSDIRNAMES += mabi.ilp32=../lib32




Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Ian Lance Taylor
On Wed, Dec 4, 2013 at 8:04 AM, FX fxcoud...@gmail.com wrote:
  Well, it regresses against 4.8, so it still is a P1 regression.

 Does anyone care?


 Well, you’re one of the maintainers of libsanitizer for GCC, so if you do not 
 care about regressions in your code, it makes little sense for GCC (the whole 
 project) to keep libsanitizer.

 I’ve posted this regression a month ago, it was not addressed. I’m not sure 
 under what specific arrangement libsanitizer was added to GCC, but in general 
 there is a responsibility of maintainers not to break bootstrap in their 
 code. Yes, it’s a cost, and if you are not willing to do it, why did you 
 contribute in the first place?

 Or is it a “hit and run” approach to maintainership?

I believe this is a case where the GCC project gets more benefit from
libsanitizer than libsanitizer gets from being part of the GCC
project.  We should work with the libsanitizer developers to make this
work, not just push everything back on them.

Ian


Re: [wwwdocs] Update obvious fix commit policy

2013-12-04 Thread Jeff Law

On 12/04/13 07:20, Diego Novillo wrote:

On Tue, Dec 3, 2013 at 6:55 PM, Gerald Pfeifer ger...@pfeifer.com wrote:

On Thu, 28 Nov 2013, Richard Biener wrote:

Why remove ChangeLog files, web pages and comments?


I was going to complain about web pages being removed. :-)

On Thu, 28 Nov 2013, Diego Novillo wrote:

-pFixes for obvious typos in ChangeLog files, docs, web pages, comments
-and similar stuff.  Just check in the fix and copy it to
-codegcc-patches/code.  We don't want to get overly anal-retentive
-about checkin policies./p
+pObvious fixes can be committed without prior approval.  Just check
+in the fix and copy it to codegcc-patches/code.  A good test to
+determine whether a fix is obvious: qwill the person who objects to
+my work the most be able to find a fault with my fix?/q  If the fix
+is later found to be faulty, it can always be rolled back.  We don't
+want to get overly restrictive about checkin policies./p


I am in favor of this change.

To some extent, this is more a clarification of what I have seen as
our current policy than a change in policy, though to a laywer-minded
person it surely looks like the latter.  Not sure what kind of approval
this needs?  Mind it has.


I have not received any feedback against this change. I will wait
another 48 hours and commit.

Here's feedback.  Install it now :-)

jeff


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread FX
 I believe this is a case where the GCC project gets more benefit from
 libsanitizer than libsanitizer gets from being part of the GCC
 project.  We should work with the libsanitizer developers to make this
 work, not just push everything back on them.

You’re vastly better qualified than me to make this assessment, of course. My 
point is: unless someone (or multiple someones) is actually responsible for the 
thing, it cannot just work out of a sense of “someone should really do 
something about it”.

The merge model of “we can break any target, except the single one we’re 
testing, every time we merge” seems poised for failure.

FX

Re: libsanitizer merge from upstream r196090

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 8:41 AM, Ian Lance Taylor i...@google.com wrote:
 On Wed, Dec 4, 2013 at 8:04 AM, FX fxcoud...@gmail.com wrote:
  Well, it regresses against 4.8, so it still is a P1 regression.

 Does anyone care?


 Well, you’re one of the maintainers of libsanitizer for GCC, so if you do 
 not care about regressions in your code, it makes little sense for GCC (the 
 whole project) to keep libsanitizer.

 I’ve posted this regression a month ago, it was not addressed. I’m not sure 
 under what specific arrangement libsanitizer was added to GCC, but in 
 general there is a responsibility of maintainers not to break bootstrap in 
 their code. Yes, it’s a cost, and if you are not willing to do it, why did 
 you contribute in the first place?

 Or is it a “hit and run” approach to maintainership?

 I believe this is a case where the GCC project gets more benefit from
 libsanitizer than libsanitizer gets from being part of the GCC
 project.  We should work with the libsanitizer developers to make this
 work, not just push everything back on them.


I think libsanitizer should be disabled automatically if kernel or glibc are
too old.

BTW, fixincludes should fix the bad kernel header files from SuSE.


-- 
H.J.


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread FX
 I think libsanitizer should be disabled automatically if kernel or glibc are
 too old.

I think pretty much everyone agrees. But noone’s willing to put forward a 
patch, and so far the libsanitizer maintainers have failed to even document the 
requirements. (There are also binutils requirements, as I learnt the hard way.)

FX

Re: libsanitizer merge from upstream r196090

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 8:50 AM, FX fxcoud...@gmail.com wrote:
 I think libsanitizer should be disabled automatically if kernel or glibc are
 too old.

 I think pretty much everyone agrees. But noone’s willing to put forward a 
 patch,

What are the agreed-upon minimum kernel and glibc? I
can give it a try.

 and so far the libsanitizer maintainers have failed to even document the 
 requirements. (There are also binutils requirements, as I learnt the hard 
 way.)


What is the minimum binutils for libsanitizer?


-- 
H.J.


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 08:47:41AM -0800, H.J. Lu wrote:
  I believe this is a case where the GCC project gets more benefit from
  libsanitizer than libsanitizer gets from being part of the GCC
  project.  We should work with the libsanitizer developers to make this
  work, not just push everything back on them.
 
 
 I think libsanitizer should be disabled automatically if kernel or glibc are
 too old.

For very old I agree, I just strongly disagree with saying that anything
older than a year and half is too old.
So, as very old and unsupportable I'd probably consider e.g. Linux kernels
without futex support, libsanitizer apparently uses those in various places
and doesn't have a fallback.  The question is how to do that though, because
libraries are now disabled through lib*/configure.tgt UNSUPPORTED=1, and
that is sourced in by toplevel configure, so any configure checks would need
to be in toplevel configure.  Or of course, we could in those cases
configure the libsanitizer directory, but just decide not to build anything
in there.

Anyway, my preference right now would be if the ppc32 bits would be
acceptable to Kostya (either by committing them upstream or just applying
them as GCC local change for the time being), so that we don't break
bootstrap on powerpc*-linux*, add those and commit the merge, then deal with
the older kernel headers through include/linux subdirectory (I'll work on
it), very old headers through configure, the CFI I hope Kostya would accept
some macro, even if it is always enabled in the compiler-rt build and just
GCC can disable .cfi_* addition if compiler doesn't use those, and then
we can start fixing rest of portability issues.

Jakub


[PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.

2013-12-04 Thread Kirill Yukhin
Hello,

MSVC and ICC (currently Windows version, Linux version soon) have
dedicated intrinsics to read/set EFLAGS register ([1], [2]).

Patch introduces these intrinsics and tests for them.

Bootstrapped. New tests pass.
Although gate is closed patch is obvious.
So, is it ok for trunk?

ChangeLog/
* config/i386/ia32intrin.h (__readeflags): New.
(__writeeflags): Ditto.

testsuite/ChangeLog/
* gcc.target/i386/readeflags-1.c: New.
* gcc.target/i386/writeeflags-1.c: Ditto.

[1] - http://msdn.microsoft.com/en-us/library/aa983406(v=vs.90).aspx
[2] - http://msdn.microsoft.com/en-us/library/aa983392(v=vs.90).aspx

--
Thanks, K

diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h
index b26dc46..c9e68c5 100644
--- a/gcc/config/i386/ia32intrin.h
+++ b/gcc/config/i386/ia32intrin.h
@@ -238,6 +238,34 @@ __rorq (unsigned long long __X, int __C)
   return (__X  __C) | (__X  (64 - __C));
 }
 
+/* Read flags register */
+extern __inline unsigned long long
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__readeflags (void)
+{
+  unsigned long long result = 0;
+  __asm__ __volatile__ (pushf\n\t
+   popq %0\n
+   :=r(result)
+   :
+   :
+   );
+  return result;
+}
+
+/* Write flags register */
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__writeeflags (unsigned long long X)
+{
+  __asm__ __volatile__ (pushq %0\n\t
+   popf\n
+   :
+   :r(X)
+   :flags
+   );
+}
+
 #define _bswap64(a)__bswapq(a)
 #define _popcnt64(a)   __popcntq(a)
 #define _lrotl(a,b)__rolq((a), (b))
@@ -245,6 +273,35 @@ __rorq (unsigned long long __X, int __C)
 #else
 #define _lrotl(a,b)__rold((a), (b))
 #define _lrotr(a,b)__rord((a), (b))
+
+/* Read flags register */
+extern __inline unsigned int
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__readeflags (void)
+{
+  unsigned int result = 0;
+  __asm__ __volatile__ (pushf\n\t
+   popl %0\n
+   :=r(result)
+   :
+   :
+   );
+  return result;
+}
+
+/* Write flags register */
+extern __inline void
+__attribute__((__gnu_inline__, __always_inline__, __artificial__))
+__writeeflags (unsigned int X)
+{
+  __asm__ __volatile__ (pushl %0\n\t
+   popf\n
+   :
+   :r(X)
+   :flags
+   );
+}
+
 #endif
 
 #define _bit_scan_forward(a)   __bsfd(a)
diff --git a/gcc/testsuite/gcc.target/i386/readeflags-1.c 
b/gcc/testsuite/gcc.target/i386/readeflags-1.c
new file mode 100644
index 000..6b2fa7e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/readeflags-1.c
@@ -0,0 +1,40 @@
+/* { dg-do run } */
+/* { dg-options -O0 } */
+
+#include x86intrin.h
+
+#ifdef __x86_64__
+#define EFLAGS_TYPE unsigned long long int
+#else
+#define EFLAGS_TYPE unsigned int
+#endif
+
+static EFLAGS_TYPE
+readeflags_test (unsigned int a, unsigned int b)
+{
+  unsigned x = (a == b);
+  return __readeflags ();
+}
+
+int
+main ()
+{
+  EFLAGS_TYPE flags;
+
+  flags = readeflags_test (100, 100);
+
+  if ((flags  1) != 0)  /* Read CF */
+abort ();
+
+  flags = readeflags_test (100, 101);
+
+  if ((flags  1) == 0)  /* Read CF */
+abort ();
+
+#ifdef DEBUG
+printf (PASSED\n);
+#endif
+
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/writeeflags-1.c 
b/gcc/testsuite/gcc.target/i386/writeeflags-1.c
new file mode 100644
index 000..446840c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/writeeflags-1.c
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options -O0 } */
+
+#include x86intrin.h
+
+#ifdef __x86_64__
+#define EFLAGS_TYPE unsigned long long int
+#else
+#define EFLAGS_TYPE unsigned int
+#endif
+
+int
+main ()
+{
+  EFLAGS_TYPE flags = 0xD7; /* 111010111b  */
+
+  __writeeflags (flags);
+
+  flags = __readeflags ();
+
+  if ((flags  0xFF) != 0xD7)
+abort ();
+
+#ifdef DEBUG
+printf (PASSED\n);
+#endif
+
+  return 0;
+}
+


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 8:58 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Dec 04, 2013 at 08:47:41AM -0800, H.J. Lu wrote:
  I believe this is a case where the GCC project gets more benefit from
  libsanitizer than libsanitizer gets from being part of the GCC
  project.  We should work with the libsanitizer developers to make this
  work, not just push everything back on them.
 

 I think libsanitizer should be disabled automatically if kernel or glibc are
 too old.

 For very old I agree, I just strongly disagree with saying that anything
 older than a year and half is too old.
 So, as very old and unsupportable I'd probably consider e.g. Linux kernels
 without futex support, libsanitizer apparently uses those in various places
 and doesn't have a fallback.  The question is how to do that though, because
 libraries are now disabled through lib*/configure.tgt UNSUPPORTED=1, and
 that is sourced in by toplevel configure, so any configure checks would need
 to be in toplevel configure.  Or of course, we could in those cases
 configure the libsanitizer directory, but just decide not to build anything
 in there.


The kernel and glibc check should be done at the toplevel.
So what are the minimum kernel and glibc we want to
support?

-- 
H.J.


Re: [PATCH 1/2] Implement -fsanitize=signed-integer-overflow (generic parts)

2013-12-04 Thread Jeff Law

On 12/04/13 06:44, Marek Polacek wrote:

This is a repost of rebased version of the signed-integer-overflow
patch, split into generic parts and i?86 parts.  By i?86 parts I mean
the stuff that resides in config/i386, I haven't really tried to
untangle it more.
Except the two formatting fixes I also moved various PROB_ macros into
predict.h and made the users include it, rather than duplicating
the defines everywhere.

Regtested/bootstrapped on x86_64-linux.  Ok for trunk?
Yes, it's OK.  If it works without the x86 backend changes, you can 
install it now.  If it requires the x86 backend changes, wait until 
those are approved and check in both together.


jeff



Re: [PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Joseph S. Myers
On Wed, 4 Dec 2013, Ian Bolton wrote:

 The main update, other than cosmetic differences, is that we've chosen
 the same ARM encoding as LLVM for practical purposes.  (The Thumb
 encoding in Mark's patch already matched LLVM.)

Do the encodings match what plain udf does in recent-enough gas (too 
recent for us to assume it in GCC or glibc for now), or is it something 
else?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [wwwdocs] Update obvious fix commit policy

2013-12-04 Thread Diego Novillo
On Wed, Dec 4, 2013 at 11:24 AM, Jeff Law l...@redhat.com wrote:

 Here's feedback.  Install it now :-)

Works for me :)  Committed.

Diego.


Re: libsanitizer merge from upstream r196090

2013-12-04 Thread Joseph S. Myers
On Wed, 4 Dec 2013, H.J. Lu wrote:

 The kernel and glibc check should be done at the toplevel.
 So what are the minimum kernel and glibc we want to
 support?

Checking those at toplevel is tricky in general because you're checking 
something for the target rather than the host.  You'd need to move the 
logic from gcc/configure.ac to compute target_header_dir and 
glibc_version_*, and GCC_GLIBC_VERSION_GTE_IFELSE from gcc/acinclude.m4, 
to something in toplevel config/ (and that logic depends on lots of other 
things in gcc/configure.ac).

For binutils it's both easier to check (although the logic for binutils is 
also in gcc/acinclude.m4 at present) and more reasonable to require 
comparatively recent versions (for targets using binutils, which should 
cover everything supporting libsanitizer except Darwin) - I think there 
should be a minimum binutils version requirement generally when binutils 
is used with GCC, so we can reduce the need for conditionals on binutils 
features (unless of course the conditional code is still needed to support 
non-GNU assemblers and linkers for some target).

It can be useful to build new tools for a target with old kernel and glibc 
in order to build binaries that will work on systems with a wide range of 
glibc versions.  The oldest kernel and glibc versions I've used in that 
context with any post-4.3 GCC have been Linux 2.6.16 and glibc 2.4 (but 
the kernel headers were more recent than that, and this use case for old 
sysroots does *not* mean libsanitizer should necessarily be supported for 
them, simply that it's useful for the compiler and those libraries that 
may be used in production applications to be supported).  If GCC were to 
desupport e.g. glibc before 2.4 you still get to deal with other libraries 
such as uClibc which pretends to be an older glibc (but again, you may 
well declare it unsupported for libsanitizer).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread Jeff Law

On 12/04/13 09:14, H.J. Lu wrote:


+
+/* { dg-final { scan-rtl-dump deleting noop move combine { target
aarch64*-*-* } } } */


Any particular reason why it doesn't work for x86?
I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for 
the obvious reason.


jeff



Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 9:29 AM, Jeff Law l...@redhat.com wrote:
 On 12/04/13 09:14, H.J. Lu wrote:

 +
 +/* { dg-final { scan-rtl-dump deleting noop move combine { target
 aarch64*-*-* } } } */


 Any particular reason why it doesn't work for x86?

 I don't think so.  I'm pretty sure Tejas is focused on ARM platforms for the
 obvious reason.


Then please add i?86-*-* x86_64-*-*.

Thanks.

-- 
H.J.


Re: [PATCH] Use DW_LANG_D for D

2013-12-04 Thread Iain Buclaw
On 3 December 2013 19:42, Cary Coutant ccout...@google.com wrote:
 This patches gen_compile_unit_die to use the DW_LANG_D DWARF language
 code for D.  Is in relation to some other D language fixes that are
 going to be submitted to gdb.

 Is this for a private front end? I'm not aware of any front ends that
 set the language name to GNU D.

 Since it's so trivial, though, I have no problem with this patch for
 Stage 3 -- if you do have a separate front end that sets that language
 string, then it's arguably a bug fix. If this patch is preparation for
 more substantial changes to the GCC tree, however, I suspect you're
 going to need to wait for Stage 1 to reopen anyway.

 So, if this is a standalone patch, it's OK, but you also need a ChangeLog 
 entry.

 -cary

The frontend isn't private, but is currently external to GCC.

I've had plans to get the frontend merged for some time now.  And was
adviced last time I submitted the code for review to send patches that
can be merged into GCC prior to re-submitting the frontend - which as
you have already said will have to wait for Stage 1 to reopen.

Will make a changelog entry for the patch.

Regards
Iain.


Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread Richard Sandiford
Tejas Belagod tbela...@arm.com writes:
 Richard Sandiford wrote:
 Tejas Belagod tbela...@arm.com writes:
 Richard Sandiford wrote:
 Tejas Belagod tbela...@arm.com writes:
 The problem is that one reg rtx can span several hard registers.
 E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
 but it might instead represent two 32-bit registers (nos. 32 and 33).
 Obviously the latter's not very likely for vectors this small,
 but more likely for larger ones (including on NEON IIRC).

 So if we had 2 32-bit registers being treated as a V4HI, it would be:

--3233--
msb  lsb



msb  lsb
--32--

 for big endian and:

--3332--
msb  lsb



msb  lsb
--32--

 for little endian.
 Ah, ok, that makes things clearer. Thanks for that.

 I can't find any helper function that figures out if we're writing
 partial or
 full result regs. Would something like

  REGNO (src) == REGNO (dst) 
  HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1

 be a sane check for partial result regs?
 Yeah, that should work.  I think a more general alternative would be:

   simplify_subreg_regno (REGNO (src), GET_MODE (src),
  offset, GET_MODE (dst)) == (int) REGNO (dst)

 where:

   offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP (sel, 0))

 That offset is the byte offset of the first selected element from the
 start of a vector in memory, which is also the way that SUBREG_BYTEs
 are counted.  For little-endian it gives the offset of the lsb of the
 slice, while for big-endian it gives the offset of the msb (which is
 also how SUBREG_BYTEs work).

 The simplify_subreg_regno should cope with both single-register vectors
 and multi-register vectors.
 Sorry for the delayed response to this.

 Thanks for the tip. Here's an improved patch that implements the 
 simplify_sureg_regno () method of eliminating redundant moves. Regarding 
 the 
 test case, I failed to get the ppc back-end to generate RTL pattern
 that this
 patch checks for. I can easily write a test case for aarch64(big and little 
 endian) on these lines

 typedef float float32x4_t __attribute__ ((__vector_size__ (16)));

 float foo_be (float32x4_t x)
 {
return x[3];
 }

 float foo_le (float32x4_t x)
 {
return x[0];
 }

 where I know that the vector indexing will generate a vec_select on
 the same src and dst regs that could be optimized away and hence test
 it. But I'm struggling to get a test case that the ppc altivec
 back-end will generate such a vec_select for. I see that altivec does
 not define vec_extract, so a simple indexing like this seems to happen
 via memory. Also, I don't know enough about the ppc PCS or
 architecture to write a test that will check for this optimization
 opportunity on same src and dst hard-registers. Any hints?
 
 Me neither, sorry.
 
 FWIW, the MIPS tests:
 
   typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
   void bar (float);
   void foo_be (float32x2_t x) { bar (x[1]); }
   void foo_le (float32x2_t x) { bar (x[0]); }
 
 also exercise it, but I don't think they add anything over the aarch64
 versions.  I can add them to the testsuite anyway if it helps though.
 
 diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
 index 0cd0c7e..ca25ce5 100644
 --- a/gcc/rtlanal.c
 +++ b/gcc/rtlanal.c
 @@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
dst = SUBREG_REG (dst);
  }
  
 +  /* It is a NOOP if destination overlaps with selected src vector
 + elements.  */
 +  if (GET_CODE (src) == VEC_SELECT
 +   REG_P (XEXP (src, 0))  REG_P (dst)
 +   HARD_REGISTER_P (XEXP (src, 0))
 +   HARD_REGISTER_P (dst))
 +{
 +  rtx par = XEXP (src, 1);
 +  rtx src0 = XEXP (src, 0);
 +  HOST_WIDE_INT offset =
 +   GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0, 0));
 +
 +  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
 +   offset, GET_MODE (dst)) == (int)REGNO (dst);
 +}
 +
 
 Since this also (correctly) triggers for vector results, we need to keep
 the check for consecutive indices that you had originally.  (It's always
 the first index that should be used for the simplify_subreg_regno though.)
 
 Looks good to me otherwise, thanks.

 Thanks Richard. Here is a revised patch. Sorry about the delay - I was
 investigating to make sure an LRA ICE I was seeing on aarch64 was
 unrelated to this patch. I've added a test case that I expect to pass
 for aarch64. I've also added the tests that you suggested for MIPS,
 but haven't checked for the target because I'm not sure what
 optimizations happen on MIPS.

Thanks, looks good to me, but I can't approve it.  Just one minor
formatting nit:

 +  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
 + offset, GET_MODE (dst)) == (int)REGNO 

Re: [PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32

2013-12-04 Thread Yufeng Zhang

On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl
has a mode of SImode while the register is DImode.  So the target that gets
passed down to expand_builtin_thread_pointer is NULL as expand does not
know how to get a subreg for a pointer type.

This fixes the problem by handling a NULL target like we are able to handle
for a non register/correct mode target inside expand_builtin_thread_pointer.

OK?  Build and tested for aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* builtins.c (expand_builtin_thread_pointer): Create a new target
when the target is NULL.
---
  gcc/ChangeLog  |5 +
  gcc/builtins.c |2 +-
  2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f1c818..66797fa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target)
if (icode != CODE_FOR_nothing)
  {
struct expand_operand op;
-  if (!REG_P (target) || GET_MODE (target) != Pmode)
+  if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) != Pmode)
target = gen_reg_rtx (Pmode);
create_output_operand (op, target, Pmode);
expand_insn (icode, 1,op);


Shouldn't thread pointer have ptr_mode instead?  I'm aware that on 
AArch64 the thread pointer system register tpidr_el0 is 64-bit wide 
regardless of ILP32 or not, but in the abstracted view of AArch64 ILP32 
world, the thread pointer shall be a 32-bit pointer; the OS should have 
taken care of the hardware register tpidr_el0 by having its higher 32 
bits cleared.  I think expand_builtin_thread_pointer and 
expand_builtin_set_thread_pointer should use ptr_mode instead.  Correct 
me if I missed anything.


Add Chung-Lin Tang to the CC list; Chung-Lin wrote these builtins in r192364

Yufeng



Re: PR37132 – RFC patch for generation of DWARF symbol for Fortran's namelists (DW_TAG_namelist)

2013-12-04 Thread Cary Coutant
 gcc/
 2013-11-24  Tobias Burnus  bur...@net-b.de

   PR debug/37132
   * lto-streamer.h (LTO_tags): Add LTO_namelist_decl_ref.
   * tree.def (NAMELIST_DECL): Add.
   * tree.h (NAMELIST_DECL_ASSOCIATED_DECL): New macro.
   * tree.c (initialize_tree_contains_struct): Add asserts for it.
   * dwarf2out.c (gen_namelist_decl): New function.
   (gen_decl_die, dwarf2out_decl): Call it.
   (dwarf2out_imported_module_or_decl_1): Handle NAMELIST_DECL.
   * lto-streamer-in.c (lto_input_tree_ref): Handle NAMELIST_DECL.
   (lto_input_tree_ref, lto_input_tree_1): Update lto_tag_check_range
   call.
   * lto-streamer-out.c (lto_output_tree_ref): Handle NAMELIST_DECL.

 gcc/fortran
 2013-11-24  Tobias Burnus  bur...@net-b.de

   PR debug/37132
   * trans-decl.c (generate_namelist_decl, create_module_nml_decl):
   New static functions.
   (gfc_generate_module_vars, generate_local_vars): Call them.
   (gfc_trans_use_stmts): Handle namelists for debug genertion.

The DWARF parts of this patch are OK with me.

-cary


On Sun, Nov 24, 2013 at 2:12 AM, Tobias Burnus bur...@net-b.de wrote:
 Hi all,

 attached is an updated version of the patch.

 Change:


 Tobias Burnus wrote:

 But for USE mod_name, only: nml, one is supposed to generate a
 DW_TAG_imported_declaration. And there I am stuck. For normal variables, the
 DW_TAG_imported_declaration refers to a DW_TAG_variable die. Analogously,
 for a namelist one would have to refer to a DW_TAG_namelist die. But such
 DW_TAG_namelist comes with a DW_TAG_namelist_item list. And for the latter,
 one needs to have the die of all variables in the namelist. But with
 use-only the symbols aren't use associate and no decl or die exists.
 (Failing call tree with the patch: gfc_trans_use_stmts -
 dwarf2out_imported_module_or_decl_1 - force_decl_die.)


 With the attached patch, one now generates DW_TAG_namelist with no
 DW_TAG_namelist_item and sets DW_AT_declaration.

 Thus, for (first file)

   module mm

 integer :: ii
 real :: rr
 namelist /nml/ ii, rr
   end module mm


 and (second file):

   subroutine test
 use mm, only: nml
 write(*,nml)
   end subroutine test


 One now generates (first file):

  11e: Abbrev Number: 2 (DW_TAG_module)
 1f   DW_AT_name: mm
 22   DW_AT_decl_file   : 1
 23   DW_AT_decl_line   : 1
 24   DW_AT_sibling : 0x59
  228: Abbrev Number: 3 (DW_TAG_variable)
 29   DW_AT_name: ii
 2c   DW_AT_decl_file   : 1
 2d   DW_AT_decl_line   : 2
 2e   DW_AT_linkage_name: (indirect string, offset: 0x15): __mm_MOD_ii
 32   DW_AT_type: 0x59
 36   DW_AT_external: 1
 36   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0  (DW_OP_addr:
 0)
  240: Abbrev Number: 3 (DW_TAG_variable)
 41   DW_AT_name: rr
 44   DW_AT_decl_file   : 1
 45   DW_AT_decl_line   : 2
 46   DW_AT_linkage_name: (indirect string, offset: 0x9): __mm_MOD_rr
 4a   DW_AT_type: 0x60
 4e   DW_AT_external: 1
 4e   DW_AT_location: 9 byte block: 3 4 0 0 0 0 0 0 0  (DW_OP_addr:
 4)
  258: Abbrev Number: 0
  159: Abbrev Number: 4 (DW_TAG_base_type)
 5a   DW_AT_byte_size   : 4
 5b   DW_AT_encoding: 5(signed)
 5c   DW_AT_name: (indirect string, offset: 0x29):
 integer(kind=4)
  160: Abbrev Number: 4 (DW_TAG_base_type)
 61   DW_AT_byte_size   : 4
 62   DW_AT_encoding: 4(float)
 63   DW_AT_name: (indirect string, offset: 0x12c):
 real(kind=4)
  167: Abbrev Number: 5 (DW_TAG_namelist)
 68   DW_AT_name: nml
  26c: Abbrev Number: 6 (DW_TAG_namelist_item)
 6d   DW_AT_namelist_items: 0x28
  271: Abbrev Number: 6 (DW_TAG_namelist_item)
 72   DW_AT_namelist_items: 0x40

 Second file:

   24f: Abbrev Number: 3 (DW_TAG_imported_declaration)
 50   DW_AT_decl_file   : 1
 51   DW_AT_decl_line   : 2
 52   DW_AT_import  : 0x70   [Abbrev Number: 6 (DW_TAG_namelist)]
  256: Abbrev Number: 4 (DW_TAG_lexical_block)
 57   DW_AT_low_pc  : 0xb
 5f   DW_AT_high_pc : 0xb0
  267: Abbrev Number: 0
  168: Abbrev Number: 5 (DW_TAG_module)
 69   DW_AT_name: mm
 6c   DW_AT_declaration : 1
 6c   DW_AT_sibling : 0x76
  270: Abbrev Number: 6 (DW_TAG_namelist)
 71   DW_AT_name: nml
 75   DW_AT_declaration : 1
  275: Abbrev Number: 0


 Does the dumps look okay? For the first file, DW_TAG_namelist doesn't come
 directly after DW_TAG_module but after its sibling 0x59; does one still see
 that nml belongs to that module? (On dwarf2out level, context die should
 point to the module tag, but I don't understand the readelf/eu-readelf
 output well enough to see whether that's also the case for the generated
 dwarf.)

 I assume that the compiler can see from the DWARF of the second file that
 nml comes from module mm and doesn't search the value elsewhere. (It is
 possible to have multiple 

[PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Marek Polacek
In C99, one way how to deal with inline functions is to put definition
of the function into header:
inline void foo (void) { /* ... */ }
and put the declaration into exactly one .c file, with extern keyword
(it can also have inline keyword):
extern void foo (void);
But in this case, we shouldn't issue the missing prototype warning.
So the following should suppress that warning in C99 mode, when
-fgnu89-inline is not in effect.  (But the function could still have
the gnu_inline attribute, so it might be better to disable that
warning for all inline functions?)

Regtested/bootstrapped on x86_64-unknown-linux-gnu.  Ok for trunk?

2013-12-04  Marek Polacek  pola...@redhat.com

PR c/54113
c/
* c-decl.c (start_function): Don't warn for missing prototype for
inline functions in C99+.
testsuite/
* gcc.dg/pr54113.c: New test.

--- gcc/c/c-decl.c.mp3  2013-12-04 17:11:43.063878926 +0100
+++ gcc/c/c-decl.c  2013-12-04 18:32:17.567008028 +0100
@@ -7974,7 +7974,10 @@ start_function (struct c_declspecs *decl
old_decl != error_mark_node
TREE_PUBLIC (decl1)
!MAIN_NAME_P (DECL_NAME (decl1))
-   C_DECL_ISNT_PROTOTYPE (old_decl))
+   C_DECL_ISNT_PROTOTYPE (old_decl)
+   !(DECL_DECLARED_INLINE_P (decl1)
+flag_isoc99
+!flag_gnu89_inline))
 warning_at (loc, OPT_Wmissing_prototypes,
no previous prototype for %qD, decl1);
   /* Optionally warn of any def with no previous prototype
--- gcc/testsuite/gcc.dg/pr54113.c.mp3  2013-12-04 17:52:45.671288940 +0100
+++ gcc/testsuite/gcc.dg/pr54113.c  2013-12-04 17:36:43.0 +0100
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options -std=c99 } */
+
+inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */
+extern int foo(void);

Marek


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Marek Polacek
On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote:
 In C99, one way how to deal with inline functions is to put definition
 of the function into header:
 inline void foo (void) { /* ... */ }
 and put the declaration into exactly one .c file, with extern keyword
 (it can also have inline keyword):
 extern void foo (void);
 But in this case, we shouldn't issue the missing prototype warning.
 So the following should suppress that warning in C99 mode, when
 -fgnu89-inline is not in effect.  (But the function could still have
 the gnu_inline attribute, so it might be better to disable that
 warning for all inline functions?)
 
 Regtested/bootstrapped on x86_64-unknown-linux-gnu.  Ok for trunk?
 
 2013-12-04  Marek Polacek  pola...@redhat.com
 
   PR c/54113
 c/
   * c-decl.c (start_function): Don't warn for missing prototype for
   inline functions in C99+.
 testsuite/
   * gcc.dg/pr54113.c: New test.
 
 --- gcc/c/c-decl.c.mp32013-12-04 17:11:43.063878926 +0100
 +++ gcc/c/c-decl.c2013-12-04 18:32:17.567008028 +0100
 @@ -7974,7 +7974,10 @@ start_function (struct c_declspecs *decl
   old_decl != error_mark_node
   TREE_PUBLIC (decl1)
   !MAIN_NAME_P (DECL_NAME (decl1))
 - C_DECL_ISNT_PROTOTYPE (old_decl))
 + C_DECL_ISNT_PROTOTYPE (old_decl)
 + !(DECL_DECLARED_INLINE_P (decl1)
 +  flag_isoc99
 +  !flag_gnu89_inline))
  warning_at (loc, OPT_Wmissing_prototypes,
   no previous prototype for %qD, decl1);
/* Optionally warn of any def with no previous prototype
 --- gcc/testsuite/gcc.dg/pr54113.c.mp32013-12-04 17:52:45.671288940 
 +0100
 +++ gcc/testsuite/gcc.dg/pr54113.c2013-12-04 17:36:43.0 +0100
 @@ -0,0 +1,5 @@
 +/* { dg-do compile } */
 +/* { dg-options -std=c99 } */

-Wmissing-prototypes is missing here, in my local copy of the patch
this is fixed.

Marek


Re: [C++ PATCH] Don't ICE on POINTER_PLUS_EXPR during tsubst* (PR c++/59268)

2013-12-04 Thread Jason Merrill

OK.

Jason


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote:
 In C99, one way how to deal with inline functions is to put definition
 of the function into header:
 inline void foo (void) { /* ... */ }
 and put the declaration into exactly one .c file, with extern keyword
 (it can also have inline keyword):
 extern void foo (void);
 But in this case, we shouldn't issue the missing prototype warning.
 So the following should suppress that warning in C99 mode, when
 -fgnu89-inline is not in effect.  (But the function could still have
 the gnu_inline attribute, so it might be better to disable that
 warning for all inline functions?)

A function definition can't have attributes after the (), and
start_function is called with the attributes argument, so you can just
look through those for gnu_inline attribute.

Jakub


Re: PR37132 – RFC patch for generation of DWARF symbol for Fortran's namelists (DW_TAG_namelist)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 09:47:36AM -0800, Cary Coutant wrote:
  gcc/
  2013-11-24  Tobias Burnus  bur...@net-b.de
 
PR debug/37132
* lto-streamer.h (LTO_tags): Add LTO_namelist_decl_ref.
* tree.def (NAMELIST_DECL): Add.
* tree.h (NAMELIST_DECL_ASSOCIATED_DECL): New macro.
* tree.c (initialize_tree_contains_struct): Add asserts for it.
* dwarf2out.c (gen_namelist_decl): New function.
(gen_decl_die, dwarf2out_decl): Call it.
(dwarf2out_imported_module_or_decl_1): Handle NAMELIST_DECL.
* lto-streamer-in.c (lto_input_tree_ref): Handle NAMELIST_DECL.
(lto_input_tree_ref, lto_input_tree_1): Update lto_tag_check_range
call.
* lto-streamer-out.c (lto_output_tree_ref): Handle NAMELIST_DECL.
 
  gcc/fortran
  2013-11-24  Tobias Burnus  bur...@net-b.de
 
PR debug/37132
* trans-decl.c (generate_namelist_decl, create_module_nml_decl):
New static functions.
(gfc_generate_module_vars, generate_local_vars): Call them.
(gfc_trans_use_stmts): Handle namelists for debug genertion.
 
 The DWARF parts of this patch are OK with me.

The rest is okay too.

Jakub


Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.

2013-12-04 Thread Kirill Yukhin
Hello,
On 04 Dec 19:59, Kirill Yukhin wrote:
 So, is it ok for trunk?

Small correction. I think it is better to use
popfql/pushfql instead of popf/pushf (however they're
encoded equally).

--
Thanks, K


Ping: [tilegx] Avoid genrecog warning

2013-12-04 Thread Richard Sandiford
Ping for this patch, which is the only one of the series that hasn't
been approved.

Thanks,
Richard

Richard Sandiford rdsandif...@googlemail.com writes:
 I have a patch to upgrade most genrecog warnings into errors.  This patch
 fixes those for tilegx.  There seemed to be two sources of warnings:

 - the intrinsics often used matched pointer_operands in an addition,
   so that the destination accepted constant pointers.  I think the
   direct translation would be pmode_register_operand, but since these
   additions have a specific mode, I think a modeful register_operand
   is more natural.

 - some instructions used reg_or_0_operand as a destination.

 Tested by building tilegx-elf with the warnings turned to errors, and by
 comparing the before and after assembly output at -O2 for gcc.c-torture,
 gcc.dg and g++.dg.  OK to install?

 Thanks,
 Richard


 gcc/
   * config/tilegx/tilegx.md (insn_ld_addbitsuffix): Use
   register_operand rather than pointer_operand.  Add modes to the
   operands.
   (insn_ldna_addbitsuffix): Likewise.
   (insn_ldI124MODE:ns_addI48MODE:bitsuffix): Likewise.
   (insn_ldnt_addbitsuffix): Likewise.
   (insn_ldntI124MODE:ns_addI48MODE:bitsuffix): Likewise.
   (insn_ld_add_L2bitsuffix): Likewise.
   (insn_ldna_add_L2bitsuffix): Likewise.
   (insn_ldI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise.
   (insn_ldnt_add_L2bitsuffix): Likewise.
   (insn_ldntI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise.
   (insn_ld_add_missbitsuffix): Likewise.
   (insn_ldna_add_missbitsuffix): Likewise.
   (insn_ldI124MODE:ns_add_missI48MODE:bitsuffix): Likewise.
   (insn_ldnt_add_missbitsuffix): Likewise.
   (insn_ldntI124MODE:ns_add_missI48MODE:bitsuffix): Likewise.
   (insn_st_addbitsuffix): Likewise.
   (insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise.
   (*insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise.
   (insn_stnt_addbitsuffix): Likewise.
   (insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise.
   (*insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise.
   (vec_pack_pack_optab_v4hi): Use register_operand rather than
   reg_or_0_operand for operand 0.
   (insn_v2pack_insn): Likewise.
   (vec_pack_hipart_v4hi): Likewise.
   (insn_v2packh): Likewise.
   (vec_pack_ssat_v2si): Likewise.
   (insn_v4packsc): Likewise.

 Index: gcc/config/tilegx/tilegx.md
 ===
 --- gcc/config/tilegx/tilegx.md   2013-11-16 21:52:15.083787117 +
 +++ gcc/config/tilegx/tilegx.md   2013-11-16 21:59:07.745113525 +
 @@ -3284,9 +3284,9 @@ (define_expand insn_ld
)
  
  (define_insn insn_ld_addbitsuffix
 -  [(set (match_operand:I48MODE 1 pointer_operand =r)
 -(plus:I48MODE (match_operand 3 pointer_operand 1)
 -   (match_operand 2 s8bit_cint_operand i)))
 +  [(set (match_operand:I48MODE 1 register_operand =r)
 +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1)
 +   (match_operand:I48MODE 2 s8bit_cint_operand i)))
 (set (match_operand:DI 0 register_operand =r)
  (mem:DI (match_dup 3)))]

 @@ -3302,9 +3302,9 @@ (define_insn insn_ldna
[(set_attr type X1_2cycle)])
  
  (define_insn insn_ldna_addbitsuffix
 -  [(set (match_operand:I48MODE 1 pointer_operand =r)
 -(plus:I48MODE (match_operand 3 pointer_operand 1)
 -   (match_operand 2 s8bit_cint_operand i)))
 +  [(set (match_operand:I48MODE 1 register_operand =r)
 +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1)
 +   (match_operand:I48MODE 2 s8bit_cint_operand i)))
 (set (match_operand:DI 0 register_operand =r)
  (mem:DI (and:DI (match_dup 3) (const_int -8]

 @@ -3318,9 +3318,9 @@ (define_expand insn_ldns
)
  
  (define_insn insn_ldI124MODE:ns_addI48MODE:bitsuffix
 -  [(set (match_operand:I48MODE 1 pointer_operand =r)
 -(plus:I48MODE (match_operand 3 pointer_operand 1)
 -   (match_operand 2 s8bit_cint_operand i)))
 +  [(set (match_operand:I48MODE 1 register_operand =r)
 +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1)
 +   (match_operand:I48MODE 2 s8bit_cint_operand i)))
 (set (match_operand:DI 0 register_operand =r)
  (any_extend:DI (mem:I124MODE (match_dup 3]

 @@ -3338,9 +3338,9 @@ (define_insn insn_ldnt
[(set_attr type X1_2cycle)])
  
  (define_insn insn_ldnt_addbitsuffix
 -  [(set (match_operand:I48MODE 1 pointer_operand =r)
 -(plus:I48MODE (match_operand 3 pointer_operand 1)
 -   (match_operand 2 s8bit_cint_operand i)))
 +  [(set (match_operand:I48MODE 1 register_operand =r)
 +(plus:I48MODE (match_operand:I48MODE 3 register_operand 1)
 +   (match_operand:I48MODE 2 s8bit_cint_operand i)))
 (set (match_operand:DI 0 register_operand =r)
  (unspec:DI [(mem:DI (match_dup 3))]
  

Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.

2013-12-04 Thread H.J. Lu
On Wed, Dec 4, 2013 at 9:58 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hello,
 On 04 Dec 19:59, Kirill Yukhin wrote:
 So, is it ok for trunk?

 Small correction. I think it is better to use
 popfql/pushfql instead of popf/pushf (however they're
 encoded equally).


If you define the proper type, you can use pushf/pop
and push/popf in the same readeflags/writeflags
implementation for -m32/-mx32/-m64.


-- 
H.J.


RE: [PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Ian Bolton
 On Wed, 4 Dec 2013, Ian Bolton wrote:
 
  The main update, other than cosmetic differences, is that we've
 chosen
  the same ARM encoding as LLVM for practical purposes.  (The Thumb
  encoding in Mark's patch already matched LLVM.)
 
 Do the encodings match what plain udf does in recent-enough gas (too
 recent for us to assume it in GCC or glibc for now), or is it something
 else?

Hi Joseph,

Yes, these encodings match the UDF instruction that is defined in the most
recent edition of the ARM architecture reference manual.

Thumb: 0xde00 | imm8  (we chose 0xff for the imm8)
ARM: 0xe7f000f0 | (imm12  8) | imm4  (we chose to use 0 for both imms)

So as not to break old versions of gas that don't recognise UDF, the
encoding is output directly.

Apologies if I have over-explained there!

Cheers,
Ian





Re: [PATCH, ARM] Implement __builtin_trap

2013-12-04 Thread Ramana Radhakrishnan

On 04/12/13 16:05, Ian Bolton wrote:

Hi,

Currently, on ARM, you have to either call abort() or raise(SIGTRAP)
to achieve a handy crash.

This patch allows you to instead call __builtin_trap() which is much
more efficient at falling over because it becomes just a single
instruction that will trap for you.

Two testcases have been added (for ARM and Thumb) and both pass.


Note: This is a modified version of a patch originally submitted by Mark
Mitchell back in 2010, which came in response to PR target/59091.


The PR came as a result of the A64 implementation of __builtin_trap. The 
original patch was much earlier than that :)




http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091

The main update, other than cosmetic differences, is that we've chosen
the same ARM encoding as LLVM for practical purposes.  (The Thumb
encoding in Mark's patch already matched LLVM.)


OK for trunk?


This is OK for trunk. Please put the PR numbers in the changelog entries 
before committing i.e. PR target/59091.


FTR, these match with the encodings for the udf mnemonic with an 
immediate value of 0 in ARM state and #0xff in Thumb state. Obviously we 
cannot put out the udf mnemonic out because an older gas will not 
support it. These immediates were chosen to match the values as in other 
compiler implementations (I know these match with LLVM as something I 
can point to externally) and have been double checked with folks who 
have an avid interest in the kernel world.


Thanks,
Ramana




Re: [PATCH/AARCH64 5/6] Fix TLS for ILP32.

2013-12-04 Thread Yufeng Zhang

On 12/03/13 21:24, Andrew Pinski wrote:

Hi,
   With ILP32, some simple usage of TLS variables causes an unrecognizable
instruction due to needing to use SImode for loading pointers from memory.
This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to
support SImode for pointers.

OK?  Build and tested on aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
Handle TLS for ILP32.
* config/aarch64/aarch64.md (tlsie_small): Change to an expand to
handle ILP32.
(tlsie_small_mode): New pattern.
(tlsle_small): Change to an expand to handle ILP32.
(tlsle_small_mode): New pattern.
(tlsdesc_small): Change to an expand to handle ILP32.
(tlsdesc_small_mode): New pattern.
---
  gcc/ChangeLog |   12 ++
  gcc/config/aarch64/aarch64.c  |   23 ++--
  gcc/config/aarch64/aarch64.md |   76 ++---
  3 files changed, 94 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..a3e4532 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -628,22 +628,37 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,

  case SYMBOL_SMALL_TLSDESC:
{
-   rtx x0 = gen_rtx_REG (Pmode, R0_REGNUM);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
rtx tp;

+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsdesc_small (imm));
tp = aarch64_load_tp (NULL);
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, x0)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, x0)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
}

  case SYMBOL_SMALL_GOTTPREL:
{
-   rtx tmp_reg = gen_reg_rtx (Pmode);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx tmp_reg = gen_reg_rtx (mode);
rtx tp = aarch64_load_tp (NULL);
+
+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsie_small (tmp_reg, imm));
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, 
tmp_reg)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
}
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 313517f..08fcc94 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3577,35 +3577,85 @@
[(set_attr type call)
 (set_attr length 16)])

-(define_insn tlsie_small
-  [(set (match_operand:DI 0 register_operand =r)
-(unspec:DI [(match_operand:DI 1 aarch64_tls_ie_symref S)]
+(define_expand tlsie_small
+  [(set (match_operand 0 register_operand =r)
+(unspec [(match_operand 1 aarch64_tls_ie_symref S)]
+  UNSPEC_GOTSMALLTLS))]
+  
+{
+  if (TARGET_ILP32)
+{
+  operands[0] = gen_lowpart (ptr_mode, operands[0]);
+  emit_insn (gen_tlsie_small_si (operands[0], operands[1]));
+}
+  else
+emit_insn (gen_tlsie_small_di (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn tlsie_small_mode
+  [(set (match_operand:PTR 0 register_operand =r)
+(unspec:PTR [(match_operand 1 aarch64_tls_ie_symref S)]
   UNSPEC_GOTSMALLTLS))]

-  adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]
+  adrp\\t%0, %A1\;ldr\\t%w0, [%0, #%L1]
[(set_attr type load1)
 (set_attr length 8)]
  )

-(define_insn tlsle_small
-  [(set (match_operand:DI 0 register_operand =r)
-(unspec:DI [(match_operand:DI 1 register_operand r)
-   (match_operand:DI 2 aarch64_tls_le_symref S)]
+
+(define_expand tlsle_small
+  [(set (match_operand 0 register_operand =r)
+(unspec [(match_operand 1 register_operand r)
+   (match_operand 2 aarch64_tls_le_symref S)]
+   UNSPEC_GOTSMALLTLS))]
+  
+{
+  if (TARGET_ILP32)
+{
+  rtx temp = gen_reg_rtx (ptr_mode);
+  operands[1] = gen_lowpart (ptr_mode, operands[1]);
+  emit_insn (gen_tlsle_small_si (temp, operands[1], operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), temp));
+}


Looks like you hit the similar issue where the matched RTX can have 
either SImode or DImode in ILP32.  The mechanism looks OK but I think 
the approach that 'add_losym' adopts is neater, which checks on the mode 
instead of TARGET_ILP32 and calls gen_add_losym_di or gen_add_losym_si 
accordingly.  Note that the iterator used in add_losym_mode is P 
instead of PTR.


Same for tlsie_small above.


+  else
+emit_insn (gen_tlsle_small_di (operands[0], operands[1], 

Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Marek Polacek
On Wed, Dec 04, 2013 at 06:55:52PM +0100, Jakub Jelinek wrote:
 On Wed, Dec 04, 2013 at 06:47:09PM +0100, Marek Polacek wrote:
  In C99, one way how to deal with inline functions is to put definition
  of the function into header:
  inline void foo (void) { /* ... */ }
  and put the declaration into exactly one .c file, with extern keyword
  (it can also have inline keyword):
  extern void foo (void);
  But in this case, we shouldn't issue the missing prototype warning.
  So the following should suppress that warning in C99 mode, when
  -fgnu89-inline is not in effect.  (But the function could still have
  the gnu_inline attribute, so it might be better to disable that
  warning for all inline functions?)
 
 A function definition can't have attributes after the (), and
 start_function is called with the attributes argument, so you can just
 look through those for gnu_inline attribute.

I can, the question is whether we want that.  Anyway, this is version
which looks for the gnu_inline attribute.

2013-12-04  Marek Polacek  pola...@redhat.com

PR c/54113
c/
* c-decl.c (start_function): Don't warn for missing prototype for
inline functions in C99+.
testsuite/
* gcc.dg/pr54113.c: New test.

--- gcc/c/c-decl.c.mp3  2013-12-04 17:11:43.063878926 +0100
+++ gcc/c/c-decl.c  2013-12-04 19:13:29.043160116 +0100
@@ -7974,7 +7974,12 @@ start_function (struct c_declspecs *decl
old_decl != error_mark_node
TREE_PUBLIC (decl1)
!MAIN_NAME_P (DECL_NAME (decl1))
-   C_DECL_ISNT_PROTOTYPE (old_decl))
+   C_DECL_ISNT_PROTOTYPE (old_decl)
+   !(DECL_DECLARED_INLINE_P (decl1)
+flag_isoc99
+!flag_gnu89_inline
+!lookup_attribute (gnu_inline,
+ DECL_ATTRIBUTES (decl1
 warning_at (loc, OPT_Wmissing_prototypes,
no previous prototype for %qD, decl1);
   /* Optionally warn of any def with no previous prototype
--- gcc/testsuite/gcc.dg/pr54113.c.mp3  2013-12-04 17:52:45.671288940 +0100
+++ gcc/testsuite/gcc.dg/pr54113.c  2013-12-04 18:48:31.012682675 +0100
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options -std=c99 -Wmissing-prototypes } */
+
+inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */
+extern int foo(void);

Marek


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Joseph S. Myers
On Wed, 4 Dec 2013, Marek Polacek wrote:

 In C99, one way how to deal with inline functions is to put definition
 of the function into header:
 inline void foo (void) { /* ... */ }
 and put the declaration into exactly one .c file, with extern keyword
 (it can also have inline keyword):
 extern void foo (void);
 But in this case, we shouldn't issue the missing prototype warning.
 So the following should suppress that warning in C99 mode, when
 -fgnu89-inline is not in effect.  (But the function could still have
 the gnu_inline attribute, so it might be better to disable that
 warning for all inline functions?)
 
 Regtested/bootstrapped on x86_64-unknown-linux-gnu.  Ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Marek Polacek
On Wed, Dec 04, 2013 at 06:22:28PM +, Joseph S. Myers wrote:
 On Wed, 4 Dec 2013, Marek Polacek wrote:
 
  In C99, one way how to deal with inline functions is to put definition
  of the function into header:
  inline void foo (void) { /* ... */ }
  and put the declaration into exactly one .c file, with extern keyword
  (it can also have inline keyword):
  extern void foo (void);
  But in this case, we shouldn't issue the missing prototype warning.
  So the following should suppress that warning in C99 mode, when
  -fgnu89-inline is not in effect.  (But the function could still have
  the gnu_inline attribute, so it might be better to disable that
  warning for all inline functions?)
  
  Regtested/bootstrapped on x86_64-unknown-linux-gnu.  Ok for trunk?
 
 OK.

Should I commit the version with or without the lookup for gnu_inline
attribute?  Thanks,

Marek


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Joseph S. Myers
On Wed, 4 Dec 2013, Marek Polacek wrote:

 I can, the question is whether we want that.  Anyway, this is version
 which looks for the gnu_inline attribute.

If anything, I'd think it should apply to all inline functions.  The point 
of this warning is that non-static functions should be declared in header 
files, separate from their definition outside a header file, and inline 
functions in general are expected to be defined directly in a header file, 
so making a separate declaration redundant.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Don't warn for missing prototypes on inline fns (PR c/54113)

2013-12-04 Thread Marek Polacek
On Wed, Dec 04, 2013 at 06:30:37PM +, Joseph S. Myers wrote:
 On Wed, 4 Dec 2013, Marek Polacek wrote:
 
  I can, the question is whether we want that.  Anyway, this is version
  which looks for the gnu_inline attribute.
 
 If anything, I'd think it should apply to all inline functions.  The point 
 of this warning is that non-static functions should be declared in header 
 files, separate from their definition outside a header file, and inline 
 functions in general are expected to be defined directly in a header file, 
 so making a separate declaration redundant.

In that case, I'll apply this one after one more regtest.  Thanks.

2013-12-04  Marek Polacek  pola...@redhat.com

PR c/54113
c/
* c-decl.c (start_function): Don't warn for missing prototype for
inline functions.
testsuite/
* gcc.dg/pr54113.c: New test.

--- gcc/c/c-decl.c.mp3  2013-12-04 17:11:43.063878926 +0100
+++ gcc/c/c-decl.c  2013-12-04 19:33:00.581512253 +0100
@@ -7974,7 +7974,8 @@ start_function (struct c_declspecs *decl
old_decl != error_mark_node
TREE_PUBLIC (decl1)
!MAIN_NAME_P (DECL_NAME (decl1))
-   C_DECL_ISNT_PROTOTYPE (old_decl))
+   C_DECL_ISNT_PROTOTYPE (old_decl)
+   !DECL_DECLARED_INLINE_P (decl1))
 warning_at (loc, OPT_Wmissing_prototypes,
no previous prototype for %qD, decl1);
   /* Optionally warn of any def with no previous prototype
--- gcc/testsuite/gcc.dg/pr54113.c.mp3  2013-12-04 17:52:45.671288940 +0100
+++ gcc/testsuite/gcc.dg/pr54113.c  2013-12-04 18:48:31.012682675 +0100
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-options -Wmissing-prototypes } */
+
+inline int foo (void) { return 42; } /* { dg-bogus no previous prototype } */
+extern int foo(void);

Marek


Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 2:44 PM, Marek Polacek pola...@redhat.com wrote:
 And this is the i?86 specific part of -fsanitize=signed-integer-overflow,
 split out of the huge patch.  It really is dependent on the generic
 parts, when commiting, I'll put both parts together.

 Uros, would you mind taking a look at this?

 Regtested/bootstrapped on x86_64-linux.  Ok for trunk?

 2013-12-04  Jakub Jelinek  ja...@redhat.com
 Marek Polacek  pola...@redhat.com

 * config/i386/i386.md (addvmode4, subvmode4, mulvmode4,
 negvmode3, negvmode3_1): Define expands.
 (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): Define
 insns.

 --- gcc/config/i386/i386.md.mp  2013-12-04 12:15:33.508905947 +0100
 +++ gcc/config/i386/i386.md 2013-12-04 12:15:39.608929341 +0100
 @@ -6153,6 +6153,42 @@
[(set_attr type alu)
 (set_attr mode QI)])

 +(define_mode_attr widerintmode [(QI HI) (HI SI) (SI DI) (DI TI)])

Please name this widerint and put it just above existing DWI/dwi
mode attribute definitions. We will merge them together.

 +
 +;; Add with jump on overflow.
 +(define_expand addvmode4
 +  [(parallel [(set (reg:CCO FLAGS_REG)
 +  (eq:CCO (plus:widerintmode
 + (sign_extend:widerintmode
 +(match_operand:SWI 1 register_operand))
 + (sign_extend:widerintmode
 +(match_operand:SWI 2 general_operand)))
 +  (sign_extend:widerintmode
 + (plus:SWI (match_dup 1) (match_dup 2)
 + (set (match_operand:SWI 0 register_operand)
 +  (plus:SWI (match_dup 1) (match_dup 2)))])
 +   (set (pc) (if_then_else
 +  (eq (reg:CCO FLAGS_REG) (const_int 0))
 +  (label_ref (match_operand 3))
 +  (pc)))]
 +  )

Please use nonimmediate_operand for operand 1 and fixup input
operands with ix86_fixup_binary_operands_no_copy. Ideally, we could
use nonimmediate_operand also for operand 0, but in this case, we
would need to fixup output operand _after_ the PLUS pattern is emitted
- not worth, IMO.

Please also change sub expander below in this way.

 +(define_insn *addvmode4
 +  [(set (reg:CCO FLAGS_REG)
 +   (eq:CCO (plus:widerintmode
 +  (sign_extend:widerintmode
 + (match_operand:SWI 1 nonimmediate_operand %0,0))
 +  (sign_extend:widerintmode
 + (match_operand:SWI 2 general_operand g,ri)))
 +   (sign_extend:widerintmode
 +  (plus:SWI (match_dup 1) (match_dup 2)
 +   (set (match_operand:SWI 0 nonimmediate_operand =r,rm)
 +   (plus:SWI (match_dup 1) (match_dup 2)))]
 +  ix86_binary_operator_ok (PLUS, MODEmode, operands)
 +  add{imodesuffix}\t{%2, %0|%0, %2}
 +  [(set_attr type alu)
 +   (set_attr mode MODE)])
 +
  ;; The lea patterns for modes less than 32 bits need to be matched by
  ;; several insns converted to real lea by splitters.

 @@ -6390,6 +6426,40 @@
[(set_attr type alu)
 (set_attr mode SI)])

 +;; Subtract with jump on overflow.
 +(define_expand subvmode4
 +  [(parallel [(set (reg:CCO FLAGS_REG)
 +  (eq:CCO (minus:widerintmode
 + (sign_extend:widerintmode
 +(match_operand:SWI 1 register_operand))
 + (sign_extend:widerintmode
 +(match_operand:SWI 2 general_operand)))
 +  (sign_extend:widerintmode
 + (minus:SWI (match_dup 1) (match_dup 2)
 + (set (match_operand:SWI 0 register_operand)
 +  (minus:SWI (match_dup 1) (match_dup 2)))])
 +   (set (pc) (if_then_else
 +  (eq (reg:CCO FLAGS_REG) (const_int 0))
 +  (label_ref (match_operand 3))
 +  (pc)))]
 +  )
 +
 +(define_insn *subvmode4
 +  [(set (reg:CCO FLAGS_REG)
 +   (eq:CCO (minus:widerintmode
 +  (sign_extend:widerintmode
 + (match_operand:SWI 1 nonimmediate_operand 0,0))
 +  (sign_extend:widerintmode
 + (match_operand:SWI 2 general_operand 
 ri,rm)))
 +   (sign_extend:widerintmode
 +  (minus:SWI (match_dup 1) (match_dup 2)
 +   (set (match_operand:SWI 0 nonimmediate_operand =rm,r)
 +   (minus:SWI (match_dup 1) (match_dup 2)))]
 +  ix86_binary_operator_ok (MINUS, MODEmode, operands)
 +  sub{imodesuffix}\t{%2, %0|%0, %2}
 +  [(set_attr type alu)
 +   (set_attr mode MODE)])
 +
  (define_insn *submode_3
[(set (reg FLAGS_REG)
 (compare (match_operand:SWI 1 nonimmediate_operand 0,0)
 @@ -6704,6 +6774,59 @@
 (set_attr bdver1_decode direct)
 (set_attr mode QI)])

 +;; Multiply with jump on overflow.
 +(define_expand mulvmode4
 +  [(parallel [(set (reg:CCO FLAGS_REG)
 +  (eq:CCO (mult:widerintmode

Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 07:58:20PM +0100, Uros Bizjak wrote:
  @@ -8617,6 +8740,49 @@
 [(set_attr type negnot)
  (set_attr mode SI)])
 
  +;; Negate with jump on overflow.
  +(define_expand negvmode3
  +  [(parallel [(set (reg:CCO FLAGS_REG)
  +  (ne:CCO (match_operand:SWI 1 register_operand)
  +  (const_int 0)))
  + (set (match_operand:SWI 0 register_operand)
  +  (neg:SWI (match_dup 1)))])
  +   (set (pc) (if_then_else
  +  (eq (reg:CCO FLAGS_REG) (const_int 0))
  +  (label_ref (match_operand 2))
  +  (pc)))]
  +  
  +{
  +  rtx minv = GEN_INT (HOST_WIDE_INT_M1U
  +  (GET_MODE_BITSIZE (MODEmode) - 1));
  +  emit_insn (gen_negvmode3_1 (operands[0], operands[1], minv, 
  operands[2]));
  +  DONE;
  +})
 
 No, please use
 
 operands[3] = GEN_INT ();
 
 and use (match_dup 3) in the pattern. The pattern below is not needed then.

My memory is fuzzy about that, but I think that was my first version which
didn't work, because with match_dup then it requires on the internal-fn.c
side to pass 4 arguments instead of just 3.  I can try again though.

 BTW: can we use
 
 gen_int_mode (1  (GET_MODE_BITSIZE (mode) - 1), mode)
 
 instead?

With HOST_WIDE_INT_1U instead of 1 and s/mode/MODEmode/g perhaps.

Jakub


Re: [PATCH i386] Introduce __readeflags () and __writeeflags () intrinsics.

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 5:59 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 MSVC and ICC (currently Windows version, Linux version soon) have
 dedicated intrinsics to read/set EFLAGS register ([1], [2]).

 Patch introduces these intrinsics and tests for them.

 Bootstrapped. New tests pass.
 Although gate is closed patch is obvious.
 So, is it ok for trunk?

 ChangeLog/
 * config/i386/ia32intrin.h (__readeflags): New.
 (__writeeflags): Ditto.

 testsuite/ChangeLog/
 * gcc.target/i386/readeflags-1.c: New.
 * gcc.target/i386/writeeflags-1.c: Ditto.

 [1] - http://msdn.microsoft.com/en-us/library/aa983406(v=vs.90).aspx
 [2] - http://msdn.microsoft.com/en-us/library/aa983392(v=vs.90).aspx

 --
 Thanks, K

 diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h
 index b26dc46..c9e68c5 100644
 --- a/gcc/config/i386/ia32intrin.h
 +++ b/gcc/config/i386/ia32intrin.h
 @@ -238,6 +238,34 @@ __rorq (unsigned long long __X, int __C)
return (__X  __C) | (__X  (64 - __C));
  }

 +/* Read flags register */
 +extern __inline unsigned long long
 +__attribute__((__gnu_inline__, __always_inline__, __artificial__))
 +__readeflags (void)
 +{
 +  unsigned long long result = 0;
 +  __asm__ __volatile__ (pushf\n\t
 +   popq %0\n
 +   :=r(result)
 +   :
 +   :
 +   );
 +  return result;
 +}
 +
 +/* Write flags register */
 +extern __inline void
 +__attribute__((__gnu_inline__, __always_inline__, __artificial__))
 +__writeeflags (unsigned long long X)
 +{
 +  __asm__ __volatile__ (pushq %0\n\t
 +   popf\n
 +   :
 +   :r(X)
 +   :flags
 +   );
 +}
 +

Oh, no. We don't want assembly in this century ;)

The proper implementation is to introduce a
__builtin_readflags/__builtin_writeflags that expand the sequence by
calling gen_push and gen_pop functions.

You will need new patterns for pushfl and popfl, something like:

(define_insn *pushflmode
  [(set (match_operand:DWIH 0 push_operand =)
(match_operand:DWIH 0 flags_reg_operand))]
  
  pushf{mode}
  [(set_attr type push)
   (set_attr mode MODE)])

(define_insn *popflmode1
  [(set (match_operand:DWIH 0 flags_reg_operand)
(match_operand:DWIH 1 pop_operand ))]
  
  popf{imodesuffix}\t%0
  [(set_attr type pop)
   (set_attr mode MODE)])

Uros.


Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 8:07 PM, Jakub Jelinek ja...@redhat.com wrote:

  @@ -8617,6 +8740,49 @@
 [(set_attr type negnot)
  (set_attr mode SI)])
 
  +;; Negate with jump on overflow.
  +(define_expand negvmode3
  +  [(parallel [(set (reg:CCO FLAGS_REG)
  +  (ne:CCO (match_operand:SWI 1 register_operand)
  +  (const_int 0)))
  + (set (match_operand:SWI 0 register_operand)
  +  (neg:SWI (match_dup 1)))])
  +   (set (pc) (if_then_else
  +  (eq (reg:CCO FLAGS_REG) (const_int 0))
  +  (label_ref (match_operand 2))
  +  (pc)))]
  +  
  +{
  +  rtx minv = GEN_INT (HOST_WIDE_INT_M1U
  +  (GET_MODE_BITSIZE (MODEmode) - 1));
  +  emit_insn (gen_negvmode3_1 (operands[0], operands[1], minv, 
  operands[2]));
  +  DONE;
  +})

 No, please use

 operands[3] = GEN_INT ();

 and use (match_dup 3) in the pattern. The pattern below is not needed then.

 My memory is fuzzy about that, but I think that was my first version which
 didn't work, because with match_dup then it requires on the internal-fn.c
 side to pass 4 arguments instead of just 3.  I can try again though.

I believe it should work, please see for example expNcorexf3 expander
and many of its (match_dup X) expressions.

 BTW: can we use

 gen_int_mode (1  (GET_MODE_BITSIZE (mode) - 1), mode)

 instead?

 With HOST_WIDE_INT_1U instead of 1 and s/mode/MODEmode/g perhaps.

gen_int_mode calls trunc_int_for_mode that is introduced by the comment:

/* Truncate and perhaps sign-extend C as appropriate for MODE.  */

But, admittedly, I didn't test it...

Uros.


Re: [PATCH] Add reference binding instrumentation

2013-12-04 Thread Jason Merrill

On 12/03/2013 02:45 PM, Marek Polacek wrote:

You're right.  I wanted to use cp_save_expr and/or stabilize_expr, but
that didn't work out.  So I resorted to restrict the condition a bit
and only pass INDIRECT_REFs to the ubsan routine (which, after all,
has
   if (!INDIRECT_REF_P (init))
 return init;
And in that case, it seems we don't have to worry about multiple evaluation
of the initializer.


Hmm? You can have an INDIRECT_REF where the operand has side-effect, i.e 
*f() where f returns a pointer.


stabilize_expr ought to work.  Your main problem with that was probably 
that you were trying to call it here, at which point init is just what 
the user wrote, whereas you want to wait until you have an expression 
with REFERENCE_TYPE.  Try adding the instrumentation in store_init_value 
instead.


Jason



Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Jakub Jelinek
On Wed, Dec 04, 2013 at 08:23:22PM +0100, Uros Bizjak wrote:
  My memory is fuzzy about that, but I think that was my first version which
  didn't work, because with match_dup then it requires on the internal-fn.c
  side to pass 4 arguments instead of just 3.  I can try again though.

Weird, now it works, dunno what I have done differently before.
Though, I've discovered a bug in internal-fn.c for the negation case.

So is this everything you wanted?

2013-12-04  Jakub Jelinek  ja...@redhat.com  
Marek Polacek  pola...@redhat.com

* config/i386/i386.md (DWI, dwi): Add QImode and HImode cases.
(addvmode4, subvmode4, mulvmode4, negvmode3): New
expanders.
(*addvmode4, *subvmode4, *mulvmode4, *negvmode3): New insns.

* internal-fn.c (ubsan_expand_si_overflow_neg_check): The return
value lives in res rather than target.

--- gcc/config/i386/i386.md.jj  2013-12-04 12:05:46.689185140 +0100
+++ gcc/config/i386/i386.md 2013-12-04 20:40:25.417309596 +0100
@@ -905,8 +905,8 @@ (define_mode_iterator DWI [(DI !TARGET_
   (TI TARGET_64BIT)])
 
 ;; Double word integer modes as mode attribute.
-(define_mode_attr DWI [(SI DI) (DI TI)])
-(define_mode_attr dwi [(SI di) (DI ti)])
+(define_mode_attr DWI [(QI HI) (HI SI) (SI DI) (DI TI)])
+(define_mode_attr dwi [(QI hi) (HI si) (SI di) (DI ti)])
 
 ;; Half mode for double word integer modes.
 (define_mode_iterator DWIH [(SI !TARGET_64BIT)
@@ -6160,6 +6160,41 @@ (define_insn *addqi_ext_2
   [(set_attr type alu)
(set_attr mode QI)])
 
+;; Add with jump on overflow.
+(define_expand addvmode4
+  [(parallel [(set (reg:CCO FLAGS_REG)
+  (eq:CCO (plus:DWI
+ (sign_extend:DWI
+(match_operand:SWI 1 nonimmediate_operand))
+ (sign_extend:DWI
+(match_operand:SWI 2 general_operand)))
+  (sign_extend:DWI
+ (plus:SWI (match_dup 1) (match_dup 2)
+ (set (match_operand:SWI 0 register_operand)
+  (plus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (eq (reg:CCO FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  
+  ix86_fixup_binary_operands_no_copy (PLUS, MODEmode, operands);)
+
+(define_insn *addvmode4
+  [(set (reg:CCO FLAGS_REG)
+   (eq:CCO (plus:DWI
+  (sign_extend:DWI
+ (match_operand:SWI 1 nonimmediate_operand %0,0))
+  (sign_extend:DWI
+ (match_operand:SWI 2 general_operand g,ri)))
+   (sign_extend:DWI
+  (plus:SWI (match_dup 1) (match_dup 2)
+   (set (match_operand:SWI 0 nonimmediate_operand =r,rm)
+   (plus:SWI (match_dup 1) (match_dup 2)))]
+  ix86_binary_operator_ok (PLUS, MODEmode, operands)
+  add{imodesuffix}\t{%2, %0|%0, %2}
+  [(set_attr type alu)
+   (set_attr mode MODE)])
+
 ;; The lea patterns for modes less than 32 bits need to be matched by
 ;; several insns converted to real lea by splitters.
 
@@ -6397,6 +6432,41 @@ (define_insn *subsi_2_zext
   [(set_attr type alu)
(set_attr mode SI)])
 
+;; Subtract with jump on overflow.
+(define_expand subvmode4
+  [(parallel [(set (reg:CCO FLAGS_REG)
+  (eq:CCO (minus:DWI
+ (sign_extend:DWI
+(match_operand:SWI 1 nonimmediate_operand))
+ (sign_extend:DWI
+(match_operand:SWI 2 general_operand)))
+  (sign_extend:DWI
+ (minus:SWI (match_dup 1) (match_dup 2)
+ (set (match_operand:SWI 0 register_operand)
+  (minus:SWI (match_dup 1) (match_dup 2)))])
+   (set (pc) (if_then_else
+  (eq (reg:CCO FLAGS_REG) (const_int 0))
+  (label_ref (match_operand 3))
+  (pc)))]
+  
+  ix86_fixup_binary_operands_no_copy (MINUS, MODEmode, operands);)
+
+(define_insn *subvmode4
+  [(set (reg:CCO FLAGS_REG)
+   (eq:CCO (minus:DWI
+  (sign_extend:DWI
+ (match_operand:SWI 1 nonimmediate_operand 0,0))
+  (sign_extend:DWI
+ (match_operand:SWI 2 general_operand ri,rm)))
+   (sign_extend:DWI
+  (minus:SWI (match_dup 1) (match_dup 2)
+   (set (match_operand:SWI 0 nonimmediate_operand =rm,r)
+   (minus:SWI (match_dup 1) (match_dup 2)))]
+  ix86_binary_operator_ok (MINUS, MODEmode, operands)
+  sub{imodesuffix}\t{%2, %0|%0, %2}
+  [(set_attr type alu)
+   (set_attr mode MODE)])
+
 (define_insn *submode_3
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 nonimmediate_operand 0,0)
@@ -6711,6 +6781,58 @@ (define_insn *mulqi3_1
(set_attr bdver1_decode direct)
(set_attr mode QI)])
 
+;; Multiply 

Re: [Patch, RTL] Eliminate redundant vec_select moves.

2013-12-04 Thread Jeff Law

On 12/04/13 09:06, Tejas Belagod wrote:

Richard Sandiford wrote:

Tejas Belagod tbela...@arm.com writes:

Richard Sandiford wrote:

Tejas Belagod tbela...@arm.com writes:

The problem is that one reg rtx can span several hard registers.
E.g. (reg:V4SI 32) might represent one 64-bit register (no. 32),
but it might instead represent two 32-bit registers (nos. 32 and 33).
Obviously the latter's not very likely for vectors this small,
but more likely for larger ones (including on NEON IIRC).

So if we had 2 32-bit registers being treated as a V4HI, it would be:

   --3233--
   msb  lsb
   
   
   
   msb  lsb
   --32--

for big endian and:

   --3332--
   msb  lsb
   
   
   
   msb  lsb
   --32--

for little endian.

Ah, ok, that makes things clearer. Thanks for that.

I can't find any helper function that figures out if we're writing
partial or
full result regs. Would something like

 REGNO (src) == REGNO (dst) 
 HARD_REGNO_NREGS (src) == HARD_REGNO_NREGS (dst) == 1

be a sane check for partial result regs?

Yeah, that should work.  I think a more general alternative would be:

  simplify_subreg_regno (REGNO (src), GET_MODE (src),
 offset, GET_MODE (dst)) == (int) REGNO (dst)

where:

  offset = GET_MODE_UNIT_SIZE (GET_MODE (src)) * INTVAL (XVECEXP
(sel, 0))

That offset is the byte offset of the first selected element from the
start of a vector in memory, which is also the way that SUBREG_BYTEs
are counted.  For little-endian it gives the offset of the lsb of the
slice, while for big-endian it gives the offset of the msb (which is
also how SUBREG_BYTEs work).

The simplify_subreg_regno should cope with both single-register vectors
and multi-register vectors.

Sorry for the delayed response to this.

Thanks for the tip. Here's an improved patch that implements the
simplify_sureg_regno () method of eliminating redundant moves.
Regarding the test case, I failed to get the ppc back-end to generate
RTL pattern that this patch checks for. I can easily write a test
case for aarch64(big and little endian) on these lines

typedef float float32x4_t __attribute__ ((__vector_size__ (16)));

float foo_be (float32x4_t x)
{
   return x[3];
}

float foo_le (float32x4_t x)
{
   return x[0];
}

where I know that the vector indexing will generate a vec_select on
the same src and dst regs that could be optimized away and hence test
it. But I'm struggling to get a test case that the ppc altivec
back-end will generate such a vec_select for. I see that altivec does
not define vec_extract, so a simple indexing like this seems to happen
via memory. Also, I don't know enough about the ppc PCS or
architecture to write a test that will check for this optimization
opportunity on same src and dst hard-registers. Any hints?


Me neither, sorry.

FWIW, the MIPS tests:

  typedef float float32x2_t __attribute__ ((__vector_size__ (8)));
  void bar (float);
  void foo_be (float32x2_t x) { bar (x[1]); }
  void foo_le (float32x2_t x) { bar (x[0]); }

also exercise it, but I don't think they add anything over the aarch64
versions.  I can add them to the testsuite anyway if it helps though.


diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 0cd0c7e..ca25ce5 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1180,6 +1180,22 @@ set_noop_p (const_rtx set)
   dst = SUBREG_REG (dst);
 }

+  /* It is a NOOP if destination overlaps with selected src vector
+ elements.  */
+  if (GET_CODE (src) == VEC_SELECT
+   REG_P (XEXP (src, 0))  REG_P (dst)
+   HARD_REGISTER_P (XEXP (src, 0))
+   HARD_REGISTER_P (dst))
+{
+  rtx par = XEXP (src, 1);
+  rtx src0 = XEXP (src, 0);
+  HOST_WIDE_INT offset =
+GET_MODE_UNIT_SIZE (GET_MODE (src0)) * INTVAL (XVECEXP (par, 0,
0));
+
+  return simplify_subreg_regno (REGNO (src0), GET_MODE (src0),
+offset, GET_MODE (dst)) == (int)REGNO (dst);
+}
+


Since this also (correctly) triggers for vector results, we need to keep
the check for consecutive indices that you had originally.  (It's always
the first index that should be used for the simplify_subreg_regno
though.)

Looks good to me otherwise, thanks.


Thanks Richard. Here is a revised patch. Sorry about the delay - I was
investigating to make sure an LRA ICE I was seeing on aarch64 was
unrelated to this patch. I've added a test case that I expect to pass
for aarch64. I've also added the tests that you suggested for MIPS, but
haven't checked for the target because I'm not sure what optimizations
happen on MIPS.

OK for trunk?

Thanks,
Tejas.

2013-12-04  Tejas Belagod  tejas.bela...@arm.com

gcc/
 * rtlanal.c (set_noop_p): Return nonzero in case of redundant
vec_select
 for overlapping register lanes.

testsuite/
 * config/gcc.dg/vect/vect-nop-move.c: New.
Per HJ's request please test vect-nop-move on x86/x86_64 and if the 
redundant move 

Re: [PATCH 2/2] Implement -fsanitize=signed-integer-overflow (i?86 parts)

2013-12-04 Thread Uros Bizjak
On Wed, Dec 4, 2013 at 9:01 PM, Jakub Jelinek ja...@redhat.com wrote:
  My memory is fuzzy about that, but I think that was my first version which
  didn't work, because with match_dup then it requires on the internal-fn.c
  side to pass 4 arguments instead of just 3.  I can try again though.

 Weird, now it works, dunno what I have done differently before.
 Though, I've discovered a bug in internal-fn.c for the negation case.

 So is this everything you wanted?

Yes, thanks!

 2013-12-04  Jakub Jelinek  ja...@redhat.com
 Marek Polacek  pola...@redhat.com

 * config/i386/i386.md (DWI, dwi): Add QImode and HImode cases.
 (addvmode4, subvmode4, mulvmode4, negvmode3): New
 expanders.
 (*addvmode4, *subvmode4, *mulvmode4, *negvmode3): New insns.

 * internal-fn.c (ubsan_expand_si_overflow_neg_check): The return
 value lives in res rather than target.

The i386 part is OK.

Thanks,
Uros.


Re: [C++ Patch] Avoid pairs of error calls in duplicate_decls

2013-12-04 Thread Jason Merrill

OK, thanks.

Jason


Re: Ping: [tilegx] Avoid genrecog warning

2013-12-04 Thread Jeff Law

On 12/04/13 11:01, Richard Sandiford wrote:

Ping for this patch, which is the only one of the series that hasn't
been approved.

Thanks,
Richard

Richard Sandiford rdsandif...@googlemail.com writes:

I have a patch to upgrade most genrecog warnings into errors.  This patch
fixes those for tilegx.  There seemed to be two sources of warnings:

- the intrinsics often used matched pointer_operands in an addition,
   so that the destination accepted constant pointers.  I think the
   direct translation would be pmode_register_operand, but since these
   additions have a specific mode, I think a modeful register_operand
   is more natural.

- some instructions used reg_or_0_operand as a destination.

Tested by building tilegx-elf with the warnings turned to errors, and by
comparing the before and after assembly output at -O2 for gcc.c-torture,
gcc.dg and g++.dg.  OK to install?

Thanks,
Richard


gcc/
* config/tilegx/tilegx.md (insn_ld_addbitsuffix): Use
register_operand rather than pointer_operand.  Add modes to the
operands.
(insn_ldna_addbitsuffix): Likewise.
(insn_ldI124MODE:ns_addI48MODE:bitsuffix): Likewise.
(insn_ldnt_addbitsuffix): Likewise.
(insn_ldntI124MODE:ns_addI48MODE:bitsuffix): Likewise.
(insn_ld_add_L2bitsuffix): Likewise.
(insn_ldna_add_L2bitsuffix): Likewise.
(insn_ldI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise.
(insn_ldnt_add_L2bitsuffix): Likewise.
(insn_ldntI124MODE:ns_add_L2I48MODE:bitsuffix): Likewise.
(insn_ld_add_missbitsuffix): Likewise.
(insn_ldna_add_missbitsuffix): Likewise.
(insn_ldI124MODE:ns_add_missI48MODE:bitsuffix): Likewise.
(insn_ldnt_add_missbitsuffix): Likewise.
(insn_ldntI124MODE:ns_add_missI48MODE:bitsuffix): Likewise.
(insn_st_addbitsuffix): Likewise.
(insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise.
(*insn_stI124MODE:n_addI48MODE:bitsuffix): Likewise.
(insn_stnt_addbitsuffix): Likewise.
(insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise.
(*insn_stntI124MODE:n_addI48MODE:bitsuffix): Likewise.
(vec_pack_pack_optab_v4hi): Use register_operand rather than
reg_or_0_operand for operand 0.
(insn_v2pack_insn): Likewise.
(vec_pack_hipart_v4hi): Likewise.
(insn_v2packh): Likewise.
(vec_pack_ssat_v2si): Likewise.
(insn_v4packsc): Likewise.
This looks pretty mechanical.  Hopefully there wasn't a compelling 
reason to use pointer_operand instead of either register_operand or 
other alternatives.


Let's give Walt a bit more time to chime in just in case there was a 
particular reason for the prior choice of pointer_operand.  Perhaps 
Monday morning.  If you haven't heard from Walt by then, consider the 
patch approved by me.


jeff




  1   2   >