[RS6000] PR61300 KR incoming args
One of the nice features of the ELFv2 ABI is that stack frames are smaller compared to ELFv1. We don't allocate a parameter save area unless we actually use it. However, for variable argument lists, we kept the simple va_list type which is a pointer to the memory location of the next parameter. This means calls to variable argument list functions must allocate the parameter save area, and hence calls to unprototyped functions must also do so. The wrinkle with KR style C functions is that function *definitions* may be unprototyped. So when compiling a function body we can't use !prototype_p() to say we have a parameter save area. A call in some other compilation unit might be prototyped and so not allocate a parameter save area. Another consequence of unprototyped function definitions is that the return type and argument types may not be available on the function type node. Instead you need to look at the return and arguments on the function decl. Now, function.c always passes a decl to REG_PARM_STACK_SPACE, but calls.c sometimes passes a decl and sometimes a type. This latter fact makes it necessary, I think, to define an INCOMING_REG_PARM_STACK_SPACE used by function.c. You can't blindly use a decl from calls.c as that falls foul of C++.. The following implements this. Bootstrapped and regression tested powerpc64le-linux and powerpc64-linux all langs (except Ada since I didn't have gnat installed.) OK to apply? PR target/61300 * doc/tm.texi.in (INCOMING_REG_PARM_STACK_SPACE): Document. * doc/tm.texi: Regenerate. * function.c (INCOMING_REG_PARM_STACK_SPACE): Provide default. Use throughout in place of REG_PARM_STACK_SPACE. * config/rs6000/rs6000.c (rs6000_reg_parm_stack_space): Add incoming param. Pass to rs6000_function_parms_need_stack. (rs6000_function_parms_need_stack): Add incoming param, ignore prototype_p when incoming. Use function decl when incoming to handle KR style functions. * config/rs6000/rs6000.h (REG_PARM_STACK_SPACE): Adjust. (INCOMING_REG_PARM_STACK_SPACE): Define. Index: gcc/doc/tm.texi.in === --- gcc/doc/tm.texi.in (revision 210919) +++ gcc/doc/tm.texi.in (working copy) @@ -3499,6 +3499,13 @@ which. @c above is overfull. not sure what to do. --mew 5feb93 did @c something, not sure if it looks good. --mew 10feb93 +@defmac INCOMING_REG_PARM_STACK_SPACE (@var{fndecl}) +Like @code{REG_PARM_STACK_SPACE}, but for incoming register arguments. +Define this macro if space guaranteed when compiling a function body +is different to space required when making a call, a situation that +can arise with unprototyped functions. +@end defmac + @defmac OUTGOING_REG_PARM_STACK_SPACE (@var{fntype}) Define this to a nonzero value if it is the responsibility of the caller to allocate the area reserved for arguments passed in registers Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 210919) +++ gcc/doc/tm.texi (working copy) @@ -3948,6 +3948,13 @@ which. @c above is overfull. not sure what to do. --mew 5feb93 did @c something, not sure if it looks good. --mew 10feb93 +@defmac INCOMING_REG_PARM_STACK_SPACE (@var{fndecl}) +Like @code{REG_PARM_STACK_SPACE}, but for incoming register arguments. +Define this macro if space guaranteed when compiling a function body +is different to space required when making a call, a situation that +can arise with unprototyped functions. +@end defmac + @defmac OUTGOING_REG_PARM_STACK_SPACE (@var{fntype}) Define this to a nonzero value if it is the responsibility of the caller to allocate the area reserved for arguments passed in registers Index: gcc/function.c === --- gcc/function.c (revision 210919) +++ gcc/function.c (working copy) @@ -1348,9 +1348,13 @@ static int cfa_offset; #define STACK_POINTER_OFFSET 0 #endif +#if defined (REG_PARM_STACK_SPACE) !defined (INCOMING_REG_PARM_STACK_SPACE) +#define INCOMING_REG_PARM_STACK_SPACE REG_PARM_STACK_SPACE +#endif + /* If not defined, pick an appropriate default for the offset of dynamically allocated memory depending on the value of ACCUMULATE_OUTGOING_ARGS, - REG_PARM_STACK_SPACE, and OUTGOING_REG_PARM_STACK_SPACE. */ + INCOMING_REG_PARM_STACK_SPACE, and OUTGOING_REG_PARM_STACK_SPACE. */ #ifndef STACK_DYNAMIC_OFFSET @@ -1362,12 +1366,12 @@ static int cfa_offset; `crtl-outgoing_args_size'. Nevertheless, we must allow for it when allocating stack dynamic objects. */ -#if defined(REG_PARM_STACK_SPACE) +#ifdef INCOMING_REG_PARM_STACK_SPACE #define STACK_DYNAMIC_OFFSET(FNDECL) \ ((ACCUMULATE_OUTGOING_ARGS \ ? (crtl-outgoing_args_size\ +
Re: ipa-visibility TLC 2/n
Jan Hubicka hubi...@ucw.cz writes: Richard Sandiford wrote the original section anchors implementation, so he would be a good person to comment about the interaction between aliases and section anchors. Thanks! Richard, does this patch seem sane? Looks good to me in principle, but with: + struct symtab_node *snode; decl = SYMBOL_REF_DECL (symbol); + + snode = symtab_node (decl); + if (snode-alias) + { + rtx target = DECL_RTL (symtab_alias_ultimate_target (snode)-decl); + SYMBOL_REF_BLOCK_OFFSET (symbol) = SYMBOL_REF_BLOCK_OFFSET (target); + return; + } is SYMBOL_REF_BLOCK_OFFSET (target) guaranteed to be valid at this point? It looked at face value like you'd need a recursive call to place_block_symbol on the target before the copy. Thanks, Richard
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Andreas Schwab [mailto:sch...@linux-m68k.org] This adds a full byte of padding between each bitfield. If you want a single padding bit you should use :1, but you also need to update the test to check for 0x44434241 (0x88868482 is impossible, since that requires at least 8 bits per bitfield). Actually if I understood C99 correctly it depends on the storage unit allocated for the bitfield preceding the 0 length bitfield. Instead of trying to cover all possible value read from this bitfield I rewrote the test to check if bswap misinterpret the expression and replace it with a load or load+bswap. This reduce the number of possible values to 2 and thus makes the test less fragile and easier to understand. By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Here is the ChangeLog: 2014-05-29 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c: Add alignment constraints to bitfield and test wrong results instead of correct results to make the test more portable. And the patch: diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index 38f18fd..a47e01a 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -6,8 +6,11 @@ typedef __UINT32_TYPE__ unsigned; struct bitfield { unsigned char f0:7; + unsigned char :1; unsigned char f1:7; + unsigned char :1; unsigned char f2:7; + unsigned char :1; unsigned char f3:7; }; @@ -74,11 +77,17 @@ main () return 0; bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_le32 (bfin); - if (out != 0x09070503 out != 0x88868482 out != 0x78306141) + /* Test what bswap would do if its check are not strict enough instead of + what is the expected result as there is too many possible results with + bitfields. */ + if (out == 0x89878583) __builtin_abort (); bfin.inval = (struct ok) { 0x83, 0x85, 0x87, 0x89 }; out = partial_read_be32 (bfin); - if (out != 0x03050709 out != 0x82848688 out != 0x41613078) + /* Test what bswap would do if its check are not strict enough instead of + what is the expected result as there is too many possible results with + bitfields. */ + if (out == 0x83858789) __builtin_abort (); out = fake_read_le32 (cin, cin[2]); if (out != 0x89018583) Best regards, Thomas fix_bswap-2.diff Description: Binary data
Re: [Patch] Minor fixes for regtesting gfortran with -flto
Hello! With the following patch, gfortran can be regtested with -flto with no failure, but pr54852 and pr60061. -! { dg-final { scan-assembler-times myBindC 1 { target { ! { hppa*-*-hpux* } } } } } -! { dg-final { scan-assembler-times myBindC,%r2 1 { target { hppa*-*-hpux* } } } } +! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } +! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } The change above fails on alpha, which doesn't emit call in the assembly, but: $ grep myBindC bind_c_array_params_2.s jsr $26,myBindC Probably, alpha is not the only one that fails this assumption. Uros.
Re: -fuse-caller-save - Collect register usage information
On 29-05-14 00:42, Bill Schmidt wrote: Tom, the final version of this patch that you committed breaks bootstrap on powerpc64le-linux-gnu. The problem is that all uses of the variable i are guarded by #ifdef STACK_REGS, but the declaration of i is unconditional. We get an unused variable warning that becomes an error during stage 3. Bill, thanks for letting me know. I've bootstrapped attached patch on x86_64, and committed it. Thanks, - Tom 2014-05-29 Tom de Vries t...@codesourcery.com * final.c (collect_fn_hard_reg_usage): Guard variable declaration with #ifdef STACK_REGS. diff --git a/gcc/final.c b/gcc/final.c index a345fe7..c32e177 100644 --- a/gcc/final.c +++ b/gcc/final.c @@ -4750,7 +4750,9 @@ static void collect_fn_hard_reg_usage (void) { rtx insn; +#ifdef STACK_REGS int i; +#endif struct cgraph_rtl_info *node; /* ??? To be removed when all the ports have been fixed. */
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
Thomas Preud'homme thomas.preudho...@arm.com writes: By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Each byte is composed of the first 7 bits of the original byte. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
RE: [MIPS] Add sbasic supoert ffor MSA (SIMD)
Mike Stump mikest...@comcast.net writes: On May 28, 2014, at 7:27 AM, Richard Earnshaw rearn...@arm.com wrote: Speed of implementation. We're gradually replacing these with proper builtins, but that takes a lot more work. As an owner of a port with more builtins that yours, I can offer a technological solution to reduce the cost of builtins to: (define_builtin my_stop [ (define_outputs [(void_operand 0)]) (define_rtl_pattern my_stop []) ] ) (define_insn my_stop [(unspec_volatile [(const_int 0)] UNSPECV_STOP)] stop) for example. This creates the builtins, allows overloading, allows input/output parameters, can reorder operands, allows for complex types, allows memory reference parameters, allows pure markings, does vectors, conditional availability, generates documentation, creates test suites and more. If you wire up a speaker it even sings. Someone would have have to step forward with a need and some time to port their port over to the new scheme and help with the reason for why the technology should go in. It is mostly contained in 5600 lines of self contained python code, and is built to solve the problem generally. It adds about 800 lines to builtins.c. It has a macro system that is more powerful than the macro system .md files use, so one gets to share and collapse builtins rather nicely. It is known to work for C and C++. Other languages may need extending; C for example cost is around 250 lines to support. Myself and others at IMG would be interested in reviewing/evaluating the implementation and assuming it looks useful then we would of course help to get it in shape for submission. One promise, you will never have to create an argument list, or a type, for example here is a two output, type input functional instruction with some doc content: (define_mode_iterator MYTYPE [V8QI V4HI V2SI DI ...]) (define_builtin my_foo my_foo2_type [ (define_descDoc string for operation) (define_outputs [(var_operand:T_MYTYPE 0) (var_operand:T_MYTYPE 1)]) (define_inputs [(var_operand:T_MYTYPE 2) (var_operand:T_MYTYPE 3)]) (define_rtl_pattern my_foo2_mode [0 2 1 3]) (attributes [pure]) ] ) I stripped it so you can't know what the instruction was, but you get a flavor of multiple outputs, doc bits, pure, overloading, arguments and argument rearranging. Can you post the implementation as an RFC? I suspect the python aspect will cause the most trouble as GCC builds do not currently require python I guess that could change depending on the value added. Otherwise it would be a rewrite I guess. Before digging in too deep though it would be useful to know if RichardS would be willing to consider this kind of thing for the MIPS port? Regards, Matthew
Re: [Patch] Minor fixes for regtesting gfortran with -flto
Probably, alpha is not the only one that fails this assumption. Indeed! see the thread starting at https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html Could you test the following patch --- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-24 16:17:53.0 +0200 +++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 11:34:40.0 +0200 @@ -16,7 +16,7 @@ integer :: aa(4,4) call test(aa) end -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } ! { dg-final { scan-tree-dump-times test \\\(parm\\. 1 original } } ! { dg-final { cleanup-tree-dump original } } with make -k check-gfortran RUNTESTFLAGS=dg.exp=bind_c_array_params_2.f90 --target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}' Can you pre-approved it? TIA Dominique
Re: [MIPS] Add sbasic supoert ffor MSA (SIMD)
On Wed, May 28, 2014 at 6:49 PM, Mike Stump mikest...@comcast.net wrote: On May 28, 2014, at 7:27 AM, Richard Earnshaw rearn...@arm.com wrote: Speed of implementation. We're gradually replacing these with proper builtins, but that takes a lot more work. As an owner of a port with more builtins that yours, I can offer a technological solution to reduce the cost of builtins to: (define_builtin “my_stop [ (define_outputs [(void_operand 0)]) (define_rtl_pattern “my_stop []) ] ) (define_insn “my_stop [(unspec_volatile [(const_int 0)] UNSPECV_STOP)] “stop”) for example. This creates the builtins, allows overloading, allows input/output parameters, can reorder operands, allows for complex types, allows memory reference parameters, allows pure markings, does vectors, conditional availability, generates documentation, creates test suites and more. If you wire up a speaker it even sings. Someone would have have to step forward with a need and some time to port their port over to the new scheme and help with the reason for why the technology should go in. It is mostly contained in 5600 lines of self contained python code, and is built to solve the problem generally. It adds about 800 lines to builtins.c. It has a macro system that is more powerful than the macro system .md files use, so one gets to share and collapse builtins rather nicely. It is known to work for C and C++. Other languages may need extending; C for example cost is around 250 lines to support. One promise, you will never have to create an argument list, or a type, for example here is a two output, type input functional instruction with some doc content: (define_mode_iterator MYTYPE [V8QI V4HI V2SI DI ...]) (define_builtin “my_foo” my_foo2_type [ (define_desc“Doc string for operation) (define_outputs [(var_operand:T_MYTYPE 0) (var_operand:T_MYTYPE 1)]) (define_inputs [(var_operand:T_MYTYPE 2) (var_operand:T_MYTYPE 3)]) (define_rtl_pattern “my_foo2_mode [0 2 1 3]) (attributes [pure]) ] ) I stripped it so you can’t know what the instruction was, but you get a flavor of multiple outputs, doc bits, pure, overloading, arguments and argument rearranging. Let me know if you’re interested. This sounds interesting - could you post something for an RFC or in a branch so that one can play with it ? Ramana
Re: RFA: cache enabled attribute by insn code
On 27/05/14 16:07, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: Richard Sandiford rsand...@linux.vnet.ibm.com writes: Does the following patch help? Bah, it won't of course: %i1 needs to be the operator. Here's v2. I tested that it worked for simple tests like: int f1 (int x, int y) { return x + (y 4); } int f2 (int x, int y) { return x - (y 4); } int f3 (int x, int y) { return x (y 4); } int f4 (int x, int y) { return x | (y 4); } int f5 (int x, int y) { return x ^ (y 4); } int f6 (int x, int y) { return (y 4) - x; } int g1 (int x, int y, int z) { return x + (y z); } int g2 (int x, int y, int z) { return x - (y z); } int g3 (int x, int y, int z) { return x (y z); } int g4 (int x, int y, int z) { return x | (y z); } int g5 (int x, int y, int z) { return x ^ (y z); } int g6 (int x, int y, int z) { return (y z) - x; } as well as the testcase. Thanks, Richard gcc/ * config/arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0, arith_shift_insn): New code attributes. * config/arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Split out... (*arith_multsi): ...this pattern and remove insn_enabled attribute. Thanks, Richard. I've tweaked this as followed and committed it. I now consider shift_operator in the arm backend deprecated. We should be moving towards using shift_nomul_operator. There's one final wart still to be handled, though. 'rotate' can only take an immediate operand, not a register. We can currently deal with this, but it's not clean in terms of constraint handling. I'll see if I can fix this up sometime, but not today. R. 2014-05-29 Richard Earnshaw rearn...@arm.com Richard Sandiford rdsandif...@googlemail.com * arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0, arith_shift_insn): New code attributes. * arm/predicates.md (shift_nomul_operator): New predicate. * arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Delete. Replace with ... (*arith_shift_insn_multsi): ... new pattern. (*arith_shift_insn_shiftsi): ... new pattern. * config/arm/arm.c (arm_print_operand): Handle operand format 'b'. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index ccad548..b514757 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -21271,7 +21271,15 @@ arm_print_condition (FILE *stream) } -/* If CODE is 'd', then the X is a condition operand and the instruction +/* Globally reserved letters: acln + Puncutation letters currently used: @_|?().!# + Lower case letters currently used: bcdefhimpqtvwxyz + Upper case letters currently used: ABCDFGHJKLMNOPQRSTU + Letters previously used, but now deprecated/obsolete: sVWXYZ. + + Note that the global reservation for 'c' is only for CONSTANT_ADDRESS_P. + + If CODE is 'd', then the X is a condition operand and the instruction should only be executed if the condition is true. if CODE is 'D', then the X is a condition operand and the instruction should only be executed if the condition is false: however, if the mode @@ -21411,6 +21419,19 @@ arm_print_operand (FILE *stream, rtx x, int code) } return; +case 'b': + /* Print the log2 of a CONST_INT. */ + { + HOST_WIDE_INT val; + + if (!CONST_INT_P (x) + || (val = exact_log2 (INTVAL (x) 0x)) 0) + output_operand_lossage (Unsupported operand for code '%c', code); + else + fprintf (stream, # HOST_WIDE_INT_PRINT_DEC, val); + } + return; + case 'L': /* The low 16 bits of an immediate constant. */ fprintf (stream, HOST_WIDE_INT_PRINT_DEC, INTVAL(x) 0x); diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 348a89c..cd7495f 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -200,17 +200,9 @@ (const_string yes)] (const_string no))) -; Allows an insn to disable certain alternatives for reasons other than -; arch support. -(define_attr insn_enabled no,yes - (const_string yes)) - ; Enable all alternatives that are both arch_enabled and insn_enabled. (define_attr enabled no,yes - (cond [(eq_attr insn_enabled no) - (const_string no) - - (and (eq_attr predicable_short_it no) + (cond [(and (eq_attr predicable_short_it no) (and (eq_attr predicated yes) (match_test arm_restrict_it))) (const_string no) @@ -9876,38 +9868,34 @@ ;; Patterns to allow combination of arithmetic, cond code and shifts -(define_insn *arith_shiftsi - [(set (match_operand:SI 0 s_register_operand =r,r,r,r) -(match_operator:SI 1 shiftable_operator - [(match_operator:SI 3 shift_operator - [(match_operand:SI 4 s_register_operand r,r,r,r) -
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
From: Andreas Schwab [mailto:sch...@linux-m68k.org] Thomas Preud'homme thomas.preudho...@arm.com writes: By the way, I couldn't understand how you reached the value 0x44434241. Can you explain me? Each byte is composed of the first 7 bits of the original byte. Sorry, it seems I wasn't very awake when I checked that. Makes sense now. Thanks. Does the patch solve the problem you had? What about you Christophe? Best regards, Thomas
Re: [PATCH] Inline asm asan instrumentation
Cool, we don't have this in LLVM-ASan, but we have plans to instrument inline asm soon (not just constraints). asm-struct-1.c test looks like a false positive though - the code does not access any invalid memory, it only does a harmless pointer cast. On Wed, May 28, 2014 at 10:36 PM, Konstantin Serebryany konstantin.s.serebry...@gmail.com wrote: On Wed, May 28, 2014 at 5:33 PM, Marat Zakirov m.zaki...@samsung.com wrote: Hi all, Here's a patch for optional Asan instrumentation of inline assembly. This version scans gimple for GIMPLE_ASMs and performs usual instrumentation of arguments with memory constraints (m, o, etc.) with fixed size. Instrumentation is turned off by default. This was successfully bootstrapped and regtested on x64. I have also instrumented and ran ffmpeg regression testsuite (it seems to have quite some inline asm). --Marat
Re: [Patch] Minor fixes for regtesting gfortran with -flto
On Thu, May 29, 2014 at 11:38 AM, Dominique Dhumieres domi...@lps.ens.fr wrote: Probably, alpha is not the only one that fails this assumption. Indeed! see the thread starting at https://gcc.gnu.org/ml/fortran/2014-05/msg00127.html Could you test the following patch --- ../_clean/gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-24 16:17:53.0 +0200 +++ gcc/testsuite/gfortran.dg/bind_c_array_params_2.f90 2014-05-29 11:34:40.0 +0200 @@ -16,7 +16,7 @@ integer :: aa(4,4) call test(aa) end -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } -! { dg-final { scan-assembler-times call\[^\n\r\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC 1 { target { ! { hppa*-*-hpux* } } } } } +! { dg-final { scan-assembler-times \[ \t\]\[$,_0-9\]*myBindC,%r2 1 { target { hppa*-*-hpux* } } } } ! { dg-final { scan-tree-dump-times test \\\(parm\\. 1 original } } ! { dg-final { cleanup-tree-dump original } } with make -k check-gfortran RUNTESTFLAGS=dg.exp=bind_c_array_params_2.f90 --target_board=unix'{-m32,-m64,-m32/-flto,-m64/-flto}' This works on alpha with --target_board=unix'{,-flto}' and x86_64, so I guess it is OK. Can you pre-approved it? I'm not a testsuite maintainer (one is CC'd for a final approval), but the situation is definitely better with the patched regexp. Uros.
Re: [PATCH] Inline asm asan instrumentation
On Wed, May 28, 2014 at 05:33:44PM +0400, Marat Zakirov wrote: Here's a patch for optional Asan instrumentation of inline assembly. This version scans gimple for GIMPLE_ASMs and performs usual instrumentation of arguments with memory constraints (m, o, etc.) with fixed size. That doesn't look right to me. The fact that some region appears in m doesn't mean the inline asm actually accesses it, it could not touch it at all, or only some part of it. If you look e.g. at Linux kernel headers, you'll see lots of struct __large_struct { unsigned long buf[100]; }; #define __m(x) (*(struct __large_struct __user *)(x)) ... m (__m(addr)) and similar cases, if Asan wants to check that the whole 100*sizeof(long) region is accessible, it could often just have false positives, because the inline asm really accesses just some small part of it. Jakub
Re: RTABI half-precision conversion functions (ping)
On Thu, 19 Jul 2012 14:47:54 +0100 Julian Brown jul...@codesourcery.com wrote: On Thu, 19 Jul 2012 13:54:57 +0100 Paul Brook p...@codesourcery.com wrote: But, that means EABI-conformant callers are also perfectly entitled to sign-extend half-float values before calling our helper functions (although GCC itself won't do that). Using unsigned int and taking care to only examine the low-order bits of the value in the helper function itself serves to fix the latent bug, is compatible with existing code, allows us to be conformant with the eabi, and allows use of aliases to make the __gnu and __aeabi functions the same. As long as LTO never sees this mismatch we should be fine :-) AFAIK we don't curently have any way of expressing the actual ABI. Let's not worry about that for now :-). The patch no longer applied as-is, so I've updated it (attached, re-tested). Note that there are no longer any target-independent changes (though I'm not certain that the symbol versions are still correct). OK to apply? I think this deserves a comment in the source. Otherwise it's liable to get fixed in the future :-) Something allong the lines of While the EABI describes the arguments to the half-float helper routines as 'short', it does not require that they be extended to full register width. The normal ABI requres that the caller sign/zero extend short values to 32 bit. We use unsigned int arguments to prevent the gcc making assumptions about the high half of the register. Here's a version with an explanatory comment. I also fixed a couple of minor formatting nits I noticed (they don't upset the diff too much, I don't think). It looks like this one got forgotten about. Ping? Context: https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00902.html https://gcc.gnu.org/ml/gcc-patches/2012-07/msg00912.html This is an EABI-conformance fix. Thanks, Julian
Re: [DOC PATCH] Rewrite docs for inline asm
Yes. We already know that this is better than the current docs. Let's check it in. As far as I can see you did it, but didn't add a ChangeLog entry (so David isn't properly credited with the rewrite)? -- Eric Botcazou
RE: [PATCH] Inline asm asan instrumentation
asm-struct-1.c test looks like a false positive though - the code does not access any invalid memory, it only does a harmless pointer cast. It is not. Because st1 have smaller size than st2: struct st1 { int a[110]; } struct st2 { int a[111]; }; And asm constrain was declared as: : =m (*((struct st2 *)s1))); Test violate memory access constrain by cast (struct st2 *)s1. We check only constraints and by such a cast as we think user declare that he want to access full st2 structure which have bigger size than st1. -Original Message- From: Evgeniy Stepanov [mailto:eugeni.stepa...@gmail.com] Sent: Thursday, May 29, 2014 1:58 PM To: Konstantin Serebryany Cc: Marat Zakirov; GCC Patches; Konstantin Serebryany; Jakub Jelinek; Viacheslav Garbuzov; Yuri Gribov; Marat Zakirov Subject: Re: [PATCH] Inline asm asan instrumentation Cool, we don't have this in LLVM-ASan, but we have plans to instrument inline asm soon (not just constraints). asm-struct-1.c test looks like a false positive though - the code does not access any invalid memory, it only does a harmless pointer cast. On Wed, May 28, 2014 at 10:36 PM, Konstantin Serebryany konstantin.s.serebry...@gmail.com wrote: On Wed, May 28, 2014 at 5:33 PM, Marat Zakirov m.zaki...@samsung.com wrote: Hi all, Here's a patch for optional Asan instrumentation of inline assembly. This version scans gimple for GIMPLE_ASMs and performs usual instrumentation of arguments with memory constraints (m, o, etc.) with fixed size. Instrumentation is turned off by default. This was successfully bootstrapped and regtested on x64. I have also instrumented and ran ffmpeg regression testsuite (it seems to have quite some inline asm). --Marat
Re: RFA: cache enabled attribute by insn code
Richard Earnshaw rearn...@arm.com writes: On 27/05/14 16:07, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: Richard Sandiford rsand...@linux.vnet.ibm.com writes: Does the following patch help? Bah, it won't of course: %i1 needs to be the operator. Here's v2. I tested that it worked for simple tests like: int f1 (int x, int y) { return x + (y 4); } int f2 (int x, int y) { return x - (y 4); } int f3 (int x, int y) { return x (y 4); } int f4 (int x, int y) { return x | (y 4); } int f5 (int x, int y) { return x ^ (y 4); } int f6 (int x, int y) { return (y 4) - x; } int g1 (int x, int y, int z) { return x + (y z); } int g2 (int x, int y, int z) { return x - (y z); } int g3 (int x, int y, int z) { return x (y z); } int g4 (int x, int y, int z) { return x | (y z); } int g5 (int x, int y, int z) { return x ^ (y z); } int g6 (int x, int y, int z) { return (y z) - x; } as well as the testcase. Thanks, Richard gcc/ * config/arm/iterators.md (shiftable_ops): New code iterator. (t2_binop0, arith_shift_insn): New code attributes. * config/arm/arm.md (insn_enabled): Delete. (enabled): Remove insn_enabled test. (*arith_shiftsi): Split out... (*arith_multsi): ...this pattern and remove insn_enabled attribute. Thanks, Richard. I've tweaked this as followed and committed it. I now consider shift_operator in the arm backend deprecated. We should be moving towards using shift_nomul_operator. There's one final wart still to be handled, though. 'rotate' can only take an immediate operand, not a register. We can currently deal with this, but it's not clean in terms of constraint handling. I'll see if I can fix this up sometime, but not today. Thanks for picking it up. I realised later that I'd fluffed the MULT check in: @@ -9876,38 +9868,34 @@ ;; Patterns to allow combination of arithmetic, cond code and shifts -(define_insn *arith_shiftsi - [(set (match_operand:SI 0 s_register_operand =r,r,r,r) -(match_operator:SI 1 shiftable_operator - [(match_operator:SI 3 shift_operator - [(match_operand:SI 4 s_register_operand r,r,r,r) - (match_operand:SI 5 shift_amount_operand M,M,M,r)]) - (match_operand:SI 2 s_register_operand rk,rk,r,rk)]))] +(define_insn *arith_shift_insn_multsi + [(set (match_operand:SI 0 s_register_operand =r,r) + (shiftable_ops:SI + (mult:SI (match_operand:SI 2 s_register_operand r,r) + (match_operand:SI 3 power_of_two_operand )) + (match_operand:SI 1 s_register_operand rk,t2_binop0)))] TARGET_32BIT - %i1%?\\t%0, %2, %4%S3 + arith_shift_insn%?\\t%0, %1, %2, lsl %b3 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr shift 4) + (set_attr arch a,t2) + (set_attr type alu_shift_imm)]) + +(define_insn *arith_shift_insn_shiftsi + [(set (match_operand:SI 0 s_register_operand =r,r,r) + (shiftable_ops:SI + (match_operator:SI 2 shift_nomul_operator + [(match_operand:SI 3 s_register_operand r,r,r) +(match_operand:SI 4 shift_amount_operand M,M,r)]) + (match_operand:SI 1 s_register_operand rk,t2_binop0,rk)))] + TARGET_32BIT GET_CODE (operands[3]) != MULT ...this condition: operands[3] was the old numbering of the operator rather than the new numbering. It looks like shift_nomul_operator should make it redundant anyway. Richard
[PATCH, i386, Pointer Bounds Checker 11/x] Keep bounds initial values
Hi, This patch tries to keep bounds initial values when it may be needed. Even if initial value is not fully known (e.g. we know only low bound) it still may help to remove some redundant checks. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-05-29 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (symtab_remove_unreachable_nodes): Kepp initial values for pointer bounds to be used for checks eliminations. * lto-cgraph.c (compute_ltrans_boundary): Likewise. * add_references_to_partition (add_references_to_partition): Add references to pointer bounds vars. diff --git a/gcc/ipa.c b/gcc/ipa.c index 1d7fa35..958cabe 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -568,7 +568,8 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file) vnode-aux = NULL; /* Keep body if it may be useful for constant folding. */ - if ((init = ctor_for_folding (vnode-decl)) == error_mark_node) + if ((init = ctor_for_folding (vnode-decl)) == error_mark_node + !POINTER_BOUNDS_P (vnode-decl)) varpool_remove_initializer (vnode); else DECL_INITIAL (vnode-decl) = init; diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 58105f0..7089516 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -829,7 +829,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder) { if (!lto_symtab_encoder_encode_initializer_p (encoder, vnode) - ctor_for_folding (vnode-decl) != error_mark_node) + (ctor_for_folding (vnode-decl) != error_mark_node + || POINTER_BOUNDS_P (vnode-decl))) { lto_set_symtab_encoder_encode_initializer (encoder, vnode); add_references (encoder, vnode-ref_list); diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 2967d73..330253b 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -96,7 +96,8 @@ add_references_to_partition (ltrans_partition part, symtab_node *node) Recursively look into the initializers of the constant variable and add references, too. */ else if (is_a varpool_node (ref-referred) - ctor_for_folding (ref-referred-decl) != error_mark_node + (ctor_for_folding (ref-referred-decl) != error_mark_node +|| POINTER_BOUNDS_P (ref-referred-decl)) !lto_symtab_encoder_in_partition_p (part-encoder, ref-referred)) { if (!part-initializers_visited)
[PATCH, i386, Pointer Bounds Checker 12/x] Recognize instrumented special functions
Hi, This patch allows to recognize instrumented call to special function by using the original function name for recognition. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * calls.c (special_function_p): Use original decl name when analyzing instrumentation clone. diff --git a/gcc/calls.c b/gcc/calls.c index f0c92dd..e1dc8eb 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -502,8 +502,16 @@ emit_call_1 (rtx funexp, tree fntree ATTRIBUTE_UNUSED, tree fndecl ATTRIBUTE_UNU static int special_function_p (const_tree fndecl, int flags) { - if (fndecl DECL_NAME (fndecl) - IDENTIFIER_LENGTH (DECL_NAME (fndecl)) = 17 + tree name_decl = DECL_NAME (fndecl); + + /* For instrumentation clones we want to derive flags + from the original name. */ + if (cgraph_get_node (fndecl) + cgraph_get_node (fndecl)-instrumentation_clone) +name_decl = DECL_NAME (cgraph_get_node (fndecl)-orig_decl); + + if (fndecl name_decl + IDENTIFIER_LENGTH (name_decl) = 17 /* Exclude functions not at the file scope, or not `extern', since they are not the magic functions we would otherwise think they are. @@ -515,16 +523,16 @@ special_function_p (const_tree fndecl, int flags) || TREE_CODE (DECL_CONTEXT (fndecl)) == TRANSLATION_UNIT_DECL) TREE_PUBLIC (fndecl)) { - const char *name = IDENTIFIER_POINTER (DECL_NAME (fndecl)); + const char *name = IDENTIFIER_POINTER (name_decl); const char *tname = name; /* We assume that alloca will always be called by name. It makes no sense to pass it as a pointer-to-function to anything that does not understand its behavior. */ - if (((IDENTIFIER_LENGTH (DECL_NAME (fndecl)) == 6 + if (((IDENTIFIER_LENGTH (name_decl) == 6 name[0] == 'a' ! strcmp (name, alloca)) - || (IDENTIFIER_LENGTH (DECL_NAME (fndecl)) == 16 + || (IDENTIFIER_LENGTH (name_decl) == 16 name[0] == '_' ! strcmp (name, __builtin_alloca flags |= ECF_MAY_BE_ALLOCA;
[PATCH, Pointer Bounds Checker 13/x] Early versioning
Hi, This patch allows to perform function versioning when some structures are not available yet. It is required to make clones for Pointer Bounds Checker right after SSA build. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-29 Ilya Enkovich ilya.enkov...@intel.com * tree-inline.c (copy_cfg_body): Check loop tree existence before accessing it. (tree_function_versioning): Check DF info existence before accessing it. diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index 4293241..23fef90 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -2544,7 +2544,8 @@ copy_cfg_body (copy_body_data * id, gcov_type count, int frequency_scale, /* If the loop tree in the source function needed fixup, mark the destination loop tree for fixup, too. */ - if (loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) + if (loops_for_fn (src_cfun) + loops_for_fn (src_cfun)-state LOOPS_NEED_FIXUP) loops_state_set (LOOPS_NEED_FIXUP); if (gimple_in_ssa_p (cfun)) @@ -5350,8 +5351,9 @@ tree_function_versioning (tree old_decl, tree new_decl, DECL_ARGUMENTS (new_decl) = DECL_ARGUMENTS (old_decl); initialize_cfun (new_decl, old_decl, old_entry_block-count); - DECL_STRUCT_FUNCTION (new_decl)-gimple_df-ipa_pta -= id.src_cfun-gimple_df-ipa_pta; + if (DECL_STRUCT_FUNCTION (new_decl)-gimple_df) +DECL_STRUCT_FUNCTION (new_decl)-gimple_df-ipa_pta + = id.src_cfun-gimple_df-ipa_pta; /* Copy the function's static chain. */ p = DECL_STRUCT_FUNCTION (old_decl)-static_chain_decl;
RE: [PATCH] Inline asm asan instrumentation
Actually I do not think that this is good idea to use constraints in a such arbitrary way. By setting constraints user takes responsibility on himself. So even if full inline asm support will be done one day, I do think that checking memory constraints will be still exist. It is the same situation as with compiler warnings - sometimes they are bothering but if you think you do not need them - just do not use them. -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Thursday, May 29, 2014 2:09 PM To: Marat Zakirov Cc: gcc-patches@gcc.gnu.org; 'Konstantin Serebryany'; 'Viacheslav Garbuzov'; 'Yuri Gribov'; 'Marat Zakirov' Subject: Re: [PATCH] Inline asm asan instrumentation On Wed, May 28, 2014 at 05:33:44PM +0400, Marat Zakirov wrote: Here's a patch for optional Asan instrumentation of inline assembly. This version scans gimple for GIMPLE_ASMs and performs usual instrumentation of arguments with memory constraints (m, o, etc.) with fixed size. That doesn't look right to me. The fact that some region appears in m doesn't mean the inline asm actually accesses it, it could not touch it at all, or only some part of it. If you look e.g. at Linux kernel headers, you'll see lots of struct __large_struct { unsigned long buf[100]; }; #define __m(x) (*(struct __large_struct __user *)(x)) ... m (__m(addr)) and similar cases, if Asan wants to check that the whole 100*sizeof(long) region is accessible, it could often just have false positives, because the inline asm really accesses just some small part of it. Jakub
Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header
On 10-05-14 22:24, Richard Sandiford wrote: /* A set of flags on a symbol_ref that are, in some respects, redundant with information derivable from the tree decl associated with this symbol. @@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \ this information to avoid recomputing it. Finally, this allows space for the target to store more than one bit of information, as with SYMBOL_REF_FLAG. */ -#define SYMBOL_REF_FLAGS(RTX) X0INT ((RTX), 1) +#define SYMBOL_REF_FLAGS(RTX) \ + (RTL_FLAG_CHECK1 (SYMBOL_REF_FLAGS, (RTX), SYMBOL_REF) \ + -u2.symbol_ref_flags) Richard, with an arm-linux-gnueabi non-bootstrap build with --enable-checking=yes,rtl, I ran into the following error: ... /home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal compiler error: RTL check: attempt to treat non-block symbol as a block symbol in create_block_symbol, at varasm.c:394 }; ^ 0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*) /home/vries/gcc_versions/devel/src/gcc/rtl.c:844 0x103c09d create_block_symbol /home/vries/gcc_versions/devel/src/gcc/varasm.c:394 0x103f42d make_decl_rtl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1379 0x103fc87 notice_global_symbol(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1552 0x7588bf varpool_finalize_decl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823 0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int) /home/vries/gcc_versions/devel/src/gcc/passes.c:241 0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*, tree_node*) /home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521 0x5e8586 c_parser_declaration_or_fndef /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782 0x5e7644 c_parser_external_declaration /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399 0x5e72c7 c_parser_translation_unit /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286 0x606c6d c_parse_file() /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077 0x66b7fa c_common_parse_file() /home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. ... It looks like BLOCK_SYMBOL_CHECK hasn't been updated. Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds? Thanks, - Tom 2014-05-29 Tom de Vries t...@codesourcery.com * rtl.h (BLOCK_SYMBOL_CHECK): Use SYMBOL_REF_FLAGS. --- gcc/rtl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/rtl.h b/gcc/rtl.h index 02ce424..51cfae5 100644 --- a/gcc/rtl.h +++ b/gcc/rtl.h @@ -708,7 +708,7 @@ struct GTY(()) rtvec_def { #define BLOCK_SYMBOL_CHECK(RTX) __extension__\ ({ __typeof (RTX) const _symbol = (RTX);\ - const unsigned int flags = RTL_CHECKC1 (_symbol, 1, SYMBOL_REF).rt_int; \ + const unsigned int flags = SYMBOL_REF_FLAGS (_symbol); \ if ((flags SYMBOL_FLAG_HAS_BLOCK_INFO) == 0) \ rtl_check_failed_block_symbol (__FILE__, __LINE__, \ __FUNCTION__); \ -- 1.9.1
Re: [DOC PATCH] Rewrite docs for inline asm
On 05/29/2014 11:22 AM, Eric Botcazou wrote: Yes. We already know that this is better than the current docs. Let's check it in. As far as I can see you did it, but didn't add a ChangeLog entry (so David isn't properly credited with the rewrite)? Fixed. Thanks, Andrew.
Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header
Tom de Vries tom_devr...@mentor.com writes: On 10-05-14 22:24, Richard Sandiford wrote: /* A set of flags on a symbol_ref that are, in some respects, redundant with information derivable from the tree decl associated with this symbol. @@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \ this information to avoid recomputing it. Finally, this allows space for the target to store more than one bit of information, as with SYMBOL_REF_FLAG. */ -#define SYMBOL_REF_FLAGS(RTX) X0INT ((RTX), 1) +#define SYMBOL_REF_FLAGS(RTX) \ + (RTL_FLAG_CHECK1 (SYMBOL_REF_FLAGS, (RTX), SYMBOL_REF) \ + -u2.symbol_ref_flags) Richard, with an arm-linux-gnueabi non-bootstrap build with --enable-checking=yes,rtl, I ran into the following error: ... /home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal compiler error: RTL check: attempt to treat non-block symbol as a block symbol in create_block_symbol, at varasm.c:394 }; ^ 0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*) /home/vries/gcc_versions/devel/src/gcc/rtl.c:844 0x103c09d create_block_symbol /home/vries/gcc_versions/devel/src/gcc/varasm.c:394 0x103f42d make_decl_rtl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1379 0x103fc87 notice_global_symbol(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1552 0x7588bf varpool_finalize_decl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823 0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int) /home/vries/gcc_versions/devel/src/gcc/passes.c:241 0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*, tree_node*) /home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521 0x5e8586 c_parser_declaration_or_fndef /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782 0x5e7644 c_parser_external_declaration /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399 0x5e72c7 c_parser_translation_unit /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286 0x606c6d c_parse_file() /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077 0x66b7fa c_common_parse_file() /home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. ... It looks like BLOCK_SYMBOL_CHECK hasn't been updated. Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds? Can't really approve it, but it looks obviously correct to me. Thanks for the fix. Richard
Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header
On 05/29/14 05:27, Tom de Vries wrote: On 10-05-14 22:24, Richard Sandiford wrote: /* A set of flags on a symbol_ref that are, in some respects, redundant with information derivable from the tree decl associated with this symbol. @@ -1791,7 +1794,9 @@ #define SYMBOL_REF_CONSTANT(RTX) \ this information to avoid recomputing it. Finally, this allows space for the target to store more than one bit of information, as with SYMBOL_REF_FLAG. */ -#define SYMBOL_REF_FLAGS(RTX)X0INT ((RTX), 1) +#define SYMBOL_REF_FLAGS(RTX) \ + (RTL_FLAG_CHECK1 (SYMBOL_REF_FLAGS, (RTX), SYMBOL_REF) \ + -u2.symbol_ref_flags) Richard, with an arm-linux-gnueabi non-bootstrap build with --enable-checking=yes,rtl, I ran into the following error: ... /home/vries/gcc_versions/devel/src/libgcc/libgcc2.c:819:1: internal compiler error: RTL check: attempt to treat non-block symbol as a block symbol in create_block_symbol, at varasm.c:394 }; ^ 0xc3c16b rtl_check_failed_block_symbol(char const*, int, char const*) /home/vries/gcc_versions/devel/src/gcc/rtl.c:844 0x103c09d create_block_symbol /home/vries/gcc_versions/devel/src/gcc/varasm.c:394 0x103f42d make_decl_rtl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1379 0x103fc87 notice_global_symbol(tree_node*) /home/vries/gcc_versions/devel/src/gcc/varasm.c:1552 0x7588bf varpool_finalize_decl(tree_node*) /home/vries/gcc_versions/devel/src/gcc/cgraphunit.c:823 0xb4eaa0 rest_of_decl_compilation(tree_node*, int, int) /home/vries/gcc_versions/devel/src/gcc/passes.c:241 0x5902c4 finish_decl(tree_node*, unsigned int, tree_node*, tree_node*, tree_node*) /home/vries/gcc_versions/devel/src/gcc/c/c-decl.c:4521 0x5e8586 c_parser_declaration_or_fndef /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1782 0x5e7644 c_parser_external_declaration /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1399 0x5e72c7 c_parser_translation_unit /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:1286 0x606c6d c_parse_file() /home/vries/gcc_versions/devel/src/gcc/c/c-parser.c:14077 0x66b7fa c_common_parse_file() /home/vries/gcc_versions/devel/src/gcc/c-family/c-opts.c:1067 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. ... It looks like BLOCK_SYMBOL_CHECK hasn't been updated. Patch below fixes it for me. OK for trunk if bootstrap on x86_64 succeeds? Yes. Ok. jeff
Re: [PATCH 7/7] Move SYMBOL_REF_FLAGS to rtx header
On 05/29/14 06:07, Richard Sandiford wrote: Can't really approve it, but it looks obviously correct to me. Thanks for the fix. Is that something you'd like to see changed? Jeff
[PATCH AArch64 0/2] Correct signedness of builtins, remove casts from arm_neon.h
The __builtin_ functions registered by aarch64_init_simd_builtins use signed and/or unsigned types according to the qualifiers defined in aarch64-builtins.c and used in aarch64-simd-builtins.def. These __builtin functions are then used in arm_neon.h, with explicit casts converting between the signed/unsigned types declared for the intrinsics and for the builtins. These two patches add a few more sets of qualifiers, and use existing ones more widely, allowing removing lots of the explicit casts from arm_neon.h. There is no particular logic in the division between the two patches beyond that the second patch uses sets of qualifiers defined in the first. No regressions on aarch64-none-elf or aarch64_be-none-elf.
[PATCH AArch64 1/2] Correct signedness of builtins, remove casts from arm_neon.h
This adds three new sets of qualifiers to aarch64-builtins.c, and uses the already-present-but-unused USHIFTIMM. gcc/ChangeLog: * gcc/config/aarch64/aarch64-builtins.c (aarch64_types_binop_uus_qualifiers, aarch64_types_shift_to_unsigned_qualifiers, aarch64_types_unsigned_shiftacc_qualifiers): Define. * gcc/config/aarch64/aarch64-simd-builtins.def (uqshl, uqrshl, uqadd, uqsub, usqadd, usra_n, ursra_n, uqshrn_n, uqrshrn_n, usri_n, usli_n, sqshlu_n, uqshl_n): Update qualifiers. * gcc/config/aarch64/arm_neon.h (vqadd_u8, vqadd_u16, vqadd_u32, vqadd_u64, vqaddq_u8, vqaddq_u16, vqaddq_u32, vqaddq_u64, vqsub_u8, vqsub_u16, vqsub_u32, vqsub_u64, vqsubq_u8, vqsubq_u16, vqsubq_u32, vqsubq_u64, vqaddb_u8, vqaddh_u16, vqadds_u32, vqaddd_u64, vqrshl_u8, vqrshl_u16, vqrshl_u32, vqrshl_u64, vqrshlq_u8, vqrshlq_u16, vqrshlq_u32, vqrshlq_u64, vqrshlb_u8, vqrshlh_u16, vqrshls_u32, vqrshld_u64, vqrshrn_n_u16, vqrshrn_n_u32, vqrshrn_n_u64, vqrshrnh_n_u16, vqrshrns_n_u32, vqrshrnd_n_u64, vqshl_u8, vqshl_u16, vqshl_u32, vqshl_u64, vqshlq_u8, vqshlq_u16, vqshlq_u32, vqshlq_u64, vqshlb_u8, vqshlh_u16, vqshls_u32, vqshld_u64, vqshl_n_u8, vqshl_n_u16, vqshl_n_u32, vqshl_n_u64, vqshlq_n_u8, vqshlq_n_u16, vqshlq_n_u32, vqshlq_n_u64, vqshlb_n_u8, vqshlh_n_u16, vqshls_n_u32, vqshld_n_u64, vqshlu_n_s8, vqshlu_n_s16, vqshlu_n_s32, vqshlu_n_s64, vqshluq_n_s8, vqshluq_n_s16, vqshluq_n_s32, vqshluq_n_s64, vqshlub_n_s8, vqshluh_n_s16, vqshlus_n_s32, vqshlud_n_s64, vqshrn_n_u16, vqshrn_n_u32, vqshrn_n_u64, vqshrnh_n_u16, vqshrns_n_u32, vqshrnd_n_u64, vqsubb_u8, vqsubh_u16, vqsubs_u32, vqsubd_u64, vrsra_n_u8, vrsra_n_u16, vrsra_n_u32, vrsra_n_u64, vrsraq_n_u8, vrsraq_n_u16, vrsraq_n_u32, vrsraq_n_u64, vrsrad_n_u64, vsli_n_u8, vsli_n_u16, vsli_n_u32,vsli_n_u64, vsliq_n_u8, vsliq_n_u16, vsliq_n_u32, vsliq_n_u64, vslid_n_u64, vsqadd_u8, vsqadd_u16, vsqadd_u32, vsqadd_u64, vsqaddq_u8, vsqaddq_u16, vsqaddq_u32, vsqaddq_u64, vsqaddb_u8, vsqaddh_u16, vsqadds_u32, vsqaddd_u64, vsra_n_u8, vsra_n_u16, vsra_n_u32, vsra_n_u64, vsraq_n_u8, vsraq_n_u16, vsraq_n_u32, vsraq_n_u64, vsrad_n_u64, vsri_n_u8, vsri_n_u16, vsri_n_u32, vsri_n_u64, vsriq_n_u8, vsriq_n_u16, vsriq_n_u32, vsriq_n_u64, vsrid_n_u64): Remove casts. Alan Lawrence wrote: The __builtin_ functions registered by aarch64_init_simd_builtins use signed and/or unsigned types according to the qualifiers defined in aarch64-builtins.c and used in aarch64-simd-builtins.def. These __builtin functions are then used in arm_neon.h, with explicit casts converting between the signed/unsigned types declared for the intrinsics and for the builtins. These two patches add a few more sets of qualifiers, and use existing ones more widely, allowing removing lots of the explicit casts from arm_neon.h. There is no particular logic in the division between the two patches beyond that the second patch uses sets of qualifiers defined in the first. No regressions on aarch64-none-elf or aarch64_be-none-elf. diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 591260f18bcc084bcc6cc16b6597a3d2ec098d05..036da3e8646124cda7e1cb36db2ec48f84b58214 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -173,6 +173,10 @@ aarch64_types_binopu_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned }; #define TYPES_BINOPU (aarch64_types_binopu_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_binop_uus_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_unsigned, qualifier_none }; +#define TYPES_BINOP_UUS (aarch64_types_binop_uus_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly }; #define TYPES_BINOPP (aarch64_types_binopp_qualifiers) @@ -199,9 +203,14 @@ aarch64_types_getlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] #define TYPES_GETLANE (aarch64_types_getlane_qualifiers) #define TYPES_SHIFTIMM (aarch64_types_getlane_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_shift_to_unsigned_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_unsigned, qualifier_none, qualifier_immediate }; +#define TYPES_SHIFTIMM_USS (aarch64_types_shift_to_unsigned_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_unsigned_shift_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate }; #define TYPES_USHIFTIMM (aarch64_types_unsigned_shift_qualifiers) + static enum aarch64_type_qualifiers aarch64_types_setlane_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_none, qualifier_none, qualifier_immediate }; @@
[PATCH AArch64 2/2] Correct signedness of builtins, remove casts from arm_neon.h
This adds another set of qualifiers to aarch64-builtins.c, and removes more casts from arm_neon.h, for the suqadd, ushl, urshl, urshr_n, ushll_n, and sshl intrinsics. gcc/ChangeLog: * gcc/config/aarch64/aarch64-builtins.c (aarch64_types_binop_ssu_qualifiers): New static data. (TYPES_BINOP_SSU): Define. * gcc/config/aarch64/aarch64-simd-builtins.def (suqadd, ushl, urshl, urshr_n, ushll_n): Use appropriate unsigned qualifiers. * gcc/config/aarch64/arm_neon.h (vrshl_u8, vrshl_u16, vrshl_u32, vrshl_u64, vrshlq_u8, vrshlq_u16, vrshlq_u32, vrshlq_u64, vrshld_u64, vrshr_n_u8, vrshr_n_u16, vrshr_n_u32, vrshr_n_u64, vrshrq_n_u8, vrshrq_n_u16, vrshrq_n_u32, vrshrq_n_u64, vrshrd_n_u64, vshll_n_u8, vshll_n_u16, vshll_n_u32, vuqadd_s8, vuqadd_s16, vuqadd_s32, vuqadd_s64, vuqaddq_s8, vuqaddq_s16, vuqaddq_s32, vuqaddq_s64, vuqaddb_s8, vuqaddh_s16, vuqadds_s32, vuqaddd_s64): Add signedness suffix to builtin function name, remove cast. (vshl_s8, vshl_s16, vshl_s32, vshl_s64, vshl_u8, vshl_u16, vshl_u32, vshl_u64, vshlq_s8, vshlq_s16, vshlq_s32, vshlq_s64, vshlq_u8, vshlq_u16, vshlq_u32, vshlq_u64, vshld_s64, vshld_u64): Remove cast.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 036da3e8646124cda7e1cb36db2ec48f84b58214..481f68fe7f070545b63d39d2589720566288d638 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -177,6 +177,10 @@ aarch64_types_binop_uus_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_unsigned, qualifier_unsigned, qualifier_none }; #define TYPES_BINOP_UUS (aarch64_types_binop_uus_qualifiers) static enum aarch64_type_qualifiers +aarch64_types_binop_ssu_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_none, qualifier_unsigned }; +#define TYPES_BINOP_SSU (aarch64_types_binop_ssu_qualifiers) +static enum aarch64_type_qualifiers aarch64_types_binopp_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_poly, qualifier_poly, qualifier_poly }; #define TYPES_BINOPP (aarch64_types_binopp_qualifiers) diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index 03e820f426fb0d99634a3e3d675ab648fca1691b..8e3dcf1d04de2e8ccef75bd2f37d5a5ac17321c0 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -86,7 +86,7 @@ BUILTIN_VSDQ_I (BINOP, sqsub, 0) BUILTIN_VSDQ_I (BINOPU, uqsub, 0) /* Implemented by aarch64_surqaddmode. */ - BUILTIN_VSDQ_I (BINOP, suqadd, 0) + BUILTIN_VSDQ_I (BINOP_SSU, suqadd, 0) BUILTIN_VSDQ_I (BINOP_UUS, usqadd, 0) /* Implemented by aarch64_get_dregVSTRUCT:modeVDC:mode. */ @@ -197,9 +197,9 @@ BUILTIN_VSDQ_I_DI (BINOP, ashl, 3) /* Implemented by aarch64_surshlmode. */ BUILTIN_VSDQ_I_DI (BINOP, sshl, 0) - BUILTIN_VSDQ_I_DI (BINOP, ushl, 0) + BUILTIN_VSDQ_I_DI (BINOP_UUS, ushl, 0) BUILTIN_VSDQ_I_DI (BINOP, srshl, 0) - BUILTIN_VSDQ_I_DI (BINOP, urshl, 0) + BUILTIN_VSDQ_I_DI (BINOP_UUS, urshl, 0) BUILTIN_VDQ_I (SHIFTIMM, ashr, 3) VAR1 (SHIFTIMM, ashr_simd, 0, di) @@ -207,7 +207,7 @@ VAR1 (USHIFTIMM, lshr_simd, 0, di) /* Implemented by aarch64_surshr_nmode. */ BUILTIN_VSDQ_I_DI (SHIFTIMM, srshr_n, 0) - BUILTIN_VSDQ_I_DI (SHIFTIMM, urshr_n, 0) + BUILTIN_VSDQ_I_DI (USHIFTIMM, urshr_n, 0) /* Implemented by aarch64_sursra_nmode. */ BUILTIN_VSDQ_I_DI (SHIFTACC, ssra_n, 0) BUILTIN_VSDQ_I_DI (USHIFTACC, usra_n, 0) @@ -215,7 +215,7 @@ BUILTIN_VSDQ_I_DI (USHIFTACC, ursra_n, 0) /* Implemented by aarch64_surshll_nmode. */ BUILTIN_VDW (SHIFTIMM, sshll_n, 0) - BUILTIN_VDW (SHIFTIMM, ushll_n, 0) + BUILTIN_VDW (USHIFTIMM, ushll_n, 0) /* Implemented by aarch64_surshll2_nmode. */ BUILTIN_VQW (SHIFTIMM, sshll2_n, 0) BUILTIN_VQW (SHIFTIMM, ushll2_n, 0) diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index af86d8e127404c7bdf2763cbefbdabb2db047dfc..f96871aeaa4613988352a10a2a2a718dd24702ab 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -22605,25 +22605,25 @@ vrshl_s64 (int64x1_t __a, int64x1_t __b) __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vrshl_u8 (uint8x8_t __a, int8x8_t __b) { - return (uint8x8_t) __builtin_aarch64_urshlv8qi ((int8x8_t) __a, __b); + return __builtin_aarch64_urshlv8qi_uus (__a, __b); } __extension__ static __inline uint16x4_t __attribute__ ((__always_inline__)) vrshl_u16 (uint16x4_t __a, int16x4_t __b) { - return (uint16x4_t) __builtin_aarch64_urshlv4hi ((int16x4_t) __a, __b); + return __builtin_aarch64_urshlv4hi_uus (__a, __b); } __extension__ static __inline uint32x2_t __attribute__ ((__always_inline__)) vrshl_u32 (uint32x2_t __a, int32x2_t __b) { - return (uint32x2_t) __builtin_aarch64_urshlv2si ((int32x2_t) __a, __b); + return
Re: [PATCH] Inline asm asan instrumentation
The fact that some region appears in m doesn't mean the inline asm actually accesses it, it could not touch it at all, or only some part of it. Do we have precise semantics of m written somewhere? My understanding was that even though asm may not touch buffer at all (like e.g. in our tests), user still declares whole region accessible to compiler. if Asan wants to check that the whole 100*sizeof(long) region is accessible, it could often just have false positives, because the inline asm really accesses just some small part of it. We've seen this abused (e.g. casting to struct { char x[0x]; } *) and that's main reason why we turned this off by default. On the other we've seen no problems with ffmpeg's testsuite and ability to detect overflows in inline asm would be rather useful. Do you see how we could make this more robustly? We could e.g. only check the first byte although this wouldn't be as useful. -Y
Re: [C++ Patch] PR 57543
On 05/28/2014 01:06 PM, Paolo Carlini wrote: Now, I got this insane idea: would it make sense to simply invert the substitutions (args and return) unconditionally? If we're going to change the order, I want to do it in a more correct, rather than differently wrong, way. DR 1227 clarified that substitution should proceed in lexical order. http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227 Jason
Re: [C++ Patch] PR 57543
Hi, On 05/29/2014 03:34 PM, Jason Merrill wrote: On 05/28/2014 01:06 PM, Paolo Carlini wrote: Now, I got this insane idea: would it make sense to simply invert the substitutions (args and return) unconditionally? If we're going to change the order, I want to do it in a more correct, rather than differently wrong, way. DR 1227 clarified that substitution should proceed in lexical order. http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227 Ok, I had no idea we had this kind of much more general issue. Then this is really something for you to handle ;) Thanks! Paolo.
Re: ipa-visibility TLC 2/n
Hi Honza, I can confirm that with your commit r211045 the arm-none-linux-gnueabi{hf} builds are OK now. Thanks for the fix. Yufeng On 05/28/14 22:56, Jan Hubicka wrote: Any update? I've managed to generate a simple test case from libstdc++-v3/src/c++98/strstream.cc which reproduces the issue on ARM that Ramana has reported previously: templateclass _CharT struct char_traits; templatetypename _CharT, typename _Traits class basic_ios { }; templatetypename _CharT, typename _Traits = char_traits_CharT class basic_istream : virtual public basic_ios_CharT, _Traits { protected: int _M_gcount; virtual ~basic_istream() { } }; class istrstream : public basic_istreamchar { virtual ~istrstream(); }; istrstream::~istrstream() { } -- CUT -- With an arm-none-linux-gnueabi gcc configured as: ./gcc/configure --target=arm-none-linux-gnueabi --enable-gnu-indirect-function --enable-shared --with-arch=armv7-a --with-fpu=vfpv3-d16 --with-float=softfp --with-arch=armv7-a (irrelevant parts omitted) With the following command line options: -fdata-sections-O2 -fPIC -S ./test.cpp We'll see ./test.cpp:17:7: error: istrstream::_ZTV10istrstream.localalias.0 causes a section type conflict with istrstream::_ZTV10istrstream class istrstream : public basic_istreamchar ^ ./test.cpp:17:7: note: 'istrstream::_ZTV10istrstream' was declared here This seems to be same cause as on AIX - we do section for decl rather than original. The following patch seems to fix it. Does it allows bootstrap for you? (it doesn't for AIX. but that seems bug in output machinery) Index: varasm.c === --- varasm.c(revision 210914) +++ varasm.c(working copy) @@ -1083,6 +1083,9 @@ { addr_space_t as = ADDR_SPACE_GENERIC; int reloc; + symtab_node *snode = symtab_get_node (decl); + if (snode) +decl = symtab_alias_ultimate_target (snode)-decl; if (TREE_TYPE (decl) != error_mark_node) as = TYPE_ADDR_SPACE (TREE_TYPE (decl)); Yufeng
Re: detecting container overflow bugs in std::vector
On 26/05/14 19:19 +0400, Konstantin Serebryany wrote: It does look useful but I'm concerned about a proliferation of container checks, we already have the libstdc++ Debug Mode and I'd like to see some of the lightweight checks from the Google branch added to trunk too. Me too, but these checks are mostly orthogonal to the proposed annotations. Thanks for clarifying that (and to Paul). Aren't they still much cheaper than asan instrumentation? Of course, they are much cheaper than asan. But they do not cover the case that motivated the container overflow annotations (when the contents of vector are accessed via vectorT::data()) Yes, I don't think I've ever seen that error in code I work with, but if Asan can be made to detect it then I'm in favour of the changes. Thanks.
[PATCH][ARM] Use mov_imm type for movw operations consistently
Hi all, I noticed that in some of our move patterns the movw instruction is given the mov_reg type rather than the mov_imm type that all other uses of movw have. This patch fixes that. Scanning through our pipeline descriptions I see that mov_imm is treated the same way as mov_reg everywhere anyway. In the Cortex-A7 description we do have a bit more complicated logic: ;; ALU instruction with an immediate operand can dual-issue. (define_insn_reservation cortex_a7_alu_imm 2 (and (eq_attr tune cortexa7) (ior (eq_attr type adr,alu_imm,alus_imm,logic_imm,logics_imm,\ mov_imm,mvn_imm,extend) (and (eq_attr type mov_reg,mov_shift,mov_shift_reg) (not (eq_attr length 8) cortex_a7_ex2|cortex_a7_ex1) In the two patterns that I change the mov_imm has a length of 4 an hence will still use this reservation. Thus I don't expect codegen to change at all from this patch but for future scheduling jobs this could make a difference. Tested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill 2014-05-29 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/thumb2.md (*thumb2_movhi_insn): Set type of movw to mov_imm. * config/arm/vfp.md (*thumb2_movsi_vfp): Likewise. diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 10bc8b1..6ea0810 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -329,7 +329,7 @@ movw%?\\t%0, %L1\\t%@ movhi str%(h%)\\t%1, %0\\t%@ movhi ldr%(h%)\\t%0, %1\\t%@ movhi - [(set_attr type mov_reg,mov_imm,mov_imm,mov_reg,store1,load1) + [(set_attr type mov_reg,mov_imm,mov_imm,mov_imm,store1,load1) (set_attr predicable yes) (set_attr predicable_short_it yes,no,yes,no,no,no) (set_attr length 2,4,2,4,4,4) diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index e1a48ee..8147624 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -100,7 +100,7 @@ [(set_attr predicable yes) (set_attr predicable_short_it yes,no,yes,no,no,no,no,no,no,no,no,no,no,no) - (set_attr type mov_reg,mov_reg,mov_reg,mvn_reg,mov_reg,load1,load1,store1,store1,f_mcr,f_mrc,fmov,f_loads,f_stores) + (set_attr type mov_reg,mov_reg,mov_reg,mvn_reg,mov_imm,load1,load1,store1,store1,f_mcr,f_mrc,fmov,f_loads,f_stores) (set_attr length 2,4,2,4,4,4,4,4,4,4,4,4,4,4) (set_attr pool_range *,*,*,*,*,1018,4094,*,*,*,*,*,1018,*) (set_attr neg_pool_range *,*,*,*,*, 0, 0,*,*,*,*,*,1008,*)]
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski pins...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote: On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. It leads to different values in case of symbol (0xfffc) + 8. Basically the current test for 31-bit (or less) precision is bogus. The real question is for a address computation, A + B, if address wrap-around is supported in convert_memory_address_addr_space. Unless the code has already reassociated the additions already. Like in the AARCH64 ILP32 case: (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ]) (const_int -4 [0xfffc])) (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0)) (const_int -1073742592 [0xbd00])) The Tree level is correct in that it did not reassociate the addition but the RTL level ignores that. So this patch is invalid and incorrect unless you know the non constant part of the addition is a pointer (which is not the case here). There is an address overflow. Is the address overflow behavior defined here? -- H.J.
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On May 29, 2014, at 9:13 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski pins...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote: On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. It leads to different values in case of symbol (0xfffc) + 8. Basically the current test for 31-bit (or less) precision is bogus. The real question is for a address computation, A + B, if address wrap-around is supported in convert_memory_address_addr_space. Unless the code has already reassociated the additions already. Like in the AARCH64 ILP32 case: (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ]) (const_int -4 [0xfffc])) (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0)) (const_int -1073742592 [0xbd00])) The Tree level is correct in that it did not reassociate the addition but the RTL level ignores that. So this patch is invalid and incorrect unless you know the non constant part of the addition is a pointer (which is not the case here). There is an address overflow. Is the address overflow behavior defined here? There was no address overflow in the original code and there was no address overflow in the tree level. The rtl level does introduce an address overflow but the semantics of plus is defined to be wrapping so there is no overflow. This is blocking me from testing ilp32 under gnu/Linux as ld.so gets miscompiled and stack addresses have the sign bit set. Thanks, Andrew -- H.J.
Re: [PATCH AArch64] Remove from arm_neon.h functions not in the spec
Patch retaining vfmaq_n_f64 attached, updated gcc/ChangeLog: * config/aarch64/arm_neon.h (vmlaq_n_f64, vmlsq_n_f64, vrsrtsq_f64, vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8, vcgez_u16, vcgez_u32, vcgez_u64, vcgezq_p8, vcgezq_u8, vcgezq_u16, vcgezq_u32, vcgezq_u64, vcgezd_u64, vcgt_p8, vcgtq_p8, vcgtz_p8, vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64, vcgtzq_p8, vcgtzq_u8, vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64, vcle_p8, vcleq_p8, vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8, vcltq_p8, vcltz_p8, vcltzq_p8, vcltzd_u64): Remove functions as they are not in the spec. Alan Lawrence wrote: No, hold that, vfmaq_n_f64 has been added back in the latest version (to which I linked). Hang on... --Alan Alan Lawrence wrote: arm_neon.h contains a bunch of functions (for example, the wonderful vcgez_u* intrinsics - that's an unsigned comparison of greater-than-or-equal-to zero) that are not present in the current ARM Neon Intrinsics spec: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/index.html This patch just deletes those intrinsics. OK for trunk? Cheers, Alan gcc/ChangeLog: 2014-05-27 Alan Lawrence alan.lawre...@arm.com * config/aarch64/arm_neon.h (vfmaq_n_f64, vmlaq_n_f64, vmlsq_n_f64, vrsrtsq_f64, vtst_p16, vtstq_p16, vcge_p8, vcgeq_p8, vcgez_p8, vcgez_u8, vcgez_u16, vcgez_u32, vcgez_u64, vcgezq_p8, vcgezq_u8, vcgezq_u16, vcgezq_u32, vcgezq_u64, vcgezd_u64, vcgt_p8, vcgtq_p8, vcgtz_p8, vcgtz_u8, vcgtz_u16, vcgtz_u32, vcgtz_u64, vcgtzq_p8, vcgtzq_u8, vcgtzq_u16, vcgtzq_u32, vcgtzq_u64, vcgtzd_u64, vcle_p8, vcleq_p8, vclez_p8, vclez_u64, vclezq_p8, vclezd_u64, vclt_p8, vcltq_p8, vcltz_p8, vcltzq_p8, vcltzd_u64): Remove functions as they are not in the spec. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 747a292ba9b2260e74566c946fe57afaea267969..bbf47349ae0e21761637a670c6e59c5c1e3f5195 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -7243,18 +7243,6 @@ vmlaq_n_f32 (float32x4_t a, float32x4_t b, float32_t c) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vmlaq_n_f64 (float64x2_t a, float64x2_t b, float64_t c) -{ - float64x2_t result; - float64x2_t t1; - __asm__ (fmul %1.2d, %3.2d, %4.d[0]; fadd %0.2d, %0.2d, %1.2d - : =w(result), =w(t1) - : 0(a), w(b), w(c) - : /* No clobbers */); - return result; -} - __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vmlaq_n_s16 (int16x8_t a, int16x8_t b, int16_t c) { @@ -7943,18 +7931,6 @@ vmlsq_n_f32 (float32x4_t a, float32x4_t b, float32_t c) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vmlsq_n_f64 (float64x2_t a, float64x2_t b, float64_t c) -{ - float64x2_t result; - float64x2_t t1; - __asm__ (fmul %1.2d, %3.2d, %4.d[0]; fsub %0.2d, %0.2d, %1.2d - : =w(result), =w(t1) - : 0(a), w(b), x(c) - : /* No clobbers */); - return result; -} - __extension__ static __inline int16x8_t __attribute__ ((__always_inline__)) vmlsq_n_s16 (int16x8_t a, int16x8_t b, int16_t c) { @@ -11329,17 +11305,6 @@ vrsqrtss_f32 (float32_t a, float32_t b) return result; } -__extension__ static __inline float64x2_t __attribute__ ((__always_inline__)) -vrsrtsq_f64 (float64x2_t a, float64x2_t b) -{ - float64x2_t result; - __asm__ (frsqrts %0.2d,%1.2d,%2.2d - : =w(result) - : w(a), w(b) - : /* No clobbers */); - return result; -} - __extension__ static __inline int8x16_t __attribute__ ((__always_inline__)) vrsubhn_high_s16 (int8x8_t a, int16x8_t b, int16x8_t c) { @@ -16082,13 +16047,6 @@ vcge_f64 (float64x1_t __a, float64x1_t __b) } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vcge_p8 (poly8x8_t __a, poly8x8_t __b) -{ - return (uint8x8_t) __builtin_aarch64_cmgev8qi ((int8x8_t) __a, - (int8x8_t) __b); -} - -__extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) vcge_s8 (int8x8_t __a, int8x8_t __b) { return (uint8x8_t) __builtin_aarch64_cmgev8qi (__a, __b); @@ -16152,13 +16110,6 @@ vcgeq_f64 (float64x2_t __a, float64x2_t __b) } __extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) -vcgeq_p8 (poly8x16_t __a, poly8x16_t __b) -{ - return (uint8x16_t) __builtin_aarch64_cmgev16qi ((int8x16_t) __a, - (int8x16_t) __b); -} - -__extension__ static __inline uint8x16_t __attribute__ ((__always_inline__)) vcgeq_s8 (int8x16_t __a, int8x16_t __b) { return (uint8x16_t) __builtin_aarch64_cmgev16qi (__a, __b); @@ -16252,14 +16203,6 @@ vcgez_f64 (float64x1_t __a) } __extension__ static __inline uint8x8_t __attribute__ ((__always_inline__)) -vcgez_p8 (poly8x8_t __a) -{ - poly8x8_t __b = {0, 0, 0, 0, 0, 0, 0, 0}; - return (uint8x8_t)
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On Thu, May 29, 2014 at 9:23 AM, pins...@gmail.com wrote: On May 29, 2014, at 9:13 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski pins...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote: On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. It leads to different values in case of symbol (0xfffc) + 8. Basically the current test for 31-bit (or less) precision is bogus. The real question is for a address computation, A + B, if address wrap-around is supported in convert_memory_address_addr_space. Unless the code has already reassociated the additions already. Like in the AARCH64 ILP32 case: (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ]) (const_int -4 [0xfffc])) (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0)) (const_int -1073742592 [0xbd00])) The Tree level is correct in that it did not reassociate the addition but the RTL level ignores that. So this patch is invalid and incorrect unless you know the non constant part of the addition is a pointer (which is not the case here). There is an address overflow. Is the address overflow behavior defined here? There was no address overflow in the original code and there was no address overflow in the tree level. The rtl level does introduce an address overflow but the semantics of plus is defined to be wrapping so there is no overflow. This is blocking me from testing ilp32 under gnu/Linux as ld.so gets miscompiled and stack addresses have the sign bit set. What is your Pmode? -- H.J.
Re: ipa-visibility TLC 2/n
Jan Hubicka hubi...@ucw.cz writes: Richard Sandiford wrote the original section anchors implementation, so he would be a good person to comment about the interaction between aliases and section anchors. Thanks! Richard, does this patch seem sane? Looks good to me in principle, but with: + struct symtab_node *snode; decl = SYMBOL_REF_DECL (symbol); + + snode = symtab_node (decl); + if (snode-alias) + { + rtx target = DECL_RTL (symtab_alias_ultimate_target (snode)-decl); + SYMBOL_REF_BLOCK_OFFSET (symbol) = SYMBOL_REF_BLOCK_OFFSET (target); + return; + } is SYMBOL_REF_BLOCK_OFFSET (target) guaranteed to be valid at this point? It looked at face value like you'd need a recursive call to place_block_symbol on the target before the copy. My reading was that SYMBOL_REF_BLOCK_OFFSET is computed at DECL_RTL calculation time. But you are right - it is done by validize_mem that is not done by DECL_RTL. Shall I just call it on target first? Honza Thanks, Richard
Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant
On May 29, 2014, at 10:09 AM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, May 29, 2014 at 9:23 AM, pins...@gmail.com wrote: On May 29, 2014, at 9:13 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, May 28, 2014 at 9:52 PM, Andrew Pinski pins...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:39 AM, H.J. Lu hjl.to...@gmail.com wrote: On Wed, Jul 13, 2011 at 9:13 AM, Paolo Bonzini bonz...@gnu.org wrote: On 07/11/2011 05:54 PM, H.J. Lu wrote: The key is the XEXP (x, 1) == convert_memory_address_addr_space (to_mode, XEXP (x, 1), as) test. It ensures basically that the constant has 31-bit precision, because otherwise the constant would change from e.g. (const_int -0x7ffc) to (const_int 0x8004) when zero-extending it from SImode to DImode. But I'm not sure it's safe. You have, (zero_extend:DI (plus:SI FOO:SI) (const_int Y)) and you want to convert it to (plus:DI FOO:DI (zero_extend:DI (const_int Y))) (where the zero_extend is folded). Ignore that FOO is a SYMBOL_REF (this piece of code does not assume anything about its shape); if FOO == 0xfffc and Y = 8, the result will be respectively 0x4 (valid) and 0x10004 (invalid). This example contradicts what you said above It ensures basically that the constant has 31-bit precision. Why? Certainly Y = 8 has 31-bit (or less) precision. So it has the same representation in SImode and DImode, and the test above on XEXP (x, 1) succeeds. And then we permute conversion and addition, which leads to the issue you raised above. In another word, the current code permutes conversion and addition. It leads to different values in case of symbol (0xfffc) + 8. Basically the current test for 31-bit (or less) precision is bogus. The real question is for a address computation, A + B, if address wrap-around is supported in convert_memory_address_addr_space. Unless the code has already reassociated the additions already. Like in the AARCH64 ILP32 case: (plus:SI (plus:SI (mult:SI (reg/v:SI 80 [ b ]) (const_int -4 [0xfffc])) (subreg/s/u:SI (reg/v/f:DI 79 [ a ]) 0)) (const_int -1073742592 [0xbd00])) The Tree level is correct in that it did not reassociate the addition but the RTL level ignores that. So this patch is invalid and incorrect unless you know the non constant part of the addition is a pointer (which is not the case here). There is an address overflow. Is the address overflow behavior defined here? There was no address overflow in the original code and there was no address overflow in the tree level. The rtl level does introduce an address overflow but the semantics of plus is defined to be wrapping so there is no overflow. This is blocking me from testing ilp32 under gnu/Linux as ld.so gets miscompiled and stack addresses have the sign bit set. What is your Pmode? Pmode is dimode while ptr_mode is simode. Pointers are zero extended when converting between si and di modes. Thanks, Andrew -- H.J.
Patch RFA: Move x86 _mm_pause out of pragma target(sse) scope
The _mm_pause intrinsic is defined in xmmintrin.h. Right now using it with -m32 with the default -march option gives an error: /home/iant/foo.c: In function ‘f’: /home/iant/gcc/go-install/lib/gcc/x86_64-unknown-linux-gnu/4.10.0/include/xmmintrin.h:1238:1: error: inlining failed in call to always_inline ‘_mm_pause’: target specific option mismatch _mm_pause (void) ^ /home/iant/foo5.c:3:13: error: called from here void f () { _mm_pause (); } ^ This error is because _mm_pause is defined in the scope of #pragma GCC target(sse). But _mm_pause, which simply generates the pause instruction, does not require SSE support. The pause instruction has nothing really to do with SSE, and it works on all x86 processors (on processors that do not explicitly recognize it, it is a nop). I propose the following patch, which moves _mm_pause out of the pragma target scope. I know that x86intrin.h provides a similar intrinsic, __pause, but I think it's worth making _mm_pause work reasonably as well. I'm running a full testsuite run. OK for mainline if it passes? Ian gcc/ChangeLog: 2014-05-29 Ian Lance Taylor i...@google.com * config/i386/xmmintrin.h (_mm_pause): Move out of scope of pragma target(sse). gcc/testsuite/ChangeLog: 2014-05-29 Ian Lance Taylor i...@google.com * gcc.target/i386/pause-2.c: New test. Index: config/i386/xmmintrin.h === --- config/i386/xmmintrin.h (revision 211057) +++ config/i386/xmmintrin.h (working copy) @@ -1231,15 +1231,6 @@ _mm_sfence (void) __builtin_ia32_sfence (); } -/* The execution of the next instruction is delayed by an implementation - specific amount of time. The instruction does not modify the - architectural state. */ -extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_pause (void) -{ - __builtin_ia32_pause (); -} - /* Transpose the 4x4 matrix composed of row[0-3]. */ #define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) \ do { \ @@ -1262,4 +1253,15 @@ do { \ #pragma GCC pop_options #endif /* __DISABLE_SSE__ */ +/* The execution of the next instruction is delayed by an implementation + specific amount of time. The instruction does not modify the + architectural state. This is after the pop_options pragma because + it does not require SSE support in the processor--the encoding is a + nop on processors that do not support it. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_pause (void) +{ + __builtin_ia32_pause (); +} + #endif /* _XMMINTRIN_H_INCLUDED */ Index: testsuite/gcc.target/i386/pause-2.c === --- testsuite/gcc.target/i386/pause-2.c (revision 0) +++ testsuite/gcc.target/i386/pause-2.c (revision 0) @@ -0,0 +1,12 @@ +/* Test that pause instruction works even when SSE is not enabled. */ +/* { dg-do compile } */ +/* { dg-options -O2 -dp } */ +/* { dg-final { scan-assembler-times \\*pause 1 } } */ + +#include xmmintrin.h + +void +foo (void) +{ + _mm_pause (); +}
Re: [AArch64/ARM 3/3] Add execution tests of ARM EXT intrinsics
I've just committed this as revision 211059, with the change of adding a _1 suffix to the names of all the new tests (as per standard testsuite convention). All passed on arm-none-eabi and armeb-none-eabi. Cheers, Alan Ramana Radhakrishnan wrote: On Wed, Apr 23, 2014 at 9:32 PM, Alan Lawrence alan.lawre...@arm.com wrote: Final patch in series, adds new tests of the ARM EXT Intrinsics, that also check the execution results, reusing the test bodies introduced into AArch64 in the first patch. (These tests subsume the autogenerated ones in testsuite/gcc.target/arm/neon/ that only check assembler output.) Tests use gcc.target/arm/simd/simd.exp from corresponding patch for ZIP Intrinsics http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01500.html, will commit that first. All tests passing on arm-none-eabi. Ok if no regressions. Thanks, Ramana gcc/testsuite/ChangeLog: 2014-04-23 Alan Lawrence alan.lawre...@arm.com gcc.target/arm/simd/vextQf32.c: New file. gcc.target/arm/simd/vextQp16.c: New file. gcc.target/arm/simd/vextQp8.c: New file. gcc.target/arm/simd/vextQs16.c: New file. gcc.target/arm/simd/vextQs32.c: New file. gcc.target/arm/simd/vextQs64.c: New file. gcc.target/arm/simd/vextQs8.c: New file. gcc.target/arm/simd/vextQu16.c: New file. gcc.target/arm/simd/vextQu32.c: New file. gcc.target/arm/simd/vextQu64.c: New file. gcc.target/arm/simd/vextQu8.c: New file. gcc.target/arm/simd/vextQp64.c: New file. gcc.target/arm/simd/vextf32.c: New file. gcc.target/arm/simd/vextp16.c: New file. gcc.target/arm/simd/vextp8.c: New file. gcc.target/arm/simd/vexts16.c: New file. gcc.target/arm/simd/vexts32.c: New file. gcc.target/arm/simd/vexts64.c: New file. gcc.target/arm/simd/vexts8.c: New file. gcc.target/arm/simd/vextu16.c: New file. gcc.target/arm/simd/vextu32.c: New file. gcc.target/arm/simd/vextu64.c: New file. gcc.target/arm/simd/vextu8.c: New file. gcc.target/arm/simd/vextp64.c: New file. diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQf32.c b/gcc/testsuite/gcc.target/arm/simd/vextQf32.c new file mode 100644 index 000..c1da6d3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vextQf32.c @@ -0,0 +1,12 @@ +/* Test the `vextQf32' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O3 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/extq_f32.x + +/* { dg-final { scan-assembler-times vext\.32\[ \t\]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 3 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp16.c b/gcc/testsuite/gcc.target/arm/simd/vextQp16.c new file mode 100644 index 000..adc0861 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vextQp16.c @@ -0,0 +1,12 @@ +/* Test the `vextQp16' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O3 -fno-inline } */ +/* { dg-add-options arm_neon } */ + +#include arm_neon.h +#include ../../aarch64/simd/extq_p16.x + +/* { dg-final { scan-assembler-times vext\.16\[ \t\]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 7 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp64.c b/gcc/testsuite/gcc.target/arm/simd/vextQp64.c new file mode 100644 index 000..e8b688d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vextQp64.c @@ -0,0 +1,33 @@ +/* Test the `vextQp64' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_crypto_ok } */ +/* { dg-options -save-temps -O3 -fno-inline } */ +/* { dg-add-options arm_crypto } */ + +#include arm_neon.h + +extern void abort (void); + +poly64x2_t +test_vextq_p64_1 (poly64x2_t a, poly64x2_t b) +{ + return vextq_p64(a, b, 1); +} + +int +main (int argc, char **argv) +{ + int i, off; + poly64x2_t in1 = {0, 1}; + poly64x2_t in2 = {2, 3}; + poly64x2_t actual = test_vextq_p64_1 (in1, in2); + for (i = 0; i 2; i++) +if (actual[i] != i + 1) + abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times vext\.64\[ \t\]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, \[qQ\]\[0-9\]+, #\[0-9\]+!?\(?:\[ \t\]+@\[a-zA-Z0-9 \]+\)?\n 1 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/simd/vextQp8.c b/gcc/testsuite/gcc.target/arm/simd/vextQp8.c new file mode 100644 index 000..5f2cc53 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/simd/vextQp8.c @@ -0,0 +1,12 @@ +/* Test the `vextQp8' ARM Neon intrinsic. */ + +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options -save-temps -O3 -fno-inline } */ +/* { dg-add-options arm_neon
patch to fix PR61325
The following patch PR61325. The details can be found on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61325 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 211060 to gcc-4.9 branch and as rev.211061 to trunk. 2014-05-29 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/61325 * lra-constraints.c (process_address): Rename to process_address_1. (process_address): New function. 2014-05-29 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/61325 * gcc.target/aarch64/pr61325.c: New. Index: lra-constraints.c === --- lra-constraints.c (revision 210973) +++ lra-constraints.c (working copy) @@ -2784,9 +2784,14 @@ Add reloads to the lists *BEFORE and *AFTER. We might need to add reloads to *AFTER because of inc/dec, {pre, post} modify in the - address. Return true for any RTL change. */ + address. Return true for any RTL change. + + The function is a helper function which does not produce all + transformations which can be necessary. It does just basic steps. + To do all necessary transformations use function + process_address. */ static bool -process_address (int nop, rtx *before, rtx *after) +process_address_1 (int nop, rtx *before, rtx *after) { struct address_info ad; rtx new_reg; @@ -2986,6 +2991,18 @@ return true; } +/* Do address reloads until it is necessary. Use process_address_1 as + a helper function. Return true for any RTL changes. */ +static bool +process_address (int nop, rtx *before, rtx *after) +{ + bool res = false; + + while (process_address_1 (nop, before, after)) +res = true; + return res; +} + /* Emit insns to reload VALUE into a new register. VALUE is an auto-increment or auto-decrement RTX whose operand is a register or memory location; so reloading involves incrementing that location. @@ -3270,7 +3287,7 @@ change_p = true; lra_update_dup (curr_id, i); } - + if (change_p) /* If we've changed the instruction then any alternative that we chose previously may no longer be valid. */ Index: testsuite/gcc.target/aarch64/pr61325.c === --- testsuite/gcc.target/aarch64/pr61325.c (revision 0) +++ testsuite/gcc.target/aarch64/pr61325.c (working copy) @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +typedef unsigned int wchar_t; +typedef long unsigned int size_t; + +size_t +wcstombs(char *s , const wchar_t *pwcs , size_t n) +{ + int count = 0; + + if (n != 0) { +do { + if ((*s++ = (char) *pwcs++) == 0) +break; + count++; +} while (--n != 0); + } + return count; +}
[patch, avr] ata6289 device ISA is updated
Hi, Device ATA6289 has MUL instruction and it belongs to avr4 ISA. Now it is incorrectly listed under avr25. Attached patch corrects it. Please commit if the patch is OK. I do not have commit access. Regards, Pitchumani 2014-05-29 Pitchumani Sivanupandi pitchuman...@atmel.com * config/avr/avr-mcus.def: Change ATA6289 ISA to AVR4 * config/avr/avr-tables.opt: Regenerate. * config/avr/t-multilib: Regenerate. * doc/avr-mmcu.texi: Regenerate. ata6289-isa-change.patch Description: ata6289-isa-change.patch
Re: [PATCH] Do not build libsanitizer also for powerpc*-*-linux*
On Wed, 2014-05-28 at 09:36 +0200, Thomas Schwinge wrote: Hi! On Mon, 26 May 2014 18:53:22 +0800, Arseny Solokha asolo...@gmx.com wrote: Recent changes in GetPcSpBp() (libsanitizer/asan/asan_linux.cc) made it impossible to build 4.10.0-alpha20140525 snapshot for powerpc targets. I hit this, too. The proposed patch disables building libsanitizer for powerpc*-*-linux* in addition to already disabled powerpc*le-*-linux* until the smarter solution will emerge. The actual issue preventing ASAN from porting to PPC seems to be inability to retrieve values of PC and BP on this architecture. This is being discussed in the thread at http://gcc.gnu.org/ml/gcc-patches/2014-05/msg02031.html. Until that has been resolved, I do agree to check in the following patch (and have successfully tested it, but cannot formally approve it for commit; thus copying the libsanitizer maintainers): The re-enablement patch was submitted to the llvm mailing list here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140526/219249.html Once that is committed and merged into gcc, we can re-enable building libsanitizer for powerpc*-linux. Peter
Re: RFA: A couple of ira_get_dup_out_num fixes
On 05/28/2014 04:32 PM, Richard Sandiford wrote: While working on patches to speed up the handling of constraints, I hit some behaviour in ira_get_dup_out_num that looked unintentional: - the check for output operands was part of the !ignored_p condition so would be skipped if the first alternative is disabled/excluded. - the first disabled/excluded alternative stops all following alternatives from being processed, since we get stuck in the first part of the if statement and never increment curr_alt. This seems to have some effect on the testsuite. E.g. at -O2 gcc.c-torture/compile/20071117-1.c has changes like: .LCFI2: movq%rsp, %rbx subq%rax, %rsp - leaq15(%rsp), %rax - andq$-16, %rax - movq%rax, %rdi + leaq15(%rsp), %rdi + andq$-16, %rdi callbar xorl%esi, %esi movq%rbx, %rsp There are also some cases where the change introduces a move though. E.g. gcc.c-torture/compat/struct-ic.c has: movabsq $4294967296, %rdx addq$8, %rsp .LCFI4: - andq%rdi, %rax + andq%rax, %rdi + movq%rdi, %rax orq %rdx, %rax ret .L9: But AFAICT the patch is what was originally intended. Tested on x86_64-linux-gnu. OK to install? Ok, Richard. Thanks for fixing this.
Re: [C++ Patch] PR 57543
Hi again, On 05/29/2014 03:34 PM, Jason Merrill wrote: On 05/28/2014 01:06 PM, Paolo Carlini wrote: Now, I got this insane idea: would it make sense to simply invert the substitutions (args and return) unconditionally? If we're going to change the order, I want to do it in a more correct, rather than differently wrong, way. DR 1227 clarified that substitution should proceed in lexical order. http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1227 So, here is another iteration, sorry about the ping-pong. I put together the below which already passes testing. How does it look? Thanks again for your patience, Paolo. / Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 211052) +++ cp/cp-tree.h(working copy) @@ -125,7 +125,7 @@ c-common.h, not after. Usage of TYPE_LANG_FLAG_?: 0: TYPE_DEPENDENT_P 1: TYPE_HAS_USER_CONSTRUCTOR. - 2: unused + 2: TYPE_HAS_LATE_RETURN_TYPE (in FUNCTION_TYPE, METHOD_TYPE) 3: TYPE_FOR_JAVA. 4: TYPE_HAS_NONTRIVIAL_DESTRUCTOR 5: CLASS_TYPE_P (in RECORD_TYPE and UNION_TYPE) @@ -3404,6 +3404,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a user-declared constructor. */ #define TYPE_HAS_USER_CONSTRUCTOR(NODE) (TYPE_LANG_FLAG_1 (NODE)) +/* Nonzero means that the FUNCTION_TYPE or METHOD_TYPE has a + late-specified return type. */ +#define TYPE_HAS_LATE_RETURN_TYPE(NODE) \ + (TYPE_LANG_FLAG_2 (FUNC_OR_METHOD_CHECK (NODE))) + /* When appearing in an INDIRECT_REF, it means that the tree structure underneath is actually a call to a constructor. This is needed when the constructor must initialize local storage (which can Index: cp/decl.c === --- cp/decl.c (revision 211052) +++ cp/decl.c (working copy) @@ -8817,6 +8817,7 @@ grokdeclarator (const cp_declarator *declarator, bool template_parm_flag = false; bool typedef_p = decl_spec_seq_has_spec_p (declspecs, ds_typedef); bool constexpr_p = decl_spec_seq_has_spec_p (declspecs, ds_constexpr); + bool late_return_type_p = false; source_location saved_loc = input_location; const char *errmsg; @@ -9660,6 +9661,9 @@ grokdeclarator (const cp_declarator *declarator, if (type == error_mark_node) return error_mark_node; + if (declarator-u.function.late_return_type) + late_return_type_p = true; + if (ctype == NULL_TREE decl_context == FIELD funcdecl_p @@ -10590,6 +10594,10 @@ grokdeclarator (const cp_declarator *declarator, decl_function_context (TYPE_MAIN_DECL (ctype)) : NULL_TREE; publicp = (! friendp || ! staticp) function_context == NULL_TREE; + + if (late_return_type_p) + TYPE_HAS_LATE_RETURN_TYPE (type) = 1; + decl = grokfndecl (ctype, type, TREE_CODE (unqualified_id) != TEMPLATE_ID_EXPR ? unqualified_id : dname, @@ -10814,6 +10822,9 @@ grokdeclarator (const cp_declarator *declarator, publicp = (ctype != NULL_TREE || storage_class != sc_static); + if (late_return_type_p) + TYPE_HAS_LATE_RETURN_TYPE (type) = 1; + decl = grokfndecl (ctype, type, original_name, parms, unqualified_id, virtualp, flags, memfn_quals, rqual, raises, 1, friendp, Index: cp/pt.c === --- cp/pt.c (revision 211052) +++ cp/pt.c (working copy) @@ -11322,8 +11322,42 @@ tsubst_function_type (tree t, /* The TYPE_CONTEXT is not used for function/method types. */ gcc_assert (TYPE_CONTEXT (t) == NULL_TREE); - /* Substitute the return type. */ - return_type = tsubst (TREE_TYPE (t), args, complain, in_decl); + /* DR 1227: Mixing immediate and non-immediate contexts in deduction + failure. */ + bool late_return_type_p = TYPE_HAS_LATE_RETURN_TYPE (t); + + if (late_return_type_p) +{ + /* Substitute the argument types. */ + arg_types = tsubst_arg_types (TYPE_ARG_TYPES (t), args, NULL_TREE, + complain, in_decl); + if (arg_types == error_mark_node) + return error_mark_node; + + tree save_ccp = current_class_ptr; + tree save_ccr = current_class_ref; + tree this_type = (TREE_CODE (t) == METHOD_TYPE + ? TREE_TYPE (TREE_VALUE (arg_types)) : NULL_TREE); + bool do_inject = this_type !dependent_type_p (this_type); + if (do_inject) + { + /* DR 1207: 'this' is in scope in the trailing return type. */ + inject_this_parameter (this_type, cp_type_quals (this_type)); + } + + /* Substitute the return type. */ + return_type = tsubst (TREE_TYPE (t), args, complain, in_decl); + +
libgo patch committed: Add --without-libatomic configure option
This patch from Peter Collingbourne adds a --without-libatomic configure option to libgo, to make it easier to build libgo outside of the GCC build system. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 9e7a28ffe425 libgo/Makefile.am --- a/libgo/Makefile.am Wed May 28 17:02:52 2014 -0700 +++ b/libgo/Makefile.am Thu May 29 13:05:27 2014 -0700 @@ -30,6 +30,8 @@ LIBFFI = @LIBFFI@ LIBFFIINCS = @LIBFFIINCS@ +LIBATOMIC = @LIBATOMIC@ + WARN_CFLAGS = $(WARN_FLAGS) $(WERROR) # -I/-D flags to pass when compiling. @@ -1949,8 +1951,7 @@ libgo_la_LIBADD = \ $(libgo_go_objs) ../libbacktrace/libbacktrace.la \ - ../libatomic/libatomic_convenience.la \ - $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS) + $(LIBATOMIC) $(LIBFFI) $(PTHREAD_LIBS) $(MATH_LIBS) $(NET_LIBS) libgobegin_a_SOURCES = \ runtime/go-main.c diff -r 9e7a28ffe425 libgo/configure.ac --- a/libgo/configure.ac Wed May 28 17:02:52 2014 -0700 +++ b/libgo/configure.ac Thu May 29 13:05:27 2014 -0700 @@ -122,6 +122,21 @@ AC_SUBST(LIBFFI) AC_SUBST(LIBFFIINCS) +# See if the user wants to configure without libatomic. This is useful if we are +# on an architecture for which libgo does not need an atomic support library and +# libatomic does not support our C compiler. +AC_ARG_WITH(libatomic, + AS_HELP_STRING([--without-libatomic], + [don't use libatomic]), + [:], + [with_libatomic=${with_libatomic_default-yes}]) + +LIBATOMIC= +if test $with_libatomic != no; then + LIBATOMIC=../libatomic/libatomic_convenience.la +fi +AC_SUBST(LIBATOMIC) + # Used to tell GNU make to include a file without telling automake to # include it. go_include=-include
Re: [DOC PATCH] Rewrite docs for inline asm
Fixed. Thanks! -- Eric Botcazou
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On Wed, 28 May 2014, Richard Earnshaw wrote: Ah, light dawns (maybe). I guess the problems stem from the attempts to combine Neon with ARMv5. Neon shouldn't be used with anything prior to ARMv7, since that's the earliest version of the architecture that can support it. Good to know, thanks for the hint. Anyway it's the test case doing something silly or maybe just odd. After all IIUC ARMv5 code will run just fine on ARMv7/NEON hardware so mixing up ARMv5 scalar code with NEON vector code is nothing wrong per se. I guess that what is happening is that we see we have Neon, so start to generate a Neon-based copy sequence, but then notice that we don't have misaligned access (something that must exist if we have Neon) and generate VLDR instructions in a mistaken attempt to work around the first inconsistency. Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6 also has misaligned access support). So to move away from the odd mixture of instruction selection options just as a quick test I rebuilt the same file with `-march=armv7-a -mno-unaligned-access' and the result is the same, a pair of VLDR instructions accessing unaligned memory, i.e. the same problem. So based on observations made so far I think there are two sensible ways to move forward: 1. Fix GCC so that a manual byte-wise copy is made whenever `-mno-unaligned-access' is in effect. 2. Revert the change being discussed here as its lone purpose was to disable the use of VLD1.8, etc. where `-mno-unaligned-access' is in effect, and it does no good. Maciej
run dsymutil post lto
Jack finally found the answer to a question I had back in 2010… Why, yes, one does have to arrange to run the post ld pass when lto runs but doesn’t have to relink. Committed revision 211067. Thanks Jack. PR debug/61352 * collect2.c (maybe_run_lto_and_relink): Be sure to always run post ld passes when lto is used. Index: collect2.c === --- collect2.c (revision 211062) +++ collect2.c (working copy) @@ -848,6 +848,8 @@ maybe_run_lto_and_relink (char **lto_ld_ fork_execute (ld, lto_ld_argv); post_ld_pass (false); } + else +post_ld_pass (true); } /* Main program. */