Re: [PATCH, i386] RTM support
The patch is OK for mainline, if there are no further comments in next 24h. Thank you! According to Tobias's input, I've added few lines about RTM to doc/invoke.texi. If no objections - I'll commit the patch tomorrow. Updated patch attached. Updated ChangeLog entry: 2012-03-11 Kirill Yukhin kirill.yuk...@intel.com * doc/invoke.texi: Document -mrtm option. * common/config/i386/i386-common.c (OPTION_MASK_ISA_RTM_SET): New. (OPTION_MASK_ISA_RTM_UNSET): Ditto. (ix86_handle_option): Handle OPT_mrtm. * config.gcc (i[34567]86-*-*): Add rtmintrin.h and xtestintrin.h. (x86_64-*-*): Ditto. * i386-builtin-types.def (INT_FTYPE_VOID): New. * config/i386/i386-c.c (ix86_target_macros_internal): Define __RTM__ if needed. (ix86_target_string): Define -mrtm option. (PTA_RTM): New. (ix86_option_override_internal): Extend cirei7-avx with RTM option. Handle new option. (ix86_valid_target_attribute_inner_p): Add OPT_mrtm. (ix86_builtins): Add IX86_BUILTIN_XBEGIN, IX86_BUILTIN_XEND, IX86_BUILTIN_XTEST. (bdesc_special_args): Ditto. (ix86_init_mmx_sse_builtins): Add IX86_BUILTIN_XABORT. (ix86_expand_special_args_builtin): Handle new built-in type. (ix86_expand_builtin): Handle XABORT instruction. * config/i386/i386.h (TARGET_RTM): New. * config/i386/i386.md (UNSPECV_XBEGIN): New. (UNSPECV_XEND): Ditto. (UNSPECV_XABORT): Ditto. (UNSPECV_XTEST): Ditto. (xbegin): Ditto. (xbegin_1): Ditto. (xend): Ditto. (xabort): Ditto (xtest): Ditto. (xtest_1): Ditto. * config/i386/i386.opt (mrtm): New. * config/i386/immintrin.h: Include rtmintrin.h and xtestintrin.h. * config/i386/rtmintrin.h: New header. * config/i386/xtestintrin.h: Ditto. Thanks, K
Re: PATCH: Check Pmode in lwp_slwpcb
On Sat, Mar 10, 2012 at 8:13 PM, H.J. Lu hongjiu...@intel.com wrote: Pmode may be SImode for TARGET_64BIT. This patch checks Pmode instead of TARGET_64BIT in lwp_slwpcb. Tested on Linux/x86-64. OK for trunk? 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.md (lwp_slwpcb): Check Pmode instead of TARGET_64BIT. OK. Thanks, Uros.
Re: PATCH: Use Pmode on x86_64 this parameter
On Sun, Mar 11, 2012 at 2:11 AM, H.J. Lu hongjiu...@intel.com wrote: This patch replaces DImode with Pmode on x86_64 this parameter. OK for trunk? 2012-03-10 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (x86_this_parameter): Replace DImode with Pmode. OK. Thanks, Uros.
Re: [Patch ARM/ configury] Add fall-back check for gnu_unique_object
On 10 March 2012 00:39, DJ Delorie d...@redhat.com wrote: Ping - http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00549.html And now really add Paolo and DJ. + [.type foo, '$target_type_format_char'gnu_unique_object],, This un-quoting looks incorrect if you don't know what's going on under the hood, but I don't see a clean way around it. A suitable comment would be appropriate. Thanks for the quick review - I thought however it was kind of a standard workaround for this issue having seen this elsewhere in the same file - given this is used in multiple places/ +target_type_format_char=@ + target_type_format_char=% Since the string always has special characters, it's likely that single quotes are more appropriate here. The two characters in the patch don't care, but some future porter might naively do $ and wonder why (or worse, not wonder why) it doesn't work right. Fair point - done. Other than that it looks OK to me, assuming you tested it on all the relevent targets (i.e. arm and not-arm). I tested x86_64-linux-gnu with a bootstrap and that showed identical auto-host.h to the previous run and thus that appeared to be fine. (This is a target that uses the default '@') .. On ARM I've done a full bootstrap and auto-host.h shows the appropriate macro defined ( and it does with this version of the patch as well). Are there any other targets you'd suggest ? Assuming all tests pass is this version better ? cheers Ramana 2012-03-11 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * config.gcc (target_type_format_char): New. Document it. Set it for arm*-*-* . * configure.ac (gnu_unique_option): Use target_type_format_char in test. Comment rationale. * configure: Regenerate . diff --git a/gcc/config.gcc b/gcc/config.gcc index 99f0b47..a769d0c 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -184,6 +184,11 @@ # the --with-sysroot configure option or the # --sysroot command line option is used this # will be relative to the sysroot. +# target_type_format_char +# The default character to be used for formatting +# the attribute in a +# .type symbol_name, ${t_t_f_c}property +# directive. # The following variables are used in each case-construct to build up the # outgoing variables: @@ -235,6 +240,7 @@ target_gtfiles= need_64bit_hwint= need_64bit_isa= native_system_header_dir=/usr/include +target_type_format_char='@' # Don't carry these over build-host-target. Please. xm_file= @@ -321,6 +327,7 @@ am33_2.0-*-linux*) arm*-*-*) cpu_type=arm extra_headers=mmintrin.h arm_neon.h + target_type_format_char='%' c_target_objs=arm-c.o cxx_target_objs=arm-c.o extra_options=${extra_options} arm/arm-tables.opt diff --git a/gcc/configure.ac b/gcc/configure.ac index 39302ad..4a534a1 100644 --- a/gcc/configure.ac +++ b/gcc/configure.ac @@ -4188,7 +4188,9 @@ Valid choices are 'yes' and 'no'.]) ;; esac], [gcc_GAS_CHECK_FEATURE([gnu_unique_object], gcc_cv_as_gnu_unique_object, [elf,2,19,52],, - [.type foo, @gnu_unique_object],, +#We have to unquote here to reuse the variable from +#config.gcc. + [.type foo, '$target_type_format_char'gnu_unique_object],, # Also check for ld.so support, i.e. glibc 2.11 or higher. [[if test x$host = x$build -a x$host = x$target ldd --version 2/dev/null
Re: [SH] PR 51244 - Improve conditional branches
On Thu, 2012-03-08 at 09:31 +0100, Oleg Endo wrote: This is the patch for the patch, as attached in the PR. Tested against rev 184966 as before and no changes in the test results for me (i.e. no new failures). This one had a bug, as discussed in the PR. I've tested the attached latest version of the patch (same as in the PR) against rev 185160 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a-single/-mb, -m4-single/-ml,-m4-single/-mb, -m4a-single/-ml,-m4a-single/-mb} once more and confirmed that there are no new failures. The new failure '21_strings/basic_string/cons/char/6.cc' mentioned in the PR is failing due to the test program allocating too much memory for the simulator. It aborts with a 'heap and stack collision'. Chaning the number of test iterations in the test case from '13' to '12' makes it pass again. OK to commit the patch? Cheers, Oleg ChangeLog: PR target/51244 * config/sh/sh.md (movnegt): Expand into respective insns immediately. Use movrt_negc instead of negc pattern for non-SH2A. (*movnegt): Remove. (*movrt_negc, *negnegt, *movtt, *movt_qi): New insns and splits. testsuite/ChangeLog: PR target/51244 * gcc.target/sh/pr51244-1.c: Fix thinkos. Index: gcc/testsuite/gcc.target/sh/pr51244-1.c === --- gcc/testsuite/gcc.target/sh/pr51244-1.c (revision 184966) +++ gcc/testsuite/gcc.target/sh/pr51244-1.c (working copy) @@ -13,20 +13,20 @@ } int -testfunc_01 (int a, char* p, int b, int c) +testfunc_01 (int a, int b, int c, int d) { - return (a == b a == c) ? b : c; + return (a == b || a == d) ? b : c; } int -testfunc_02 (int a, char* p, int b, int c) +testfunc_02 (int a, int b, int c, int d) { - return (a == b a == c) ? b : c; + return (a == b a == d) ? b : c; } int -testfunc_03 (int a, char* p, int b, int c) +testfunc_03 (int a, int b, int c, int d) { - return (a != b a != c) ? b : c; + return (a != b a != d) ? b : c; } Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 184966) +++ gcc/config/sh/sh.md (working copy) @@ -9679,39 +9679,90 @@ ;; If the constant -1 can be CSE-ed or lifted out of a loop it effectively ;; becomes a one instruction operation. Moreover, care must be taken that ;; the insn can still be combined with inverted compare and branch code -;; around it. -;; The expander will reserve the constant -1, the insn makes the whole thing -;; combinable, the splitter finally emits the insn if it was not combined -;; away. -;; Notice that when using the negc variant the T bit also gets inverted. +;; around it. On the other hand, if a function returns the complement of +;; a previous comparison result in the T bit, the xor #1,r0 approach might +;; lead to better code. (define_expand movnegt - [(set (match_dup 1) (const_int -1)) - (parallel [(set (match_operand:SI 0 arith_reg_dest ) - (xor:SI (reg:SI T_REG) (const_int 1))) - (use (match_dup 1))])] + [(set (match_operand:SI 0 arith_reg_dest ) + (xor:SI (reg:SI T_REG) (const_int 1)))] { - operands[1] = gen_reg_rtx (SImode); + if (TARGET_SH2A) +emit_insn (gen_movrt (operands[0])); + else +{ + rtx val = force_reg (SImode, gen_int_mode (-1, SImode)); + emit_insn (gen_movrt_negc (operands[0], val)); +} + DONE; }) -(define_insn_and_split *movnegt +(define_insn movrt_negc [(set (match_operand:SI 0 arith_reg_dest =r) (xor:SI (reg:SI T_REG) (const_int 1))) + (set (reg:SI T_REG) (const_int 1)) (use (match_operand:SI 1 arith_reg_operand r))] TARGET_SH1 + negc %1,%0 + [(set_attr type arith)]) + +;; The *negnegt patterns help the combine pass to figure out how to fold +;; an explicit double T bit negation. +(define_insn_and_split *negnegt + [(set (reg:SI T_REG) + (eq:SI (subreg:QI (xor:SI (reg:SI T_REG) (const_int 1)) 3) +(const_int 0)))] + ! TARGET_LITTLE_ENDIAN # - 1 - [(const_int 0)] -{ - if (TARGET_SH2A) -emit_insn (gen_movrt (operands[0])); - else -emit_insn (gen_negc (operands[0], operands[1])); - DONE; -} + + [(const_int 0)]) + +(define_insn_and_split *negnegt + [(set (reg:SI T_REG) + (eq:SI (subreg:QI (xor:SI (reg:SI T_REG) (const_int 1)) 0) +(const_int 0)))] + TARGET_LITTLE_ENDIAN + # + + [(const_int 0)]) + +;; The *movtt patterns improve code at -O1. +(define_insn_and_split *movtt + [(set (reg:SI T_REG) + (eq:SI (zero_extend:SI (subreg:QI (reg:SI T_REG) 3)) +(const_int 1)))] + ! TARGET_LITTLE_ENDIAN + # + + [(const_int 0)]) + +(define_insn_and_split *movtt + [(set (reg:SI T_REG) + (eq:SI (zero_extend:SI (subreg:QI (reg:SI T_REG) 0)) +(const_int 1)))] + TARGET_LITTLE_ENDIAN + # + + [(const_int 0)]) + +;; The *movt_qi patterns help the combine pass convert a movrt_negc pattern +;; into a movt Rn, xor #1 Rn pattern. This can happen
Re: [SH] PR 51244 - Improve conditional branches
Oleg Endo oleg.e...@t-online.de wrote: This one had a bug, as discussed in the PR. I've tested the attached latest version of the patch (same as in the PR) against rev 185160 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a-single/-mb, -m4-single/-ml,-m4-single/-mb, -m4a-single/-ml,-m4a-single/-mb} once more and confirmed that there are no new failures. The new failure '21_strings/basic_string/cons/char/6.cc' mentioned in the PR is failing due to the test program allocating too much memory for the simulator. It aborts with a 'heap and stack collision'. Chaning the number of test iterations in the test case from '13' to '12' makes it pass again. OK to commit the patch? OK. Regards, kaz
Re: 4.4 branch frozen
On Tue, 6 Mar 2012, Jakub Jelinek wrote: The 4.4 branch is now frozen, all commits require RM approval. There will be the 4.4.7 release next week released from it and after that the branch will be closed. Cool. At that point I suggest removing GCC 4.4 from the Release Series and Status; let me know, and I'll be happy to help with any web page updates. Gerald
Re: [PATCH] [SH] Fix target/48596
On Tue, 2012-03-06 at 08:24 +0900, Kaz Kojima wrote: Oleg Endo oleg.e...@t-online.de wrote: I'd like to add the test case from the PR to the testsuite. Tested with make check-gcc RUNTESTFLAGS=sh.exp=pr48596.c --target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a-single/-mb, -m4-single/-ml,-m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb} OK? A gcc.c-torture/compile test is better, isn't it? I just noticed that I've accidentally added the pr48596.c to gcc.target/sh in another commit. I'm sorry about that. The attached patch moves it as suggested to gcc.c-torture/compile. Briefly tested by running the gcc.c-torture/compile set on sh-him -m4a-single -ml. testsuite/ChangeLog: PR target/48596 * gcc.target/sh/pr48596.c: Move accidentally added new test case to ... * gcc.c-torture/compile/pr48596.c: ... here. Index: gcc/testsuite/gcc.target/sh/pr48596.c === --- gcc/testsuite/gcc.target/sh/pr48596.c (revision 185191) +++ gcc/testsuite/gcc.target/sh/pr48596.c (working copy) @@ -1,31 +0,0 @@ -/* Check that the following code compiles without errors. */ -/* { dg-do compile { target sh*-*-* } } */ -/* { dg-options -O1 } */ - -enum { nrrdCenterUnknown, nrrdCenterNode, nrrdCenterCell, nrrdCenterLast }; -typedef struct { int size; int center; } NrrdAxis; -typedef struct { int dim; NrrdAxis axis[10]; } Nrrd; -typedef struct { } NrrdKernel; -typedef struct { const NrrdKernel *kernel[10]; int samples[10]; } Info; - -void -foo (Nrrd *nout, Nrrd *nin, const NrrdKernel *kernel, const double *parm, - const int *samples, const double *scalings) -{ - Info *info; - int d, p, np, center; - for (d=0; dnin-dim; d++) -{ - info-kernel[d] = kernel; - if (samples) - info-samples[d] = samples[d]; - else - { - center = _nrrdCenter(nin-axis[d].center); - if (nrrdCenterCell == center) - info-samples[d] = nin-axis[d].size*scalings[d]; - else - info-samples[d] = (nin-axis[d].size - 1)*scalings[d] + 1; - } -} -} Index: gcc/testsuite/gcc.c-torture/compile/pr48596.c === --- gcc/testsuite/gcc.c-torture/compile/pr48596.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr48596.c (revision 0) @@ -0,0 +1,31 @@ +/* PR target/48596 */ +/* { dg-do compile } */ +/* { dg-options -O1 } */ + +enum { nrrdCenterUnknown, nrrdCenterNode, nrrdCenterCell, nrrdCenterLast }; +typedef struct { int size; int center; } NrrdAxis; +typedef struct { int dim; NrrdAxis axis[10]; } Nrrd; +typedef struct { } NrrdKernel; +typedef struct { const NrrdKernel *kernel[10]; int samples[10]; } Info; + +void +foo (Nrrd *nout, Nrrd *nin, const NrrdKernel *kernel, const double *parm, + const int *samples, const double *scalings) +{ + Info *info; + int d, p, np, center; + for (d=0; dnin-dim; d++) +{ + info-kernel[d] = kernel; + if (samples) + info-samples[d] = samples[d]; + else + { + center = _nrrdCenter(nin-axis[d].center); + if (nrrdCenterCell == center) + info-samples[d] = nin-axis[d].size*scalings[d]; + else + info-samples[d] = (nin-axis[d].size - 1)*scalings[d] + 1; + } +} +}
Re: PATCH: Properly check mode for x86 call/jmp address
On Sat, Mar 10, 2012 at 5:05 PM, H.J. Lu hjl.to...@gmail.com wrote: (define_insn *call - [(call (mem:QI (match_operand:P 0 call_insn_operand czw)) + [(call (mem:QI (match_operand:C 0 call_insn_operand czw)) (match_operand 1 ))] - !SIBLING_CALL_P (insn) + !SIBLING_CALL_P (insn) + (GET_CODE (operands[0]) == SYMBOL_REF + || GET_MODE (operands[0]) == word_mode) There are enough copies of this extra constraint that I wonder if it simply ought to be folded into call_insn_operand. Which would need to be changed to define_special_predicate, since you'd be doing your own mode checking. Probably similar changes to sibcall_insn_operand. Here is the updated patch. I changed constant_call_address_operand and call_register_no_elim_operand to use define_special_predicate. OK for trunk? Please do not complicate matters that much. Just stick word_mode overrides for register operands in predicates.md, like in attached patch. These changed predicates now allow registers only in word_mode (and VOIDmode). You can now remove all new mode iterators and leave call patterns untouched. @@ -22940,14 +22940,18 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1, GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF !local_symbolic_operand (XEXP (fnaddr, 0), VOIDmode)) fnaddr = gen_rtx_MEM (QImode, construct_plt_address (XEXP (fnaddr, 0))); - else if (sibcall - ? !sibcall_insn_operand (XEXP (fnaddr, 0), Pmode) - : !call_insn_operand (XEXP (fnaddr, 0), Pmode)) + else if (!(constant_call_address_operand (XEXP (fnaddr, 0), Pmode) + || call_register_no_elim_operand (XEXP (fnaddr, 0), + word_mode) + || (!sibcall + !TARGET_X32 + memory_operand (XEXP (fnaddr, 0), word_mode { fnaddr = XEXP (fnaddr, 0); - if (GET_MODE (fnaddr) != Pmode) - fnaddr = convert_to_mode (Pmode, fnaddr, 1); - fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (Pmode, fnaddr)); + if (GET_MODE (fnaddr) != word_mode) + fnaddr = convert_to_mode (word_mode, fnaddr, 1); + fnaddr = gen_rtx_MEM (QImode, + copy_to_mode_reg (word_mode, fnaddr)); } vec_len = 0; Please update the above part. It looks you don't even have to change condition with new predicates. Basically, you should only convert the address to word_mode instead of Pmode. + if (TARGET_X32) + operands[0] = convert_memory_address (word_mode, operands[0]); This addition to indirect_jump and tablejump should be the only change, needed in i386.md now. Please write the condition if (Pmode != word_mode) for consistency. BTW: The attached patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Uros. It doesn't work: x.i:7:1: error: unrecognizable insn: (call_insn/j 8 7 9 3 (call (mem:QI (reg:DI 62) [0 *foo.0_1 S1 A8]) (const_int 0 [0])) x.i:6 -1 (nil) (nil)) x.i:7:1: internal compiler error: in extract_insn, at recog.c:2123 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. make: *** [x.s] Error 1 I will investigate it. For reference, attached is the complete patch that uses define_special_predicate. This patch works OK with the current mainline, with additional patch to i386.h, where Index: i386.h === --- i386.h (revision 185079) +++ i386.h (working copy) @@ -1744,7 +1744,7 @@ /* Specify the machine mode that pointers have. After generation of rtl, the compiler makes no further distinction between pointers and any other objects of this machine mode. */ -#define Pmode (TARGET_64BIT ? DImode : SImode) +#define Pmode (TARGET_LP64 ? DImode : SImode) /* A C expression whose value is zero if pointers that need to be extended from being `POINTER_SIZE' bits wide to `Pmode' are sign-extended and I tested this patch and it passed all my x32 tests. Committed to mainline with following ChangeLog: 2012-03-11 H.J. Lu hongjiu...@intel.com Uros Bizjak ubiz...@gmail.com * config/i386/predicates.md (call_insn_operand): Allow constant_call_address_operand in Pmode only. (sibcall_insn_operand): Ditto. * config/i386/i386.md (*call): Use W mode iterator instead of P mode. (*call_vzeroupper): Ditto. (*sibcall): Ditto. (*sibcall_vzeroupper): Ditto. (*call_value): Ditto. (*call_value_vzeroupper): Ditto. (*sibcall_value): Ditto. (*sibcall_value_vzeroupper): Ditto. (*indirect_jump): Ditto. (*tablejump_1): Ditto. (indirect_jump): Convert memory address to word mode for TARGET_X32. (tablejump): Ditto. * config/i386/i386.c (ix86_expand_call): Convert indirect operands
[patch, RFA] -no-integrated-cpp documentation
While I've been cleaning up invoke.texi I noticed that the blurb about -no-integrated-cpp needed some copy-editing and markup changes. Then I noticed that the description didn't make a whole lot of sense, and that it talked about what might happen in the hypothetical case that cc1/cc1plus/cc1obj are merged, which I think only further confused things. And, I further noticed that this option was documented with the C Dialect Options instead of the Preprocessor Options, which is where users might be most likely to look for it. I dug up the original discussion that led to this option being added back in 2003 -- it's here: http://gcc.gnu.org/ml/gcc/2002-12/msg01163.html Based on that and reading the code, I've tried to rewrite the documentation so it makes more sense. Did I get this right? If I'm understanding the intended purpose of this option correctly, it sounds like a really convoluted hack and maybe not what the manual ought to recommend. (If you really want to do stuff with the preprocessed code before compiling it, why not just write a makefile rule or a shell script to use as your $(CC)?) But, I think we have a gazillion other useless options too, and it's probably more trouble to remove than it's worth Anyway, I'd appreciate another pair of eyes looking at this, and suggestions on what better to do here if this rewrite isn't adequate. -Sandra 2012-03-11 Sandra Loosemore san...@codesourcery.com gcc/ * doc/invoke.texi (Option Summary): Move -no-integrated-cpp from C Language Options to Preprocessor Options. (C Dialect Options): Move -no-integrated-cpp documentation from here... (Preprocessor Options): ...to here. Rewrite the description so it makes more sense, and remove discussion of merging front ends. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 185168) +++ gcc/doc/invoke.texi (working copy) @@ -174,7 +174,7 @@ in the following sections. -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol -fno-asm -fno-builtin -fno-builtin-@var{function} @gol -fhosted -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol --trigraphs -no-integrated-cpp -traditional -traditional-cpp @gol +-trigraphs -traditional -traditional-cpp @gol -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol -fsigned-bitfields -fsigned-char @gol -funsigned-bitfields -funsigned-char} @@ -433,7 +433,7 @@ Objective-C and Objective-C++ Dialects}. -M -MM -MF -MG -MP -MQ -MT -nostdinc @gol -P -fdebug-cpp -ftrack-macro-expansion -fworking-directory @gol -remap -trigraphs -undef -U@var{macro} @gol --Wp,@var{option} -Xpreprocessor @var{option}} +-Wp,@var{option} -Xpreprocessor @var{option} -no-integrated-cpp} @item Assembler Option @xref{Assembler Options,,Passing Options to the Assembler}. @@ -1794,17 +1794,6 @@ supported for C, not C++. Support ISO C trigraphs. The @option{-ansi} option (and @option{-std} options for strict ISO C conformance) implies @option{-trigraphs}. -@item -no-integrated-cpp -@opindex no-integrated-cpp -Performs a compilation in two passes: preprocessing and compiling. This -option allows a user supplied cc1, cc1plus, or cc1obj via the -@option{-B} option. The user supplied compilation step can then add in -an additional preprocessing step after normal preprocessing but before -compiling. The default is to use the integrated cpp (internal cpp) - -The semantics of this option will change if cc1, cc1plus, and -cc1obj are merged. - @cindex traditional C language @cindex C language, traditional @item -traditional @@ -9300,6 +9289,21 @@ recognize. If you want to pass an option that takes an argument, you must use @option{-Xpreprocessor} twice, once for the option and once for the argument. + +@item -no-integrated-cpp +@opindex no-integrated-cpp +Perform preprocessing as a separate pass before compilation. +By default, GCC performs preprocessing as an integrated part of +input tokenization and parsing. +If this option is provided, the appropriate language front end +(@command{cc1}, @command{cc1plus}, or @command{cc1obj} for C, C++, +and Objective-C, respectively) is instead invoked twice, +once for preprocessing only and once for actual compilation +of the preprocessed input. +This option may be useful in conjunction with the @option{-B} or +@option{-wrapper} options to specify an alternate preprocessor or +perform additional processing of the program source between +normal preprocessing and compilation. @end table @include cppopts.texi
Re: [PATCH 07/10] addr32: Use word_mode instead of Pmode in loop expand
On Sun, Mar 11, 2012 at 2:06 AM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Mar 8, 2012 at 3:22 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Mar 2, 2012 at 10:02 PM, H.J. Lu hongjiu...@intel.com wrote: This patches uses word_mode instead of Pmode in loop expand since word_mode may have bigger size than Pmode. OK for trunk? Thanks. H.J. --- 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_expand_movmem): Use word_mode instead of Pmode on loop. (ix86_expand_setmem): Likwise. Jan, can you please comment on the changes in this patch? Here is a complete updated patch to use word_mode in ix86_expand_movmem and ix86_expand_setmem. It also fixes ix86_zero_extend_to_Pmode to handle Pmode != DImode. OK for trunk? Please rewrite ix86_zero_extend_to_Pmode to something like: rtx tmp; if (GET_MODE (exp) != Pmode) tmp = convert_to_mode (Pmode, exp, 1); return force_reg (Pmode, tmp)); Uros.
Re: [patch, RFA] -no-integrated-cpp documentation
On Sun, 11 Mar 2012, Sandra Loosemore wrote: Anyway, I'd appreciate another pair of eyes looking at this, and suggestions on what better to do here if this rewrite isn't adequate. Looks good to me, but better wait for Joseph's take. Gerald 2012-03-11 Sandra Loosemore san...@codesourcery.com gcc/ * doc/invoke.texi (Option Summary): Move -no-integrated-cpp from C Language Options to Preprocessor Options. (C Dialect Options): Move -no-integrated-cpp documentation from here... (Preprocessor Options): ...to here. Rewrite the description so it makes more sense, and remove discussion of merging front ends.Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 185168) +++ gcc/doc/invoke.texi (working copy) @@ -174,7 +174,7 @@ in the following sections. -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol -fno-asm -fno-builtin -fno-builtin-@var{function} @gol -fhosted -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol --trigraphs -no-integrated-cpp -traditional -traditional-cpp @gol +-trigraphs -traditional -traditional-cpp @gol -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol -fsigned-bitfields -fsigned-char @gol -funsigned-bitfields -funsigned-char} @@ -433,7 +433,7 @@ Objective-C and Objective-C++ Dialects}. -M -MM -MF -MG -MP -MQ -MT -nostdinc @gol -P -fdebug-cpp -ftrack-macro-expansion -fworking-directory @gol -remap -trigraphs -undef -U@var{macro} @gol --Wp,@var{option} -Xpreprocessor @var{option}} +-Wp,@var{option} -Xpreprocessor @var{option} -no-integrated-cpp} @item Assembler Option @xref{Assembler Options,,Passing Options to the Assembler}. @@ -1794,17 +1794,6 @@ supported for C, not C++. Support ISO C trigraphs. The @option{-ansi} option (and @option{-std} options for strict ISO C conformance) implies @option{-trigraphs}. -@item -no-integrated-cpp -@opindex no-integrated-cpp -Performs a compilation in two passes: preprocessing and compiling. This -option allows a user supplied cc1, cc1plus, or cc1obj via the -@option{-B} option. The user supplied compilation step can then add in -an additional preprocessing step after normal preprocessing but before -compiling. The default is to use the integrated cpp (internal cpp) - -The semantics of this option will change if cc1, cc1plus, and -cc1obj are merged. - @cindex traditional C language @cindex C language, traditional @item -traditional @@ -9300,6 +9289,21 @@ recognize. If you want to pass an option that takes an argument, you must use @option{-Xpreprocessor} twice, once for the option and once for the argument. + +@item -no-integrated-cpp +@opindex no-integrated-cpp +Perform preprocessing as a separate pass before compilation. +By default, GCC performs preprocessing as an integrated part of +input tokenization and parsing. +If this option is provided, the appropriate language front end +(@command{cc1}, @command{cc1plus}, or @command{cc1obj} for C, C++, +and Objective-C, respectively) is instead invoked twice, +once for preprocessing only and once for actual compilation +of the preprocessed input. +This option may be useful in conjunction with the @option{-B} or +@option{-wrapper} options to specify an alternate preprocessor or +perform additional processing of the program source between +normal preprocessing and compilation. @end table @include cppopts.texi
Re: PATCH: Check ptr_mode and use Pmode in ix86_trampoline_init
On Sun, Mar 11, 2012 at 2:18 AM, H.J. Lu hongjiu...@intel.com wrote: Hi, x86 trampoline depends on ptr_mode. This patch checks ptr_mode, instead of TARGET_X32. Also we should use Pmode for address mode. Tested on Linux/x86-64. OK for trunk? Why we are looking at ptr_mode here? Uros.
New Swedish PO file for 'gcc' (version 4.7-b20120128)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Swedish team of translators. The file is available at: http://translationproject.org/latest/gcc/sv.po (This file, 'gcc-4.7-b20120128.sv.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: http://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: http://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator. coordina...@translationproject.org
Re: [PATCH 02/10] addr32: Only handle zero-extended DImode addresses
On Fri, Mar 9, 2012 at 10:15 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Mar 9, 2012 at 4:26 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Mar 8, 2012 at 7:20 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sun, Mar 4, 2012 at 9:13 PM, Uros Bizjak ubiz...@gmail.com wrote: We only need to handle zero-extended addresses in DImode. OK for trunk? 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_print_operand_address): Only handle zero-extended DImode addresses. OK. The patch was reverted due to PR target/52530. Revert breaks Pmode == SImode for x32. Here is a different patch. It checks Pmode == DImode instead of TARGET_64BIT. Tested on Linux/x32. OK for trunk? This will still emit i.e. leal 1(%rSImode), %rSImode on Pmode == SImode targets, so you win nothing really. Attached patch finally decouples LEA operand handling from generic address handling, and by introducing %E operand modifier, we are able to always emit DImode registers for LEAs (which is good anyway to avoid unnecessary addr32 prefixes). Luckily, the leal 1(%rSImode), %rSImode triggered some unknown problem with Sun assembler, so we were able to detect the problem. I would like to point out that the patched compiler now also emits address registers in their natural mode (modulo zero-extended RTXes) and fixes following failure on Pmode == SImode targets: --cut here-- struct foo { int *f; int i; }; void __attribute__ ((noinline)) bar (struct foo x) { *(x.f) = 1; } --cut here-- For Pmode == SImode, the compiler emitted (%rdi) address, which was wrong, since i was passed in the high part of (%rdi) register. 2012-03-09 Uros Bizjak ubiz...@gmail.com PR target/52530 * config/i386/i386.c (ix86_print_operand): Handle 'E' operand modifier. (ix86_print_operand_address): Handle UNSPEC_LEA_ADDR. Do not fallback to set code to 'q'. * config/i386/i386.md (UNSPEC_LEA_ADDR): New unspec. (*movdi_internal_rex64): Use %E operand modifier for lea. (*movsi_internal): Ditto. (*lea_1): Ditto. (*leamode_2): Ditto. (*lea_{3,4,5,6}_zext): Ditto. (*tls_global_dynamic_32_gnu): Ditto. (*tls_global_dynamic_64): Ditto. (*tls_dynamic_gnu2_lea_32): Ditto. (*tls_dynamic_gnu2_lea_64): Ditto. (pro_epilogue_adjust_stack_mode_add): Ditto. Patch was tested on x86_64-pc-linux-gnu {,-m32}. I have also eyeballed x32 code (Pmode == SImode) and found no problems. Committed to mainline SVN. H.J., can you please construct a runtime test from the above example code? Uros. It passed all my x32 tests. Thanks. -- H.J.
Re: [PATCH 07/10] addr32: Use word_mode instead of Pmode in loop expand
On Sun, Mar 11, 2012 at 3:30 PM, Uros Bizjak ubiz...@gmail.com wrote: This patches uses word_mode instead of Pmode in loop expand since word_mode may have bigger size than Pmode. OK for trunk? Thanks. H.J. --- 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_expand_movmem): Use word_mode instead of Pmode on loop. (ix86_expand_setmem): Likwise. Jan, can you please comment on the changes in this patch? Here is a complete updated patch to use word_mode in ix86_expand_movmem and ix86_expand_setmem. It also fixes ix86_zero_extend_to_Pmode to handle Pmode != DImode. OK for trunk? Please rewrite ix86_zero_extend_to_Pmode to something like: rtx tmp; if (GET_MODE (exp) != Pmode) tmp = convert_to_mode (Pmode, exp, 1); return force_reg (Pmode, tmp)); I am testing attached patch: 2012-03-11 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (ix86_zero_extend_to_Pmode): Rewrite using convert_to_mode. (ix86_expand_call): Use force_reg istead of copy_to_mode_reg. Uros. Uros. Index: i386.c === --- i386.c (revision 185193) +++ i386.c (working copy) @@ -21025,14 +21025,9 @@ ix86_adjust_counter (rtx countreg, HOST_WIDE_INT v rtx ix86_zero_extend_to_Pmode (rtx exp) { - rtx r; - if (GET_MODE (exp) == VOIDmode) -return force_reg (Pmode, exp); - if (GET_MODE (exp) == Pmode) -return copy_to_mode_reg (Pmode, exp); - r = gen_reg_rtx (Pmode); - emit_insn (gen_zero_extendsidi2 (r, exp)); - return r; + if (GET_MODE (exp) != Pmode) +exp = convert_to_mode (Pmode, exp, 1); + return force_reg (Pmode, exp); } /* Divide COUNTREG by SCALE. */ @@ -22996,7 +22991,7 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call fnaddr = XEXP (fnaddr, 0); if (GET_MODE (fnaddr) != word_mode) fnaddr = convert_to_mode (word_mode, fnaddr, 1); - fnaddr = gen_rtx_MEM (QImode, copy_to_mode_reg (word_mode, fnaddr)); + fnaddr = gen_rtx_MEM (QImode, force_reg (word_mode, fnaddr)); } vec_len = 0;
Re: PATCH: Check ptr_mode and use Pmode in ix86_trampoline_init
On Sun, Mar 11, 2012 at 4:52 PM, H.J. Lu hjl.to...@gmail.com wrote: x86 trampoline depends on ptr_mode. This patch checks ptr_mode, instead of TARGET_X32. Also we should use Pmode for address mode. Tested on Linux/x86-64. OK for trunk? Why we are looking at ptr_mode here? If ptr_mode is SImode, we can always use movl to reach our target. We don't need to check anything else. Under this assumption, the patch is OK. Thanks, Uros.
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu hjl.to...@gmail.com wrote: On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE-LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset = 3 (offset + 4) = sec-size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset 2 || (offset + 3) sec-size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD-IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD-LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD-LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE-LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is
Re: [Fortran-dev, patch] Use only lbound/extent/sm in the array descriptor
Hi Tobias, with that patch, the array descriptor on the fortran-dev branch uses now the dimension triplet as defined in TS29113. This patch removes ubound/stride and updates all calls. Great! There are still 227 test-suite failures (FAIL lines) affecting 27 test-suite files. That's slightly down from the 269 lines the branch currently has. (Some issues can be fixed by modifying the tree dump patterns, but most seem to be real problems.) Build and regtested on x86-64-linux. OK for the branch? The library parts look OK to me. There is just one point of efficiency. +#define GFC_DESCRIPTOR_STRIDE(desc,i) \ + (GFC_DESCRIPTOR_SM(desc,i) / GFC_DESCRIPTOR_SIZE(desc)) In most generated files, GFC_DESCRIPTOR_SIZE is a constant known at compile-time (and usually a power of two), which means that the division can be done in a simple shift. If we get the size from the descriptor, we actually have to divide, which is expensive. I would commit the patch now, adding TODO: - Fixing the regressions. - Cleanup of the library and the front end - Switching also from ubound - extent for nondescriptor arrays? - Properly implement subpointers - Avoid division for GFC_DESCRIPTOR_STRIDE where possible. Thomas
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE-LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset = 3 (offset + 4) = sec-size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset 2 || (offset + 3) sec-size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD-IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD-LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD-LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE-LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the (reg32) part in fs:(reg32), we can't use it as memory operand. This patch changes ix86_decompose_address to disallow fs:(reg) if Pmode != word_mode. 2. When Pmode == SImode, there may be no REX
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE-LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset = 3 (offset + 4) = sec-size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset 2 || (offset + 3) sec-size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD-IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD-LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD-LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE-LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the (reg32) part in fs:(reg32), we can't use it as memory operand. This patch changes ix86_decompose_address to disallow
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu hjl.to...@gmail.com wrote: * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for x32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. Nice solution! OK for mainline. Done. BTW: Did you investigate the issue with memory aliasing? It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 which loads address of the TLS symbol. Thanks. Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? I think we should just scrap all these complications and go with the idea of clearing MASK_TLS_DIRECT_SEG_REFS. I will give it a try. You can also revert: * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. then, since this part is handled later in the function. Uros.
Re: [PATCH 07/10] addr32: Use word_mode instead of Pmode in loop expand
On Sun, Mar 11, 2012 at 5:56 PM, H.J. Lu hjl.to...@gmail.com wrote: This patches uses word_mode instead of Pmode in loop expand since word_mode may have bigger size than Pmode. OK for trunk? Thanks. H.J. --- 2012-03-02 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_expand_movmem): Use word_mode instead of Pmode on loop. (ix86_expand_setmem): Likwise. Jan, can you please comment on the changes in this patch? Here is a complete updated patch to use word_mode in ix86_expand_movmem and ix86_expand_setmem. It also fixes ix86_zero_extend_to_Pmode to handle Pmode != DImode. OK for trunk? Please rewrite ix86_zero_extend_to_Pmode to something like: rtx tmp; if (GET_MODE (exp) != Pmode) tmp = convert_to_mode (Pmode, exp, 1); return force_reg (Pmode, tmp)); I am testing attached patch: 2012-03-11 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (ix86_zero_extend_to_Pmode): Rewrite using convert_to_mode. (ix86_expand_call): Use force_reg istead of copy_to_mode_reg. It passed all tests in GCC testsuite under Linux/x32 and glibc x32 tests. I have committed the patch without (ix86_expand_call) change. The later change was wrong, since it allowed arg register in the call pattern. Please commit your loop expand patch. Thanks, Uros.
Re: [Fortran-dev, patch] Use only lbound/extent/sm in the array descriptor
Thomas Koenig wrote: There are still 227 test-suite failures (FAIL lines) affecting 27 test-suite files. That's slightly down from the 269 lines the branch currently has. (Some issues can be fixed by modifying the tree dump patterns, but most seem to be real problems.) Build and regtested on x86-64-linux. OK for the branch? The library parts look OK to me. I have now commit it (Rev. 185199) - thanks for looking at the library part. There is just one point of efficiency. +#define GFC_DESCRIPTOR_STRIDE(desc,i) \ + (GFC_DESCRIPTOR_SM(desc,i) / GFC_DESCRIPTOR_SIZE(desc)) In most generated files, GFC_DESCRIPTOR_SIZE is a constant known at compile-time (and usually a power of two), which means that the division can be done in a simple shift. If we get the size from the descriptor, we actually have to divide, which is expensive. I think one should also go through all the files and check whether one can replace _STRIDE by _SM. In some cases, the code actually does this: It calls _STRIDE and later multiplies by the byte size. I tried to avoid _STRIDE at some places, but I only looked at it in the context of setting the descriptor. Similarly, but less critical: One should check whether EXTENT can replace UBOUND. Side note: c_f_pointer0 (which is called for array with shape) is currently broken as dtype is not set before the call - and thus the size is not known, which is required for setting the sm. I think the best would be to replace it by inline code. (That's the reason behind 5 of the 24 failing test-case files.) I would commit the patch now, adding TODO: - Fixing the regressions. - Cleanup of the library and the front end - Switching also from ubound - extent for nondescriptor arrays? - Properly implement subpointers - Avoid division for GFC_DESCRIPTOR_STRIDE where possible. Well, that's what I meant by cleanup: Trying to update the usage such that one avoids ubound/stride when extent/sm are required - and doing other optimizations like that one. Maybe one can also get rid of some of the macros if they are unused. In any case, I would be happy if you could have a look. I think the real fun will start when we have to implement the other parts (esp. lower_bound semantic, elem_len and in particular the type system of TS29113). Tobias
Re: [PATCH 02/10] addr32: Only handle zero-extended DImode addresses
On Fri, Mar 9, 2012 at 6:58 PM, Uros Bizjak ubiz...@gmail.com wrote: I would like to point out that the patched compiler now also emits address registers in their natural mode (modulo zero-extended RTXes) and fixes following failure on Pmode == SImode targets: --cut here-- struct foo { int *f; int i; }; void __attribute__ ((noinline)) bar (struct foo x) { *(x.f) = 1; } --cut here-- For Pmode == SImode, the compiler emitted (%rdi) address, which was wrong, since i was passed in the high part of (%rdi) register. Following patch adds torture test that check for this problem. 2012-03-11 Uros Bizjak ubiz...@gmail.com PR target/52530 * gcc.dg/torture/pr52530.c: New test. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline. Uros. Index: gcc.dg/torture/pr52530.c === --- gcc.dg/torture/pr52530.c(revision 0) +++ gcc.dg/torture/pr52530.c(revision 0) @@ -0,0 +1,30 @@ +/* { dg-do run } */ + +extern void abort (void); + +struct foo +{ + int *f; + int i; +}; + +int baz; + +void __attribute__ ((noinline)) +bar (struct foo x) +{ + *(x.f) = x.i; +} + +int +main () +{ + struct foo x = { baz, 0xdeadbeef }; + + bar (x); + + if (baz != 0xdeadbeef) +abort (); + + return 0; +}
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 11:21 AM, Uros Bizjak ubiz...@gmail.com wrote: On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu hjl.to...@gmail.com wrote: * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for x32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. Nice solution! OK for mainline. Done. BTW: Did you investigate the issue with memory aliasing? It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 which loads address of the TLS symbol. Thanks. Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? I think we should just scrap all these complications and go with the idea of clearing MASK_TLS_DIRECT_SEG_REFS. I will give it a try. You can also revert: * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. then, since this part is handled later in the function. Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS when Pmode != word_mode. We need to keep else if (Pmode == SImode) { /* Always generate movl %fs:0, %reg32 addl xgottpoff(%rip), %reg32 to support linker IE-LE optimization and avoid fs:(%reg32) as memory operand. */ dest = gen_reg_rtx (Pmode); emit_insn (gen_tls_initial_exec_x32 (dest, x)); return dest; } to support linker IE-LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects TLS LE access and fs:(%reg) is only generated by combine. So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable fs:immediate memory operand for TLS LE access, which doesn't have any problems to begin with. I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only fs:(%reg), which is generated by combine. -- H.J. -- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b101922..1ffcc85 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11478,6 +11478,7 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) case UNSPEC: if (XINT (op, 1) == UNSPEC_TP + Pmode == word_mode TARGET_TLS_DIRECT_SEG_REFS seg == SEG_DEFAULT) seg = TARGET_64BIT ? SEG_FS : SEG_GS; @@ -11534,11 +11535,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ - /* Since address override works only on the (reg32) part in fs:(reg32), - we can't use it as memory operand. */ - if (Pmode != word_mode seg == SEG_FS (base || index)) -return 0; - if (index) { if (REG_P (index)) @@ -12706,7 +12702,9 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov +|| Pmode != word_mode +|| !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else @@ -13239,7 +13237,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!TARGET_TLS_DIRECT_SEG_REFS) + if (Pmode != word_mode || !TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0);
Re: PING PATCH: Assert DWARF register size = saved reg size
On Fri, Mar 2, 2012 at 12:42 PM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Nov 11, 2011 at 11:04 AM, H.J. Lu hongjiu...@intel.com wrote: Hi, I am working on 32bit Pmode for x32: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50797 It removes all LEAs, which convert 32bit address to 64bit, and uses 0x67 address prefix instead. I got 5% speed up in SPEC CPU 2K/2006. But assert in _Unwind_SetGRValue: gcc_assert (dwarf_reg_size_table[index] == sizeof (_Unwind_Context_Reg_Val)); failed on return column since init_return_column_size use Pmode, not word_mode. In this case, _Unwind_Context_Reg_Val is 64bit, but return column size is 32bit. This patch changes it to assert DWARF register size = saved reg size. OK for trunk? Thanks. H.J. --- 2011-11-11 H.J. Lu hongjiu...@intel.com * unwind-dw2.c (_Unwind_SetGRValue): Assert DWARF register size = saved reg size. diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 475ad00..db1c757 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -294,7 +294,8 @@ _Unwind_SetGRValue (struct _Unwind_Context *context, int index, { index = DWARF_REG_TO_UNWIND_COLUMN (index); gcc_assert (index (int) sizeof(dwarf_reg_size_table)); - gcc_assert (dwarf_reg_size_table[index] == sizeof (_Unwind_Context_Reg_Val)); + /* Return column size may be smaller than _Unwind_Context_Reg_Va. */ + gcc_assert (dwarf_reg_size_table[index] = sizeof (_Unwind_Context_Reg_Val)); context-by_value[index] = 1; context-reg[index] = _Unwind_Get_Unwind_Context_Reg_Val (val); Now trunk is in stage 1. Jason, is this OK for trunk? Thanks. Ping. -- H.J.
PATCH: Properly set ix86_gen_leave and ix86_gen_monitor
Hi, leave_rex64 works on DImode and sse3_monitor64 works on Pmode. This patch properly sets ix86_gen_leave and ix86_gen_monitor, depending on TARGET_64BIT and Pmode. Tested on Linux/x86-64. OK for trunk? Thanks. H.J. --- 2012-03-11 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (ix86_option_override_internal): Properly set ix86_gen_leave and ix86_gen_monitor. Check Pmode == DImode. * config/i386/sse.md (sse3_monitor64): Renamed to ... (sse3_monitor64_mode): This. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index d673101..f21721f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3748,11 +3748,23 @@ ix86_option_override_internal (bool main_args_p) if (TARGET_64BIT) { ix86_gen_leave = gen_leave_rex64; + if (Pmode == DImode) + ix86_gen_monitor = gen_sse3_monitor64_di; + else + ix86_gen_monitor = gen_sse3_monitor64_si; +} + else +{ + ix86_gen_leave = gen_leave; + ix86_gen_monitor = gen_sse3_monitor; +} + + if (Pmode == DImode) +{ ix86_gen_add3 = gen_adddi3; ix86_gen_sub3 = gen_subdi3; ix86_gen_sub3_carry = gen_subdi3_carry; ix86_gen_one_cmpl2 = gen_one_cmpldi2; - ix86_gen_monitor = gen_sse3_monitor64; ix86_gen_andsp = gen_anddi3; ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_probe_di; ix86_gen_adjust_stack_and_probe = gen_adjust_stack_and_probedi; @@ -3760,12 +3772,10 @@ ix86_option_override_internal (bool main_args_p) } else { - ix86_gen_leave = gen_leave; ix86_gen_add3 = gen_addsi3; ix86_gen_sub3 = gen_subsi3; ix86_gen_sub3_carry = gen_subsi3_carry; ix86_gen_one_cmpl2 = gen_one_cmplsi2; - ix86_gen_monitor = gen_sse3_monitor; ix86_gen_andsp = gen_andsi3; ix86_gen_allocate_stack_worker = gen_allocate_stack_worker_probe_si; ix86_gen_adjust_stack_and_probe = gen_adjust_stack_and_probesi; diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 4afc4b3..f5935f1 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -8147,8 +8147,8 @@ monitor\t%0, %1, %2 [(set_attr length 3)]) -(define_insn sse3_monitor64 - [(unspec_volatile [(match_operand:DI 0 register_operand a) +(define_insn sse3_monitor64_mode + [(unspec_volatile [(match_operand:P 0 register_operand a) (match_operand:SI 1 register_operand c) (match_operand:SI 2 register_operand d)] UNSPECV_MONITOR)]
Re: [PATCH] [SH] Fix target/48596
Oleg Endo oleg.e...@t-online.de wrote: The attached patch moves it as suggested to gcc.c-torture/compile. Briefly tested by running the gcc.c-torture/compile set on sh-him -m4a-single -ml. You forgot to remove two dg-* lines: +/* { dg-do compile } */ +/* { dg-options -O1 } */ unneeded for this gcc.c-torture/compile test. Looks OK with that change. FYI, I've tested it on i686-pc-linux-gnu with no problem. Regards, kaz
Re: PATCH RFA: Update Go frontend on gcc 4.7 branch
Jakub Jelinek ja...@redhat.com writes: FYI, on Fedora 17 I had recent testresults without the patch, so below are just testsuite differences for that (debug/dwarf fails consistently everywhere), on RHEL5/6 I didn't have earlier go testsuite results, so I'm just providing summaries there. The reason debug/dwarf fails everywhere with the patch is simply that there are a couple of binary test files in debug/dwarf, and the patch program did not update them correctly. The debug/dwarf tests should pass now that the correct binary files have been committed to the branch. Ian