Re: [google] ThreadSanitizer instrumentation pass (issue 5303083)
Have not done with reviewing. This is the first batch. David http://codereview.appspot.com/5303083/diff/1/gcc/passes.c File gcc/passes.c (right): http://codereview.appspot.com/5303083/diff/1/gcc/passes.c#newcode1423 gcc/passes.c:1423: NEXT_PASS (pass_tsan); Move this to the same place as asan. Otherwise TARGET_MEM_REF won't be handled. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c File gcc/tree-tsan.c (right): http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode56 gcc/tree-tsan.c:56: The instrumentation module mainintains shadow call stacks s/mainitains/maintains/ http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode60 gcc/tree-tsan.c:60: Instrumentation for shadow stack maintainance is as follows: s/maintainance/maintenance/ http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode94 gcc/tree-tsan.c:94: #define RTL_STACK __tsan_shadow_stack Please change RTL_ prefix to TSAN_. It is confusing to use RTL_ http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode100 gcc/tree-tsan.c:100: enum tsan_ignore_e better to be tsan_ignore_type or tsan_ignore_kind. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode110 gcc/tree-tsan.c:110: enum bb_state_e A new empty line is needed. Same for other comments leading a decl, or function. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode110 gcc/tree-tsan.c:110: enum bb_state_e bb_state_e --bb_state http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode119 gcc/tree-tsan.c:119: struct bb_data_t _t suffix is better removed. Same for other types with _t suffix. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode161 gcc/tree-tsan.c:161: tree __attribute__((weak)) Explain this. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode169 gcc/tree-tsan.c:169: extern __thread void **__tsan_shadow_stack; */ Need two white space before */. Same for other instances. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode182 gcc/tree-tsan.c:182: Better use varpool_get_node interface. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode186 gcc/tree-tsan.c:186: TREE_STATIC (def) = 1; Why mark TREE_STATIC (def) = 1? Should the variable be defined in tsan library? http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode189 gcc/tree-tsan.c:189: DECL_TLS_MODEL (def) = decl_default_tls_model (def); Check if targetm.have_tls -- though for those target, tsan won't be used. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode200 gcc/tree-tsan.c:200: { Refactor the code so that it can be shared with the above one. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode228 gcc/tree-tsan.c:228: { The name of the function is very confusing. Change it to get_tsan_mop_handler_decl or something like that. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode251 gcc/tree-tsan.c:251: /* Adds new ignore definition to the global list */ Add documentation on function parameters (in upper case) such as TYPE is the ignore type, and NAME is the name of the function to be ignored. If there is return value, document it too. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode257 gcc/tree-tsan.c:257: desc = (struct tsan_ignore_desc_t*)xmalloc (sizeof (*desc)); Use XCNEW to clear. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode264 gcc/tree-tsan.c:264: /* Checks as to whether identifier 'str' matches template 'templ'. Use STR instead of 'str'. 'templ' -- TEMPL. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode291 gcc/tree-tsan.c:291: if (spos == NULL) Move the check up right after spos is computed. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode349 gcc/tree-tsan.c:349: printf (failed to open ignore file '%s'\n, flag_tsan_ignore); Use error (..) http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode360 gcc/tree-tsan.c:360: if (line [sz-1] == '\r' || line [sz-1] == '\n') sz-1 -- sz - 1 Change other instances http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode391 gcc/tree-tsan.c:391: src_name = expand_location(cfun-function_start_locus).file; space before ( http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode413 gcc/tree-tsan.c:413: static const char * Missing documentation. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode443 gcc/tree-tsan.c:443: tree rtl_stack; Do not use rtl_ prefix. Same for other instances. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode459 gcc/tree-tsan.c:459: s = NULL; MODIFY_EXPR? directly use gimple_build_assign. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode725 gcc/tree-tsan.c:725: This is wrong. SSA_NAME expr should be skipped. http://codereview.appspot.com/5303083/diff/1/gcc/tree-tsan.c#newcode730 gcc/tree-tsan.c:730: { remove {} Same for
[PATCH] Slight improvements to vec_init code gen on sparc.
There is definitely more than can be done in this area, but at least this is a start. Next we can start trying to use the ASI_FL{8,16,32}_P short floating point loads which zero extend a 8, 16, or 32 bit integer value into a double precision float register. gcc/ * config/sparc/sparc.c (vector_init_bshuffle): New function. (vector_init_fpmerge): New function. (sparc_expand_vector_init): Use them to improve non-const cases. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180696 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog|4 ++ gcc/config/sparc/sparc.c | 105 ++ 2 files changed, 109 insertions(+), 0 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 037138a..a851ba1 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,9 @@ 2011-10-30 David S. Miller da...@davemloft.net + * config/sparc/sparc.c (vector_init_bshuffle): New function. + (vector_init_fpmerge): New function. + (sparc_expand_vector_init): Use them to improve non-const cases. + * dwarf2out.c (dwarf2out_var_location): When processing several consecutive location notes, cache the result of next_real_insn(). diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 3883dbd..fd1b190 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -11279,6 +11279,67 @@ output_v8plus_mult (rtx insn, rtx *operands, const char *name) } } +static void +vector_init_bshuffle (rtx target, rtx elt, enum machine_mode mode, + enum machine_mode inner_mode) +{ + rtx t1, final_insn; + int bmask; + + t1 = gen_reg_rtx (mode); + + elt = convert_modes (SImode, inner_mode, elt, true); + emit_move_insn (gen_lowpart(SImode, t1), elt); + + switch (mode) + { + case V2SImode: + final_insn = gen_bshufflev2si_vis (target, t1, t1); + bmask = 0x45674567; + break; + case V4HImode: + final_insn = gen_bshufflev4hi_vis (target, t1, t1); + bmask = 0x67676767; + break; + case V8QImode: + final_insn = gen_bshufflev8qi_vis (target, t1, t1); + bmask = 0x; + break; + default: + gcc_unreachable (); + } + + emit_insn (gen_bmasksi_vis (gen_reg_rtx (SImode), CONST0_RTX (SImode), + force_reg (SImode, GEN_INT (bmask; + emit_insn (final_insn); +} + +static void +vector_init_fpmerge (rtx target, rtx elt, enum machine_mode inner_mode) +{ + rtx t1, t2, t3, t3_low; + + t1 = gen_reg_rtx (V4QImode); + elt = convert_modes (SImode, inner_mode, elt, true); + emit_move_insn (gen_lowpart (SImode, t1), elt); + + t2 = gen_reg_rtx (V4QImode); + emit_move_insn (t2, t1); + + t3 = gen_reg_rtx (V8QImode); + t3_low = gen_lowpart (V4QImode, t3); + + emit_insn (gen_fpmerge_vis (t3, t1, t2)); + emit_move_insn (t1, t3_low); + emit_move_insn (t2, t3_low); + + emit_insn (gen_fpmerge_vis (t3, t1, t2)); + emit_move_insn (t1, t3_low); + emit_move_insn (t2, t3_low); + + emit_insn (gen_fpmerge_vis (gen_lowpart (V8QImode, target), t1, t2)); +} + void sparc_expand_vector_init (rtx target, rtx vals) { @@ -11286,13 +11347,18 @@ sparc_expand_vector_init (rtx target, rtx vals) enum machine_mode inner_mode = GET_MODE_INNER (mode); int n_elts = GET_MODE_NUNITS (mode); int i, n_var = 0; + bool all_same; rtx mem; + all_same = true; for (i = 0; i n_elts; i++) { rtx x = XVECEXP (vals, 0, i); if (!CONSTANT_P (x)) n_var++; + + if (i 0 !rtx_equal_p (x, XVECEXP (vals, 0, 0))) + all_same = false; } if (n_var == 0) @@ -11301,6 +11367,45 @@ sparc_expand_vector_init (rtx target, rtx vals) return; } + if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (mode)) +{ + if (GET_MODE_SIZE (inner_mode) == 4) + { + emit_move_insn (gen_lowpart (SImode, target), + gen_lowpart (SImode, XVECEXP (vals, 0, 0))); + return; + } + else if (GET_MODE_SIZE (inner_mode) == 8) + { + emit_move_insn (gen_lowpart (DImode, target), + gen_lowpart (DImode, XVECEXP (vals, 0, 0))); + return; + } +} + else if (GET_MODE_SIZE (inner_mode) == GET_MODE_SIZE (word_mode) + GET_MODE_SIZE (mode) == 2 * GET_MODE_SIZE (word_mode)) +{ + emit_move_insn (gen_highpart (word_mode, target), + gen_lowpart (word_mode, XVECEXP (vals, 0, 0))); + emit_move_insn (gen_lowpart (word_mode, target), + gen_lowpart (word_mode, XVECEXP (vals, 0, 1))); + return; +} + + if (all_same GET_MODE_SIZE (mode) == 8) +{ + if (TARGET_VIS2) + { + vector_init_bshuffle (target, XVECEXP (vals, 0, 0), mode, inner_mode); + return; + } + if (mode == V8QImode) + { +
Re: C++ PATCH to add -std=c++11 ??
On Mon, Oct 31, 2011 at 12:26 AM, Jason Merrill ja...@redhat.com wrote: Here's my start at adjusting things to use the C++11 name; feel free to run with it. Looking at it again, I think adding __GXX_EXPERIMENTAL_CXX11__ is a mistake, we should just set __cplusplus to the C++11 value. I tend to agree. Too many macros to control C++11 may not necessarily be a feature. Tricky.
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
-Original Message- From: Kai Tietz [mailto:ktiet...@googlemail.com] Sent: Thursday, October 27, 2011 5:36 PM To: Jiangning Liu Cc: Michael Matz; Richard Guenther; Kai Tietz; gcc-patches@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs 2011/10/27 Jiangning Liu jiangning@arm.com: -Original Message- From: Michael Matz [mailto:m...@suse.de] Sent: Wednesday, October 26, 2011 11:47 PM To: Kai Tietz Cc: Jiangning Liu; Richard Guenther; Kai Tietz; gcc- patc...@gcc.gnu.org; Richard Henderson Subject: Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: So you would mean that memory dereferencing shouldn't be considered as side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... So we would happily cause by code 'if (i *i != 0) an crash, as memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. In you special case it might be valid that, if first (and C-fold- const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '' is a memread it can always be moved to the right side (or the operator transformed into '' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Agree. The point is for the small case I gave RHS doesn't have side effect at all, so the optimization of changing it to AND doesn't violate C specification. We need to recover something for this case, although it did improve a lot for some particular benchmarks. Thanks, -Jiangning Ciao, Michael. Hmm, so we can allow merging to AND, if the left-hand-side might trap but has no-side-effects and rhs has neither trapping nor side-effects. As for the case that left-hand side has side-effects but right-hand not, we aren't allowed to do this AND/OR merge. For example 'if ((f = foo ()) != 0 f 24)' we aren't allowed to make this transformation. This shouldn't be that hard. We need to provide to simple_operand_p_2 an additional argument for checking trapping or not. Would it be OK if I file a tracker in bugzilla against this? Regards, Kai
Re: resent2 [PATCH] Fix ICE in redirect_jump, at jump.c:1497 PR50496
On 2011/10/25 02:04 AM, Bernd Schmidt wrote: On 10/24/11 20:02, Chung-Lin Tang wrote: On 2011/10/18 04:03 PM, Eric Botcazou wrote: thread_prologue_and_epilogue_insns should detect all cases where a return insn can be created. So any CFG cleanup that runs before it does not need this functionality. So we're left with CFG cleanups that run after it and could forward edges to an edge from a return insn to the exit block in order to build a new return insn. We have no testcases to suggest that this ever happens. Which does mean that, at least through the two call sites that my original patch modified, it may be hard to ever find out later, if patch applied. Bernd, why can't we simply remove the assertion? The pre-reload case will fail at validation and return 0, matching pre-reload, pre-shrink-wrap behavior, while any possible remaining post-reload redirection to the exit block can just use 'ret_rtx' as the rare fallback No, after prologue insertion we have to distinguish between ret_rtx and simple_return_rtx. I'm suggesting a new patch, as attached. Before reload_completed, we directly return 0 upon nlabel == NULL, which should be identical with old behavior, while asserting fail if after reload (where we assume the simple_return/return distinction is required). This should ensure better that, if a post-prologue case of redirecting to the exit block ever happens we will more easily know (by some future PR :P) Bootstrapped and tested on i686, and cross tested on ARM using QEMU. Eric, is this approach okay? Thanks, Chung-Lin 2011-10-31 Chung-Lin Tang clt...@codesourcery.com * jump.c (redirect_jump): Assert fail on nlabel == NULL_RTX only after reload. Add comments. Index: jump.c === --- jump.c (revision 180421) +++ jump.c (working copy) @@ -1495,8 +1495,19 @@ redirect_jump (rtx jump, rtx nlabel, int delete_un { rtx olabel = JUMP_LABEL (jump); - gcc_assert (nlabel != NULL_RTX); + if (!nlabel) +{ + /* For nlabel == NULL_RTX cases, if reload_completed == 0, +return/simple_return are not yet creatable, thus we return 0 +immediately; if reload_completed, we do not accept !nlabel +at all, either a non-null label, or return/simple_return RTX. +In that case assert fail. */ + if (!reload_completed) + return 0; + gcc_unreachable (); +} + if (nlabel == olabel) return 1;
Re: resent2 [PATCH] Fix ICE in redirect_jump, at jump.c:1497 PR50496
I'm suggesting a new patch, as attached. Before reload_completed, we directly return 0 upon nlabel == NULL, which should be identical with old behavior, while asserting fail if after reload (where we assume the simple_return/return distinction is required). This should ensure better that, if a post-prologue case of redirecting to the exit block ever happens we will more easily know (by some future PR :P) Bootstrapped and tested on i686, and cross tested on ARM using QEMU. Eric, is this approach okay? Don't you want epilogue_completed instead of reload_completed? Otherwise, yes, the approach is fine with me, but wait for Bernd's input. 2011-10-31 Chung-Lin Tang clt...@codesourcery.com * jump.c (redirect_jump): Assert fail on nlabel == NULL_RTX only after reload. Add comments. Minor rewording of the comment below: + if (!nlabel) +{ /* If there is no label, we are asked to redirect to the EXIT block. Now, before the epilogue is emitted, return/simple_return cannot be created so we return 0 immediately. After the epilogue is emitted, we always expect a label, either a non-null label, or a return/simple_return RTX. + if (!reload_completed) + return 0; + gcc_unreachable (); +} -- Eric Botcazou
[PATCH] Re: vector shift regression on sparc
On Sun, Oct 30, 2011 at 12:38:32AM -0400, David Miller wrote: gcc.dg/pr48616.c segfaults on sparc as of a day or two ago vectorizable_shift() crashes because op1_vectype is NULL and we hit this code path: /* Vector shifted by vector. */ if (!scalar_shift_arg) { optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); =if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) dt[1] is vect_external_def and slp_node is non-NULL. Indeed, when the 'dt' arg to vect_is_simple_use_1() is vect_external_def *vectype will be set to NULL. Here is a fix for that (and other issues that show up on these testcases with -O3 -mxop if I disable all vector/scalar shift expanders in sse.md). For SLP it currently gives up more often than for loop vectorization, I assume we could handle all dt[1] == vect_constant_def and dt[2] == vect_external_def cases for SLP (and at least the former even if the constants differ between nodes) by building the vectors by hand, though the current vect_get_vec_defs/vect_get_vec_defs_for_stmt_copy can't be used for that as is. 2011-10-28 Jakub Jelinek ja...@redhat.com * tree-vect-stmts.c (vectorizable_shift): If op1 is vect_external_def in a loop and has different type from op0, cast it to op0's type before the loop first. For slp give up. Don't crash if op1_vectype is NULL. * gcc.dg/vshift-3.c: New test. * gcc.dg/vshift-4.c: New test. * gcc.dg/vshift-5.c: New test. --- gcc/tree-vect-stmts.c.jj2011-10-28 16:21:06.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-31 10:27:57.0 +0100 @@ -2446,7 +2446,10 @@ vectorizable_shift (gimple stmt, gimple_ optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); - if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) + if (!op1_vectype) + op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out); + if (op1_vectype == NULL_TREE + || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, unusable type for last operand in @@ -2480,9 +2483,28 @@ vectorizable_shift (gimple stmt, gimple_ /* Unlike the other binary operators, shifts/rotates have the rhs being int, instead of the same type as the lhs, so make sure the scalar is the right type if we are - dealing with vectors of short/char. */ +dealing with vectors of long long/long/short/char. */ if (dt[1] == vect_constant_def) op1 = fold_convert (TREE_TYPE (vectype), op1); + else if (!useless_type_conversion_p (TREE_TYPE (vectype), + TREE_TYPE (op1))) + { + if (slp_node + TYPE_MODE (TREE_TYPE (vectype)) +!= TYPE_MODE (TREE_TYPE (op1))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, unusable type for last operand in + vector/vector shift/rotate.); + return false; + } + if (vec_stmt !slp_node) + { + op1 = fold_convert (TREE_TYPE (vectype), op1); + op1 = vect_init_vector (stmt, op1, + TREE_TYPE (vectype), NULL); + } + } } } } --- gcc/testsuite/gcc.dg/vshift-3.c.jj 2011-10-31 10:00:57.0 +0100 +++ gcc/testsuite/gcc.dg/vshift-3.c 2011-10-31 10:00:42.0 +0100 @@ -0,0 +1,136 @@ +/* { dg-do run } */ +/* { dg-options -O3 } */ + +#include stdlib.h + +#define N 64 + +#ifndef TYPE1 +#define TYPE1 int +#define TYPE2 long long +#endif + +signed TYPE1 a[N], b, g[N]; +unsigned TYPE1 c[N], h[N]; +signed TYPE2 d[N], e, j[N]; +unsigned TYPE2 f[N], k[N]; + +#ifndef S +#define S(x) x +#endif + +__attribute__((noinline)) void +f1 (void) +{ + int i; + for (i = 0; i N; i++) +g[i] = a[i] S (b); +} + +__attribute__((noinline)) void +f2 (void) +{ + int i; + for (i = 0; i N; i++) +g[i] = a[i] S (b); +} + +__attribute__((noinline)) void +f3 (void) +{ + int i; + for (i = 0; i N; i++) +h[i] = c[i] S (b); +} + +__attribute__((noinline)) void +f4 (void) +{ + int i; + for (i = 0; i N; i++) +j[i] = d[i] S (e); +} + +__attribute__((noinline)) void +f5 (void) +{ + int i; + for (i = 0; i N; i++) +j[i] = d[i] S (e); +} + +__attribute__((noinline)) void +f6 (void) +{ + int i; + for (i = 0; i N; i++) +k[i] = f[i] S (e); +} +
Re: [PR50869] don't attempt to expand CFA within cselib
On Fri, Oct 28, 2011 at 07:07:18PM -0200, Alexandre Oliva wrote: for gcc/ChangeLog from Alexandre Oliva aol...@redhat.com PR debug/50869 * cselib.c (cfa_base_preserved_regno): Initialize. (cselib_expand_value_rtx_1): Don't expand it. * var-tracking.c (vt_expand_var_loc_chain): Initialize depth. Check it's only zero if result is NULL. Ok for trunk, thanks. Jakub
Re: C++ PATCH to add -std=c++11 ??
On 10/31/2011 06:26 AM, Jason Merrill wrote: Here's my start at adjusting things to use the C++11 name; feel free to run with it. Great. When you commit it, you can as well add 'PR c++/50920' to the ChangeLog! Paolo.
Re: [ARM] Fix PR49641
On 10/25/2011 06:56 PM, Richard Earnshaw wrote: On 24/10/11 14:30, Sebastian Huber wrote: Hello, what about the attached patch based on the original patch provided by Bernd Schmidt with modifications suggested by Richard Earnshaw. pr49641.patch * config/arm/arm.c (store_multiple_sequence): Avoid cases where the base reg is stored iff compiling for Thumb1. * gcc.target/arm/pr49641.c: New test. OK. R. Would someone mind committing it? Thanks. -- Sebastian Huber, embedded brains GmbH Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany Phone : +49 89 18 90 80 79-6 Fax : +49 89 18 90 80 79-9 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: [PATCH] Handle many consecutive location notes more efficiently in dwarf2.
On Sun, Oct 30, 2011 at 09:55:42PM -0400, David Miller wrote: --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -20149,7 +20151,35 @@ dwarf2out_var_location (rtx loc_note) if (var_loc_p !DECL_P (NOTE_VAR_LOCATION_DECL (loc_note))) return; - next_real = next_real_insn (loc_note); + /* Optimize processing a large consecutive sequence of location + notes so we don't spend too much time in next_real_insn. If the + next insn is another location note, remember the next_real_insn + calculation for next time. */ + next_real = cached_next_real_insn; + if (next_real) +{ + if (expected_next_loc_note != loc_note) + next_real = NULL_RTX; +} + + next_note = NEXT_INSN (loc_note); + if (! next_note + || INSN_DELETED_P (next_note) + || GET_CODE (next_note) != NOTE + || (NOTE_KIND (next_note) != NOTE_INSN_VAR_LOCATION I think for next_note being NOTE_INSN_VAR_LOCATION you want to set next_note to NULL_RTX if !DECL_P (NOTE_VAR_LOCATION_DECL (next_note)). Otherwise you risk that the above if (var_loc_p !DECL_P (NOTE_VAR_LOCATION_DECL (loc_note))) return; will not clear the cache, you reach end of function and in the next function when dwarf2out_var_location is called for the first time, cached_next_real_insn will be non-NULL and if you have really bad luck it will be called on insn that has the same address as expected_next_loc_note (GC collection could happen in between). Or alternatively you could remove the whole if (! !next_note ...) next_note = NULL_RTX; stmt and move your cache to a global var and clear it when reaching end of function (like e.g. last_var_location_insn is cleared in dwarf2out_end_epilogue). Jakub
Re: [PATCH] Re: vector shift regression on sparc
On 31 October 2011 11:53, Jakub Jelinek ja...@redhat.com wrote: On Sun, Oct 30, 2011 at 12:38:32AM -0400, David Miller wrote: gcc.dg/pr48616.c segfaults on sparc as of a day or two ago vectorizable_shift() crashes because op1_vectype is NULL and we hit this code path: /* Vector shifted by vector. */ if (!scalar_shift_arg) { optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); = if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) dt[1] is vect_external_def and slp_node is non-NULL. Indeed, when the 'dt' arg to vect_is_simple_use_1() is vect_external_def *vectype will be set to NULL. Here is a fix for that (and other issues that show up on these testcases with -O3 -mxop if I disable all vector/scalar shift expanders in sse.md). For SLP it currently gives up more often than for loop vectorization, I assume we could handle all dt[1] == vect_constant_def and dt[2] == vect_external_def cases for SLP (and at least the former even if the constants differ between nodes) by building the vectors by hand, though the current vect_get_vec_defs/vect_get_vec_defs_for_stmt_copy can't be used for that as is. 2011-10-28 Jakub Jelinek ja...@redhat.com * tree-vect-stmts.c (vectorizable_shift): If op1 is vect_external_def in a loop and has different type from op0, cast it to op0's type before the loop first. For slp give up. Don't crash if op1_vectype is NULL. * gcc.dg/vshift-3.c: New test. * gcc.dg/vshift-4.c: New test. * gcc.dg/vshift-5.c: New test. --- gcc/tree-vect-stmts.c.jj 2011-10-28 16:21:06.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-31 10:27:57.0 +0100 @@ -2446,7 +2446,10 @@ vectorizable_shift (gimple stmt, gimple_ optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); - if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) + if (!op1_vectype) + op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out); + if (op1_vectype == NULL_TREE + || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, unusable type for last operand in @@ -2480,9 +2483,28 @@ vectorizable_shift (gimple stmt, gimple_ /* Unlike the other binary operators, shifts/rotates have the rhs being int, instead of the same type as the lhs, so make sure the scalar is the right type if we are - dealing with vectors of short/char. */ + dealing with vectors of long long/long/short/char. */ if (dt[1] == vect_constant_def) op1 = fold_convert (TREE_TYPE (vectype), op1); + else if (!useless_type_conversion_p (TREE_TYPE (vectype), + TREE_TYPE (op1))) What happens in case dt[1] == vect_internal_def? Thanks, Ira + { + if (slp_node + TYPE_MODE (TREE_TYPE (vectype)) + != TYPE_MODE (TREE_TYPE (op1))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, unusable type for last operand in + vector/vector shift/rotate.); + return false; + } + if (vec_stmt !slp_node) + { + op1 = fold_convert (TREE_TYPE (vectype), op1); + op1 = vect_init_vector (stmt, op1, + TREE_TYPE (vectype), NULL); + } + } } } } Jakub
Re: [PATCH] Re: vector shift regression on sparc
On Mon, Oct 31, 2011 at 01:14:25PM +0200, Ira Rosen wrote: --- gcc/tree-vect-stmts.c.jj 2011-10-28 16:21:06.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-31 10:27:57.0 +0100 @@ -2446,7 +2446,10 @@ vectorizable_shift (gimple stmt, gimple_ optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); - if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) + if (!op1_vectype) + op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out); + if (op1_vectype == NULL_TREE + || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, unusable type for last operand in @@ -2480,9 +2483,28 @@ vectorizable_shift (gimple stmt, gimple_ /* Unlike the other binary operators, shifts/rotates have the rhs being int, instead of the same type as the lhs, so make sure the scalar is the right type if we are - dealing with vectors of short/char. */ + dealing with vectors of long long/long/short/char. */ if (dt[1] == vect_constant_def) op1 = fold_convert (TREE_TYPE (vectype), op1); + else if (!useless_type_conversion_p (TREE_TYPE (vectype), + TREE_TYPE (op1))) What happens in case dt[1] == vect_internal_def? For !slp_node we can't reach this with dt1[1] == vect_internal_def, because of: if (dt[1] == vect_internal_def !slp_node) scalar_shift_arg = false; And for slp_node I'm just giving up if type modes don't match: + { + if (slp_node + TYPE_MODE (TREE_TYPE (vectype)) + != TYPE_MODE (TREE_TYPE (op1))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, unusable type for last operand in + vector/vector shift/rotate.); + return false; + } BTW, even the pre-existing if (dt[1] == vect_constant_def) doesn't seem to be 100% correct for slp_node != NULL, I think vect_get_constant_vectors will in that case create a VECTOR_CST with the desirable vector type (same type mode as op0's vector type mode), but the constants in the VECTOR_CST will have a wrong type (say V4DImode VECTOR_CST with SImode constants in its constructor). The expander doesn't ICE on it though. Jakub
Re: [PATCH] Re: vector shift regression on sparc
On 31 October 2011 13:23, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 31, 2011 at 01:14:25PM +0200, Ira Rosen wrote: --- gcc/tree-vect-stmts.c.jj 2011-10-28 16:21:06.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-31 10:27:57.0 +0100 @@ -2446,7 +2446,10 @@ vectorizable_shift (gimple stmt, gimple_ optab = optab_for_tree_code (code, vectype, optab_vector); if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, vector/vector shift/rotate found.); - if (TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) + if (!op1_vectype) + op1_vectype = get_same_sized_vectype (TREE_TYPE (op1), vectype_out); + if (op1_vectype == NULL_TREE + || TYPE_MODE (op1_vectype) != TYPE_MODE (vectype)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, unusable type for last operand in @@ -2480,9 +2483,28 @@ vectorizable_shift (gimple stmt, gimple_ /* Unlike the other binary operators, shifts/rotates have the rhs being int, instead of the same type as the lhs, so make sure the scalar is the right type if we are - dealing with vectors of short/char. */ + dealing with vectors of long long/long/short/char. */ if (dt[1] == vect_constant_def) op1 = fold_convert (TREE_TYPE (vectype), op1); + else if (!useless_type_conversion_p (TREE_TYPE (vectype), + TREE_TYPE (op1))) What happens in case dt[1] == vect_internal_def? For !slp_node we can't reach this with dt1[1] == vect_internal_def, because of: if (dt[1] == vect_internal_def !slp_node) scalar_shift_arg = false; And for slp_node I'm just giving up if type modes don't match: + { + if (slp_node + TYPE_MODE (TREE_TYPE (vectype)) + != TYPE_MODE (TREE_TYPE (op1))) + { + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, unusable type for last operand in + vector/vector shift/rotate.); + return false; + } Ah, OK. BTW, even the pre-existing if (dt[1] == vect_constant_def) doesn't seem to be 100% correct for slp_node != NULL, I think vect_get_constant_vectors will in that case create a VECTOR_CST with the desirable vector type (same type mode as op0's vector type mode), but the constants in the VECTOR_CST will have a wrong type (say V4DImode VECTOR_CST with SImode constants in its constructor). The expander doesn't ICE on it though. Right. As you wrote before, we should probably change shift vectors creation for SLP. The patch is OK. Thanks, Ira Jakub
Re: [patch, Fortran] Fix PR 50690
Tobias Burnus wrote: I had also a glance at the patch - and it looks reasonable; in particular, I failed to generate a failing test case. Actually, the test case is *not* OK. If one compiles the original test case of the PR (or your workshare2.f90) with -O and looks at -fdump-tree-original, one finds: #pragma omp parallel default(shared) { { real(kind=4) __var_1; { #pragma omp single { __var_1 = __builtin_cosf (b[0]) } ... #pragma omp for schedule(static) nowait for (S.1 = 1; S.1 = 5; S.1 = S.1 + 1) { a[S.1 + -1] = a[S.1 + -1] * D.1730 + a[S.1 + -1] * D.1731; Thus, __var_1 is a thread-local variable; however, COS() is not executed in all threads but only in one due to the omp single: The single construct specifies that the associated structured block is executed by only one of the threads in the team (2.5.3 single Construct, OpenMP 3.1). Jakub remarks that omp single is what we expand to omp workshare if it is not simple enough for us. * * * With the test case below, the dump looks OK, but the FE optimization does not combine the two cos() calls - I have no idea why. The dump looks as: #pragma omp parallel default(shared) { D.1743 = __builtin_cosf (b[0]); D.1745 = __builtin_cosf (b[0]); ... #pragma omp for schedule(static) nowait for (S.2 = 1; S.2 = 10; S.2 = S.2 + 1) a[S.2 + D.1750] = a[S.2 + D.1748] * D.1743 + a[S.2 + D.1749] * D.1745; Tobias PS: The test case is: program workshare implicit none real, parameter :: eps = 3e-7 integer :: j real :: A(10,5), B(5) B(1) = 3.344 call random_number(a) !$omp parallel default(shared) !$omp workshare forall (j=1:5) A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1)) end forall !$omp end workshare !$omp end parallel print *, A end program workshare subroutine parallel_workshare implicit none real, parameter :: eps = 3e-7 integer :: j real :: A(10,5), B(5) B(1) = 3.344 call random_number(a) !$omp parallel workshare default(shared) forall (j=1:5) A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1)) end forall !$omp end parallel workshare print *, A end subroutine parallel_workshare
fixes after the review (issue5303083)
Fixes after davidxl review. The patch is for google/main branch. 2011-10-31 Dmitriy Vyukov dvyu...@google.com * gcc/doc/invoke.texi: * gcc/tree-tsan.c (enum tsan_ignore_type): (struct bb_data): (struct mop_desc): (struct tsan_ignore_desc): (lookup_name): (build_var_decl): (get_shadow_stack_decl): (get_thread_ignore_decl): (get_handle_mop_decl): (ignore_append): (ignore_match): (ignore_load): (tsan_ignore): (decl_name): (build_stack_op): (build_rec_ignore_op): (build_stack_assign): (instr_mop): (instr_vptr_store): (instr_func): (set_location): (is_dtor_vptr_store): (is_vtbl_read): (is_load_of_const): (handle_expr): (handle_gimple): (instrument_bblock): (instrument_mops): (instrument_function): (tsan_pass): (tsan_gate): * gcc/tree-pass.h: * gcc/testsuite/gcc.dg/tsan-ignore.ignore: * gcc/testsuite/gcc.dg/tsan.h (__tsan_init): (__tsan_expect_mop): (__tsan_handle_mop): * gcc/testsuite/gcc.dg/tsan-ignore.c (foo): (int bar): (int baz): (int bla): (int xxx): (main): * gcc/testsuite/gcc.dg/tsan-ignore.h (in_tsan_ignore_header): * gcc/testsuite/gcc.dg/tsan-stack.c (foobar): * gcc/testsuite/gcc.dg/tsan-mop.c: * gcc/common.opt: * gcc/Makefile.in: * gcc/passes.c: Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 180522) +++ gcc/doc/invoke.texi (working copy) @@ -308,6 +308,7 @@ -fdump-tree-ssa@r{[}-@var{n}@r{]} -fdump-tree-pre@r{[}-@var{n}@r{]} @gol -fdump-tree-ccp@r{[}-@var{n}@r{]} -fdump-tree-dce@r{[}-@var{n}@r{]} @gol -fdump-tree-gimple@r{[}-raw@r{]} -fdump-tree-mudflap@r{[}-@var{n}@r{]} @gol +-fdump-tree-tsan@r{[}-@var{n}@r{]} @gol -fdump-tree-dom@r{[}-@var{n}@r{]} @gol -fdump-tree-dse@r{[}-@var{n}@r{]} @gol -fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol @@ -381,8 +382,8 @@ -floop-parallelize-all -flto -flto-compression-level @gol -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol --fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol --fno-default-inline @gol +-fmove-loop-invariants -fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol +-ftsan -ftsan-ignore -fno-default-inline @gol -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol @@ -5896,6 +5897,11 @@ Dump each function after adding mudflap instrumentation. The file name is made by appending @file{.mudflap} to the source file name. +@item tsan +@opindex fdump-tree-tsan +Dump each function after adding ThreadSanitizer instrumentation. The file name is +made by appending @file{.tsan} to the source file name. + @item sra @opindex fdump-tree-sra Dump each function after performing scalar replacement of aggregates. The @@ -6674,6 +6680,12 @@ some protection against outright memory corrupting writes, but allows erroneously read data to propagate within a program. +@item -ftsan -ftsan-ignore +@opindex ftsan +@opindex ftsan-ignore +Add ThreadSanitizer instrumentation. Use @option{-ftsan-ignore} to specify +an ignore file. Refer to http://go/tsan for details. + @item -fthread-jumps @opindex fthread-jumps Perform optimizations where we check to see if a jump branches to a Index: gcc/tree-tsan.c === --- gcc/tree-tsan.c (revision 0) +++ gcc/tree-tsan.c (revision 0) @@ -0,0 +1,1125 @@ +/* ThreadSanitizer instrumentation pass. + http://code.google.com/p/data-race-test + Copyright (C) 2011 + Free Software Foundation, Inc. + Contributed by Dmitry Vyukov dvyu...@google.com + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include intl.h +#include tm.h +#include basic-block.h +#include gimple.h +#include function.h +#include tree-flow.h +#include tree-pass.h +#include cfghooks.h +#include
Re: [PATCH] Optimize in RTL vector AND { -1, -1, ... }, IOR { -1, -1, ... } and XOR { -1, -1, ... } (take 2)
2011-09-26 Jakub Jelinek ja...@redhat.com * rtl.h (const_tiny_rtx): Change into array of 4 x MAX_MACHINE_MODE from 3 x MAX_MACHINE_MODE. (CONSTM1_RTX): Define. * emit-rtl.c (const_tiny_rtx): Change into array of 4 x MAX_MACHINE_MODE from 3 x MAX_MACHINE_MODE. (gen_rtx_CONST_VECTOR): Use CONSTM1_RTX if all inner constants are CONSTM1_RTX. (init_emit_once): Initialize CONSTM1_RTX for MODE_INT and MODE_VECTOR_INT modes. * simplify-rtx.c (simplify_binary_operation_1) case IOR, XOR, AND: Optimize if one operand is CONSTM1_RTX. * config/i386/i386.c (ix86_expand_sse_movcc): Optimize mask ? -1 : x into mask | x. FYI - this patch (179238) breaks the Blackfin compiler build with an internal compiler error during configure of libgcc: conftest.c:1:0: internal compiler error: in gen_const_vector, at emit-rtl.c:5491 which is the: gcc_assert (const_tiny_rtx[constant][(int) inner]); gcc configured with: ../gcc-4.7/configure --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=bfin-elf --prefix=/home/shender/gnu/toolchain/bfin-elf --disable-libstdcxx-pch --enable-languages=c,c++ --with-newlib --enable-clocale=generic --disable-symvers --disable-libssp --disable-libffi --disable-libgcj --enable-version-specific-runtime-libs --enable-__cxa_atexit Stu
Re: [RFC PATCH] update to libtool-2.4.2 and regenerate
Markus Trippelsdorf mar...@trippelsdorf.de writes: By popular demand, I've prepared a patch that updates the in-tree libtool to version 2.4.2. It is needed for lto-bootstrap with -fno-fat-lto-objects and FreeBSD10.x versions. It's a pretty big update as you can see by the following diffstat. I cannot attach the patch even as a gzip file, because of its size: 417745 Oct 28 00:47 0001-update-to-libtool-2.4.2-and-regenerate.patch.gz Bootstrapped on x86_64-pc-linux-gnu. Comments? Stage 1 will end soon and it would be nice to get this in. I've tried your patch on i386-pc-solaris2.11 this weekend in a variety of configurations: * using ld or gld 2.21.1, * with the 32-bit default configuration (i386-pc-solaris2.11) and the 64-bit default configuration (amd64-pc-solaris2.11). This revealed a couple of problems: * If Go support is included (off by default), bootstrap breaks like this while building libgo: libtool: Version mismatch error. This is libtool 2.4.2, but the libtool: definition of this LT_INIT comes from libtool 2.2.7a. libtool: You should recreate aclocal.m4 with macros from libtool 2.4.2 libtool: and run autoconf again. make[4]: *** [go-assert.lo] Error 63 To avoid this, I've run all bootstraps without Go. * When building the 64-bit default gld configuration, building the 64-bit libjava fails like this: Error libtool: compile: not configured to build any kind of library libtool: compile: See the libtool documentation for more information. make[5]: libtool: compile: Fatal configuration error. I had already patched the copy of libtool.m4 to deal with the new configuration and now also submitted it upstream: Support 64-bit default GCC on Solaris/x86 http://lists.gnu.org/archive/html/libtool-patches/2011-10/msg00021.html After applying the patch and regenerating all affected configure scripts, the bootstrap completed. With those two changes, all four bootstraps completed without regressions. I made a quick comparison of the libtool.m4 in libgo/config with the 2.4.2 version: the only relevant change seems to be an instance of AC_PROG_GO, which also lives in go.m4. Ian will know why that additional copy is necessary. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)
On Sat, Oct 29, 2011 at 03:53:37PM +0200, Toon Moene wrote: I wonder whether it will work with the attached Fortran routine - it sure would mean a boost to the 18%+ heaviest CPU user in our code. It didn't do anything, but only because I used a bad approach in vect_check_gather. I have been using DR_BASE_ADDRESS/DR_OFFSET/DR_INIT into which dr_analyze_innermost splits the reference, but that split is into something hopefully usable for alias analysis, variable and constant offset using split_constant_offset that would create largish expressions in your testcase, which add many values together. But for gather vectorization we'd either have to gimplify such expressions back before the load and allow them to be vectorized, or, as done in this incremental patch, I instead do something similar to what dr_analyze_innermost does, but with different goal - to split stuff off into a loop invariant that can be computed before the loop (and will be put into the scalar part of gather), and a SSA_NAME defined in the loop which contains the rest (plus optionally sign/zero extending that into a wider type and/or scaling by 2/4/8. With this incremental patch I get 4 loops in this testcase with -O3 -mavx2 vectorized (compared to 0 before), with 262 vgather* insns. On x86_64 unfortunately this doesn't figure out that it could do all the additions for the variable index in 32-bit type and then sign extend: idx_202 = *kp_201(D)[D.1941_200]; idy_209 = *kq_208(D)[D.1941_200]; ilev_216 = *kr_215(D)[D.1941_200]; D.1955_229 = *pgama_228(D)[D.1941_200]; D.1960_237 = *pbeta_236(D)[D.1941_200]; D.1965_245 = *palfa_244(D)[D.1941_200]; D.1966_246 = ilev_216 + -1; D.1967_247 = (integer(kind=8)) D.1966_246; D.1968_248 = D.1967_247 * stride.32_141; D.1969_249 = D.1968_248 + offset.33_155; D.1970_250 = idy_209 + -1; D.1971_251 = (integer(kind=8)) D.1970_250; D.1972_252 = D.1971_251 * stride.30_129; D.1973_253 = D.1969_249 + D.1972_252; D.1974_254 = idx_202 + -1; D.1975_255 = (integer(kind=8)) D.1974_254; D.1976_256 = D.1973_253 + D.1975_255; D.1977_258 = *parg_257(D)[D.1976_256]; so for -m64 it emits vgatherqps instructions (V4DImode indexes, loads V4SFmode values) and then merges those, while for -m32 it emits just 131 vgatherdps instructions (V8SImode indexes, V8SFmode values). Would be nice to cut down slightly this testcase into just one or two loops that are vectorized and turn it into a runtime testcase which verifies the vectorization was correct. 2011-10-31 Jakub Jelinek ja...@redhat.com * tree-vect-stmts.c (vectorizable_load): Don't add DR_INIT (dr) to ptr. * tree-vect-data-refs.c (vect_check_gather): Rewritten not to use DR_BASE_ADDRESS or DR_OFFSET, instead call get_inner_reference and try to separate in between base and off. --- gcc/tree-vect-stmts.c.jj2011-10-31 12:13:45.0 +0100 +++ gcc/tree-vect-stmts.c 2011-10-31 13:21:13.0 +0100 @@ -4452,8 +4452,6 @@ vectorizable_load (gimple stmt, gimple_s vec_dest = vect_create_destination_var (scalar_dest, vectype); ptr = fold_convert (ptrtype, gather_base); - ptr = fold_build2 (POINTER_PLUS_EXPR, ptrtype, ptr, -fold_convert (sizetype, DR_INIT (dr))); if (!is_gimple_min_invariant (ptr)) { ptr = force_gimple_operand (ptr, seq, true, NULL_TREE); --- gcc/tree-vect-data-refs.c.jj2011-10-31 12:13:45.0 +0100 +++ gcc/tree-vect-data-refs.c 2011-10-31 14:53:18.0 +0100 @@ -2504,109 +2504,156 @@ tree vect_check_gather (gimple stmt, loop_vec_info loop_vinfo, tree *basep, tree *offp, int *scalep) { - HOST_WIDE_INT scale = 1; + HOST_WIDE_INT scale = 1, pbitpos, pbitsize; struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); tree offtype = NULL_TREE; - tree base = DR_BASE_ADDRESS (dr); - tree off = DR_OFFSET (dr); - tree decl; - - if (TREE_CODE (base) == POINTER_PLUS_EXPR - integer_zerop (off) - TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME - !chrec_contains_symbols_defined_in_loop (TREE_OPERAND (base, 0), - loop-num)) + tree decl, base, off; + enum machine_mode pmode; + int punsignedp, pvolatilep; + + base = get_inner_reference (DR_REF (dr), pbitsize, pbitpos, off, + pmode, punsignedp, pvolatilep, false); + gcc_assert (base != NULL_TREE (pbitpos % BITS_PER_UNIT) == 0); + + if (TREE_CODE (base) == MEM_REF) { - off = TREE_OPERAND (base, 1); + if (!integer_zerop (TREE_OPERAND (base, 1))) + { + if (off == NULL_TREE) + { + double_int moff = mem_ref_offset (base); + off = double_int_to_tree (sizetype, moff); + } + else + off = size_binop (PLUS_EXPR, off, TREE_OPERAND (base, 1)); + } base =
Re: PowerPC shrink-wrap support 3 of 3
So I'm at the point where I'm reasonably happy with this work. This patch doesn't do anything particularly clever regarding our shrink-wrap implementation. We still only insert one copy of the prologue, and one of the epilogue in thread_prologue_and_epilogue. All it really does is replaces Bernd's !last_bb_active code (allowing one tail block with no active insns to be shared by paths needing a prologue and paths not needing a prologue), with what I think is conceptually simpler, duplicating a shared tail block. Then I extend this to duplicating a chain of tail blocks. That leads to some simplification as all the special cases and restrictions of !last_bb_active disappear. For example, convert_jumps_to_returns looks much like the code in gcc-4.6. We also get many more functions being shrink-wrapped. Some numbers from my latest gcc bootstraps: powerpc-linux .../gcc-virgin/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l 453 .../gcc-curr/gcc grep 'Performing shrink' *.pro_and_epilogue | wc -l 648 i686-linux .../gcc-virgin/gcc$ grep 'Performing shrink' *pro_and_epilogue | wc -l 329 .../gcc-curr/gcc$ grep 'Performing shrink' *.pro_and_epilogue | wc -l 416 Bits left to do - limit size of duplicated tails - don't duplicate sibling call blocks, but instead split the block after the sibling call epilogue has been added, redirecting non-prologue paths past the epilogue. Is this OK to apply as is? * function.c (bb_active_p): Delete. (dup_block_and_redirect, active_insn_between): New functions. (convert_jumps_to_returns, emit_return_for_exit): New functions, split out from.. (thread_prologue_and_epilogue_insns): ..here. Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Delete all last_bb_active code. Instead compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c === *** gcc/function.c (revision 180588) --- gcc/function.c (working copy) *** set_return_jump_label (rtx returnjump) *** 5514,5535 JUMP_LABEL (returnjump) = ret_rtx; } ! /* Return true if BB has any active insns. */ static bool ! bb_active_p (basic_block bb) { rtx label; ! /* Test whether there are active instructions in BB. */ ! label = BB_END (bb); ! while (label !LABEL_P (label)) { ! if (active_insn_p (label)) ! break; ! label = PREV_INSN (label); } ! return BB_HEAD (bb) != label || !LABEL_P (label); } /* Generate the prologue and epilogue RTL if the machine supports it. Thread this into place with notes indicating where the prologue ends and where --- 5514,5698 JUMP_LABEL (returnjump) = ret_rtx; } ! #ifdef HAVE_simple_return ! /* Create a copy of BB instructions and insert at BEFORE. Redirect !preds of BB to COPY_BB if they don't appear in NEED_PROLOGUE. */ ! static void ! dup_block_and_redirect (basic_block bb, basic_block copy_bb, rtx before, ! bitmap_head *need_prologue) ! { ! edge_iterator ei; ! edge e; ! rtx insn = BB_END (bb); ! ! /* We know BB has a single successor, so there is no need to copy a ! simple jump at the end of BB. */ ! if (simplejump_p (insn)) ! insn = PREV_INSN (insn); ! ! start_sequence (); ! duplicate_insn_chain (BB_HEAD (bb), insn); ! if (dump_file) ! { ! unsigned count = 0; ! for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) ! if (active_insn_p (insn)) ! ++count; ! fprintf (dump_file, Duplicating bb %d to bb %d, %u active insns.\n, ! bb-index, copy_bb-index, count); ! } ! insn = get_insns (); ! end_sequence (); ! emit_insn_before (insn, before); ! ! /* Redirect all the paths that need no prologue into copy_bb. */ ! for (ei = ei_start (bb-preds); (e = ei_safe_edge (ei)); ) ! if (!bitmap_bit_p (need_prologue, e-src-index)) ! { ! redirect_edge_and_branch_force (e, copy_bb); ! continue; ! } ! else ! ei_next (ei); ! } ! #endif ! ! #if defined (HAVE_return) || defined (HAVE_simple_return) ! /* Return true if there are any active insns between HEAD and TAIL. */ static bool ! active_insn_between (rtx head, rtx tail) ! { ! while (tail) ! { ! if (active_insn_p (tail)) ! return true; ! if (tail == head) ! return false; ! tail = PREV_INSN (tail); ! } ! return false; ! } ! ! /* LAST_BB is a block that exits, and empty of active instructions. !Examine its predecessors for jumps that can be converted to !(conditional) returns. */ ! static VEC (edge, heap) * ! convert_jumps_to_returns (basic_block last_bb, bool simple_p, !
Re: [PATCH] Miscompilation of __attribute__((constructor)) functions.
Ok if you move the clearing to after /* Generate a new name for the new version. */ DECL_NAME (new_decl) = clone_function_name (old_decl, clone_name); SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl)); SET_DECL_RTL (new_decl, NULL); using new_decl directly, thus add /* When the old decl was a con-/destructor make sure the clone isn't. */ DECL_STATIC_CONSTRUCTOR(new_decl) = 0; DECL_STATIC_DESTRUCTOR(new_decl) = 0; Done, and applied. Paul
Re: [C++ preview patch] PR 44277
How does it work to warn in convert_like_real instead? Jason
Re: [Patch, libfortran, 3/3] Update file position lazily
On Sun, Oct 30, 2011 at 01:29, Janne Blomqvist blomqvist.ja...@gmail.com wrote: On Sat, Oct 29, 2011 at 18:35, Mikael Morin mikael.mo...@sfr.fr wrote: On Saturday 29 October 2011 14:43:22 Mikael Morin wrote: FWIW, it seems ifort 12.0 uses UNDEFINED in this case; I suppose a case could be made for using the same. Comments? Let's go for UNDEFINED then. On second thought, UNSPECIFIED is better as UNDEFINED is for another case. Hmm, indeed, on second thought I agree as well. I just committed all the 3 parts of this patch series. Parts 1 and 2 verbatim, and 3 also verbatim except with the following for inquire_5.f90: Index: gcc/testsuite/gfortran.dg/inquire_5.f90 === --- gcc/testsuite/gfortran.dg/inquire_5.f90 (revision 180700) +++ gcc/testsuite/gfortran.dg/inquire_5.f90 (working copy) @@ -1,11 +1,10 @@ ! { dg-do run { target fd_truncate } } -! { dg-options -std=legacy } ! ! pr19314 inquire(..position=..) segfaults ! test by thomas.koe...@online.de ! bdavis9...@comcast.net implicit none - character*20 chr + character(len=20) chr open(7,STATUS='SCRATCH') inquire(7,position=chr) if (chr.NE.'ASIS') CALL ABORT @@ -31,7 +30,7 @@ write(7,*)'this is another record' backspace(7) inquire(7,position=chr) - if (chr.NE.'ASIS') CALL ABORT + if (chr .NE. 'UNSPECIFIED') CALL ABORT rewind(7) inquire(7,position=chr) if (chr.NE.'REWIND') CALL ABORT (That is, test the returned value explicitly rather than test for standards conformance as in the original patch) -- Janne Blomqvist
Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)
On Mon, Oct 31, 2011 at 03:23:32PM +0100, Jakub Jelinek wrote: Would be nice to cut down slightly this testcase into just one or two loops that are vectorized and turn it into a runtime testcase which verifies the vectorization was correct. Here is one such testcase (though, in your case there are no loads for the indexes, on the other side you have 3 of the IVs each multiplied by some loop invariant and all added together. Though, on the other side in your case there are far more expressions. 2011-10-31 Jakub Jelinek ja...@redhat.com * gcc.target/i386/avx2-gather-4.c: New test. --- gcc/testsuite/gcc.target/i386/avx2-gather-4.c.jj2011-10-31 15:58:57.0 +0100 +++ gcc/testsuite/gcc.target/i386/avx2-gather-4.c 2011-10-31 15:59:44.0 +0100 @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-require-effective-target avx2 } */ +/* { dg-options -O3 -mavx2 } */ + +#include avx2-check.h + +#define N 1024 +int a[N], b[N], c[N], d[N]; + +__attribute__((noinline, noclone)) void +foo (float *__restrict p, float *__restrict q, float *__restrict r, + long s1, long s2, long s3) +{ + int i; + for (i = 0; i N; i++) +p[i] = q[a[i] * s1 + b[i] * s2 + s3] * r[c[i] * s1 + d[i] * s2 + s3]; +} + +static void +avx2_test (void) +{ + int i; + float e[N], f[N], g[N]; + for (i = 0; i N; i++) +{ + a[i] = (i * 7) (N / 8 - 1); + b[i] = (i * 13) (N / 8 - 1); + c[i] = (i * 23) (N / 8 - 1); + d[i] = (i * 5) (N / 8 - 1); + e[i] = 16.5 + i; + f[i] = 127.5 - i; +} + foo (g, e, f, 3, 2, 4); + for (i = 0; i N; i++) +if (g[i] != (float) ((20.5 + a[i] * 3 + b[i] * 2) +* (123.5 - c[i] * 3 - d[i] * 2))) + abort (); +} Jakub
Re: Go patch committed: Update Go library
Ian Lance Taylor i...@google.com writes: This patch updates the Go library to the most recent weekly release. I think the only potential portability issues here are the use of the ipv6_mreq struct. I'm not entirely sure the new exp/terminal package is portable, but it might be. I have not included the entire patch here, because it is too large and it's just copying changes anyhow. I've included all patches to files which are specific to the Go frontend version. After this change, I'm seeing another issue: most 32-bit go execution tests fail like this on Solaris 11/x86: /vol/gcc/src/hg/trunk/local/libgo/runtime/malloc.goc:366: libgo assertion failure FAIL: go.go-torture/execute/array-1.go execution, -O0 Running the test under truss, I find: 14261: mmap(0xFF00, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) Err#12 ENOMEM With truss -u (user function tracing), I see: 14285/1@1: - libgo:runtime_mallocinit() 14285/1@1:- libgo:runtime_InitSizes() 14285/1@1:- libgo:runtime_InitSizes() = 2 14285/1@1:- libgo:runtime_SysReserve() 14285/1:mmap(0xFF00, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) Err#12 ENOMEM 14285/1@1:- libgo:runtime_SysReserve() = -1 14285/1@1:- libgo:__go_assert_fail() If I remove the adjustment in runtime/malloc.goc (runtime_mallocinit), the test passes: 14445/1:mmap(0xFEF78114, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xCE00 This stuff seems incredibly fragile, and I don't exactly understand why. Besides, the assertion failure above is strange/wrong in two ways: /vol/gcc/src/hg/trunk/local/libgo/runtime/malloc.goc:366: libgo assertion failure * I'd expect to see the message from runtime_throw() here, not just `libgo assertion failure'. * The message points to the wrong line due to a broken test: malloc.goc has: p = runtime_SysReserve((void*)(0x00f8ULL32), bitmap_size + arena_size); if(p == nil) runtime_throw(runtime: cannot reserve arena virtual address space); On failure, p will be MAP_FAILED ((void *)-1), not nil, so the wrong assertion it thrown. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [C++ preview patch] PR 44277
Hi, How does it work to warn in convert_like_real instead? the problem is that (expr, totype) can be a lot of different things for which we want to warn, can be a zero and a pointer for assignments, but, when totype is a BOOLEAN_TYPE expr can be an EQ_EXPR or NEQ_EXPR and then the operands various things depending on whether we are looking for pointer, data member pointer, etc, on the left or on the right of the == or != sign. In other terms, the pattern matching doesn't seem matter of a few lines. I'm annoyed by this. Also, for the assignment case, I'm getting duplicate warnings, maybe can be fixed. Do you think there is no neat way to implement my idea of avoiding generating those implicit 0s in the first place? The internals of the front-end, seem very c++98-ish for null pointers ;) Paolo.
Re: C++ PATCH to add -std=c++11 ??
On 10/31/2011 06:39 AM, Paolo Carlini wrote: Great. When you commit it, you can as well add 'PR c++/50920' to the ChangeLog! OK, here's what I'm checking in. There are a lot more instances of C++0x in comments and cxx_dialect checks, but I'm not going to worry about those now. Tested x86_64-pc-linux-gnu. commit 12395569015d26ee38609653bf9b589961f546e2 Author: Jason Merrill ja...@redhat.com Date: Fri Aug 12 17:09:47 2011 -0400 PR c++/50920 gcc/c-family * c-common.h (cxx_dialect): Add cxx11 and cxx03. * c.opt: Add -std=c++11, -std=gnu++11, -std=gnu++03, and -Wc++11-compat. * c-opts.c (set_std_cxx11): Rename from set_std_cxx0x. gcc/cp * class.c (check_field_decl): Change c++0x in diags to c++11. * error.c (maybe_warn_cpp0x): Likewise. * parser.c (cp_parser_diagnose_invalid_type_name): Likewise. * pt.c (check_default_tmpl_args): Likewise. libcpp * include/cpplib.h (enum c_lang): Rename CLK_CXX0X to CLK_CXX11, CLK_GNUCXX0X to CLK_GNUCXX11. libstdc++-v3 * include/bits/c++0x_warning.h: Change -std=c++0x to -std=c++11. diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index be9d729..71746a9 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -643,11 +643,12 @@ extern int flag_use_repository; /* The supported C++ dialects. */ enum cxx_dialect { - /* C++98 */ + /* C++98 with TC1 */ cxx98, - /* Experimental features that are likely to become part of - C++0x. */ - cxx0x + cxx03 = cxx98, + /* C++11 */ + cxx0x, + cxx11 = cxx0x }; /* The C++ dialect being used. C++98 is the default. */ diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index 6869d5c..b56aec7 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -110,7 +110,7 @@ static size_t include_cursor; static void handle_OPT_d (const char *); static void set_std_cxx98 (int); -static void set_std_cxx0x (int); +static void set_std_cxx11 (int); static void set_std_c89 (int, int); static void set_std_c99 (int); static void set_std_c1x (int); @@ -775,10 +775,10 @@ c_common_handle_option (size_t scode, const char *arg, int value, set_std_cxx98 (code == OPT_std_c__98 /* ISO */); break; -case OPT_std_c__0x: -case OPT_std_gnu__0x: +case OPT_std_c__11: +case OPT_std_gnu__11: if (!preprocessing_asm_p) - set_std_cxx0x (code == OPT_std_c__0x /* ISO */); + set_std_cxx11 (code == OPT_std_c__11 /* ISO */); break; case OPT_std_c90: @@ -1501,18 +1501,18 @@ set_std_cxx98 (int iso) cxx_dialect = cxx98; } -/* Set the C++ 0x working draft standard (without GNU extensions if ISO). */ +/* Set the C++ 2011 standard (without GNU extensions if ISO). */ static void -set_std_cxx0x (int iso) +set_std_cxx11 (int iso) { - cpp_set_lang (parse_in, iso ? CLK_CXX0X: CLK_GNUCXX0X); + cpp_set_lang (parse_in, iso ? CLK_CXX11: CLK_GNUCXX11); flag_no_gnu_keywords = iso; flag_no_nonansi_builtin = iso; flag_iso = iso; - /* C++0x includes the C99 standard library. */ + /* C++11 includes the C99 standard library. */ flag_isoc94 = 1; flag_isoc99 = 1; - cxx_dialect = cxx0x; + cxx_dialect = cxx11; } /* Args to -d specify what to dump. Silently ignore diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 693f191..336a75a 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -289,7 +289,11 @@ Warn about C constructs that are not in the common subset of C and C++ Wc++0x-compat C++ ObjC++ Var(warn_cxx0x_compat) Warning -Warn about C++ constructs whose meaning differs between ISO C++ 1998 and ISO C++ 200x +Deprecated in favor of -Wc++11-compat + +Wc++11-compat +C++ ObjC++ Warning Alias(Wc++0x-compat) +Warn about C++ constructs whose meaning differs between ISO C++ 1998 and ISO C++ 2011 Wcast-qual C ObjC C++ ObjC++ Var(warn_cast_qual) Warning @@ -1175,12 +1179,13 @@ std=c++03 C++ ObjC++ Alias(std=c++98) Conform to the ISO 1998 C++ standard revised by the 2003 technical corrigendum +std=c++11 +C++ ObjC++ +Conform to the ISO 2011 C++ standard (experimental and incomplete support) + std=c++0x -C++ ObjC++ -Conform to the ISO 1998 C++ standard, with extensions that are likely to -become a part of the upcoming ISO C++ standard, dubbed C++0x. Note that the -extensions enabled by this mode are experimental and may be removed in -future releases of GCC. +C++ ObjC++ Alias(std=c++11) +Deprecated in favor of -std=c++11 std=c1x C ObjC @@ -1204,14 +1209,21 @@ Deprecated in favor of -std=c99 std=gnu++98 C++ ObjC++ -Conform to the ISO 1998 C++ standard with GNU extensions +Conform to the ISO 1998 C++ standard revised by the 2003 technical +corrigendum with GNU extensions + +std=gnu++03 +C++ ObjC++ Alias(std=gnu++98) +Conform to the ISO 1998 C++ standard revised by the 2003 technical +corrigendum with GNU extensions + +std=gnu++11 +C++ ObjC++ +Conform to the ISO 2011 C++ standard with GNU extensions (experimental and incomplete support)
Re: [libcpp] Correctly define __cplusplus (PR libstdc++-v3/1773)
On 10/21/2011 03:52 PM, Jason Merrill wrote: On 10/21/2011 03:11 PM, Marc Glisse wrote: Note that at least clang now defines __cplusplus to its new C++11 value (in experimental C++0X mode only). Apparently they switched around last June and say they are not the only ones. So if you want to follow their lead... Hmm, between that and the fact that 4.7 will in fact have almost all of the C++11 features, I think changing the value makes sense. Thus: commit f6f3e056eac1f9bcdc2ba0459723665dafd57396 Author: Jason Merrill ja...@redhat.com Date: Mon Oct 31 11:26:25 2011 -0400 PR libstdc++/1773 * init.c (cpp_init_builtins): Set __cplusplus for C++11. diff --git a/libcpp/init.c b/libcpp/init.c index bbaa8ae..9101b34 100644 --- a/libcpp/init.c +++ b/libcpp/init.c @@ -461,7 +461,13 @@ cpp_init_builtins (cpp_reader *pfile, int hosted) _cpp_define_builtin (pfile, __STDC__ 1); if (CPP_OPTION (pfile, cplusplus)) -_cpp_define_builtin (pfile, __cplusplus 199711L); +{ + if (CPP_OPTION (pfile, lang) == CLK_CXX11 + || CPP_OPTION (pfile, lang) == CLK_GNUCXX11) + _cpp_define_builtin (pfile, __cplusplus 201103L); + else + _cpp_define_builtin (pfile, __cplusplus 199711L); +} else if (CPP_OPTION (pfile, lang) == CLK_ASM) _cpp_define_builtin (pfile, __ASSEMBLER__ 1); else if (CPP_OPTION (pfile, lang) == CLK_STDC94)
Re: [RFC PATCH] update to libtool-2.4.2 and regenerate
Rainer Orth r...@cebitec.uni-bielefeld.de writes: I made a quick comparison of the libtool.m4 in libgo/config with the 2.4.2 version: the only relevant change seems to be an instance of AC_PROG_GO, which also lives in go.m4. Ian will know why that additional copy is necessary. The version of AC_PROG_GO in libgo/config/go.m4 is there so that I can rebuild libgo/configure with a version of autoconf that does not have Go support. The version of AC_PROG_GO in libgo/config/libtool.m4 is there because all the languages work that way in libtool.m4. Ian
Re: [PR50869] don't attempt to expand CFA within cselib
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/28/11 15:07, Alexandre Oliva wrote: An assertion check meant to verify that var loc expansions that didn't involve VALUEs (say constants, REGs, etc) didn't push values onto the dependency stack failed in an expansion of the argp reg, because equivalences for it are preserved at cselib table resets, and cselib later tries to expand it to equivalent expressions. It's not profitable to expand it within var-tracking, and that's the only user of the CFA-base special-casing in cselib, so I arranged for argp to be preserved in expansions, just like other stack base registers. While debugging it, I noticed it was theoretically possible for the expression depth to remain uninitialized, and added an initialization and an assertion check to make sure it only remains zero when no location is found. Regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok to install? OK. jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOruRVAAoJEBRtltQi2kC7JlIH/3zxv5lhZ2VaGGVFjntIZO2T AlANqcP3UZRsbBcIQ4J/3MA19ob4QTw5gQq7nFxX1OGUlRag9mFzE00L3Q2uLCSn z7OVZGNwL48eN5G36HH9UY5ktQmy14UPQfE1d4P+X3h/bhLAMHfaQuMIl2+/QK60 nhaGYQMx0qlv2Ndof+HNwo/6s/o4oX3bWS5EavPFyPCHuy7dGrlcY10C2gZnund8 JYA4byxtFKNybiji5WNFO2XxzjVCxGe0+XWAPqO2jNj3CBEfzMyUbZhVP3llOJBI 9Mcjn3k/kTp/3h9aGzoGPssYR9DpMxyU+IQlSPyhR9ZiNGCC7Udj/aALtI6/b/U= =689Y -END PGP SIGNATURE-
[Patch,AVR]: Fix PR50910: int/2 leads to libgcc call
This is a fix for optimization flaw when dividing int by 2. There is really no need for a library call. Costs of [U]DIV/[U]MOD are adjusted to take into account the costs of CONST_INT operands that must be loaded for division by means of libgcc call. There are some new combiner patterns suffixed .lt0 that so adjustment frequently seen when division-by-const in lowered to arithmetic in order to avoid more expensive libcall. Moreover, there are two patterns for adding sign-extended QI to HI. These patterns are shorter, faster and have lower register pressure than explicitly sign-extending the QI before adding it. Example code is: int add (int a, char b) { return a + b; } int sub (int a, char b) { return a - b; } add: add r24,r22 ; 13 *addhi3.sign_extend1[length = 4] adc r25,__zero_reg__ sbrc r22,7 dec r25 ret sub: sub r24,r22 ; 13 *subhi3.sign_extend2[length = 4] sbc r25,__zero_reg__ sbrc r22,7 inc r25 ret The reg_overlap_mentioned case is just for pathological code like, e.g. a + (char) a so that the expected size is 4 instructions. Since beginning of time, BRANCH_COST was set to 0 so that some optimization passes make code happily jumping around. The patch introduces a new command line option for that; mainly because I don't know the rationale behind setting BRANCH_COST to 0. Regression-tested. Ok for trunk? Johann * config/avr/avr.opt (-mbranch-cost=): New option. * config/avr/avr.h (BRANCH_COST): Define to avr_branch_cost. * config/avr/avr.c (avr_rtx_costs_1): Adjust [U]DIV/[U]MOD costs. * config/avr/avr.md (*addqi3.lt0, *addhi3.lt0, *addsi3.lt0): New insns. (*addhi3_zero_extend1): Remov % in constraint of operand 1. (*addhi3.sign_extend1, *subhi3.sign_extend2): New insns. Index: config/avr/avr.md === --- config/avr/avr.md (revision 180654) +++ config/avr/avr.md (working copy) @@ -776,27 +776,36 @@ (define_expand addhi3 (define_insn *addhi3_zero_extend - [(set (match_operand:HI 0 register_operand =r) - (plus:HI (zero_extend:HI - (match_operand:QI 1 register_operand r)) - (match_operand:HI 2 register_operand 0)))] + [(set (match_operand:HI 0 register_operand =r) +(plus:HI (zero_extend:HI (match_operand:QI 1 register_operand r)) + (match_operand:HI 2 register_operand 0)))] - add %A0,%1 - adc %B0,__zero_reg__ + add %A0,%1\;adc %B0,__zero_reg__ [(set_attr length 2) (set_attr cc set_n)]) (define_insn *addhi3_zero_extend1 - [(set (match_operand:HI 0 register_operand =r) - (plus:HI (match_operand:HI 1 register_operand %0) - (zero_extend:HI - (match_operand:QI 2 register_operand r] + [(set (match_operand:HI 0 register_operand =r) +(plus:HI (match_operand:HI 1 register_operand 0) + (zero_extend:HI (match_operand:QI 2 register_operand r] - add %A0,%2 - adc %B0,__zero_reg__ + add %A0,%2\;adc %B0,__zero_reg__ [(set_attr length 2) (set_attr cc set_n)]) +(define_insn *addhi3.sign_extend1 + [(set (match_operand:HI 0 register_operand =r) +(plus:HI (sign_extend:HI (match_operand:QI 1 register_operand r)) + (match_operand:HI 2 register_operand 0)))] + + { +return reg_overlap_mentioned_p (operands[0], operands[1]) + ? mov __tmp_reg__,%1\;add %A0,%1\;adc %B0,__zero_reg__\;sbrc __tmp_reg__,7\;dec %B0 + : add %A0,%1\;adc %B0,__zero_reg__\;sbrc %1,7\;dec %B0; + } + [(set_attr length 5) + (set_attr cc clobber)]) + (define_insn *addhi3_sp [(set (match_operand:HI 1 stack_register_operand =q) (plus:HI (match_operand:HI 2 stack_register_operand q) @@ -956,6 +965,19 @@ (define_insn *subhi3_zero_extend1 [(set_attr length 2) (set_attr cc set_czn)]) +(define_insn *subhi3.sign_extend2 + [(set (match_operand:HI 0 register_operand =r) +(minus:HI (match_operand:HI 1 register_operand 0) + (sign_extend:HI (match_operand:QI 2 register_operand r] + + { +return reg_overlap_mentioned_p (operands[0], operands[2]) + ? mov __tmp_reg__,%2\;sub %A0,%2\;sbc %B0,__zero_reg__\;sbrc __tmp_reg__,7\;inc %B0 + : sub %A0,%2\;sbc %B0,__zero_reg__\;sbrc %2,7\;inc %B0; + } + [(set_attr length 5) + (set_attr cc clobber)]) + (define_insn subsi3 [(set (match_operand:SI 0 register_operand =r) (minus:SI (match_operand:SI 1 register_operand 0) @@ -1054,6 +1076,41 @@ (define_insn *subqi3.ashiftrt7 [(set_attr length 2) (set_attr cc clobber)]) +(define_insn *addqi3.lt0 + [(set (match_operand:QI 0 register_operand =r) +(plus:QI (lt:QI (match_operand:QI 1 register_operand r) +
[trans-mem] Fix tm_pure not inlinable in tm_safe
This fixes the g++ pr45940-4 failure. I think it is due to the latest merge. Tested on i686. (I cannot test it yet on x86-64, I hope to get access to a 64 bit soon...) Patrick. 2011-10-31 Patrick Marlier patrick.marl...@gmail.com * ipa-inline.c: Adjust how cannot_inline is set. Index: ipa-inline.c === --- ipa-inline.c(revision 180705) +++ ipa-inline.c(working copy) @@ -285,14 +285,14 @@ inlinable = false; } /* TM pure functions should not get inlined if the outer function is - a TM safe function. */ + a TM safe function. ??? TM pure function could be inlined if waiver block + is implemented. */ else if (flag_tm is_tm_pure (callee-decl) is_tm_safe (e-caller-decl)) { e-inline_failed = CIF_UNSPECIFIED; - gimple_call_set_cannot_inline (e-call_stmt, true); - return false; + inlinable = false; } /* Don't inline if the callee can throw non-call exceptions but the caller cannot.
[Patch, libfortran] PR 50016 Slow IO on Windows due to _commit()
Hi, here's an updated version of my patch that gets rid of _commit along with a section in the manual describing data consistency and durability issues. See also the thread starting at http://gcc.gnu.org/ml/fortran/2011-10/msg00079.html and the latest mail in that thread with my current thinking which perhaps explains some of the motivations behind this patch: http://gcc.gnu.org/ml/fortran/2011-10/msg00141.html Regtested on x86_64-unknown-linux-gnu, Ok for trunk? frontend ChangeLog: 2011-10-31 Janne Blomqvist j...@gcc.gnu.org PR libfortran/50016 * gfortran.texi (Data consistency and durability): New section. testsuite ChangeLog: 2011-10-31 Janne Blomqvist j...@gcc.gnu.org PR libfortran/50016 * gfortran.dg/inquire_size.f90: Don't flush the unit. libgfortran ChangeLog: 2011-10-31 Janne Blomqvist j...@gcc.gnu.org PR libfortran/50016 * io/inquire.c (inquire_via_unit): Flush the unit and use ssize. * io/unix.c (buf_flush): Don't call _commit. -- Janne Blomqvist diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi index f847df3..b45b71a 100644 --- a/gcc/fortran/gfortran.texi +++ b/gcc/fortran/gfortran.texi @@ -1090,6 +1090,7 @@ might in some way or another become visible to the programmer. * KIND Type Parameters:: * Internal representation of LOGICAL variables:: * Thread-safety of the runtime library:: +* Data consistency and durability:: @end menu @@ -1194,6 +1195,81 @@ Finally, for platforms not supporting thread-safe POSIX functions, further functionality might not be thread-safe. For details, please consult the documentation for your operating system. + +@node Data consistency and durability +@section Data consistency and durability +@cindex consistency, durability + +This section contains a brief overview of data and metadata +consistency and durability issues when doing I/O. + +With respect to durability, GNU Fortran makes no effort to ensure that +data is committed to stable storage. If this is required, the GNU +Fortran programmer can use the intrinsic @code{FNUM} to retrieve the +low level file descriptor corresponding to an open Fortran unit. Then, +using e.g. the @code{ISO_C_BINDING} feature, one can call the +underlying system call to flush dirty data to stable storage, such as +@code{fsync} on POSIX, @code{_commit} on MingW, or @code{fcntl(fd, +F_FULLSYNC, 0)} on Mac OS X. The following example shows how to call +fsync: + +@smallexample + ! Declare the interface for POSIX fsync function + interface +function fsync (fd) bind(c,name=fsync) +use iso_c_binding, only: c_int + integer(c_int), value :: fd + integer(c_int) :: fsync +end function fsync + end interface + + ! Variable declaration + integer :: ret + + ! Opening unit 10 + open (10,file=foo) + + ! ... + ! Perform I/O on unit 10 + ! ... + + ! Flush and sync + flush(10) + ret = fsync(fnum(10)) + + ! Handle possible error + if (ret /= 0) stop Error calling FSYNC +@end smallexample + +With respect to consistency, for regular files GNU Fortran uses +buffered I/O in order to improve performance. This buffer is flushed +automatically when full and in some other situations, e.g. when +closing a unit. It can also be explicitly flushed with the +@code{FLUSH} statement. Also, the buffering can be turned off with the +@code{GFORTRAN_UNBUFFERED_ALL} and +@code{GFORTRAN_UNBUFFERED_PRECONNECTED} environment variables. Special +files, such as terminals and pipes, are always unbuffered. Sometimes, +however, further things may need to be done in order to allow other +processes to see data that GNU Fortran has written, as follows. + +The Windows platform supports a relaxed metadata consistency model, +where file metadata is written to the directory lazily. This means +that, for instance, the @code{dir} command can show a stale size for a +file. One can force a directory metadata update by closing the unit, +or by calling @code{_commit} on the file descriptor. Note, though, +that @code{_commit} will force all dirty data to stable storage, which +is often a very slow operation. + +The Network File System (NFS) implements a relaxed consistency model +called open-to-close consistency. Closing a file forces dirty data and +metadata to be flushed to the server, and opening a file forces the +client to contact the server in order to revalidate cached +data. @code{fsync} will also force a flush of dirty data and metadata +to the server. Similar to @code{open} and @code{close}, acquiring and +releasing @code{fcntl} file locks, if the server supports them, will +also force cache validation and flushing dirty data and metadata. + + @c - @c Extensions @c - diff --git a/gcc/testsuite/gfortran.dg/inquire_size.f90 b/gcc/testsuite/gfortran.dg/inquire_size.f90 index 568c3d6..13876cf 100644 ---
Re: [trans-mem] Fix tm_pure not inlinable in tm_safe
On 10/31/11 13:54, Patrick Marlier wrote: This fixes the g++ pr45940-4 failure. I think it is due to the latest merge. Tested on i686. (I cannot test it yet on x86-64, I hope to get access to a 64 bit soon...) Patrick. 2011-10-31 Patrick Marlier patrick.marl...@gmail.com * ipa-inline.c: Adjust how cannot_inline is set. Heh, funny... I have the exact same patch on this end. But it doesn't completely fix the pr45940-4, cause now I get a segfault here: if (is_gimple_call (stmt)) { struct cgraph_edge *edge = cgraph_edge (node, stmt); struct inline_edge_summary *es = inline_edge_summary (edge); /* Special case: results of BUILT_IN_CONSTANT_P will be always resolved as constant. We however don't want to optimize out the cgraph edges. */ The edge isn't set. I don't know if this is related or not. I'm investigating. BTW, are you sure it fixes the regression? I still get this other segfault on both x86-32 and x86-64.
Re: [trans-mem] Fix tm_pure not inlinable in tm_safe
On 10/31/2011 03:21 PM, Aldy Hernandez wrote: On 10/31/11 13:54, Patrick Marlier wrote: This fixes the g++ pr45940-4 failure. I think it is due to the latest merge. Tested on i686. (I cannot test it yet on x86-64, I hope to get access to a 64 bit soon...) Patrick. 2011-10-31 Patrick Marlier patrick.marl...@gmail.com * ipa-inline.c: Adjust how cannot_inline is set. Heh, funny... I have the exact same patch on this end. But it doesn't completely fix the pr45940-4, cause now I get a segfault here: if (is_gimple_call (stmt)) { struct cgraph_edge *edge = cgraph_edge (node, stmt); struct inline_edge_summary *es = inline_edge_summary (edge); /* Special case: results of BUILT_IN_CONSTANT_P will be always resolved as constant. We however don't want to optimize out the cgraph edges. */ The edge isn't set. I don't know if this is related or not. I'm investigating. BTW, are you sure it fixes the regression? I still get this other segfault on both x86-32 and x86-64. It does on my side: === g++ Summary === # of expected passes122 I have no other change over the source. Patrick. Copy/Paste if I run the command line directly in the terminal: marlier@d01:/localdisk/gcc/tm-build-dbg$ /localdisk/gcc/tm-build-dbg/gcc/testsuite/g++/../../g++ -B/localdisk/gcc/tm-build-dbg/gcc/testsuite/g++/../../ /localdisk/gcc/tm-src/gcc/testsuite/g++.dg/tm/pr45940-4.C -nostdinc++ -I/localdisk/gcc/tm-build-dbg/i686-pc-linux-gnu/libstdc++-v3/include/i686-pc-linux-gnu -I/localdisk/gcc/tm-build-dbg/i686-pc-linux-gnu/libstdc++-v3/include -I/localdisk/gcc/tm-src/libstdc++-v3/libsupc++ -I/localdisk/gcc/tm-src/libstdc++-v3/include/backward -I/localdisk/gcc/tm-src/libstdc++-v3/testsuite/util -fmessage-length=0 -fgnu-tm -O1 -S -o pr45940-4.s marlier@d01:/localdisk/gcc/tm-build-dbg$ echo $? 0 - no segfault
RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
I have on occasion been confused by initializer_list silently becoming empty in C++98 mode. OK for trunk? Jason commit 3d7ac3e4d8bb54921eb3e1f70b1a42a165ba4f5b Author: Jason Merrill ja...@redhat.com Date: Mon Oct 31 01:21:49 2011 -0400 * libsupc++/initializer_list: Copy C++0x #error from bits/c++0x_warning.h. diff --git a/libstdc++-v3/include/bits/algorithmfwd.h b/libstdc++-v3/include/bits/algorithmfwd.h index cc0b98e..fbec55d 100644 --- a/libstdc++-v3/include/bits/algorithmfwd.h +++ b/libstdc++-v3/include/bits/algorithmfwd.h @@ -35,7 +35,9 @@ #include bits/c++config.h #include bits/stl_pair.h #include bits/stl_iterator_base_types.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 5708194..0edb8b2 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -40,7 +40,9 @@ #include ext/atomicity.h #include debug/debug.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h index c80ee50..0fc8323 100644 --- a/libstdc++-v3/include/bits/forward_list.h +++ b/libstdc++-v3/include/bits/forward_list.h @@ -33,7 +33,9 @@ #pragma GCC system_header #include memory +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index bddecb0..8f28640 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -57,7 +57,9 @@ #ifndef _STL_BVECTOR_H #define _STL_BVECTOR_H 1 +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h index 17ea01a..b924917 100644 --- a/libstdc++-v3/include/bits/stl_deque.h +++ b/libstdc++-v3/include/bits/stl_deque.h @@ -60,7 +60,9 @@ #include bits/concept_check.h #include bits/stl_iterator_base_types.h #include bits/stl_iterator_base_funcs.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h index 56ee2fb..fc1d8f8 100644 --- a/libstdc++-v3/include/bits/stl_list.h +++ b/libstdc++-v3/include/bits/stl_list.h @@ -58,7 +58,9 @@ #define _STL_LIST_H 1 #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_map.h b/libstdc++-v3/include/bits/stl_map.h index 889e52b..45824f0 100644 --- a/libstdc++-v3/include/bits/stl_map.h +++ b/libstdc++-v3/include/bits/stl_map.h @@ -59,7 +59,9 @@ #include bits/functexcept.h #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_multimap.h b/libstdc++-v3/include/bits/stl_multimap.h index 6b74558..fd5a5a8 100644 --- a/libstdc++-v3/include/bits/stl_multimap.h +++ b/libstdc++-v3/include/bits/stl_multimap.h @@ -58,7 +58,9 @@ #define _STL_MULTIMAP_H 1 #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h index 8b25a97..ab467c8 100644 --- a/libstdc++-v3/include/bits/stl_multiset.h +++ b/libstdc++-v3/include/bits/stl_multiset.h @@ -58,7 +58,9 @@ #define _STL_MULTISET_H 1 #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h index b30966a..18fd117 100644 --- a/libstdc++-v3/include/bits/stl_set.h +++ b/libstdc++-v3/include/bits/stl_set.h @@ -58,7 +58,9 @@ #define _STL_SET_H 1 #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h index 869bcf7..9b7b698 100644 --- a/libstdc++-v3/include/bits/stl_vector.h +++ b/libstdc++-v3/include/bits/stl_vector.h @@ -60,7 +60,9 @@ #include bits/stl_iterator_base_funcs.h #include bits/functexcept.h #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/ext/vstring.h
[i386] Remove TARGET_VECTORIZE_BUILTIN_CONVERSION
I checked in the generic portion of Dmitry Plotnikov's patch to the vectorizer and optabs that enables this patch. The ARM portion of his patch is still outstanding, awaiting approval. This allows this target hook to be removed from other targets. Can I talk you into doing a similar patch for rs6000, Mike? After that I can take care of removing the target hook entirely. Tested on x86_64-linux. r~ i386: Remove TARGET_VECTORIZE_BUILTIN_CONVERSION. Renaming all of the insn patterns as needed to the standard optab forms. Sadly, only one of the builtins is unused by the various header files, so most of them must stay around. * config/i386/sse.md (floatv8siv8sf2): Rename from avx_cvtdq2ps256. (floatv4siv4sf2): Rename from sse2_cvtdq2ps. (floatunsv4siv4sf2): Rename from sse2_cvtudq2ps. (fix_truncv8sfv8si2): Rename from avx_cvttps2dq256. (fix_truncv4sfv4si2): Rename from sse2_cvttps2dq. (floatv4siv4df2): Rename from avx_cvtdq2pd256. (fix_truncv4dfv4si2): Rename from avx_cvttpd2dq256. (vec_unpacku_float_hi_v8si): Update for insn pattern name changes. * config/i386/i386.md (splitters for int-float conversion): Likewise. * config/i386/i386.c (ix86_split_convert_uns_si_sse): Likewise. (bdesc_args): Likewise. (enum ix86_builtins) [IX86_BUILTIN_CVTUDQ2PS]: Remove. (ix86_vectorize_builtin_conversion): Remove. (TARGET_VECTORIZE_BUILTIN_CONVERSION): Remove. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 148fcfb..4e34f25 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -16857,7 +16857,7 @@ ix86_split_convert_uns_si_sse (rtx operands[]) x = gen_rtx_REG (V4SImode, REGNO (value)); if (vecmode == V4SFmode) -emit_insn (gen_sse2_cvttps2dq (x, value)); +emit_insn (gen_fix_truncv4sfv4si2 (x, value)); else emit_insn (gen_sse2_cvttpd2dq (x, value)); value = x; @@ -25077,8 +25077,6 @@ enum ix86_builtins IX86_BUILTIN_CPYSGNPS256, IX86_BUILTIN_CPYSGNPD256, - IX86_BUILTIN_CVTUDQ2PS, - /* FMA4 instructions. */ IX86_BUILTIN_VFMADDSS, IX86_BUILTIN_VFMADDSD, @@ -25791,8 +25789,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmovmskb, __builtin_ia32_pmovmskb128, IX86_BUILTIN_PMOVMSKB128, UNKNOWN, (int) INT_FTYPE_V16QI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sqrtv2df2, __builtin_ia32_sqrtpd, IX86_BUILTIN_SQRTPD, UNKNOWN, (int) V2DF_FTYPE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtdq2pd, __builtin_ia32_cvtdq2pd, IX86_BUILTIN_CVTDQ2PD, UNKNOWN, (int) V2DF_FTYPE_V4SI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtdq2ps, __builtin_ia32_cvtdq2ps, IX86_BUILTIN_CVTDQ2PS, UNKNOWN, (int) V4SF_FTYPE_V4SI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtudq2ps, __builtin_ia32_cvtudq2ps, IX86_BUILTIN_CVTUDQ2PS, UNKNOWN, (int) V4SF_FTYPE_V4SI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_floatv4siv4sf2, __builtin_ia32_cvtdq2ps, IX86_BUILTIN_CVTDQ2PS, UNKNOWN, (int) V4SF_FTYPE_V4SI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtpd2dq, __builtin_ia32_cvtpd2dq, IX86_BUILTIN_CVTPD2DQ, UNKNOWN, (int) V4SI_FTYPE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtpd2pi, __builtin_ia32_cvtpd2pi, IX86_BUILTIN_CVTPD2PI, UNKNOWN, (int) V2SI_FTYPE_V2DF }, @@ -25809,7 +25806,7 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2dq, __builtin_ia32_cvtps2dq, IX86_BUILTIN_CVTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvtps2pd, __builtin_ia32_cvtps2pd, IX86_BUILTIN_CVTPS2PD, UNKNOWN, (int) V2DF_FTYPE_V4SF }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_cvttps2dq, __builtin_ia32_cvttps2dq, IX86_BUILTIN_CVTTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_fix_truncv4sfv4si2, __builtin_ia32_cvttps2dq, IX86_BUILTIN_CVTTPS2DQ, UNKNOWN, (int) V4SI_FTYPE_V4SF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_addv2df3, __builtin_ia32_addpd, IX86_BUILTIN_ADDPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_subv2df3, __builtin_ia32_subpd, IX86_BUILTIN_SUBPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF }, @@ -26147,14 +26144,14 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vextractf128v4df, __builtin_ia32_vextractf128_pd256, IX86_BUILTIN_EXTRACTF128PD256, UNKNOWN, (int) V2DF_FTYPE_V4DF_INT }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vextractf128v8sf, __builtin_ia32_vextractf128_ps256, IX86_BUILTIN_EXTRACTF128PS256, UNKNOWN, (int) V4SF_FTYPE_V8SF_INT }, { OPTION_MASK_ISA_AVX, CODE_FOR_avx_vextractf128v8si, __builtin_ia32_vextractf128_si256, IX86_BUILTIN_EXTRACTF128SI256, UNKNOWN, (int) V4SI_FTYPE_V8SI_INT }, - { OPTION_MASK_ISA_AVX, CODE_FOR_avx_cvtdq2pd256, __builtin_ia32_cvtdq2pd256, IX86_BUILTIN_CVTDQ2PD256, UNKNOWN, (int)
Re: [trans-mem] Fix tm_pure not inlinable in tm_safe
It does on my side: === g++ Summary === # of expected passes 122 I have no other change over the source. Woah I hereby profess my love for Richard and Patrick. That didn't sound, right, but whatever... During the weekend they apparently fixed the rest of the bug I've been working on all morning. Yay!!! With this patch we have ironed out all the C++ regressions. Thanks and sorry for the duplicate work. Committing to branch. * ipa-inline.c (can_inline_edge_p): Do not inline TM safe calling TM pure functions. Index: ipa-inline.c === --- ipa-inline.c(revision 180710) +++ ipa-inline.c(working copy) @@ -291,8 +291,7 @@ can_inline_edge_p (struct cgraph_edge *e is_tm_safe (e-caller-decl)) { e-inline_failed = CIF_UNSPECIFIED; - gimple_call_set_cannot_inline (e-call_stmt, true); - return false; + inlinable = false; } /* Don't inline if the callee can throw non-call exceptions but the caller cannot.
Re: [C++-11] User defined literals
On 10/31/2011 12:44 PM, 3dw...@verizon.net wrote: For string and character literals, we can still just build up a call; we only need to walk the overload list here for numeric literals. I found that if you don't walk the overload list for chars, a char could be routed to the operator taking wchar_t for example. Ah, yes, I was overlooking the bit in the standard that says S shall contain a literal operator (13.5.8) whose only parameter has the type ch. The paragraph for string literals doesn't have a similar restriction. Jason
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
On 10/31/2011 08:37 PM, Jason Merrill wrote: I have on occasion been confused by initializer_list silently becoming empty in C++98 mode. OK for trunk? For c++98, I think we should use the usual: #ifndef __GXX_EXPERIMENTAL_CXX0X__ # include bits/c++0x_warning.h #else which we have in place elsewhere. Or we have special reasons for not doing that? Paolo.
Re: [C++ preview patch] PR 44277
... so today I noticed the c_inhibit_evaluation_warnings use in cp_convert_and_check and occurred to me that we could use the existing mechanism for this warning too? The below still passes checking and my small set of tests... What do you think? Thanks, Paolo. // Index: c-family/c.opt === --- c-family/c.opt (revision 180705) +++ c-family/c.opt (working copy) @@ -685,6 +685,9 @@ Wpointer-sign C ObjC Var(warn_pointer_sign) Init(-1) Warning Warn when a pointer differs in signedness in an assignment +Wzero-as-null-pointer-constant +C++ ObjC++ Var(warn_zero_as_null_pointer_constant) Warning + ansi C ObjC C++ ObjC++ A synonym for -std=c89 (for C) or -std=c++98 (for C++) Index: cp/typeck.c === --- cp/typeck.c (revision 180705) +++ cp/typeck.c (working copy) @@ -4057,8 +4057,13 @@ cp_build_binary_op (location_t location, } else { + bool inhibit = NULLPTR_TYPE_P (TREE_TYPE (op1)); op0 = build_ptrmemfunc_access_expr (op0, pfn_identifier); - op1 = cp_convert (TREE_TYPE (op0), integer_zero_node); + if (inhibit) + ++c_inhibit_evaluation_warnings; + op1 = cp_convert (TREE_TYPE (op0), integer_zero_node); + if (inhibit) + --c_inhibit_evaluation_warnings; } result_type = TREE_TYPE (op0); } @@ -4666,11 +4671,25 @@ tree cp_truthvalue_conversion (tree expr) { tree type = TREE_TYPE (expr); + tree ret; + if (TYPE_PTRMEM_P (type)) -return build_binary_op (EXPR_LOCATION (expr), - NE_EXPR, expr, integer_zero_node, 1); +{ + ++c_inhibit_evaluation_warnings; + ret = build_binary_op (EXPR_LOCATION (expr), +NE_EXPR, expr, integer_zero_node, 1); + --c_inhibit_evaluation_warnings; +} + else if (TYPE_PTR_P (type) || TYPE_PTRMEMFUNC_P (type)) +{ + ++c_inhibit_evaluation_warnings; + ret = c_common_truthvalue_conversion (input_location, expr); + --c_inhibit_evaluation_warnings; +} else -return c_common_truthvalue_conversion (input_location, expr); +ret = c_common_truthvalue_conversion (input_location, expr); + + return ret; } /* Just like cp_truthvalue_conversion, but we want a CLEANUP_POINT_EXPR. */ Index: cp/init.c === --- cp/init.c (revision 180705) +++ cp/init.c (working copy) @@ -176,6 +176,12 @@ build_zero_init_1 (tree type, tree nelts, bool sta items with static storage duration that are not otherwise initialized are initialized to zero. */ ; + else if (TYPE_PTR_P (type) || TYPE_PTR_TO_MEMBER_P (type)) +{ + ++c_inhibit_evaluation_warnings; + init = convert (type, integer_zero_node); + --c_inhibit_evaluation_warnings; +} else if (SCALAR_TYPE_P (type)) init = convert (type, integer_zero_node); else if (CLASS_TYPE_P (type)) Index: cp/cvt.c === --- cp/cvt.c(revision 180705) +++ cp/cvt.c(working copy) @@ -198,6 +198,11 @@ cp_convert_to_pointer (tree type, tree expr) if (null_ptr_cst_p (expr)) { + if (c_inhibit_evaluation_warnings == 0 + !NULLPTR_TYPE_P (TREE_TYPE (expr))) + warning (OPT_Wzero_as_null_pointer_constant, +zero as null pointer constant); + if (TYPE_PTRMEMFUNC_P (type)) return build_ptrmemfunc (TYPE_PTRMEMFUNC_FN_TYPE (type), expr, 0, /*c_cast_p=*/false, tf_warning_or_error);
[PATCH] Add floatunsv8siv8sf2 support
Hi! On Mon, Oct 31, 2011 at 12:43:14PM -0700, Richard Henderson wrote: Renaming all of the insn patterns as needed to the standard optab forms. Sadly, only one of the builtins is unused by the various header files, so most of them must stay around. Thanks. Here is a patch that adds floatunsv8siv8sf2 and macroizes floatv[48]siv[48]sf2. Ok if bootstrap/regtest passes? 2011-10-31 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (sseintvecmode): Remove duplicate modes. (sseintvecmodelower): New mode iterator. (floatv8siv8sf2, floatunsv4siv4sf2): Macroize into... (floatsseintvecmodelowermode2): ... this using VF1 iterator. (floatunsv4siv4sf2): Macroize into... (floatunssseintvecmodelowermode2): ... this using VF1 iterator. --- gcc/config/i386/sse.md.jj 2011-10-31 20:44:13.0 +0100 +++ gcc/config/i386/sse.md 2011-10-31 21:05:21.0 +0100 @@ -233,12 +233,19 @@ (define_mode_attr sseinsnmode (define_mode_attr sseintvecmode [(V8SF V8SI) (V4DF V4DI) (V4SF V4SI) (V2DF V2DI) - (V4DF V4DI) (V8SF V8SI) (V8SI V8SI) (V4DI V4DI) (V4SI V4SI) (V2DI V2DI) (V16HI V16HI) (V8HI V8HI) (V32QI V32QI) (V16QI V16QI)]) +(define_mode_attr sseintvecmodelower + [(V8SF v8si) (V4DF v4di) + (V4SF v4si) (V2DF v2di) + (V8SI v8si) (V4DI v4di) + (V4SI v4si) (V2DI v2di) + (V16HI v16hi) (V8HI v8hi) + (V32QI v32qi) (V16QI v16qi)]) + ;; Mapping of vector modes to a vector mode of double size (define_mode_attr ssedoublevecmode [(V32QI V64QI) (V16HI V32HI) (V8SI V16SI) (V4DI V8DI) @@ -2224,33 +2231,26 @@ (define_insn sse_cvttss2siq (set_attr prefix maybe_vex) (set_attr mode DI)]) -(define_insn floatv8siv8sf2 - [(set (match_operand:V8SF 0 register_operand =x) - (float:V8SF (match_operand:V8SI 1 nonimmediate_operand xm)))] - TARGET_AVX - vcvtdq2ps\t{%1, %0|%0, %1} - [(set_attr type ssecvt) - (set_attr prefix vex) - (set_attr mode V8SF)]) - -(define_insn floatv4siv4sf2 - [(set (match_operand:V4SF 0 register_operand =x) - (float:V4SF (match_operand:V4SI 1 nonimmediate_operand xm)))] +(define_insn floatsseintvecmodelowermode2 + [(set (match_operand:VF1 0 register_operand =x) + (float:VF1 + (match_operand:sseintvecmode 1 nonimmediate_operand xm)))] TARGET_SSE2 %vcvtdq2ps\t{%1, %0|%0, %1} [(set_attr type ssecvt) (set_attr prefix maybe_vex) - (set_attr mode V4SF)]) + (set_attr mode sseinsnmode)]) -(define_expand floatunsv4siv4sf2 +(define_expand floatunssseintvecmodelowermode2 [(set (match_dup 5) - (float:V4SF (match_operand:V4SI 1 nonimmediate_operand ))) + (float:VF1 + (match_operand:sseintvecmode 1 nonimmediate_operand ))) (set (match_dup 6) - (lt:V4SF (match_dup 5) (match_dup 3))) + (lt:VF1 (match_dup 5) (match_dup 3))) (set (match_dup 7) - (and:V4SF (match_dup 6) (match_dup 4))) - (set (match_operand:V4SF 0 register_operand ) - (plus:V4SF (match_dup 5) (match_dup 7)))] + (and:VF1 (match_dup 6) (match_dup 4))) + (set (match_operand:VF1 0 register_operand ) + (plus:VF1 (match_dup 5) (match_dup 7)))] TARGET_SSE2 { REAL_VALUE_TYPE TWO32r; @@ -2260,12 +2260,12 @@ (define_expand floatunsv4siv4sf2 real_ldexp (TWO32r, dconst1, 32); x = const_double_from_real_value (TWO32r, SFmode); - operands[3] = force_reg (V4SFmode, CONST0_RTX (V4SFmode)); - operands[4] = force_reg (V4SFmode, - ix86_build_const_vector (V4SFmode, 1, x)); + operands[3] = force_reg (MODEmode, CONST0_RTX (MODEmode)); + operands[4] = force_reg (MODEmode, + ix86_build_const_vector (MODEmode, 1, x)); for (i = 5; i 8; i++) -operands[i] = gen_reg_rtx (V4SFmode); +operands[i] = gen_reg_rtx (MODEmode); }) (define_insn avx_cvtps2dq256 Jakub
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
On 10/31/2011 04:07 PM, Paolo Carlini wrote: On 10/31/2011 08:37 PM, Jason Merrill wrote: I have on occasion been confused by initializer_list silently becoming empty in C++98 mode. OK for trunk? For c++98, I think we should use the usual: #ifndef __GXX_EXPERIMENTAL_CXX0X__ # include bits/c++0x_warning.h #else which we have in place elsewhere. Or we have special reasons for not doing that? I'd rather not make a libsupc++ header dependent on a header from the main library. I guess we could move c++0x_warning.h into libsupc++, though... Jason
Re: [PR50878, PATCH] Fix for verify_dominators in -ftree-tail-merge
On 10/30/2011 10:54 AM, Richard Guenther wrote: On Sun, Oct 30, 2011 at 9:27 AM, Tom de Vries tom_devr...@mentor.com wrote: On 10/30/2011 09:20 AM, Tom de Vries wrote: Richard, I have a fix for PR50878. Sorry, with patch this time. Ok for now, but see Davids mail and the complexity issue with iteratively updating dominators. I'm not sure which mail you mean. It seems to me that we know exactly what to update and how, and we should do that (well, if we need up-to-date dominators, re-computing them once in the pass would be ok). Indeed, in this example we know exactly what to update and how. However, PR50908 popped up, and there that's not the case anymore. Consider the following cfg, where A is the direct dominator of I: A / \ B \ / \ \ C D /| |\ E F |\ /| | x | |/ \| G H \ / I Say E and F are duplicates, and F is removed. The cfg then looks like this: A / \ B \ / \ \ C D / \ / \ E / \ G H \ / I E is now the new direct dominator of I. The patch for PR50878 did not address this example, since it uses the set of bbs directly dominated by the (single) predecessor of bb1 and bb2. The new patch calculates the updated dominator info by taking the nearest common dominator (A) of bb1 (F) and bb2 (E), and getting the set of bbs immediately dominated by it. Part of this set is now directly dominated by bb2. Ideally we would have a means to determine which bbs in the set are now directly dominated by bb2, and call set_immediate_dominator for those bbs, but we don't, so instead we let iterate_fix_dominators figure it out. Additionally, the patch makes sure it updates dominator info before updating the vuses, this fixes a latent bug. The patch fixes both PR50908 and PR50878. Bootstrapped and reg-tested on x86_64 and i686, and build and reg-tested on ARM and MIPS. Ok for trunk? Thanks, - Tom Richard. Thanks, - Tom A simplified form of the problem from the test-case of the PR is shown in this cfg. Block 12 has as direct dominator block 5. 5 / \ / \ * * 6 7 | | | | * * 8 9 \ / \ / * 12 tail_merge_optimize finds that blocks 6 and 7 are duplicates. After replacing block 7 by block 6, the cfg looks like this: 5 | | * 6 / \ / \ * * 8 9 \ / \ / * 12 The new direct dominator of block 12 is block 6, but the current algorithm only recalculates dominator info for blocks 6, 8 and 9. The patch fixes this by additionally recalculating the dominator info for blocks immediately dominated by bb2 (block 6 in the example), if bb2 has a single predecessor after replacement. Bootstapped and reg-tested on x86_64 and i686. Build and reg-tested on MIPS and ARM. Ok for trunk? Thanks, - Tom 2011-10-30 Tom de Vries t...@codesourcery.com PR tree-optimization/50878 * tree-ssa-tail-merge.c (replace_block_by): Recalculate dominator info for blocks immediately dominated by bb2, if bb2 has a single predecessor after replacement. 2011-10-31 Tom de Vries t...@codesourcery.com PR tree-optimization/50908 * tree-ssa-tail-merge.c (update_vuses): Now that edges are removed before update_vuses, test for 1 predecessor rather than two. (delete_block_update_dominator_info): New function, part of it factored out of ... (replace_block_by): Use delete_block_update_dominator_info. Call update_vuses after deleting bb1 and updating dominator info, instead of before. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 180521) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -1458,7 +1458,7 @@ update_vuses (bool vuse1_phi_args, tree if (!dominated_by_p (CDI_DOMINATORS, pred, bb2)) continue; - if (pred == bb2 EDGE_COUNT (gimple_bb (stmt)-preds) == 2) + if (pred == bb2 EDGE_COUNT (gimple_bb (stmt)-preds) == 1) { gimple_stmt_iterator gsi = gsi_for_stmt (stmt); unlink_virtual_phi (stmt, lhs); @@ -1526,6 +1526,88 @@ vop_at_entry (basic_block bb) : NULL_TREE); } +/* Given that all incoming edges of BB1 have been redirected to BB2, delete BB1 + and recompute dominator info. */ + +static void +delete_block_update_dominator_info (basic_block bb1, basic_block bb2) +{ + VEC (basic_block,heap) *fix_dom_bb; + unsigned int i; + basic_block bb, dom;
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
On 10/31/2011 09:17 PM, Jason Merrill wrote: On 10/31/2011 04:07 PM, Paolo Carlini wrote: On 10/31/2011 08:37 PM, Jason Merrill wrote: I have on occasion been confused by initializer_list silently becoming empty in C++98 mode. OK for trunk? For c++98, I think we should use the usual: #ifndef __GXX_EXPERIMENTAL_CXX0X__ # include bits/c++0x_warning.h #else which we have in place elsewhere. Or we have special reasons for not doing that? I'd rather not make a libsupc++ header dependent on a header from the main library. I guess we could move c++0x_warning.h into libsupc++, though... Sure. Note anyway, that bits/c++config.h is already included, elsewhere too in libsupc++. Paolo
Re: [PATCH] Add floatunsv8siv8sf2 support
On 10/31/2011 01:15 PM, Jakub Jelinek wrote: Hi! On Mon, Oct 31, 2011 at 12:43:14PM -0700, Richard Henderson wrote: Renaming all of the insn patterns as needed to the standard optab forms. Sadly, only one of the builtins is unused by the various header files, so most of them must stay around. Thanks. Here is a patch that adds floatunsv8siv8sf2 and macroizes floatv[48]siv[48]sf2. Ok if bootstrap/regtest passes? 2011-10-31 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (sseintvecmode): Remove duplicate modes. (sseintvecmodelower): New mode iterator. (floatv8siv8sf2, floatunsv4siv4sf2): Macroize into... (floatsseintvecmodelowermode2): ... this using VF1 iterator. (floatunsv4siv4sf2): Macroize into... (floatunssseintvecmodelowermode2): ... this using VF1 iterator. Ok. r~
Re: [C++ preview patch] PR 44277
On 10/31/2011 04:09 PM, Paolo Carlini wrote: ... so today I noticed the c_inhibit_evaluation_warnings use in cp_convert_and_check and occurred to me that we could use the existing mechanism for this warning too? The below still passes checking and my small set of tests... I notice that this patch only changes the C++ front end, and it seems like you already have special cases for pointers/pointers to members, so you might as well go ahead and use nullptr_node. Jason
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
On 10/31/2011 04:24 PM, Paolo Carlini wrote: Sure. Note anyway, that bits/c++config.h is already included, elsewhere too in libsupc++. I guess the other option would be to add it to install-freestanding-headers. Jason
RE: AVX generic mode tuning discussion.
We would like to propose changing AVX generic mode tuning to generate 128-bit AVX instead of 256-bit AVX. You indicate a 3% reduction on bulldozer with avx256. How does avx128 compare to -mno-avx -msse4.2? We see these % differences going from SSE42 to AVX128 to AVX256 on Bulldozer with -mtune=generic -Ofast. (Positive is improvement, negative is degradation) Bulldozer: AVX128/SSE42AVX256/AVX-128 410.bwaves-1.4% -1.4% 416.gamess-1.1% 0.0% 433.milc 0.5%-2.4% 434.zeusmp9.7%-2.1% 435.gromacs 5.1%0.5% 436.cactusADM 8.2%-23.8% 437.leslie3d 8.1%0.4% 444.namd 3.6%0.0% 447.dealII-1.4% -0.4% 450.soplex-0.4% -0.4% 453.povray0.0%-1.5% 454.calculix 15.7% -8.3% 459.GemsFDTD 4.9%1.4% 465.tonto 1.3%-0.6% 470.lbm 0.9%0.3% 481.wrf 7.3%-3.6% 482.sphinx3 5.0%-9.8% SPECFP3.8%-3.2% Will the next AMD generation have a useable avx256? I'm not keen on the idea of generic mode being tune for a single processor revision that maybe shouldn't actually be using avx at all. We see a substantial gain in several SPECFP benchmarks going from SSE42 to AVX128 on Bulldozer. IMHO, accomplishing even a 5% gain in an individual benchmark takes a hardware company several man months. The loss with AVX256 for Bulldozer is much more significant than the gain for SandyBridge. While the general trend in the industry is a move toward AVX256, for now we would be disadvantaging Bulldozer with this choice. We have several customers who use -mtune=generic and it is default, unless a user explicitly overrides it with -mtune=native. They are the ones who want to experiment with latest ISA using gcc, but want to keep their ISA selection and tuning agnostic on x86/64. IMHO, it is with these customers in mind that generic was introduced in the first place. Since stage 1 closure is around the corner, just wanted to ping to see if the maintainers have made up their mind on this one. AVX-128 is an improvement over SSE42 for Bulldozer and AVX-256 wipes out pretty much all of that gain in generic mode. Until there is a convergence on AVX-256 for x86/64, we would like to propose having generic generate avx-128 by default and have a user override to avx-256 manually when known to benefit performance. Thanks, Harsha
Re: [C++ preview patch] PR 44277
On 10/31/2011 09:29 PM, Jason Merrill wrote: On 10/31/2011 04:09 PM, Paolo Carlini wrote: ... so today I noticed the c_inhibit_evaluation_warnings use in cp_convert_and_check and occurred to me that we could use the existing mechanism for this warning too? The below still passes checking and my small set of tests... I notice that this patch only changes the C++ front end, and it seems like you already have special cases for pointers/pointers to members, so you might as well go ahead and use nullptr_node. Right. Thus essentially a mix of the two recent tries, like the below, right? Patch becomes even simpler and more importantly we rely on c_inhibit_* only for c code proper. If you think I'm on the right track, I will add the testcases, documentation, etc. Is the name of the warning ok? It's a bit long... Thanks, Paolo. Index: c-family/c.opt === --- c-family/c.opt (revision 180705) +++ c-family/c.opt (working copy) @@ -685,6 +685,9 @@ Wpointer-sign C ObjC Var(warn_pointer_sign) Init(-1) Warning Warn when a pointer differs in signedness in an assignment +Wzero-as-null-pointer-constant +C++ ObjC++ Var(warn_zero_as_null_pointer_constant) Warning + ansi C ObjC C++ ObjC++ A synonym for -std=c89 (for C) or -std=c++98 (for C++) Index: cp/typeck.c === --- cp/typeck.c (revision 180705) +++ cp/typeck.c (working copy) @@ -4058,7 +4058,9 @@ cp_build_binary_op (location_t location, else { op0 = build_ptrmemfunc_access_expr (op0, pfn_identifier); - op1 = cp_convert (TREE_TYPE (op0), integer_zero_node); + op1 = cp_convert (TREE_TYPE (op0), + NULLPTR_TYPE_P (TREE_TYPE (op1)) + ? nullptr_node : integer_zero_node); } result_type = TREE_TYPE (op0); } @@ -4666,11 +4668,21 @@ tree cp_truthvalue_conversion (tree expr) { tree type = TREE_TYPE (expr); + tree ret; + if (TYPE_PTRMEM_P (type)) -return build_binary_op (EXPR_LOCATION (expr), - NE_EXPR, expr, integer_zero_node, 1); +ret = build_binary_op (EXPR_LOCATION (expr), + NE_EXPR, expr, nullptr_node, 1); + else if (TYPE_PTR_P (type) || TYPE_PTRMEMFUNC_P (type)) +{ + ++c_inhibit_evaluation_warnings; + ret = c_common_truthvalue_conversion (input_location, expr); + --c_inhibit_evaluation_warnings; +} else -return c_common_truthvalue_conversion (input_location, expr); +ret = c_common_truthvalue_conversion (input_location, expr); + + return ret; } /* Just like cp_truthvalue_conversion, but we want a CLEANUP_POINT_EXPR. */ Index: cp/init.c === --- cp/init.c (revision 180705) +++ cp/init.c (working copy) @@ -176,6 +176,8 @@ build_zero_init_1 (tree type, tree nelts, bool sta items with static storage duration that are not otherwise initialized are initialized to zero. */ ; + else if (TYPE_PTR_P (type) || TYPE_PTR_TO_MEMBER_P (type)) +init = convert (type, nullptr_node); else if (SCALAR_TYPE_P (type)) init = convert (type, integer_zero_node); else if (CLASS_TYPE_P (type)) Index: cp/cvt.c === --- cp/cvt.c(revision 180705) +++ cp/cvt.c(working copy) @@ -198,6 +198,11 @@ cp_convert_to_pointer (tree type, tree expr) if (null_ptr_cst_p (expr)) { + if (c_inhibit_evaluation_warnings == 0 + !NULLPTR_TYPE_P (TREE_TYPE (expr))) + warning (OPT_Wzero_as_null_pointer_constant, +zero as null pointer constant); + if (TYPE_PTRMEMFUNC_P (type)) return build_ptrmemfunc (TYPE_PTRMEMFUNC_FN_TYPE (type), expr, 0, /*c_cast_p=*/false, tf_warning_or_error);
Re: v2[PATCH] update to libtool-2.4.2 and regenerate
This is an updated version of the libtool update patch. It fixes the --with-sysroot clash by reverting commit 3334f7ed5851ef1 in libtools. I've also included Rainer's 64bit Solaris patch. http://trippelsdorf.de/update-to-libtool-2.4.2-and-regenerate.patch.bz2 --- boehm-gc/Makefile.in |3 + boehm-gc/configure | 1857 ++-- boehm-gc/include/Makefile.in |3 + boehm-gc/include/gc_config.h.in|6 - boehm-gc/testsuite/Makefile.in |3 + fixincludes/configure | 95 +- gcc/configure | 1383 +-- intl/config.h.in | 232 +- intl/configure | 4764 +++- libffi/Makefile.in |3 + libffi/configure | 1430 +-- libffi/include/Makefile.in |3 + libffi/man/Makefile.in |3 + libffi/testsuite/Makefile.in |3 + libgfortran/Makefile.in|3 + libgfortran/configure | 1884 ++--- libgomp/Makefile.in|3 + libgomp/configure | 1886 ++--- libgomp/testsuite/Makefile.in |3 + libjava/Makefile.in|2 + libjava/classpath/Makefile.in |3 + libjava/classpath/configure| 1869 ++--- libjava/classpath/doc/Makefile.in |3 + libjava/classpath/doc/api/Makefile.in |3 + libjava/classpath/examples/Makefile.in |3 + libjava/classpath/external/Makefile.in |3 + libjava/classpath/external/jsr166/Makefile.in |3 + .../classpath/external/relaxngDatatype/Makefile.in |3 + libjava/classpath/external/sax/Makefile.in |3 + libjava/classpath/external/w3c_dom/Makefile.in |3 + libjava/classpath/include/Makefile.in |3 + libjava/classpath/lib/Makefile.in |3 + libjava/classpath/native/Makefile.in |3 + libjava/classpath/native/fdlibm/Makefile.in|3 + libjava/classpath/native/jawt/Makefile.in |3 + libjava/classpath/native/jni/Makefile.in |3 + libjava/classpath/native/jni/classpath/Makefile.in |3 + .../classpath/native/jni/gconf-peer/Makefile.in|3 + .../native/jni/gstreamer-peer/Makefile.in |3 + libjava/classpath/native/jni/gtk-peer/Makefile.in |3 + libjava/classpath/native/jni/java-io/Makefile.in |3 + libjava/classpath/native/jni/java-lang/Makefile.in |3 + libjava/classpath/native/jni/java-math/Makefile.in |3 + libjava/classpath/native/jni/java-net/Makefile.in |3 + libjava/classpath/native/jni/java-nio/Makefile.in |3 + libjava/classpath/native/jni/java-util/Makefile.in |3 + libjava/classpath/native/jni/midi-alsa/Makefile.in |3 + libjava/classpath/native/jni/midi-dssi/Makefile.in |3 + .../classpath/native/jni/native-lib/Makefile.in|3 + libjava/classpath/native/jni/qt-peer/Makefile.in |3 + libjava/classpath/native/jni/xmlj/Makefile.in |3 + libjava/classpath/native/plugin/Makefile.in|3 + libjava/classpath/resource/Makefile.in |3 + libjava/classpath/scripts/Makefile.in |3 + libjava/classpath/tools/Makefile.in|3 + libjava/configure | 2164 +++--- libjava/gcj/Makefile.in|4 +- libjava/include/Makefile.in|4 +- libjava/testsuite/Makefile.in |4 +- libmudflap/Makefile.in |3 + libmudflap/configure | 1430 +-- libmudflap/testsuite/Makefile.in |3 + libobjc/configure | 1457 +-- libquadmath/Makefile.in|3 + libquadmath/configure | 1430 +-- libssp/Makefile.in |3 + libssp/configure | 1430 +-- libstdc++-v3/Makefile.in |3 + libstdc++-v3/configure | 1873 ++--- libstdc++-v3/doc/Makefile.in |3 + libstdc++-v3/include/Makefile.in |3 + libstdc++-v3/libsupc++/Makefile.in |3 + libstdc++-v3/po/Makefile.in|3 + libstdc++-v3/python/Makefile.in|3 + libstdc++-v3/src/Makefile.in |3 + libstdc++-v3/testsuite/Makefile.in
implementation of std::thread::hardware_concurrency()
Hi all. This is patch is implement the std::thread::hardware_concurrency(). Tested on pthreads-win32/winpthreads on windows OS, and on Linux/FreeBSD. diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc index 09e7fc5..3eacb06 100644 --- a/libstdc++-v3/src/thread.cc +++ b/libstdc++-v3/src/thread.cc @@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION unsigned int thread::hardware_concurrency() noexcept { -int __n = _GLIBCXX_NPROCS; -if (__n 0) - __n = 0; -return __n; +int count=0; +#if defined(PTW32_VERSION) || \ + (defined(__MINGW64_VERSION_MAJOR) defined(_POSIX_THREADS)) || \ + defined(__hpux) +count=pthread_num_processors_np(); +#elif defined(__APPLE__) || defined(__FreeBSD__) +size_t size=sizeof(count); +sysctlbyname(hw.ncpu, count, size, NULL, 0); +#elif defined(_SC_NPROCESSORS_ONLN) +count=sysconf(_SC_NPROCESSORS_ONLN); +#elif definen(_GLIBCXX_USE_GET_NPROCS) +count=_GLIBCXX_NPROCS; +#endif +return (count0)?count:0; } _GLIBCXX_END_NAMESPACE_VERSION
Re: [PATCH] Use gcc's libtool in libgo
Here is a patch that updates libgo to use gcc's internal libtool version. I've only retained config/go.m4 for now. http://trippelsdorf.de/Use-gcc-s-libtool-in-libgo.patch.bz2 --- libgo/Makefile.in |9 +- libgo/aclocal.m4| 10 +- libgo/config/libtool.m4 | 7516 - libgo/config/ltmain.sh | 8636 --- libgo/config/ltoptions.m4 | 369 -- libgo/config/ltsugar.m4 | 123 - libgo/config/ltversion.m4 | 23 - libgo/config/lt~obsolete.m4 | 98 - libgo/configure | 1747 +++--- libgo/testsuite/Makefile.in |9 +- 10 files changed, 1276 insertions(+), 17264 deletions(-) delete mode 100644 libgo/config/libtool.m4 delete mode 100644 libgo/config/ltmain.sh delete mode 100644 libgo/config/ltoptions.m4 delete mode 100644 libgo/config/ltsugar.m4 delete mode 100644 libgo/config/ltversion.m4 delete mode 100644 libgo/config/lt~obsolete.m4 diff --git a/libgo/Makefile.in b/libgo/Makefile.in index 05223a6..bad8ed3 100644 --- a/libgo/Makefile.in +++ b/libgo/Makefile.in @@ -58,11 +58,7 @@ am__aclocal_m4_deps = $(top_srcdir)/../config/depstand.m4 \ $(top_srcdir)/../config/multi.m4 \ $(top_srcdir)/../config/override.m4 \ $(top_srcdir)/../config/unwind_ipinfo.m4 \ - $(top_srcdir)/config/go.m4 $(top_srcdir)/config/libtool.m4 \ - $(top_srcdir)/config/ltoptions.m4 \ - $(top_srcdir)/config/ltsugar.m4 \ - $(top_srcdir)/config/ltversion.m4 \ - $(top_srcdir)/config/lt~obsolete.m4 $(top_srcdir)/configure.ac + $(top_srcdir)/config/go.m4 $(top_srcdir)/configure.ac am__configure_deps = $(am__aclocal_m4_deps) $(CONFIGURE_DEPENDENCIES) \ $(ACLOCAL_M4) am__CONFIG_DISTCLEAN_FILES = config.status config.cache config.log \ @@ -365,6 +361,7 @@ CPPFLAGS = @CPPFLAGS@ CYGPATH_W = @CYGPATH_W@ DEFS = @DEFS@ DEPDIR = @DEPDIR@ +DLLTOOL = @DLLTOOL@ DSYMUTIL = @DSYMUTIL@ DUMPBIN = @DUMPBIN@ ECHO_C = @ECHO_C@ @@ -399,6 +396,7 @@ LN_S = @LN_S@ LTLIBOBJS = @LTLIBOBJS@ MAINT = @MAINT@ MAKEINFO = @MAKEINFO@ +MANIFEST_TOOL = @MANIFEST_TOOL@ MATH_LIBS = @MATH_LIBS@ MKDIR_P = @MKDIR_P@ NET_LIBS = @NET_LIBS@ @@ -434,6 +432,7 @@ abs_builddir = @abs_builddir@ abs_srcdir = @abs_srcdir@ abs_top_builddir = @abs_top_builddir@ abs_top_srcdir = @abs_top_srcdir@ +ac_ct_AR = @ac_ct_AR@ ac_ct_CC = @ac_ct_CC@ ac_ct_DUMPBIN = @ac_ct_DUMPBIN@ am__include = @am__include@ diff --git a/libgo/aclocal.m4 b/libgo/aclocal.m4 index ca453c6..9d4f58c 100644 --- a/libgo/aclocal.m4 +++ b/libgo/aclocal.m4 @@ -973,9 +973,9 @@ m4_include([../config/lead-dot.m4]) m4_include([../config/multi.m4]) m4_include([../config/override.m4]) m4_include([../config/unwind_ipinfo.m4]) +m4_include([../config/libtool.m4]) +m4_include([../config/ltoptions.m4]) +m4_include([../config/ltsugar.m4]) +m4_include([../config/ltversion.m4]) +m4_include([../config/lt~obsolete.m4]) m4_include([config/go.m4]) -m4_include([config/libtool.m4]) -m4_include([config/ltoptions.m4]) -m4_include([config/ltsugar.m4]) -m4_include([config/ltversion.m4]) -m4_include([config/lt~obsolete.m4]) diff --git a/libgo/config/libtool.m4 b/libgo/config/libtool.m4 deleted file mode 100644 index 1a667d3..000 --- a/libgo/config/libtool.m4 +++ /dev/null ... -- Markus
Re: implementation of std::thread::hardware_concurrency()
Hi, This is patch is implement the std::thread::hardware_concurrency(). Tested on pthreads-win32/winpthreads on windows OS, and on Linux/FreeBSD. Please send library patches to the library mailing list too. Also, always parch mainline first: actually in the latter the function is alread implemented, maybe something is missing for win32, please check, rediff, and resend. Thanks Paolo
Re: implementation of std::thread::hardware_concurrency()
On 10/31/2011 02:10 PM, niXman wrote: +#elif definen(_GLIBCXX_USE_GET_NPROCS) Typo. r~
Re: [PATCH, rs6000] Preserve link stack for 476 cpus
On Fri, 2011-10-28 at 15:37 -0400, David Edelsohn wrote: On Fri, Oct 28, 2011 at 12:36 PM, Peter Bergner berg...@vnet.ibm.com wrote: So David, do we even want to bother trying to support this on -m64 given the only cpu that needs this is a 32-bit only cpu? If so, I can try and work with Alan to figure out how we can merge the function descriptors for the thunk routines when using -m64. I barely want to bother with this ;-). So, no, I don't want to bother with -m64 support. Ok, attached below is the updated patch that passes bootstrap and regtesting that only enables the new link stack code for 32-bit compiles. However, talking with Alan, he mentioned we just have to mark the opd entry weak and that will fix my link problem (confirmed it does). It seems we might want to allow this on 64-bit too, since it actually makes the code cleaner wrt where we set TARGET_LINK_STACK. To get 64-bit working, we only need the following patch on top of the 32-bit only patch below: --- gcc/config/rs6000/rs6000.c.old 2011-10-31 16:16:04.0 -0500 +++ gcc/config/rs6000/rs6000.c 2011-10-31 16:16:37.0 -0500 @@ -3245,13 +3245,7 @@ /* If not explicitly specified via option, decide whether to generate the extra blr's required to preserve the link stack on some cpus (eg, 476). */ - if (TARGET_POWERPC64) -{ - if (TARGET_LINK_STACK 0) - warning (0, -m64 disables -mpreserve-ppc476-link-stack); - SET_TARGET_LINK_STACK (0); -} - else if (TARGET_LINK_STACK == -1) + if (TARGET_LINK_STACK == -1) SET_TARGET_LINK_STACK (rs6000_cpu == PROCESSOR_PPC476 flag_pic); return ret; @@ -27960,6 +27954,8 @@ DECL_COMDAT_GROUP (decl) = DECL_ASSEMBLER_NAME (decl); targetm.asm_out.unique_section (decl, 0); switch_to_section (get_named_section (decl, NULL, 0)); + DECL_WEAK (decl) = 1; + ASM_WEAKEN_DECL (asm_out_file, decl, name, 0); targetm.asm_out.globalize_label (asm_out_file, name); targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN); ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl); It's up to you David whether we should stick with the 32-bit only patch or go ahead and allow 64-bit too. What do you think? Peter * config.gcc (powerpc*-*-linux*): Add powerpc*-*-linux*ppc476* variant. * config/rs6000/476.h: New file. * config/rs6000/476.opt: Likewise. * config/rs6000/rs6000.h (TARGET_LINK_STACK): New define. (SET_TARGET_LINK_STACK): Likewise. (TARGET_ASM_CODE_END): Define. * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable TARGET_LINK_STACK for -mtune=476 and -mtune=476fp. (rs6000_legitimize_tls_address): Emit the link stack preserving GOT code if TARGET_LINK_STACK. (rs6000_emit_load_toc_table): Likewise. (output_function_profiler): Likewise (macho_branch_islands): Likewise (machopic_output_stub): Likewise (get_ppc476_thunk_name): New function. (rs6000_code_end): Likewise. * config/rs6000/rs6000.md (load_toc_v4_PIC_1, load_toc_v4_PIC_1b): Convert to a define_expand. (load_toc_v4_PIC_1_normal): New define_insn. (load_toc_v4_PIC_1_476): Likewise. (load_toc_v4_PIC_1b_normal): Likewise. (load_toc_v4_PIC_1b_476): Likewise. Index: gcc/config.gcc === --- gcc/config.gcc (revision 179091) +++ gcc/config.gcc (working copy) @@ -2133,6 +2133,9 @@ powerpc-*-linux* | powerpc64-*-linux*) esac tmake_file=${tmake_file} t-slibgcc-libgcc case ${target} in + powerpc*-*-linux*ppc476*) + tm_file=${tm_file} rs6000/476.h + extra_options=${extra_options} rs6000/476.opt ;; powerpc*-*-linux*altivec*) tm_file=${tm_file} rs6000/linuxaltivec.h ;; powerpc*-*-linux*spe*) Index: gcc/config/rs6000/476.h === --- gcc/config/rs6000/476.h (revision 0) +++ gcc/config/rs6000/476.h (revision 0) @@ -0,0 +1,32 @@ +/* Enable IBM PowerPC 476 support. + Copyright (C) 2011 Free Software Foundation, Inc. + Contributed by Peter Bergner (berg...@vnet.ibm.com) + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
On 10/31/2011 09:32 PM, Jason Merrill wrote: On 10/31/2011 04:24 PM, Paolo Carlini wrote: Sure. Note anyway, that bits/c++config.h is already included, elsewhere too in libsupc++. I guess the other option would be to add it to install-freestanding-headers. Of course. I think Benjamin followed in better detail the issues having to with libsupc++ vs the C++ runtime proper, freestanding, making sure there aren't overly annoying links, etc. For now you can of course commit the patch as-is, only please make sure that the error message is by and large consistent with the one we provide via c++0x_warning.h. Paolo.
Re: Go patch committed: Implement new syscall package
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Ian, I committed this patch which should fix this problem. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. thanks, but this is not enough: nawk: syntax error at source line 173 context is ([^ ]*)$, cparam) == 0) { nawk: illegal statement at source line 173 nawk: syntax error at source line 179 and there is another instance on l.210. I haven't tried fixing this myself since I'm fighting with other issues. Whoops, my patch was incomplete. Sorry about that. Fixed by this patch. Bootstrapped on x86_64-unknown-linux-gnu and saw that it did not change the code generated by the script. Committed to mainline. Ian diff -r a880b911554e libgo/go/syscall/mksyscall.awk --- a/libgo/go/syscall/mksyscall.awk Fri Oct 28 15:05:12 2011 -0700 +++ b/libgo/go/syscall/mksyscall.awk Mon Oct 31 14:41:12 2011 -0700 @@ -170,7 +170,7 @@ printf(\t}\n) ++carg - if (match(cargs[carg], ^([^ ]*) ([^ ]*)$, cparam) == 0) { + if (split(cargs[carg], cparam) != 2) { print loc, bad C parameter:, cargs[carg] | cat 12 status = 1 next @@ -207,7 +207,7 @@ } usedr = 0 for (goresult = 1; goresults[goresult] != ; goresult++) { - if (match(goresults[goresult], ^([^ ]*) ([^ ]*)$, goparam) == 0) { + if (split(goresults[goresult], goparam) != 2) { print loc, bad result:, goresults[goresult] | cat 12 status = 1 next
Re: implementation of std::thread::hardware_concurrency()
On Oct 31, 2011, at 2:10 PM, niXman wrote: This is patch is implement the std::thread::hardware_concurrency(). [ general comment ] Ick, this isn't what I'd call clean. Maybe a porting header inclusion that defines a static inline pthread_num_processors_np when on those system that don't have it. With that then this routine could just use pthread_num_processors_np instead after including that porting header. Having dozens of files with cascades of #if went out of fashion back in the 1990s.
Re: [PATCH] Handle many consecutive location notes more efficiently in dwarf2.
From: Jakub Jelinek ja...@redhat.com Date: Mon, 31 Oct 2011 11:26:40 +0100 Or alternatively you could remove the whole if (! !next_note ...) next_note = NULL_RTX; stmt and move your cache to a global var and clear it when reaching end of function (like e.g. last_var_location_insn is cleared in dwarf2out_end_epilogue). This solution sounds the best, thanks Jakub! If I get some time I'll see if I can more strongly integrate my changes with the existing last_label sharing code there, as all of these tests are checking for essentially the same thing. Invalidate cached next real insn in dwarf2out_end_epilogue(). * dwarf2out.c (cached_next_real_insn): New. (dwarf2out_end_epilogue): Set it to NULL_RTX. (dwarf2out_var_location): Remove cached_next_real_insn local static. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180713 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog |6 ++ gcc/dwarf2out.c |3 ++- 2 files changed, 8 insertions(+), 1 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index caed12e..4848147 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2011-10-31 David S. Miller da...@davemloft.net + + * dwarf2out.c (cached_next_real_insn): New. + (dwarf2out_end_epilogue): Set it to NULL_RTX. + (dwarf2out_var_location): Remove cached_next_real_insn local static. + 2011-10-31 Richard Henderson r...@redhat.com * config/i386/sse.md (floatv8siv8sf2): Rename from avx_cvtdq2ps256. diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 478952f..e6f86a4 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -98,6 +98,7 @@ along with GCC; see the file COPYING3. If not see static void dwarf2out_source_line (unsigned int, const char *, int, bool); static rtx last_var_location_insn; +static rtx cached_next_real_insn; #ifdef VMS_DEBUGGING_INFO int vms_file_stats_name (const char *, long long *, long *, char *, int *); @@ -1090,6 +1091,7 @@ dwarf2out_end_epilogue (unsigned int line ATTRIBUTE_UNUSED, char label[MAX_ARTIFICIAL_LABEL_BYTES]; last_var_location_insn = NULL_RTX; + cached_next_real_insn = NULL_RTX; if (dwarf2out_do_cfi_asm ()) fprintf (asm_out_file, \t.cfi_endproc\n); @@ -20132,7 +20134,6 @@ dwarf2out_var_location (rtx loc_note) static const char *last_postcall_label; static bool last_in_cold_section_p; static rtx expected_next_loc_note; - static rtx cached_next_real_insn; tree decl; bool var_loc_p; -- 1.7.6.401.g6a319
Re: Go patch committed: Implement new syscall package
Rainer Orth r...@cebitec.uni-bielefeld.de writes: /vol/gcc/src/hg/trunk/local/libgo/go/syscall/errstr_nor.go:22:8: error: referenc e to undefined name 'libc_strerror' make[4]: *** [syscall/syscall.lo] Error 1 Sorry about that. I thought I had tested that, but evidently not. Fixed like so. Committed to mainline. Ian diff -r 56a1bd1d907a libgo/go/syscall/errstr_nor.go --- a/libgo/go/syscall/errstr_nor.go Mon Oct 31 14:41:55 2011 -0700 +++ b/libgo/go/syscall/errstr_nor.go Mon Oct 31 14:53:22 2011 -0700 @@ -11,7 +11,7 @@ unsafe ) -//sysnb strerror(errnum int) *byte +//sysnb strerror(errnum int) (buf *byte) //strerror(errnum int) *byte var errstr_lock sync.Mutex @@ -19,7 +19,7 @@ func Errstr(errno int) string { errstr_lock.Lock() - bp := libc_strerror(errno) + bp := strerror(errno) b := (*[1000]byte)(unsafe.Pointer(bp)) i := 0 for b[i] != 0 {
Re: Go patch committed: Update Go library
Rainer Orth r...@cebitec.uni-bielefeld.de writes: the only issue I've found on Solaris is the use of pthread_yield, which doesn't exist even on Solaris 11. The following patch checks for this, and falls back to thr_yield if available. Rather than that patch, I changed the code to use sched_yield rather than pthread_yield. I realized that libgo is already using sched_yield, in runtime/go-sched.c. There shouldn't be any portability penalty to also using it in yield.c. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 7135ea46b116 libgo/runtime/yield.c --- a/libgo/runtime/yield.c Mon Oct 31 14:53:56 2011 -0700 +++ b/libgo/runtime/yield.c Mon Oct 31 14:58:19 2011 -0700 @@ -9,7 +9,7 @@ #include stddef.h #include sys/types.h #include sys/time.h -#include pthread.h +#include sched.h #include unistd.h #ifdef HAVE_SYS_SELECT_H @@ -38,7 +38,7 @@ void runtime_osyield (void) { - pthread_yield (); + sched_yield (); } /* Sleep for some number of microseconds. */
[PATCH] Allow zero operand in sparc VIS3 cmask patterns.
I noticed this while working on vcond patterns for sparc. Committed to trunk. gcc/ * config/sparc/sparc.md (cmask patterns): Allow zero operand. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180715 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog |2 ++ gcc/config/sparc/sparc.md |6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4848147..ebf8cdc 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,7 @@ 2011-10-31 David S. Miller da...@davemloft.net + * config/sparc/sparc.md (cmask patterns): Allow zero operand. + * dwarf2out.c (cached_next_real_insn): New. (dwarf2out_end_epilogue): Set it to NULL_RTX. (dwarf2out_var_location): Remove cached_next_real_insn local static. diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index 6dd3909..fbd1a87 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -8452,7 +8452,7 @@ ;; Conditional moves are possible via fcmpX -- cmaskX - bshuffle (define_insn cmask8P:mode_vis [(set (reg:DI GSR_REG) -(unspec:DI [(match_operand:P 0 register_operand r) +(unspec:DI [(match_operand:P 0 register_or_zero_operand rJ) (reg:DI GSR_REG)] UNSPEC_CMASK8))] TARGET_VIS3 @@ -8460,7 +8460,7 @@ (define_insn cmask16P:mode_vis [(set (reg:DI GSR_REG) -(unspec:DI [(match_operand:P 0 register_operand r) +(unspec:DI [(match_operand:P 0 register_or_zero_operand rJ) (reg:DI GSR_REG)] UNSPEC_CMASK16))] TARGET_VIS3 @@ -8468,7 +8468,7 @@ (define_insn cmask32P:mode_vis [(set (reg:DI GSR_REG) -(unspec:DI [(match_operand:P 0 register_operand r) +(unspec:DI [(match_operand:P 0 register_or_zero_operand rJ) (reg:DI GSR_REG)] UNSPEC_CMASK32))] TARGET_VIS3 -- 1.7.6.401.g6a319
PATCH: Move f16c intrinsics into f16cintrin.h
Hi, This patch moves f16c intrinsics out of immintrin.h into their own header f16cintrin.h Interested parties should view these threads from three years ago: http://gcc.gnu.org/ml/gcc-patches/2008-11/threads.html#00145 http://gcc.gnu.org/ml/gcc-patches/2008-12/threads.html#00174 Testing on x86_64, okay to commit if no regressions? -- Quentin Neill -- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index caed12e..5af1c78 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-31 Quentin Neill quentin.ne...@amd.com + + Piledriver f16cintrin.h fix. + * config/i386/f16cintrin.h: Contents moved from immintrin.h. + 2011-10-31 Richard Henderson r...@redhat.com * config/i386/sse.md (floatv8siv8sf2): Rename from avx_cvtdq2ps256. diff --git a/gcc/config/i386/f16cintrin.h b/gcc/config/i386/f16cintrin.h new file mode 100644 index 000..5ff836b --- /dev/null +++ b/gcc/config/i386/f16cintrin.h @@ -0,0 +1,94 @@ +/* Copyright (C) 2011 + Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef _X86INTRIN_H_INCLUDED +#if (!defined(_X86INTRIN_H_INCLUDED) !defined(_IMMINTRIN_H_INCLUDED)) +# error Never use f16intrin.h directly; include x86intrin.h or immintrin.h instead. +#endif + +#ifndef __F16C__ +# error F16C instruction set not enabled +#else + +#ifndef _F16CINTRIN_H_INCLUDED +#define _F16CINTRIN_H_INCLUDED + +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtsh_ss (unsigned short __S) +{ + __v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 }; + __v4sf __A = __builtin_ia32_vcvtph2ps (__H); + return __builtin_ia32_vec_ext_v4sf (__A, 0); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_ps (__m128i __A) +{ + return (__m128) __builtin_ia32_vcvtph2ps ((__v8hi) __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_ps (__m128i __A) +{ + return (__m256) __builtin_ia32_vcvtph2ps256 ((__v8hi) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtss_sh (float __F, const int __I) +{ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 }; + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); + return (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_ph (__m128 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph ((__v4sf) __A, __I); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtps_ph (__m256 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf) __A, __I); +} +#else +#define _cvtss_sh(__F, __I)\ + (__extension__ \ + ({ \ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 };\ + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); \ + (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); \ +})) + +#define _mm_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph ((__v4sf)(__m128) A, (int) (I))) + +#define _mm256_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf)(__m256) A, (int) (I))) +#endif + +#endif /* __F16C__ */ +#endif diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 102814e..986a573 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -76,6 +76,10 @@ #include fmaintrin.h #endif +#ifdef __F16C__ +#include f16cintrin.h +#endif + #ifdef __RDRND__ extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -161,63 +165,4 @@ _rdrand64_step (unsigned long long *__P) #endif /* __RDRND__ */ #endif /* __x86_64__ */ -#ifdef __F16C__ -extern __inline
[PATCH] Add fixuns_truncmodesseintvecmodelower2
Hi! This allows to vectorize float - uint conversion. To convert V{4,8}SFmode op0 to V{4,8}SImode target, it emits: V{4,8}SFmode mask = op0 = { INT_MAX + 1U + .0f, INT_MAX + 1U + .0f, ... } // non-signalling GE V{4,8}SFmode tmp1 = mask { 2.0f * INT_MIN, 2.0f * INT_MIN, ... } V{4,8}SFmode tmp2 = op0 + tmp1 V{4,8}SImode target = (V{4,8}SImode) tmp2 TARGET_AVX is needed, because pre-AVX we didn't have non-signalling GE in cmpps and we don't want to raise exceptions if op0 is QNaN (scalar code uses vucomiss). Ok for trunk? 2011-10-31 Jakub Jelinek ja...@redhat.com * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): New expander. --- gcc/config/i386/sse.md.jj 2011-10-31 21:05:21.0 +0100 +++ gcc/config/i386/sse.md 2011-10-31 22:53:13.0 +0100 @@ -2322,6 +2322,35 @@ (define_insn fix_truncv4sfv4si2 (set_attr prefix maybe_vex) (set_attr mode TI)]) +(define_expand fixuns_truncmodesseintvecmodelower2 + [(set (match_dup 4) + (unspec:VF1 + [(match_operand:VF1 1 register_operand ) + (match_dup 2) + (const_int 29)] UNSPEC_PCMP)) + (set (match_dup 5) + (and:VF1 (match_dup 4) (match_dup 3))) + (set (match_dup 6) + (plus:VF1 (match_dup 1) (match_dup 5))) + (set (match_operand:sseintvecmode 0 register_operand ) + (fix:sseintvecmode (match_dup 6)))] + TARGET_AVX +{ + REAL_VALUE_TYPE MTWO32r, TWO31r; + int i; + + real_ldexp (TWO31r, dconst1, 31); + operands[2] = const_double_from_real_value (TWO31r, SFmode); + operands[2] = ix86_build_const_vector (MODEmode, 1, operands[2]); + operands[2] = force_reg (MODEmode, operands[2]); + real_ldexp (MTWO32r, dconstm1, 32); + operands[3] = const_double_from_real_value (MTWO32r, SFmode); + operands[3] = ix86_build_const_vector (MODEmode, 1, operands[3]); + operands[3] = force_reg (MODEmode, operands[3]); + for (i = 4; i 7; i++) +operands[i] = gen_reg_rtx (MODEmode); +}) + ; ;; ;; Parallel double-precision floating point conversion operations Jakub
Re: PATCH: Move f16c intrinsics into f16cintrin.h
On Mon, Oct 31, 2011 at 05:23:58PM -0500, Quentin Neill wrote: Interested parties should view these threads from three years ago: http://gcc.gnu.org/ml/gcc-patches/2008-11/threads.html#00145 http://gcc.gnu.org/ml/gcc-patches/2008-12/threads.html#00174 Testing on x86_64, okay to commit if no regressions? You aren't installing the header, so it will cause regressions. config.gcc needs to be adjusted for it. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index caed12e..5af1c78 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-31 Quentin Neill quentin.ne...@amd.com + + Piledriver f16cintrin.h fix. + * config/i386/f16cintrin.h: Contents moved from immintrin.h. + 2011-10-31 Richard Henderson r...@redhat.com * config/i386/sse.md (floatv8siv8sf2): Rename from avx_cvtdq2ps256. Jakub
Re: [PATCH, rs6000] Preserve link stack for 476 cpus
On Mon, Oct 31, 2011 at 5:32 PM, Peter Bergner berg...@vnet.ibm.com wrote: Ok, attached below is the updated patch that passes bootstrap and regtesting that only enables the new link stack code for 32-bit compiles. However, talking with Alan, he mentioned we just have to mark the opd entry weak and that will fix my link problem (confirmed it does). It seems we might want to allow this on 64-bit too, since it actually makes the code cleaner wrt where we set TARGET_LINK_STACK. To get 64-bit working, we only need the following patch on top of the 32-bit only patch below: Okay, go ahead with PPC64 support as well. Hopefully no one ever will have to use it. That implies the option should not explicitly reference ppc476. - David
Re: PATCH: Move f16c intrinsics into f16cintrin.h
On Mon, Oct 31, 2011 at 5:31 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Oct 31, 2011 at 05:23:58PM -0500, Quentin Neill wrote: Interested parties should view these threads from three years ago: http://gcc.gnu.org/ml/gcc-patches/2008-11/threads.html#00145 http://gcc.gnu.org/ml/gcc-patches/2008-12/threads.html#00174 Testing on x86_64, okay to commit if no regressions? You aren't installing the header, so it will cause regressions. config.gcc needs to be adjusted for it. Arggh. Thanks, my tests found that too. Reposting, okay to commit after testing on x86_64 if no regressions? -- Quentin Neill From c0379bf7dacbe457813893cdaf381ae7206566c7 Mon Sep 17 00:00:00 2001 From: Quentin Neill quentin.ne...@amd.com Date: Mon, 31 Oct 2011 16:54:18 -0500 Subject: [PATCH] 2011-10-31 Quentin Neill quentin.ne...@amd.com Piledriver f16cintrin.h fix. * config/i386/f16cintrin.h: Contents moved from immintrin.h. * config/config.gcc: Add f16cintrin.h. --- gcc/ChangeLog|6 +++ gcc/config.gcc |4 +- gcc/config/i386/f16cintrin.h | 94 ++ gcc/config/i386/immintrin.h | 63 ++-- 4 files changed, 106 insertions(+), 61 deletions(-) create mode 100644 gcc/config/i386/f16cintrin.h diff --git a/gcc/ChangeLog b/gcc/ChangeLog index caed12e..14a4392 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2011-10-31 Quentin Neill quentin.ne...@amd.com + + Piledriver f16cintrin.h fix. + * config/i386/f16cintrin.h: Contents moved from immintrin.h. + * config/config.gcc: Add f16cintrin.h. + 2011-10-31 Richard Henderson r...@redhat.com * config/i386/sse.md (floatv8siv8sf2): Rename from avx_cvtdq2ps256. diff --git a/gcc/config.gcc b/gcc/config.gcc index 2c18655..2b60e77 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -361,7 +361,7 @@ i[34567]86-*-*) immintrin.h x86intrin.h avxintrin.h xopintrin.h ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h lzcntintrin.h bmiintrin.h bmi2intrin.h tbmintrin.h - avx2intrin.h fmaintrin.h + avx2intrin.h fmaintrin.h f16cintrin.h ;; x86_64-*-*) cpu_type=i386 @@ -374,7 +374,7 @@ x86_64-*-*) immintrin.h x86intrin.h avxintrin.h xopintrin.h ia32intrin.h cross-stdarg.h lwpintrin.h popcntintrin.h lzcntintrin.h bmiintrin.h tbmintrin.h bmi2intrin.h - avx2intrin.h fmaintrin.h + avx2intrin.h fmaintrin.h f16cintrin.h need_64bit_hwint=yes ;; ia64-*-*) diff --git a/gcc/config/i386/f16cintrin.h b/gcc/config/i386/f16cintrin.h new file mode 100644 index 000..5ff836b --- /dev/null +++ b/gcc/config/i386/f16cintrin.h @@ -0,0 +1,94 @@ +/* Copyright (C) 2011 + Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + http://www.gnu.org/licenses/. */ + +#ifndef _X86INTRIN_H_INCLUDED +#if (!defined(_X86INTRIN_H_INCLUDED) !defined(_IMMINTRIN_H_INCLUDED)) +# error Never use f16intrin.h directly; include x86intrin.h or immintrin.h instead. +#endif + +#ifndef __F16C__ +# error F16C instruction set not enabled +#else + +#ifndef _F16CINTRIN_H_INCLUDED +#define _F16CINTRIN_H_INCLUDED + +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtsh_ss (unsigned short __S) +{ + __v8hi __H = __extension__ (__v8hi){ __S, 0, 0, 0, 0, 0, 0, 0 }; + __v4sf __A = __builtin_ia32_vcvtph2ps (__H); + return __builtin_ia32_vec_ext_v4sf (__A, 0); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_ps (__m128i __A) +{ + return (__m128) __builtin_ia32_vcvtph2ps ((__v8hi) __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_ps (__m128i __A) +{ + return (__m256) __builtin_ia32_vcvtph2ps256 ((__v8hi) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline unsigned
Re: [PATCH] Add fixuns_truncmodesseintvecmodelower2
On 10/31/2011 03:29 PM, Jakub Jelinek wrote: * config/i386/sse.md (fixuns_truncmodesseintvecmodelower2): New expander. Ok. r~
[google] Enable loop unroll/peel notes under -fopt-info
This patch is for google-main only. Tested with bootstrap and regression tests. Print unroll and peel factors along with loop source position under -fopt-info. Teresa 2011-10-31 Teresa Johnson tejohn...@google.com * common.opt (fopt-info): Disable -fopt-info by default. * loop-unroll.c (report_unroll_peel): New function. (unroll_and_peel_loops): Call record_loop_exits for later use. (peel_loops_completely): Print the loop source position in dump info and emit note under -fopt-info. (decide_unroll_and_peeling): Ditto. (decide_peel_once_rolling): Record peel factor for use in note emission. (decide_peel_completely): Ditto. * cfgloop.c (get_loop_location): New function. * cfgloop.h (get_loop_location): Ditto. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Emit note under -fopt-info. Index: tree-ssa-loop-ivcanon.c === --- tree-ssa-loop-ivcanon.c (revision 180437) +++ tree-ssa-loop-ivcanon.c (working copy) @@ -52,6 +52,7 @@ #include flags.h #include tree-inline.h #include target.h +#include diagnostic.h /* Specifies types of loops that may be unrolled. */ @@ -443,6 +444,17 @@ fprintf (dump_file, Unrolled loop %d completely by factor %d.\n, loop-num, (int) n_unroll); + if (flag_opt_info = OPT_INFO_MIN) +{ + location_t locus; + locus = gimple_location (cond); + + inform (locus, Completely Unroll loop by %d (execution count %d, const iterations %d), + (int) n_unroll, + (int) loop-header-count, + (int) TREE_INT_CST_LOW(niter)); +} + return true; } Index: loop-unroll.c === --- loop-unroll.c (revision 180437) +++ loop-unroll.c (working copy) @@ -34,6 +34,7 @@ #include hashtab.h #include recog.h #include target.h +#include diagnostic.h /* This pass performs loop unrolling and peeling. We only perform these optimizations on innermost loops (with single exception) because @@ -152,6 +153,30 @@ basic_block); static rtx get_expansion (struct var_to_expand *); +static void +report_unroll_peel(struct loop *loop, location_t locus) +{ + struct niter_desc *desc; + int niters = 0; + + desc = get_simple_loop_desc (loop); + + if (desc-const_iter) +niters = desc-niter; + else if (loop-header-count) +niters = expected_loop_iterations (loop); + + inform (locus, %s%s loop by %d (execution count %d, %s iterations %d), + loop-lpt_decision.decision == LPT_PEEL_COMPLETELY ? +Completely : , + loop-lpt_decision.decision == LPT_PEEL_SIMPLE ? +Peel : Unroll, + loop-lpt_decision.times, + (int)loop-header-count, + desc-const_iter?const:average, + niters); +} + /* Unroll and/or peel (depending on FLAGS) LOOPS. */ void unroll_and_peel_loops (int flags) @@ -160,6 +185,8 @@ bool check; loop_iterator li; + record_loop_exits(); + /* First perform complete loop peeling (it is almost surely a win, and affects parameters for further decision a lot). */ peel_loops_completely (flags); @@ -234,16 +261,18 @@ { struct loop *loop; loop_iterator li; + location_t locus; /* Scan the loops, the inner ones first. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { loop-lpt_decision.decision = LPT_NONE; + locus = get_loop_location(loop); if (dump_file) - fprintf (dump_file, -\n;; *** Considering loop %d for complete peeling ***\n, -loop-num); + fprintf (dump_file, \n;; *** Considering loop %d for complete peeling at BB %d from %s:%d ***\n, + loop-num, loop-header-index, LOCATION_FILE(locus), + LOCATION_LINE(locus)); loop-ninsns = num_loop_insns (loop); @@ -253,6 +282,11 @@ if (loop-lpt_decision.decision == LPT_PEEL_COMPLETELY) { + if (flag_opt_info = OPT_INFO_MIN) +{ + report_unroll_peel(loop, locus); +} + peel_loop_completely (loop); #ifdef ENABLE_CHECKING verify_dominators (CDI_DOMINATORS); @@ -268,14 +302,18 @@ { struct loop *loop; loop_iterator li; + location_t locus; /* Scan the loops, inner ones first. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { loop-lpt_decision.decision = LPT_NONE; + locus = get_loop_location(loop); if (dump_file) - fprintf (dump_file, \n;; *** Considering loop %d ***\n, loop-num); + fprintf (dump_file, \n;; *** Considering loop %d at BB %d from %s:%d ***\n, + loop-num, loop-header-index, LOCATION_FILE(locus), + LOCATION_LINE(locus)); /* Do not peel cold areas. */ if (optimize_loop_for_size_p (loop)) @@
Re: C++ PATCH to add -std=c++11 ??
On 10/31/2011 01:57 PM, Jason Merrill wrote: On 10/31/2011 06:39 AM, Paolo Carlini wrote: Great. When you commit it, you can as well add 'PR c++/50920' to the ChangeLog! OK, here's what I'm checking in. There are a lot more instances of C++0x in comments and cxx_dialect checks, but I'm not going to worry about those now. And some doc changes: commit 611f8e25ffb46a089716187f84414ed4ae56fde4 Author: Jason Merrill ja...@redhat.com Date: Mon Oct 31 22:02:42 2011 -0400 * doc/invoke.texi: Update for -std=c++11. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1aa0541..0c97453 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -235,7 +235,7 @@ Objective-C and Objective-C++ Dialects}. -pedantic-errors @gol -w -Wextra -Wall -Waddress -Waggregate-return -Warray-bounds @gol -Wno-attributes -Wno-builtin-macro-redefined @gol --Wc++-compat -Wc++0x-compat -Wcast-align -Wcast-qual @gol +-Wc++-compat -Wc++11-compat -Wcast-align -Wcast-qual @gol -Wchar-subscripts -Wclobbered -Wcomment @gol -Wconversion -Wcoverage-mismatch -Wno-cpp -Wno-deprecated @gol -Wno-deprecated-declarations -Wdisabled-optimization @gol @@ -1574,16 +1574,13 @@ C++ code. GNU dialect of @option{-std=c++98}. This is the default for C++ code. -@item c++0x -The working draft of the upcoming ISO C++0x standard. This option -enables experimental features that are likely to be included in -C++0x. The working draft is constantly changing, and any feature that is -enabled by this flag may be removed from future versions of GCC if it is -not part of the C++0x standard. +@item c++11 +The 2011 ISO C++ standard plus amendments. Support for C++11 is still +experimental, and may change in incompatible ways in future releases. -@item gnu++0x -GNU dialect of @option{-std=c++0x}. This option enables -experimental features that may be removed in future versions of GCC. +@item gnu++11 +GNU dialect of @option{-std=c++11}. Support for C++11 is still +experimental, and may change in incompatible ways in future releases. @end table @item -fgnu89-inline @@ -1870,7 +1867,7 @@ Version 5 corrects the mangling of attribute const/volatile on function pointer types, decltype of a plain decl, and use of a function parameter in the declaration of another parameter. -Version 6 corrects the promotion behavior of C++0x scoped enums. +Version 6 corrects the promotion behavior of C++11 scoped enums. See also @option{-Wabi}. @@ -1905,7 +1902,7 @@ been added for putting variables into BSS without making them common. @item -fconstexpr-depth=@var{n} @opindex fconstexpr-depth -Set the maximum nested evaluation depth for C++0x constexpr functions +Set the maximum nested evaluation depth for C++11 constexpr functions to @var{n}. A limit is needed to detect endless recursion during constant expression evaluation. The minimum specified by the standard is 512. @@ -2093,7 +2090,7 @@ Set the maximum instantiation depth for template classes to @var{n}. A limit on the template instantiation depth is needed to detect endless recursions during template class instantiation. ANSI/ISO C++ conforming programs must not rely on a maximum depth greater than 17 -(changed to 1024 in C++0x). The default value is 900, as the compiler +(changed to 1024 in C++11). The default value is 900, as the compiler can run out of stack space before hitting 1024 in some situations. @item -fno-threadsafe-statics @@ -2368,14 +2365,14 @@ by @option{-Wall}. @item -Wno-narrowing @r{(C++ and Objective-C++ only)} @opindex Wnarrowing @opindex Wno-narrowing -With -std=c++0x, suppress the diagnostic required by the standard for +With -std=c++11, suppress the diagnostic required by the standard for narrowing conversions within @samp{@{ @}}, e.g. @smallexample int i = @{ 2.2 @}; // error: narrowing from double to int @end smallexample -This flag can be useful for compiling valid C++98 code in C++0x mode +This flag can be useful for compiling valid C++98 code in C++11 mode. @item -Wnoexcept @r{(C++ and Objective-C++ only)} @opindex Wnoexcept @@ -2993,7 +2990,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. @gccoptlist{-Waddress @gol -Warray-bounds @r{(only with} @option{-O2}@r{)} @gol --Wc++0x-compat @gol +-Wc++11-compat @gol -Wchar-subscripts @gol -Wenum-compare @r{(in C/Objc; this is on by default in C++)} @gol -Wimplicit-int @r{(C and Objective-C only)} @gol @@ -4063,10 +4060,10 @@ Warn about ISO C constructs that are outside of the common subset of ISO C and ISO C++, e.g.@: request for implicit conversion from @code{void *} to a pointer to non-@code{void} type. -@item -Wc++0x-compat @r{(C++ and Objective-C++ only)} +@item -Wc++11-compat @r{(C++ and Objective-C++ only)} Warn about C++ constructs whose meaning differs between ISO C++ 1998 and -ISO C++ 200x, e.g., identifiers in ISO C++ 1998 that will become keywords -in ISO C++ 200x. This warning is enabled by @option{-Wall}. +ISO C++
Re: RFA: libstdc++ PATCH to initializer_list to #error in C++98 mode
Here's what I'm checking in: commit 6e82dfcf49c92195b5d4bc4b522207b92bad554f Author: Jason Merrill ja...@redhat.com Date: Mon Oct 31 01:21:49 2011 -0400 * include/Makefile.am (install-freestanding-headers): Install c++0x_warning.h. * libsupc++/initializer_list: Include it. diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index 02deefc..74acbf0 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -1201,8 +1201,10 @@ endif # are installed by libsupc++, so only the first four and the sub-includes # are copied here. install-freestanding-headers: - $(mkinstalldirs) $(DESTDIR)${gxx_include_dir} + $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/bits $(mkinstalldirs) $(DESTDIR)${host_installdir} + $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/c++0x_warning.h \ + $(DESTDIR)${gxx_include_dir}/bits for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \ ${glibcxx_srcdir}/$(ABI_TWEAKS_SRCDIR)/cxxabi_tweaks.h \ ${glibcxx_srcdir}/$(CPU_DEFINES_SRCDIR)/cpu_defines.h; do \ diff --git a/libstdc++-v3/include/Makefile.in b/libstdc++-v3/include/Makefile.in index e3e75a0..8b805ba 100644 --- a/libstdc++-v3/include/Makefile.in +++ b/libstdc++-v3/include/Makefile.in @@ -1586,8 +1586,10 @@ ${pch3_output}: ${pch3_source} ${pch2_output} # are installed by libsupc++, so only the first four and the sub-includes # are copied here. install-freestanding-headers: - $(mkinstalldirs) $(DESTDIR)${gxx_include_dir} + $(mkinstalldirs) $(DESTDIR)${gxx_include_dir}/bits $(mkinstalldirs) $(DESTDIR)${host_installdir} + $(INSTALL_DATA) ${glibcxx_srcdir}/include/bits/c++0x_warning.h \ + $(DESTDIR)${gxx_include_dir}/bits for file in ${host_srcdir}/os_defines.h ${host_builddir}/c++config.h \ ${glibcxx_srcdir}/$(ABI_TWEAKS_SRCDIR)/cxxabi_tweaks.h \ ${glibcxx_srcdir}/$(CPU_DEFINES_SRCDIR)/cpu_defines.h; do \ diff --git a/libstdc++-v3/include/bits/algorithmfwd.h b/libstdc++-v3/include/bits/algorithmfwd.h index cc0b98e..fbec55d 100644 --- a/libstdc++-v3/include/bits/algorithmfwd.h +++ b/libstdc++-v3/include/bits/algorithmfwd.h @@ -35,7 +35,9 @@ #include bits/c++config.h #include bits/stl_pair.h #include bits/stl_iterator_base_types.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index 5708194..0edb8b2 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -40,7 +40,9 @@ #include ext/atomicity.h #include debug/debug.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/forward_list.h b/libstdc++-v3/include/bits/forward_list.h index c80ee50..0fc8323 100644 --- a/libstdc++-v3/include/bits/forward_list.h +++ b/libstdc++-v3/include/bits/forward_list.h @@ -33,7 +33,9 @@ #pragma GCC system_header #include memory +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_bvector.h b/libstdc++-v3/include/bits/stl_bvector.h index bddecb0..8f28640 100644 --- a/libstdc++-v3/include/bits/stl_bvector.h +++ b/libstdc++-v3/include/bits/stl_bvector.h @@ -57,7 +57,9 @@ #ifndef _STL_BVECTOR_H #define _STL_BVECTOR_H 1 +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h index 17ea01a..b924917 100644 --- a/libstdc++-v3/include/bits/stl_deque.h +++ b/libstdc++-v3/include/bits/stl_deque.h @@ -60,7 +60,9 @@ #include bits/concept_check.h #include bits/stl_iterator_base_types.h #include bits/stl_iterator_base_funcs.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h index 56ee2fb..fc1d8f8 100644 --- a/libstdc++-v3/include/bits/stl_list.h +++ b/libstdc++-v3/include/bits/stl_list.h @@ -58,7 +58,9 @@ #define _STL_LIST_H 1 #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_map.h b/libstdc++-v3/include/bits/stl_map.h index 889e52b..45824f0 100644 --- a/libstdc++-v3/include/bits/stl_map.h +++ b/libstdc++-v3/include/bits/stl_map.h @@ -59,7 +59,9 @@ #include bits/functexcept.h #include bits/concept_check.h +#ifdef __GXX_EXPERIMENTAL_CXX0X__ #include initializer_list +#endif namespace std _GLIBCXX_VISIBILITY(default) { diff --git a/libstdc++-v3/include/bits/stl_multimap.h b/libstdc++-v3/include/bits/stl_multimap.h index 6b74558..fd5a5a8 100644 ---
Re: [go]: Port to ALPHA arch - epoll problems
Uros Bizjak ubiz...@gmail.com writes: It turned out that the EpollEvent definition in libgo/syscalls/epoll/socket_epoll.go is non-portable (if not outright dangerous...). The definition does have a FIXME comment, but does not take into account the effects of __attribute__((__packed__)) from system headers. Contrary to alpha header, x86 has __attribute__((__packed__)) added to struct epoll_event definition in sys/epoll.h header. I couldn't work out a way to handle this correctly in mksysinfo.sh or -fdump-go-spec, so I did it in configure instead. Bootstrapped and tested on x86_64-unknown-linux-gnu. Committed to mainline. Let me know if it seems to do the right sort of thing on Alpha GNU/Linux--see if the generated file TARGET/libgo/epoll.h looks OK. Ian Index: libgo/configure.ac === --- libgo/configure.ac (revision 180345) +++ libgo/configure.ac (working copy) @@ -505,6 +505,28 @@ CFLAGS=$CFLAGS -D_LARGEFILE_SOURCE -D_L AC_CHECK_TYPES(off64_t) CFLAGS=$CFLAGS_hold +dnl Work out the size of the epoll_events struct on GNU/Linux. +AC_CACHE_CHECK([epoll_event size], +[libgo_cv_c_epoll_event_size], +[AC_COMPUTE_INT(libgo_cv_c_epoll_event_size, +[sizeof (struct epoll_event)], +[#include sys/epoll.h], +[libgo_cv_c_epoll_event_size=0])]) +SIZEOF_STRUCT_EPOLL_EVENT=${libgo_cv_c_epoll_event_size} +AC_SUBST(SIZEOF_STRUCT_EPOLL_EVENT) + +dnl Work out the offset of the fd field in the epoll_events struct on +dnl GNU/Linux. +AC_CACHE_CHECK([epoll_event data.fd offset], +[libgo_cv_c_epoll_event_fd_offset], +[AC_COMPUTE_INT(libgo_cv_c_epoll_event_fd_offset, +[offsetof (struct epoll_event, data.fd)], +[#include stddef.h +#include sys/epoll.h], +[libgo_cv_c_epoll_event_fd_offset=0])]) +STRUCT_EPOLL_EVENT_FD_OFFSET=${libgo_cv_c_epoll_event_fd_offset} +AC_SUBST(STRUCT_EPOLL_EVENT_FD_OFFSET) + AC_CACHE_SAVE if test ${multilib} = yes; then Index: libgo/go/syscall/socket_linux.go === --- libgo/go/syscall/socket_linux.go (revision 180552) +++ libgo/go/syscall/socket_linux.go (working copy) @@ -164,15 +164,6 @@ func anyToSockaddrOS(rsa *RawSockaddrAny return nil, EAFNOSUPPORT } -// We don't take this type directly from the header file because it -// uses a union. FIXME. - -type EpollEvent struct { - Events uint32 - Fd int32 - Pad int32 -} - //sysnb EpollCreate(size int) (fd int, errno int) //epoll_create(size int) int Index: libgo/Makefile.am === --- libgo/Makefile.am (revision 180552) +++ libgo/Makefile.am (working copy) @@ -1498,7 +1498,7 @@ endif # !LIBGO_IS_LINUX # Define socket sizes and types. if LIBGO_IS_LINUX -syscall_socket_file = go/syscall/socket_linux.go +syscall_socket_file = go/syscall/socket_linux.go epoll.go else if LIBGO_IS_SOLARIS syscall_socket_file = go/syscall/socket_solaris.go @@ -1582,6 +1582,34 @@ s-sysinfo: $(srcdir)/mksysinfo.sh config $(SHELL) $(srcdir)/../move-if-change tmp-sysinfo.go sysinfo.go $(STAMP) $@ +# The epoll struct has an embedded union and is packed on x86_64, +# which is too complicated for mksysinfo.sh. We find the offset of +# the only field we care about in configure.ac, and generate the +# struct here. +epoll.go: s-epoll; @true +s-epoll: Makefile + rm -f epoll.go.tmp + echo 'package syscall' epoll.go.tmp + echo 'type EpollEvent struct {' epoll.go.tmp + echo ' Events uint32' epoll.go.tmp + case $(SIZEOF_STRUCT_EPOLL_EVENT),$(STRUCT_EPOLL_EVENT_FD_OFFSET) in \ + 0,0) echo 12 *** struct epoll_event data.fd offset unknown; \ + exit 1; ;; \ + 8,4) echo ' Fd int32' epoll.go.tmp; ;; \ + 12,4) echo ' Fd int32' epoll.go.tmp; \ + echo ' Pad [4]byte' epoll.go.tmp; ;; \ + 12,8) echo ' Pad [4]byte' epoll.go.tmp; \ + echo ' Fd int32' epoll.go.tmp; ;; \ + 16,8) echo ' Pad [4]byte' epoll.go.tmp; \ + echo ' Fd int32' epoll.go.tmp; \ + echo ' Pad2 [4]byte' epoll.go.tmp; ;; \ + *) echo 12 *** struct epoll_event unsupported; \ + exit 1; ;; \ + esac + echo '}' epoll.go.tmp + $(SHELL) $(srcdir)/../move-if-change epoll.go.tmp epoll.go + $(STAMP) $@ + if LIBGO_IS_LINUX # os_lib_inotify_lo = os/inotify.lo os_lib_inotify_lo =
[PATCH] Fix errors in expand_atomic_store.
* optabs.c (expand_atomic_store): Use create_fixed_operand for atomic_store optab. Don't try to fall back to sync_lock_release. --- The create_fixed_operand thinko is obvious. The sync_lock_release is more subtle. The target is allowed to support only storing 0/1 with the test_and_set/lock_release pair, and it's allowed to support that in non-obvious ways. We don't want to get involved in that. r~ --- gcc/ChangeLog.mm |5 + gcc/optabs.c | 21 + 2 files changed, 6 insertions(+), 20 deletions(-) diff --git a/gcc/optabs.c b/gcc/optabs.c index 1ecab53..d8ab97e 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -7118,32 +7118,13 @@ expand_atomic_store (rtx mem, rtx val, enum memmodel model) icode = direct_optab_handler (atomic_store_optab, mode); if (icode != CODE_FOR_nothing) { - - create_output_operand (ops[0], mem, mode); + create_fixed_operand (ops[0], mem); create_input_operand (ops[1], val, mode); create_integer_operand (ops[2], model); if (maybe_expand_insn (icode, 3, ops)) return const0_rtx; } - /* A store of 0 is the same as __sync_lock_release, try that. */ - if (CONST_INT_P (val) INTVAL (val) == 0) -{ - icode = direct_optab_handler (sync_lock_release_optab, mode); - if (icode != CODE_FOR_nothing) - { - create_fixed_operand (ops[0], mem); - create_input_operand (ops[1], const0_rtx, mode); - if (maybe_expand_insn (icode, 2, ops)) - { - /* lock_release is only a release barrier. */ - if (model == MEMMODEL_SEQ_CST) - expand_builtin_mem_thread_fence (model); - return const0_rtx; - } - } -} - /* If the size of the object is greater than word size on this target, a default store will not be atomic, Try a mem_exchange and throw away the result. If that doesn't work, don't do anything. */ -- 1.7.6.4
[cxx-mem-model] i386 atomic load/store
I'm considering the following. Does anyone believe this i386/i486 decision re DImode is a mistake? Should I limit that to Pentium by checking cmpxchg? r~ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7ce57d8..7d28e43 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -248,6 +248,9 @@ ;; For BMI2 support UNSPEC_PDEP UNSPEC_PEXT + + ;; For __atomic support + UNSPEC_MOVA ]) (define_c_enum unspecv [ diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index e5579b1..da08e92 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -46,6 +46,88 @@ lock{%;} or{l}\t{$0, (%%esp)|DWORD PTR [esp], 0} [(set_attr memory unknown)]) +;; ??? From volume 3 section 7.1.1 Guaranteed Atomic Operations, +;; Only beginning at Pentium family processors do we get any guarantee of +;; atomicity in aligned 64-bit quantities. Beginning at P6, we get a +;; guarantee for 64-bit accesses that do not cross a cacheline boundary. +;; This distinction is ignored below, since I *suspect* that FSTLL will +;; appear atomic from the point of view of user-level threads even back +;; on the 80386; I suspect that the non-atomicity can only be seen from +;; other bus-level devices. +;; +;; Importantly, *no* processor makes atomicity guarantees for larger +;; accesses. In particular, there's no way to perform an atomic TImode +;; move, despite the apparent applicability of MOVDQA et al. + +(define_mode_iterator ATOMIC + [QI HI SI (DI TARGET_64BIT || TARGET_80387 || TARGET_SSE)]) + +(define_expand atomic_loadmode + [(set (match_operand:ATOMIC 0 register_operand ) + (unspec:ATOMIC [(match_operand:ATOMIC 1 memory_operand ) + (match_operand:SI 2 const_int_operand )] + UNSPEC_MOVA))] + +{ + /* For DImode on 32-bit, we can use the FPU to perform the load. */ + if (MODEmode == DImode !TARGET_64BIT) +emit_insn (gen_atomic_loaddi_fpu (operands[1], operands[2])); + else +emit_move_insn (operands[0], operands[1]); + DONE; +}) + +(define_insn_and_split atomic_loaddi_fpu + [(set (match_operand:DI 0 register_operand =fx) + (unspec:DI [(match_operand:DI 1 memory_operand m)] + UNSPEC_MOVA))] + !TARGET_64BIT (TARGET_80387 || TARGET_SSE) + # + reload_completed + [(set (match_dup 0) (match_dup 1))]) + +(define_expand atomic_storemode + [(set (match_operand:ATOMIC 0 memory_operand ) + (unspec:ATOMIC [(match_operand:ATOMIC 1 register_operand ) + (match_operand:SI 2 const_int_operand )] + UNSPEC_MOVA))] + +{ + enum memmodel model = (enum memmodel) INTVAL (operands[2]); + + if (MODEmode == DImode !TARGET_64BIT) +{ + /* For DImode on 32-bit, we can use the FPU to perform the store. */ + emit_insn (gen_atomic_storedi_fpu (operands[1], operands[2])); + if (model == MEMMODEL_SEQ_CST) + emit_insn (gen_mem_thread_fence (operands[2])); +} + else +{ + /* For non-seq-cst stores, we can simply just perform the store. */ + if (model != MEMMODEL_SEQ_CST) + { + emit_move_insn (operands[0], operands[1]); + DONE; + } + + /* For sub-word-size, sequentialy-consistent stores, use xchg. */ + emit_insn (gen_atomic_exchangemode (gen_reg_rtx (MODEmode), + operands[0], operands[1], + operands[2])); +} + DONE; +}) + +(define_insn_and_split atomic_storedi_fpu + [(set (match_operand:DI 0 memory_operand =m) + (unspec:DI [(match_operand:DI 1 register_operand fx)] + UNSPEC_MOVA))] + !TARGET_64BIT (TARGET_80387 || TARGET_SSE) + # + reload_completed + [(set (match_dup 0) (match_dup 1))]) + (define_expand atomic_compare_and_swapmode [(match_operand:QI 0 register_operand ) ;; bool success output (match_operand:SWI124 1 register_operand ) ;; oldval output
Re: Go patch committed: Update Go library
Rainer Orth r...@cebitec.uni-bielefeld.de writes: * The message points to the wrong line due to a broken test: malloc.goc has: p = runtime_SysReserve((void*)(0x00f8ULL32), bitmap_size + arena_size); if(p == nil) runtime_throw(runtime: cannot reserve arena virtual address space); On failure, p will be MAP_FAILED ((void *)-1), not nil, so the wrong assertion it thrown. I fixed this particular issue as follows, copying the code from the other Go library. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 1bc825e20b21 libgo/runtime/mem.c --- a/libgo/runtime/mem.c Mon Oct 31 21:07:36 2011 -0700 +++ b/libgo/runtime/mem.c Mon Oct 31 21:53:12 2011 -0700 @@ -85,6 +85,7 @@ runtime_SysReserve(void *v, uintptr n) { int fd = -1; + void *p; // On 64-bit, people with ulimit -v set complain if we reserve too // much address space. Instead, assume that the reservation is okay @@ -103,7 +104,11 @@ fd = dev_zero; #endif - return runtime_mmap(v, n, PROT_NONE, MAP_ANON|MAP_PRIVATE, fd, 0); + p = runtime_mmap(v, n, PROT_NONE, MAP_ANON|MAP_PRIVATE, fd, 0); + if((uintptr)p 4096 || -(uintptr)p 4096) { + return nil; + } + return p; } void
Re: [RFC PATCH] Gather vectorization (PR tree-optimization/50789)
On 10/31/2011 03:23 PM, Jakub Jelinek wrote: On Sat, Oct 29, 2011 at 03:53:37PM +0200, Toon Moene wrote: I wonder whether it will work with the attached Fortran routine - it sure would mean a boost to the 18%+ heaviest CPU user in our code. Would be nice to cut down slightly this testcase into just one or two loops that are vectorized and turn it into a runtime testcase which verifies the vectorization was correct. This is not a verifiable routine yet, but as the linear interpolation part already has all the juicy indirection necessary to test this vectorization, most of the routine can be thrown away, to leave the attached as essential. -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 | 4 more Saturnushof 14, 3738 XG Maartensdijk, The Netherlands | 4 44 At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news SUBROUTINE VERINT ( I KLON , KLAT , KLEV , KINT , KHALO I , KLON1 , KLON2 , KLAT1 , KLAT2 I , KP , KQ , KR R , PARG , PRES R , PALFH , PBETH R , PALFA , PBETA , PGAMA ) C C*** C C VERINT - THREE DIMENSIONAL INTERPOLATION C C PURPOSE: C C THREE DIMENSIONAL INTERPOLATION C C INPUT PARAMETERS: C C KLON NUMBER OF GRIDPOINTS IN X-DIRECTION C KLAT NUMBER OF GRIDPOINTS IN Y-DIRECTION C KLEV NUMBER OF VERTICAL LEVELS C KINT TYPE OF INTERPOLATION C= 1 - LINEAR C= 2 - QUADRATIC C= 3 - CUBIC C= 4 - MIXED CUBIC/LINEAR C KLON1 FIRST GRIDPOINT IN X-DIRECTION C KLON2 LAST GRIDPOINT IN X-DIRECTION C KLAT1 FIRST GRIDPOINT IN Y-DIRECTION C KLAT2 LAST GRIDPOINT IN Y-DIRECTION C KPARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS C KQARRAY OF INDEXES FOR HORIZONTAL DISPLACEMENTS C KRARRAY OF INDEXES FOR VERTICAL DISPLACEMENTS C PARG ARRAY OF ARGUMENTS C PALFH ALFA HAT C PBETH BETA HAT C PALFA ARRAY OF WEIGHTS IN X-DIRECTION C PBETA ARRAY OF WEIGHTS IN Y-DIRECTION C PGAMA ARRAY OF WEIGHTS IN VERTICAL DIRECTION C C OUTPUT PARAMETERS: C C PRES INTERPOLATED FIELD C C HISTORY: C C J.E. HAUGEN 1 1992 C C*** C IMPLICIT NONE C INTEGER KLON , KLAT , KLEV , KINT , KHALO, IKLON1 , KLON2 , KLAT1 , KLAT2 C INTEGER KP(KLON,KLAT), KQ(KLON,KLAT), KR(KLON,KLAT) REALPARG(2-KHALO:KLON+KHALO-1,2-KHALO:KLAT+KHALO-1,KLEV) , RPRES(KLON,KLAT) , R PALFH(KLON,KLAT) , PBETH(KLON,KLAT) , R PALFA(KLON,KLAT,4) , PBETA(KLON,KLAT,4), R PGAMA(KLON,KLAT,4) C INTEGER JX, JY, IDX, IDY, ILEV REAL Z1MAH, Z1MBH C C LINEAR INTERPOLATION C DO JY = KLAT1,KLAT2 DO JX = KLON1,KLON2 IDX = KP(JX,JY) IDY = KQ(JX,JY) ILEV = KR(JX,JY) C PRES(JX,JY) = PGAMA(JX,JY,1)*( C + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV-1) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV-1) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV-1) ) ) C+ + + PGAMA(JX,JY,2)*( C+ + PBETA(JX,JY,1)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY-1,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY-1,ILEV ) ) + + PBETA(JX,JY,2)*( PALFA(JX,JY,1)*PARG(IDX-1,IDY ,ILEV ) + + PALFA(JX,JY,2)*PARG(IDX ,IDY ,ILEV ) ) ) ENDDO ENDDO C RETURN END
Re: Go patch committed: Update Go library
Rainer Orth r...@cebitec.uni-bielefeld.de writes: After this change, I'm seeing another issue: most 32-bit go execution tests fail like this on Solaris 11/x86: /vol/gcc/src/hg/trunk/local/libgo/runtime/malloc.goc:366: libgo assertion failure FAIL: go.go-torture/execute/array-1.go execution, -O0 Running the test under truss, I find: 14261:mmap(0xFF00, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) Err#12 ENOMEM With truss -u (user function tracing), I see: 14285/1@1:- libgo:runtime_mallocinit() 14285/1@1: - libgo:runtime_InitSizes() 14285/1@1: - libgo:runtime_InitSizes() = 2 14285/1@1: - libgo:runtime_SysReserve() 14285/1: mmap(0xFF00, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) Err#12 ENOMEM 14285/1@1: - libgo:runtime_SysReserve() = -1 14285/1@1: - libgo:__go_assert_fail() If I remove the adjustment in runtime/malloc.goc (runtime_mallocinit), the test passes: 14445/1: mmap(0xFEF78114, 805306368, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xCE00 This stuff seems incredibly fragile, and I don't exactly understand why. I don't understand why one case passes and the other fails. In an attempt to make this work better, I committed the appended patch. It will at least avoid asking for impossible situations, such as the one in this example. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu (including the 32-bit tests, which I always run anyhow). Committed to mainline. Ian diff -r 250b34075533 libgo/runtime/malloc.goc --- a/libgo/runtime/malloc.goc Mon Oct 31 21:54:06 2011 -0700 +++ b/libgo/runtime/malloc.goc Mon Oct 31 22:18:21 2011 -0700 @@ -358,6 +358,8 @@ // away from the running binary image and then round up // to a MB boundary. want = (byte*)(((uintptr)end + (118) + (120) - 1)~((120)-1)); + if(0x - (uintptr)want = bitmap_size + arena_size) + want = 0; p = runtime_SysReserve(want, bitmap_size + arena_size); if(p == nil) runtime_throw(runtime: cannot reserve arena virtual address space);