Re: [PATCH, alpha]: Introduce handle_trap_shadows and align_insns passes
On Mon, Jun 30, 2014 at 9:59 PM, Uros Bizjak ubiz...@gmail.com wrote: After fixing _.barriers and _.eh_range passes w.r.t. CALL_ARG_LOCATION notes, we can finaly move handling of trap shadows (PR 56858) and insn alignments into their own passes. Additionally, the patch skips handling of BARRIERs in alpha_pad_function_end, since CALL_ARG_LOCATION notes are not split away from their call insn anymore. 2014-06-30 Uros Bizjak ubiz...@gmail.com PR target/56858 * config/alpha/alpha.c: Include tree-pass.h, context.h and pass_manager.h. (pass_data_handle_trap_shadows): New pass. (pass_handle_trap_shadows::gate): New pass gate function. (make_pass_handle_trap_shadows): New function. (rest_of_handle_trap_shadows): Ditto. (alpha_align_insns_1): Rename from alpha_align_insns. (pass_data_align_insns): New pass. (pass_align_insns::gate): New pass gate function. (make_pass_aling_insns): New function. (rest_of_align_insns): Ditto. (alpha_align_insns): Ditto. (alpha_option_override): Declare handle_trap_shadows info and align_insns_info. Register handle_trap_shadows and align_insns passes here. (alpha_reorg): Do not call alpha_trap_shadows and alpha_align_insn from here. (alpha_pad_function_end): Do not skip BARRIERs. I went agead and installed the patch on mainline SVN. The patch just fixes existing approach and doesn't introduce new functionality. Uros.
Re: [PATCH] Generate more efficient memory barriers for LEON3
Hi, Thank you for your comments. ...stdbar should never be generated since #StoreStore is implied by TSO. I missed that #StoreStore is never generated for TSO, so I'm removing that pattern. OK, thanks. Does this result in a significance performance gain? stb seems to be at least twice as fast as ldstub. I think that only the membar_storeload_leon3 pattern is necessary. The full barrier pattern membar_leon3 also gets generated so I think that one should be kept also. I'm changing the pattern type to store and the condition on the original patterns to !TARGET_LEON3 and resubmitting the patch. On 2014-07-10 11:37, Eric Botcazou wrote: The memory barriers generated for SPARC are targeting the weakest memory model allowed for SPARC. That's not quite true, they are targeting the sparc_memory_model, which is the memory model selected for the architecture/OS pair by default and which can be overridden by the user with -mmemory-model=[default|rmo|pso|tso|sc]. The LEON3/4 SPARC processors are using a stronger memory model and thus have less requirements on the memory barriers. My understanding is that they use TSO, in which case... For LEON3/4, StoreStore is compiler-only, instead of stbar, ...stdbar should never be generated since #StoreStore is implied by TSO. and StoreLoad can be achieved with a normal byte write stb, instead of an atomic byte read-write ldstub. OK, thanks. Does this result in a significance performance gain? The provided patch changes the previously mentioned memory barriers for TARGET_LEON3. I think that only the membar_storeload_leon3 pattern is necessary. Couple of more nits: the new pattern is not multi, it's store and you need to add: !TARGET_LEON3 to the original membar_storeload since TARGET_LEON3 is also TARGET_V8. -- Daniel Cederman Software Engineer Aeroflex Gaisler AB Aeroflex Microelectronic Solutions – HiRel Kungsgatan 12 SE-411 19 Gothenburg, Sweden Phone: +46 31 7758665 ceder...@gaisler.com www.Aeroflex.com/Gaisler
[PATCH-v2] Generate more efficient memory barriers for LEON3
ChangeLog: 2014-07-11 Daniel Cederman ceder...@gaisler.com gcc/config/sparc/ * sync.md: Generate more efficient memory barriers for LEON3 --- gcc/config/sparc/sync.md | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md index e6e237f..05c3277 100644 --- a/gcc/config/sparc/sync.md +++ b/gcc/config/sparc/sync.md @@ -64,11 +64,19 @@ stbar [(set_attr type multi)]) +;; For LEON3, STB has the effect of membar #StoreLoad. +(define_insn *membar_storeload_leon3 + [(set (match_operand:BLK 0 ) + (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))] + TARGET_LEON3 + stb\t%%g0, [%%sp-1] + [(set_attr type store)]) + ;; For V8, LDSTUB has the effect of membar #StoreLoad. (define_insn *membar_storeload [(set (match_operand:BLK 0 ) (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))] - TARGET_V8 + TARGET_V8 !TARGET_LEON3 ldstub\t[%%sp-1], %%g0 [(set_attr type multi)]) @@ -79,7 +87,7 @@ [(set (match_operand:BLK 0 ) (unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)] UNSPEC_MEMBAR))] - TARGET_V8 + TARGET_V8 !TARGET_LEON3 stbar\n\tldstub\t[%%sp-1], %%g0 [(set_attr type multi) (set_attr length 2)]) @@ -93,6 +101,15 @@ membar\t%1 [(set_attr type multi)]) +;; For LEON3, membar #StoreLoad is enough for a full barrier. +(define_insn *membar_leon3 + [(set (match_operand:BLK 0 ) + (unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)] + UNSPEC_MEMBAR))] + TARGET_LEON3 + stb\t%%g0, [%%sp-1] + [(set_attr type store)]) + (define_peephole2 [(set (match_operand:BLK 0 ) (unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)] -- 1.7.9.5
Re: [PATCH] Fix PR61762, reads from string constants
On Thu, 10 Jul 2014, Jakub Jelinek wrote: On Thu, Jul 10, 2014 at 05:12:53PM +0200, Richard Biener wrote: The following makes sure we fold the reads from string constants created from folding memcpy (foo, string_cst[0], ...) to eventually create similar code as if that memcpy folding wasn't done (move-by-pieces). Bootstrapped and tested on x86_64-unknown-linux-gnu. Comments? (yeah, those native_encode_expr improvement would be really nice, eventually for a slightly different interface, native_encode_expr_part) Don't think you need a new function for that, just add an optional argument to native_encode_expr, and propagate it to native_encode_string, which would if the new bool arg is true not do: if (total_bytes len) return 0; but if total_bytes len set total_bytes to len instead. Or, isn't your code only handling STRING_CSTs and nothing else? Then perhaps just export native_encode_string too and add the optional argument to that. I started with adding an optional arg to native_encode_expr but then figured that all the current workers return failure if the entire expr doesn't fit within 'len'. But what I want is a native_encode_expr that encodes (at most) [offset, offset + len] even when the actual object is larger. But yes, I'd share the actual workers and only add native_encode_expr_part with the extra offset argument. For the particular case native_encode_string is enough but I think for constructor folding and constructor output we want to support CONSTRUCTOR in native_encode_expr in the future. Thus, I'm not sure if changing the semantics of native_encode_expr is ok (or desirable with just an extra offset argument - if so I can add an overload retaining the old semantic for the old interface). Richard.
Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts
On Thu, 10 Jul 2014, Jakub Jelinek wrote: On Thu, Jul 10, 2014 at 04:30:13PM +0200, Richard Biener wrote: Compromise hack below. It simply avoids the transform for sources that c_strlen can compute a length of. That fixes all strlenopt testcase apart from strlenopt-8.c which does memcpy (, flag ? a : b); which then still folds during gimplification. I have XFAILed that. I've tried to comb my way through strlenopt but failed to quickly add support for generic stores (it has very rough support for char stores, see also PR61773). Does the below look ok? I can look at the tree-ssa-strlen.c stuff and removing the !c_strlen hack incrementally. Ok, I'll bootstrap/test the patch then and will commit if there are no issues left. Thanks, Richard.
Re: [PATCH] offline gcda profile processing tool
On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote: Here is the updated patch. The difference with patch set 3 is (1) use the gcov-counter.def for scaling operation. (2) fix wrong scaling function for time-profiler. passed with bootstrap, profiledboostrap and SPEC2006. One of the patches breaks bootstrap for me: /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO C++ forbids zero-size array ‘ctrs’ make[3]: *** [libgcov-util.o] Error 1 host compiler is gcc 4.3.4 Richard. Thanks, -Rong On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote: Hi, Honza, Please find the new patch in the attachment. This patch integrated your earlier suggestions. The noticeable changes are: (1) build specialized object for libgcov-driver.c and libgcov-merge.c and link into gcov-tool, rather including the source file. (2) split some gcov counter definition code to gcov-counter.def to avoid code duplication. (3) use a macro for gcov_read_counter(), so gcov-tool can use existing code in libgcov-merge.c with minimal change. This patch does not address the suggestion of creating a new directory for libgcov. I agree with you that's a much better and cleaner structure we should go for. We can do that in follow-up patches. I tested this patch with boostrap and profiledbootstrap. Other tests are on-going. Thanks, -Rong On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote: GCOT_TOOL needs to use this function to read the string in gcda file to memory to construct gcov_info objects. As you noticed, gcov runtime does not need this interface. But gcov-tool links with gcov runtime and it also uses the function. We could make it available in gcov_runtime, but that will slightly increase the memory footprint. Yep, it is not really pretty. I wrote bellow some plan how things may be structured in more convenient way. Any work in that direction would be welcome. The planned new functions for trunk version is: (1) overlap score b/w two profiles (2) better dumping (top hot objects/function/counters) and statistics. Once this basic version is in, we can start to add the new functionality. Sounds good. I assume the autoFDO does not go via gcov tool but rather uses custom reader of profile data at GCC side? I wonder, are there any recent overview papers/slides/text of design of AutoFDO and other features to be merged? I remember the talk from GCC Summit and I did read some of pre-LTO LIPO sources papers, but it would be nice to have somethin up to date. libgcov-util.o is built in gcc/ directory, rather in libgcc. It's directly linked to gcov-tool. So libgcov-util.o is built for HOST, not TARGET. With makefile changes, we can built HOST version of libgcov-driver.o and libgcov-merge.o. I also need to make some static functions/variables public. I suppose that can go with IN_GCOV_TOOL ifdef. So we currently have basic gcov io handling in gcc/gcov-io.c that is borrowed by libgcc/libgcov* code. We also will get libgcov-util.c in libgcc directory that is actually borrowed by by gcc/gcov-tool.c code. We now have one runtime using STDIO for streaming and kernel has custom version that goes via /proc interface (last time I looked). We added some abstraction into libgcov-interface that is the part that relies on STDIO, partly via gcov-io.c code and now we have in-memory handling code in libgcov-util. I guess it would make most sentse to put all the gcov code into a new directory (libgcov) and make it stand-alone library that can be configured 1) for stdio based runtime as we do not 2) for runtime missing the interface and relyin on user providing it 3) for use within gcov file manipulation tools with reorg of GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces. In a longer term, I would like to make FDO profiling intstrumentation to happen at linktime. For that I need to make the instrumentation code (minimal spaning tree friends) to work w/o cgraph that would ideally be done in a shared implementation Won't this get wrong answer when counters[0] is not the same? I would expected the merging code to compare the counters first. Similarly for delta counter. These *_op functions are for scaling only. So there is only one profile involved (thus there is no comparison). The merge handles are in libgcov-merge.c which have the code to handle mismatched profile targets. I see, OK then. Adding correct rounding may actually make difference for Martin's startup time work. Do you mean to use something like in RDIV macro? Yes. Honza
Re: update address taken: don't drop clobbers
On Thu, Jul 10, 2014 at 8:22 PM, Jeff Law l...@redhat.com wrote: On 07/10/14 09:48, Michael Matz wrote: Hi, On Thu, 10 Jul 2014, Richard Biener wrote: Apart from the out-of-SSA patch you proposed elsewhere a possibility is to simply never mark undefined SSA names as SSA_NAME_OCCURS_IN_ABNORMAL_PHI ... (or not mark those as must-coalesce). The insight to note is, that undefined SSA names should really be coalesced with something (otherwise you lost an optimization opportunity), but it doesn't matter with _what_ each use of the undefined name is coalesced, you can even identify different uses of them with different SSA names (e.g. the LHS of each using stmt). Requires some change in the order things are done in out-of-ssa. The last part is what I hinted might be problematical. If some undefined SSA_NAME appears on the RHS of two PHIs and we want to coalesce that undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those two PHIs must coalesce as well. At least that's my recollection of how all that stuff worked. Yes, coalescing doesn't do live-range splitting to avoid coalescing the two PHI results. But they have to be coalesced anyway. I still think simply never recording conflicts for undefined SSA names is a proper hack to avoid this issue. It was that realization that made me wonder if we should have a unique SSA_NAME at each undefined use point. That would be unnecessarily expensive. Richard. jeff
Re: [PATCH] Fix PR61762, reads from string constants
On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote: I started with adding an optional arg to native_encode_expr but then figured that all the current workers return failure if the entire expr doesn't fit within 'len'. But what I want is a native_encode_expr That is because the functions were written for VIEW_CONVERT_EXPR folding where it is expected that both the types are the same size, and it doesn't make sense to spend too much time on large stuff that doesn't fit into the fold_view_convert_expr's fixed size buffer. Now, for your case the sizes aren't the same and thus partial data is just fine, because what native_interpret_expr will do is small fixed size. So I don't see a problem with not returning NULL and instead handling the partial store into buffer if you ask for it through optional argument (doing it always would be fine for fold_view_convert_expr assuming nobody feeds us different sized argument vs. VCE type, but e.g. tree-loop-distribution.c certainly appreciates if we return NULL for large things, because if it had just partial buffer, it would not be able to verify what it is looking for). Jakub
Re: [PATCH] offline gcda profile processing tool
On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote: Here is the updated patch. The difference with patch set 3 is (1) use the gcov-counter.def for scaling operation. (2) fix wrong scaling function for time-profiler. passed with bootstrap, profiledboostrap and SPEC2006. One of the patches breaks bootstrap for me: /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO C++ forbids zero-size array ‘ctrs’ make[3]: *** [libgcov-util.o] Error 1 host compiler is gcc 4.3.4 Richard. On my side, commit 212448 breaks the cross-build of GCC for targets using newlib: * arm-none-eabi * aarch64-none-elf /usr/include/sys/stat.h: In function 8098void gcov_output_files(const char*, gcov_info*)8099: /usr/include/sys/stat.h:317: error: too few arguments to function 8098int mkdir(const char*, __mode_t)8099 /tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96: error: at this point in file make[2]: *** [gcov-tool.o] Error 1 Christophe. Thanks, -Rong On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote: Hi, Honza, Please find the new patch in the attachment. This patch integrated your earlier suggestions. The noticeable changes are: (1) build specialized object for libgcov-driver.c and libgcov-merge.c and link into gcov-tool, rather including the source file. (2) split some gcov counter definition code to gcov-counter.def to avoid code duplication. (3) use a macro for gcov_read_counter(), so gcov-tool can use existing code in libgcov-merge.c with minimal change. This patch does not address the suggestion of creating a new directory for libgcov. I agree with you that's a much better and cleaner structure we should go for. We can do that in follow-up patches. I tested this patch with boostrap and profiledbootstrap. Other tests are on-going. Thanks, -Rong On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote: GCOT_TOOL needs to use this function to read the string in gcda file to memory to construct gcov_info objects. As you noticed, gcov runtime does not need this interface. But gcov-tool links with gcov runtime and it also uses the function. We could make it available in gcov_runtime, but that will slightly increase the memory footprint. Yep, it is not really pretty. I wrote bellow some plan how things may be structured in more convenient way. Any work in that direction would be welcome. The planned new functions for trunk version is: (1) overlap score b/w two profiles (2) better dumping (top hot objects/function/counters) and statistics. Once this basic version is in, we can start to add the new functionality. Sounds good. I assume the autoFDO does not go via gcov tool but rather uses custom reader of profile data at GCC side? I wonder, are there any recent overview papers/slides/text of design of AutoFDO and other features to be merged? I remember the talk from GCC Summit and I did read some of pre-LTO LIPO sources papers, but it would be nice to have somethin up to date. libgcov-util.o is built in gcc/ directory, rather in libgcc. It's directly linked to gcov-tool. So libgcov-util.o is built for HOST, not TARGET. With makefile changes, we can built HOST version of libgcov-driver.o and libgcov-merge.o. I also need to make some static functions/variables public. I suppose that can go with IN_GCOV_TOOL ifdef. So we currently have basic gcov io handling in gcc/gcov-io.c that is borrowed by libgcc/libgcov* code. We also will get libgcov-util.c in libgcc directory that is actually borrowed by by gcc/gcov-tool.c code. We now have one runtime using STDIO for streaming and kernel has custom version that goes via /proc interface (last time I looked). We added some abstraction into libgcov-interface that is the part that relies on STDIO, partly via gcov-io.c code and now we have in-memory handling code in libgcov-util. I guess it would make most sentse to put all the gcov code into a new directory (libgcov) and make it stand-alone library that can be configured 1) for stdio based runtime as we do not 2) for runtime missing the interface and relyin on user providing it 3) for use within gcov file manipulation tools with reorg of GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces. In a longer term, I would like to make FDO profiling intstrumentation to happen at linktime. For that I need to make the instrumentation code (minimal spaning tree friends) to work w/o cgraph that would ideally be done in a shared implementation Won't this get wrong answer when counters[0] is not the same? I would expected the merging code to compare the counters first. Similarly for delta counter. These *_op functions are for scaling only. So there is only one profile involved (thus there is no comparison). The merge handles are in libgcov-merge.c which have the code to
Re: update address taken: don't drop clobbers
On Fri, Jul 11, 2014 at 10:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Jul 10, 2014 at 8:22 PM, Jeff Law l...@redhat.com wrote: On 07/10/14 09:48, Michael Matz wrote: Hi, On Thu, 10 Jul 2014, Richard Biener wrote: Apart from the out-of-SSA patch you proposed elsewhere a possibility is to simply never mark undefined SSA names as SSA_NAME_OCCURS_IN_ABNORMAL_PHI ... (or not mark those as must-coalesce). The insight to note is, that undefined SSA names should really be coalesced with something (otherwise you lost an optimization opportunity), but it doesn't matter with _what_ each use of the undefined name is coalesced, you can even identify different uses of them with different SSA names (e.g. the LHS of each using stmt). Requires some change in the order things are done in out-of-ssa. The last part is what I hinted might be problematical. If some undefined SSA_NAME appears on the RHS of two PHIs and we want to coalesce that undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those two PHIs must coalesce as well. At least that's my recollection of how all that stuff worked. Yes, coalescing doesn't do live-range splitting to avoid coalescing the two PHI results. But they have to be coalesced anyway. I still think simply never recording conflicts for undefined SSA names is a proper hack to avoid this issue. It was that realization that made me wonder if we should have a unique SSA_NAME at each undefined use point. That would be unnecessarily expensive. Btw, the bug must be already kind-of preexisting due to bool may_propagate_copy (tree dest, tree orig) { ... /* If ORIG flows in from an abnormal edge, it cannot be propagated. */ if (TREE_CODE (orig) == SSA_NAME SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig) /* If it is the default definition and an automatic variable then we can though and it is important that we do to avoid uninitialized regular copies. */ !(SSA_NAME_IS_DEFAULT_DEF (orig) (SSA_NAME_VAR (orig) == NULL_TREE || TREE_CODE (SSA_NAME_VAR (orig)) == VAR_DECL))) return false; but it never replaces an abnormal SSA names with its default definition due to the next check: /* If DEST is an SSA_NAME that flows from an abnormal edge, then it cannot be replaced. */ if (TREE_CODE (dest) == SSA_NAME SSA_NAME_OCCURS_IN_ABNORMAL_PHI (dest)) return false; still it might create overlapping life-ranges for abnormals. Richard. Richard. jeff
Re: [PATCH] Fix PR61762, reads from string constants
On Fri, 11 Jul 2014, Jakub Jelinek wrote: On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote: I started with adding an optional arg to native_encode_expr but then figured that all the current workers return failure if the entire expr doesn't fit within 'len'. But what I want is a native_encode_expr That is because the functions were written for VIEW_CONVERT_EXPR folding where it is expected that both the types are the same size, and it doesn't make sense to spend too much time on large stuff that doesn't fit into the fold_view_convert_expr's fixed size buffer. Now, for your case the sizes aren't the same and thus partial data is just fine, because what native_interpret_expr will do is small fixed size. So I don't see a problem with not returning NULL and instead handling the partial store into buffer if you ask for it through optional argument (doing it always would be fine for fold_view_convert_expr assuming nobody feeds us different sized argument vs. VCE type, but e.g. tree-loop-distribution.c certainly appreciates if we return NULL for large things, because if it had just partial buffer, it would not be able to verify what it is looking for). Yeah, if you pass 'offset' then 'len' suddenly changes semantics from specifying the buffer size to specifying the desired amount of encoded data ... (which the present interface computes for you, the caller doesn't even need to know how large it is). The question is also if I ask native_encode_expr to produce data for [offset, offset + len] but the object only contains part of it - thus the tail is 'undefined' - do we want to fail or simply feed (undefined) garbage to the following native_interpret_expr? (yes, native_encode_expr will of course return sth len in this case) Richard.
Re: Optimize type streaming
Are we sure it ever stabilizes? But yes, I had something like this in mind (just do one iteration always) in case we need to improve hashing. Hi, this is bit quick experiment with strengthening the hash by iteration. I don't seem to be able to measure WPA time difference for the patch though (but on positive side, it also does not make streaming out slower). I did not find clean way to hook things into streamer cache, so I added separate hash for that; I am sure you will know better ;) I added dumping for duplicated trees withing SCC regions that shows some interesting info. It seems that at firefox we do have duplicate type variants, duplicated const decls such as: function_type 0x2bf2a540 type integer_type 0x2bde95e8 int sizes-gimplified public SI size integer_cst 0x2bdd4e70 constant 32 unit size integer_cst 0x2bdd4e88 constant 4 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c pointer_to_this pointer_type 0x2dc527e0 QI size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 bitsizetype constant 8 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 sizetype constant 1 align 8 symtab 0 alias set -1 canonical type 0x2bf2a5e8 arg-types tree_list 0x2beedd20 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 sqlite3_file sizes-gimplified public unsigned DI size integer_cst 0x2bdd4c30 constant 64 unit size integer_cst 0x2bdd4c48 constant 8 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8 pointer_to_this pointer_type 0x2c5dadc8 chain tree_list 0x2beedcf8 value integer_type 0x2bde95e8 int chain tree_list 0x2bde4640 value void_type 0x2bde9f18 void pointer_to_this pointer_type 0x2bf2a690 function_type 0x2bf2adc8 type integer_type 0x2bde95e8 int sizes-gimplified public SI size integer_cst 0x2bdd4e70 constant 32 unit size integer_cst 0x2bdd4e88 constant 4 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c pointer_to_this pointer_type 0x2dc527e0 QI size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 bitsizetype constant 8 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 sizetype constant 1 align 8 symtab 0 alias set -1 canonical type 0x2bf2ae70 arg-types tree_list 0x2beedfa0 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 sqlite3_file sizes-gimplified public unsigned DI size integer_cst 0x2bdd4c30 constant 64 unit size integer_cst 0x2bdd4c48 constant 8 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8 pointer_to_this pointer_type 0x2c5dadc8 chain tree_list 0x2beedf78 value pointer_type 0x2bdf1690 chain tree_list 0x2bde4640 value void_type 0x2bde9f18 void pointer_to_this pointer_type 0x2bf2af18 constants as: integer_cst 0x2e8101b0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810600 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810c90 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810cc0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810ee8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810fd8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 5242893 integer_cst 0x2e8130a8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 5242893 integer_cst 0x2e814858 type enumeral_type 0x2e7e55e8 tag_nsresult constant 7864321 integer_cst 0x2e814870 type enumeral_type 0x2e7e55e8 tag_nsresult constant 7864321 integer_cst 0x2e810f78 type enumeral_type 0x2e7e55e8 tag_nsresult constant 2152726542 integer_cst 0x2e8130c0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 2152726542 tree lists: tree_list 0x2e952708 value pointer_type 0x2e954888 type record_type 0x2e9509d8 PLDHashTable sizes-gimplified type_5 type_6 BLK size integer_cst 0x2d5229a8 constant 384 unit size integer_cst 0x2d522960 constant 48 align 64 symtab 0 alias set -1 canonical type 0x2e950930 fields field_decl 0x2e955000 ops context translation_unit_decl 0x2bdec0a0 /aux/hubicka/firefox/webapprt/gtk2/webapprt.cpp
[PATCH, DOC]: Fix for Options That Control Optimization section
Hello, I fixed Options That Control Optimization section according to 'gcc -Q --help=optimizers' and after consultation with Jakub, I added missing -foptimize-strlen option. Ready for trunk? Martin ChangeLog: 2014-07-11 Martin Liska mli...@suse.cz * doc/invoke.texi: Added missing options to options that control optimization. Missing -foptimize-strlen option introduced. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a83f6c6..8fa63ff 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -6921,25 +6921,31 @@ compilation time. @option{-O} turns on the following optimization flags: @gccoptlist{ -fauto-inc-dec @gol +-fbranch-count-reg @gol +-fcombine-stack-adjustments @gol -fcompare-elim @gol -fcprop-registers @gol -fdce @gol -fdefer-pop @gol -fdelayed-branch @gol -fdse @gol +-fforward-propagate @gol -fguess-branch-probability @gol -fif-conversion2 @gol -fif-conversion @gol +-finline-functions-called-once @gol -fipa-pure-const @gol -fipa-profile @gol -fipa-reference @gol --fmerge-constants +-fmerge-constants @gol +-fmove-loop-invariants @gol +-fshrink-wrap @gol -fsplit-wide-types @gol -ftree-bit-ccp @gol --ftree-builtin-call-dce @gol -ftree-ccp @gol -fssa-phiopt @gol -ftree-ch @gol +-ftree-copy-prop @gol -ftree-copyrename @gol -ftree-dce @gol -ftree-dominator-opts @gol @@ -6947,6 +6953,7 @@ compilation time. -ftree-forwprop @gol -ftree-fre @gol -ftree-phiprop @gol +-ftree-sink @gol -ftree-slsr @gol -ftree-sra @gol -ftree-pta @gol @@ -6978,19 +6985,23 @@ also turns on the following optimization flags: -fhoist-adjacent-loads @gol -finline-small-functions @gol -findirect-inlining @gol +-fipa-cp @gol -fipa-sra @gol -fisolate-erroneous-paths-dereference @gol -foptimize-sibling-calls @gol +-foptimize-strlen @gol -fpartial-inlining @gol -fpeephole2 @gol --freorder-blocks -freorder-functions @gol +-freorder-blocks -freorder-blocks-and-partition -freorder-functions @gol -frerun-cse-after-loop @gol -fsched-interblock -fsched-spec @gol -fschedule-insns -fschedule-insns2 @gol -fstrict-aliasing -fstrict-overflow @gol +-ftree-builtin-call-dce @gol -ftree-switch-conversion -ftree-tail-merge @gol -ftree-pre @gol --ftree-vrp} +-ftree-vrp @gol +-fuse-caller-save} Please note the warning under @option{-fgcse} about invoking @option{-O2} on programs that use computed gotos. @@ -7000,9 +7011,10 @@ invoking @option{-O2} on programs that use computed gotos. Optimize yet more. @option{-O3} turns on all optimizations specified by @option{-O2} and also turns on the @option{-finline-functions}, @option{-funswitch-loops}, @option{-fpredictive-commoning}, -@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize}, -@option{-ftree-slp-vectorize}, @option{-fvect-cost-model}, -@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options. +@option{-fgcse-after-reload}, @option{-ftree-loop-distribute-patterns}, +@option{-ftree-loop-vectorize}, @option{-ftree-slp-vectorize}, +@option{-fvect-cost-model}, @option{-ftree-partial-pre} +and @option{-fipa-cp-clone} options. @item -O0 @opindex O0 @@ -7113,6 +7125,14 @@ Optimize sibling and tail recursive calls. Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}. +@item -foptimize-strlen +@opindex foptimize-strlen +Optimize various standard C string functions (e.g. @code{strlen}, +@code{strchr} or @code{strcpy}) and +their _FORTIFY_SOURCE counterparts into faster alternatives. + +Enabled at levels @option{-O2}, @option{-O3}. + @item -fno-inline @opindex fno-inline Do not expand any functions inline apart from those marked with @@ -7278,6 +7298,8 @@ register, compare it against zero, then branch based upon the result. This option is only meaningful on architectures that support such instructions, which include x86, PowerPC, IA-64 and S/390. +Enabled by default at -O1 and higher. + The default is @option{-fbranch-count-reg}. @item -fno-function-cse
Re: [GSoC] generation of Gimple loops with empty bodies
I would use -x to see the headers. This should tell you if it is defined or only used there. It gives the following output: roman@roman-System-Product-Name:~/isl-0.12.2/lib$ objdump -x libisl.so.10.2.2 | grep isl_ast_expr_get_val 00060380 g F .text 0053 isl_ast_expr_get_val Sorry, but this still needs to be debugged. :-( I am too busy to do it myself, so i am afraid you may just need to dig into it yourself. :-( I've found the reason of this error. Somehow cc1 used an older version of isl provided by the package from the standard repository in spite of manually selected paths to isl 0.12.2. I think it is better to move on and not to concentrate on this. +#include graphite-htab.h Is this still needed? No, this is redundant. +/* IVS_PARAMS_MAP maps ISL's scattering and parameter identifiers + to corresponding trees. */ +typedef struct ivs_params { + std::mapisl_id *, tree, id_comp ivs_params_map; What about calling it isl_id_to_tree? Maybe it would be better to call it tree_from_isl_id (because it would help to avoid confusion with definitions from isl)? + sese region; Is this region needed? No, this is redundant. + We create a new if region protecting the loop to be executed, if + the execution count is zero (lower bound upper bound). In this case, + it is subsequently removed by dead code elimination. */ The last sentence is unclear. Are you saying the loop will be removed or that the condition will be removed in case it is always true. Also, what about cases where the condition remains? I wanted to say that the loop will be removed. Maybe it would be better to remove the last sentence at all (because the previous sentence already explains the motivation)? Those changes are unrelated and are part of the other patch, right? Yes, these changes were made in the revision under No. 212361. + ip.region = region; Is this needed? ip.region seems unused. Yes, this is redundant. I tried to incorporate all your comments in the attached patch. -- Cheers, Roman Gareev. Index: gcc/graphite-isl-ast-to-gimple.c === --- gcc/graphite-isl-ast-to-gimple.c(revision 212361) +++ gcc/graphite-isl-ast-to-gimple.c(working copy) @@ -25,7 +25,14 @@ #include isl/map.h #include isl/union_map.h #include isl/ast_build.h +#if defined(__cplusplus) +extern C { #endif +#include isl/val_gmp.h +#if defined(__cplusplus) +} +#endif +#endif #include system.h #include coretypes.h @@ -42,6 +49,9 @@ #include cfgloop.h #include tree-data-ref.h #include sese.h +#include tree-ssa-loop-manip.h +#include tree-scalar-evolution.h +#include map #ifdef HAVE_cloog #include graphite-poly.h @@ -52,6 +62,517 @@ static bool graphite_regenerate_error; +/* We always use signed 128, until isl is able to give information about +types */ + +static tree *graphite_expression_size_type = int128_integer_type_node; + +/* Converts a GMP constant VAL to a tree and returns it. */ + +static tree +gmp_cst_to_tree (tree type, mpz_t val) +{ + tree t = type ? type : integer_type_node; + mpz_t tmp; + + mpz_init (tmp); + mpz_set (tmp, val); + wide_int wi = wi::from_mpz (t, tmp, true); + mpz_clear (tmp); + + return wide_int_to_tree (t, wi); +} + +/* Verifies properties that GRAPHITE should maintain during translation. */ + +static inline void +graphite_verify (void) +{ +#ifdef ENABLE_CHECKING + verify_loop_structure (); + verify_loop_closed_ssa (true); +#endif +} + +/* TREE_FROM_ISL_ID maps ISL's scattering and parameter identifiers + to corresponding trees. */ + +typedef struct ivs_params { + std::mapisl_id *, tree tree_from_isl_id; +} *ivs_params_p; + +/* Free all memory allocated for ISL's identifiers. */ + +void ivs_params_clear (ivs_params_p ip) +{ + std::mapisl_id *, tree::iterator it; + for (it = ip-tree_from_isl_id.begin (); + it != ip-tree_from_isl_id.end (); it++) +{ + isl_id_free (it-first); +} +} + +static tree +gcc_expression_from_isl_expression (tree type, __isl_take isl_ast_expr *, + ivs_params_p ip); + +/* Return the tree variable that corresponds to the given isl ast identifier + expression (an isl_ast_expr of type isl_ast_expr_id). */ + +static tree +gcc_expression_from_isl_ast_expr_id (__isl_keep isl_ast_expr *expr_id, +ivs_params_p ip) +{ + gcc_assert (isl_ast_expr_get_type (expr_id) == isl_ast_expr_id); + isl_id *tmp_isl_id = isl_ast_expr_get_id (expr_id); + std::mapisl_id *, tree::iterator res; + res = ip-tree_from_isl_id.find (tmp_isl_id); + isl_id_free (tmp_isl_id); + gcc_assert (res != ip-tree_from_isl_id.end () + Could not map isl_id to tree expression); + isl_ast_expr_free (expr_id); + return res-second; +} + +/* Converts an isl_ast_expr_int expression E to a GCC expression tree of + type TYPE. */ + +static tree
Re: [patch 1/4] change specific int128 - generic intN
PSImode is 20 bits, fits in a 20 bit register, and uses 20 bit operations. Then why do you need this change? - TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type))); + TYPE_SIZE (type) = bitsize_int (GET_MODE_PRECISION (TYPE_MODE (type))); TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type))); break; What are GET_MODE_BITSIZE and GET_MODE_PRECISION for PSImode? If a type is 17-20 bits, PSImode is chosen. If it's 21 bits or larger, SImode is chosen. If it's 16 or fewer bits, HImode is chosen. Size or precision? That's the crux of the matter. -- Eric Botcazou
Fwd: Re: [PING][PATCH] Fix for PR 61561
Thank to you all. Committed revision 212450. --Marat Original Message Subject:Re: [PING][PATCH] Fix for PR 61561 Date: Thu, 10 Jul 2014 14:01:24 +0100 From: Ramana Radhakrishnan ramana.radhakrish...@arm.com To: Marat Zakirov m.zaki...@samsung.com, gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org CC: Richard Earnshaw richard.earns...@arm.com, Kyrylo Tkachov kyrylo.tkac...@arm.com, Slava Garbuzov v.garbu...@samsung.com, Yuri Gribov tetra2...@gmail.com, mara...@gmail.com mara...@gmail.com On 30/06/14 16:21, Marat Zakirov wrote: Thank for your attention. This is OK for trunk - Sorry about the delayed response. Ramana Marat.
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: Are we sure it ever stabilizes? But yes, I had something like this in mind (just do one iteration always) in case we need to improve hashing. Hi, this is bit quick experiment with strengthening the hash by iteration. I don't seem to be able to measure WPA time difference for the patch though (but on positive side, it also does not make streaming out slower). I did not find clean way to hook things into streamer cache, so I added separate hash for that; I am sure you will know better ;) Well - as we re-use the streamer cache to store the hash value it isn't really possible to do better ... (at least I don't have a clever idea) I added dumping for duplicated trees withing SCC regions that shows some interesting info. It seems that at firefox we do have duplicate type variants, duplicated const decls such as: function_type 0x2bf2a540 type integer_type 0x2bde95e8 int sizes-gimplified public SI size integer_cst 0x2bdd4e70 constant 32 unit size integer_cst 0x2bdd4e88 constant 4 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c pointer_to_this pointer_type 0x2dc527e0 QI size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 bitsizetype constant 8 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 sizetype constant 1 align 8 symtab 0 alias set -1 canonical type 0x2bf2a5e8 arg-types tree_list 0x2beedd20 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 sqlite3_file sizes-gimplified public unsigned DI size integer_cst 0x2bdd4c30 constant 64 unit size integer_cst 0x2bdd4c48 constant 8 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8 pointer_to_this pointer_type 0x2c5dadc8 chain tree_list 0x2beedcf8 value integer_type 0x2bde95e8 int chain tree_list 0x2bde4640 value void_type 0x2bde9f18 void pointer_to_this pointer_type 0x2bf2a690 function_type 0x2bf2adc8 type integer_type 0x2bde95e8 int sizes-gimplified public SI size integer_cst 0x2bdd4e70 constant 32 unit size integer_cst 0x2bdd4e88 constant 4 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c pointer_to_this pointer_type 0x2dc527e0 QI size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 bitsizetype constant 8 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 sizetype constant 1 align 8 symtab 0 alias set -1 canonical type 0x2bf2ae70 arg-types tree_list 0x2beedfa0 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 sqlite3_file sizes-gimplified public unsigned DI size integer_cst 0x2bdd4c30 constant 64 unit size integer_cst 0x2bdd4c48 constant 8 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8 pointer_to_this pointer_type 0x2c5dadc8 chain tree_list 0x2beedf78 value pointer_type 0x2bdf1690 chain tree_list 0x2bde4640 value void_type 0x2bde9f18 void pointer_to_this pointer_type 0x2bf2af18 constants as: integer_cst 0x2e8101b0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810600 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810c90 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810cc0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810ee8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 0 integer_cst 0x2e810fd8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 5242893 integer_cst 0x2e8130a8 type enumeral_type 0x2e7e55e8 tag_nsresult constant 5242893 integer_cst 0x2e814858 type enumeral_type 0x2e7e55e8 tag_nsresult constant 7864321 integer_cst 0x2e814870 type enumeral_type 0x2e7e55e8 tag_nsresult constant 7864321 integer_cst 0x2e810f78 type enumeral_type 0x2e7e55e8 tag_nsresult constant 2152726542 integer_cst 0x2e8130c0 type enumeral_type 0x2e7e55e8 tag_nsresult constant 2152726542 tree lists: tree_list 0x2e952708 value pointer_type 0x2e954888 type record_type 0x2e9509d8 PLDHashTable sizes-gimplified type_5 type_6 BLK size integer_cst 0x2d5229a8
Re: [patch,gomp-4_0-branch] misc reduction clause bug fixes
Hi Cesar! On Thu, 10 Jul 2014 11:43:11 -0700, Cesar Philippidis ce...@codesourcery.com wrote: This patch addresses two bugs openacc reduction clause bugs. Thanks! OK; one question/suggestion, though: --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -9679,11 +9679,23 @@ process_reduction_data (gimple_seq *body, gimple_seq *in_stmt_seqp, gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt)); gimple_stmt_iterator gsi; + gimple_seq inner; + gimple stmt; + + /* A collapse clause may have inserted a new bind block. */ + stmt = gimple_seq_first (*body); + if (stmt gimple_code (stmt) == GIMPLE_BIND) +{ + inner = gimple_bind_body (gimple_seq_first (*body)); + body = inner; +} for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (gsi)) { gimple stmt = gsi_stmt (gsi); Can get rid of this shadow variable stmt? Grüße, Thomas pgpAk2WKO0Q1o.pgp Description: PGP signature
Re: [PATCH] Fix PR61762, reads from string constants
On Fri, 11 Jul 2014, Richard Biener wrote: On Fri, 11 Jul 2014, Jakub Jelinek wrote: On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote: I started with adding an optional arg to native_encode_expr but then figured that all the current workers return failure if the entire expr doesn't fit within 'len'. But what I want is a native_encode_expr That is because the functions were written for VIEW_CONVERT_EXPR folding where it is expected that both the types are the same size, and it doesn't make sense to spend too much time on large stuff that doesn't fit into the fold_view_convert_expr's fixed size buffer. Now, for your case the sizes aren't the same and thus partial data is just fine, because what native_interpret_expr will do is small fixed size. So I don't see a problem with not returning NULL and instead handling the partial store into buffer if you ask for it through optional argument (doing it always would be fine for fold_view_convert_expr assuming nobody feeds us different sized argument vs. VCE type, but e.g. tree-loop-distribution.c certainly appreciates if we return NULL for large things, because if it had just partial buffer, it would not be able to verify what it is looking for). Yeah, if you pass 'offset' then 'len' suddenly changes semantics from specifying the buffer size to specifying the desired amount of encoded data ... (which the present interface computes for you, the caller doesn't even need to know how large it is). The question is also if I ask native_encode_expr to produce data for [offset, offset + len] but the object only contains part of it - thus the tail is 'undefined' - do we want to fail or simply feed (undefined) garbage to the following native_interpret_expr? (yes, native_encode_expr will of course return sth len in this case) Like the following (completely untested - lacks handling of len total_bytes - off as I just figured out). If off is -1 then it should behave as the current interface, else we encode the part [off, off + MIN (len, total_bytes)] and return MIN (len, total_bytes). Ah, a guard against off = total_bytes is also missing. I wonder if it would be ok to start with gcc_assert (off == -1) in all but native_encode_string. Does the interface look sane? Thanks, Richard. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 212449) +++ gcc/fold-const.c(working copy) @@ -7239,15 +7239,17 @@ fold_plusminus_mult_expr (location_t loc upon failure. */ static int -native_encode_int (const_tree expr, unsigned char *ptr, int len) +native_encode_int (const_tree expr, unsigned char *ptr, int len, int off) { tree type = TREE_TYPE (expr); int total_bytes = GET_MODE_SIZE (TYPE_MODE (type)); int byte, offset, word, words; unsigned char value; - if (total_bytes len) + if (off == -1 total_bytes len) return 0; + if (off == -1) +off = 0; words = total_bytes / UNITS_PER_WORD; for (byte = 0; byte total_bytes; byte++) @@ -7270,9 +7272,10 @@ native_encode_int (const_tree expr, unsi } else offset = BYTES_BIG_ENDIAN ? (total_bytes - 1) - byte : byte; - ptr[offset] = value; + if (offset = off) + ptr[offset - off] = value; } - return total_bytes; + return total_bytes - off; } @@ -7282,7 +7285,7 @@ native_encode_int (const_tree expr, unsi upon failure. */ static int -native_encode_fixed (const_tree expr, unsigned char *ptr, int len) +native_encode_fixed (const_tree expr, unsigned char *ptr, int len, int off) { tree type = TREE_TYPE (expr); enum machine_mode mode = TYPE_MODE (type); @@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un FIXED_VALUE_TYPE value; tree i_value, i_type; - if (total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) + if (off == -1 + total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) return 0; i_type = lang_hooks.types.type_for_size (GET_MODE_BITSIZE (mode), 1); @@ -7302,7 +7306,7 @@ native_encode_fixed (const_tree expr, un value = TREE_FIXED_CST (expr); i_value = double_int_to_tree (i_type, value.data); - return native_encode_int (i_value, ptr, len); + return native_encode_int (i_value, ptr, len, off); } @@ -7312,7 +7316,7 @@ native_encode_fixed (const_tree expr, un upon failure. */ static int -native_encode_real (const_tree expr, unsigned char *ptr, int len) +native_encode_real (const_tree expr, unsigned char *ptr, int len, int off) { tree type = TREE_TYPE (expr); int total_bytes = GET_MODE_SIZE (TYPE_MODE (type)); @@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns up to 192 bits. */ long tmp[6]; - if (total_bytes len) + if (off == -1 + total_bytes len) return 0; + if (off == -1) +off = 0; words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD; real_to_target (tmp, TREE_REAL_CST_PTR (expr),
Re: [PATCH] Generate more efficient memory barriers for LEON3
The full barrier pattern membar_leon3 also gets generated so I think that one should be kept also. Do you have a testcase? membar is generated by sparc_emit_membar_for_model and, for the TSO model of LEON3, implied = StoreStore | LoadLoad | LoadStore so mm can only be StoreLoad, which means that membar_storeload will match so the full barrier never will. -- Eric Botcazou
Use separate sections to stream non-trivial constructors
Hi, since we both agreed offlining constructors from global decl stream is a good idea, I went ahead and implemented it. I would like to followup by an cleanups; for example the sections are still tagged as function sections, but I would like to do it incrementally. There is quite some uglyness in the way we handle function sections and the patch started to snowball very quickly. The patch conceptually copies what we do for functions and re-uses most of infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of getting function in) and it is used by output machinery, by ipa-visibility while rewritting the constructor and by ctor_for_folding (which makes us to load the ctor whenever it is needed by ipa-cp or ipa-devirt). I kept get_symbol_initial_value as an authority to decide if we want to encode given constructor or not. The section itself for trivial ctor is about 25 bytes and with header it is probably close to double of it. Currently the heuristic is to offline only constructors that are CONSTRUCTOR and keep simple expressions inline. We may want to tweak it. The patch does not bring miraculous savings to firefox WPA, but it does some: GGC memory after global stream is read goes from 1376898k to 1250533k overall GGC allocations from 4156478 kB to 4012462 kB read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average size 2.037867 20997206 tree bodies read in total - 18584194 tree bodies read in total Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section decls: 271557265 bytes Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section function_body: 7548680 bytes Things would be better if ipa-visibility and ipa-devirt did not load most of the virtual tables into memory (still better than loading each into memory 20 times at average). I will work on that incrementally. We load 10311 ctors into memory at WPA time. Note that firefox seems to feature really huge data segment these days. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK? * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h (varpool_get_constructor): New function. (ctor_for_folding): Use it. (varpool_assemble_decl): Likewise. * lto-streamer.h (struct output_block): Turn cgraph_node to symbol filed. (lto_input_variable_constructor): Declare. * ipa-visibility.c (function_and_variable_visibility): Use varpool_get_constructor. * cgraph.h (varpool_get_constructor): Declare. * lto-streamer-out.c (get_symbol_initial_value): Take encoder parameter; return error_mark_node for non-trivial constructors. (lto_write_tree_1, DFS_write_tree): UPdate use of get_symbol_initial_value. (output_function): Update initialization of symbol. (output_constructor): New function. (copy_function): Rename to .. (copy_function_or_variable): ... this one; handle vars too. (lto_output): Output variable sections. * lto-streamer-in.c (input_constructor): New function. (lto_read_body): Rename from ... (lto_read_body_or_constructor): ... this one; handle vars too. (lto_input_variable_constructor): New function. * ipa-prop.c (ipa_prop_write_jump_functions, ipa_prop_write_all_agg_replacement): Update. Index: varpool.c === --- varpool.c (revision 212426) +++ varpool.c (working copy) @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3. #include gimple-expr.h #include flags.h #include pointer-set.h +#include tree-ssa-alias.h +#include gimple.h +#include lto-streamer.h const char * const tls_model_names[]={none, tls-emulated, tls-real, tls-global-dynamic, tls-local-dynamic, @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname) return NULL; } +/* When doing LTO, read NODE's constructor from disk if it is not already present. */ + +tree +varpool_get_constructor (struct varpool_node *node) +{ + struct lto_file_decl_data *file_data; + const char *data, *name; + size_t len; + tree decl = node-decl; + + if (DECL_INITIAL (node-decl) != error_mark_node + || !in_lto_p) +return DECL_INITIAL (node-decl); + + file_data = node-lto_file_data; + name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); + + /* We may have renamed the declaration, e.g., a static function. */ + name = lto_get_decl_name_mapping (file_data, name); + + data = lto_get_section_data (file_data, LTO_section_function_body, + name, len); + if (!data) +fatal_error (%s: section %s is missing, +file_data-file_name, +name); + + lto_input_variable_constructor (file_data, node, data); +
Re: [GSoC] generation of Gimple loops with empty bodies
I tried to incorporate all your comments in the attached patch. Nice. It looks now very good to me. One final and last detail: +/* TREE_FROM_ISL_ID maps ISL's scattering and parameter identifiers + to corresponding trees. */ + +typedef struct ivs_params { + std::mapisl_id *, tree tree_from_isl_id; +} *ivs_params_p; The struct now contains only a single element such that there seems to be no need for it anymore. Should we remove it? (We could still use a typedef if you believe the datatype is too long). Cheers, Tobias
Re: [patch,gomp-4_0-branch] openacc collapse clause
Hi Cesar! On Thu, 10 Jul 2014 11:47:42 -0700, Cesar Philippidis ce...@codesourcery.com wrote: These patch enables the collapse clause with a value greater than one. Thanks! Is this patch OK for gomp-4_0-branch? OK with the following addressed: --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -11260,6 +11260,10 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask, switch (c_kind) { + case PRAGMA_OMP_CLAUSE_COLLAPSE: + clauses = c_parser_omp_clause_collapse (parser, clauses); + c_name = collapse; + break; Annotate c_parser_omp_clause_collapse that it is now used for OpenACC, too. #define OACC_LOOP_CLAUSE_MASK \ - (OMP_CLAUSE_MASK_1 PRAGMA_OMP_CLAUSE_REDUCTION) + ( (OMP_CLAUSE_MASK_1 PRAGMA_OMP_CLAUSE_COLLAPSE) \ + | (OMP_CLAUSE_MASK_1 PRAGMA_OMP_CLAUSE_REDUCTION)) Given the syntax generally used, add a space between the two closing parens. Even though we're re-using OpenMP's tested implementation, please add some basic front end test coverage for OpenACC: reject invalid, accept valid syntax. Grüße, Thomas pgpbYNzXe3f6Q.pgp Description: PGP signature
Re: Optimize type streaming
Well - as we re-use the streamer cache to store the hash value it isn't really possible to do better ... (at least I don't have a clever idea) OK, no cleverness on my side either. Yeah (though you wouldn't need the extra hashing - we only need to know the hash of the SCC). The current iterating to iterate until no duplicate hashes are found is bogous IMHO - better do the right thing from the start or only iterate until at least _one_ hash is unique (you get rid of the scc_entry_len 1 case which can be quite expensive). agreed. I just got this implemented before I got into enough of understading of the lto/lto.c side of the machinery - I assumed that identical nodes are tested deeply but that would of course be expensive Of course the question is whether we can always guarantee this property ... if we can then the offline compare can be expanded to all trees (it requires a stable sort of the SCCs independent on SCC entry). I do not think we need stable sorting here. This is what I think extra DFS walk should ensure - as soon as we have unique entry (by identifying first node w/o conflict), the DFS walks in all units will give same order. I do not think we can 100% guarantee it, but we can get very close to 100%. If we won't find unique entry, I suppose we can stick with current scheme or just drop the complicated path with multiple entries and do not merge in this case (or just merge when the entry happens to match by accident). This is because those otherwise equivalent trees are not really equivalent from merging perspective. You test strict pointer equivalence outside SCC regions, so it matters whether two same trees are referred or two equivalent copies. Hmm? So you say you have two identical trees in the _same_ SCC? How can that be? Answer: not possible - they at least differ in the edges forming the SCC (unless we have a cyclic list but I don't think we (should) have that). We hash only on outgoing SCC edges. You can easily have main variant type T and variants T1,T2 that are same all used by type T again. T1 and T2 differs by references in T, but they will always have same hash and in theory they can be merged if your code worked recursively on nested SCC regions (as given by Tarjan's algorithm). Because you work on outermost SCC regions only, we never merge T1 and T2. The congruence you compute is not best possible, but best possible assuming no duplicates within compilation units. I think this is sane, we should look towards eliminating such duplicates rather than working harder in the streamer. Btw, if you re-hash only nodes with duplicate hashes might there be a higher probability of ending up with a hash that collides with one that was not a duplicate one? That is, why not simply re-hash all nodes? (premature optimization...?) So - can you can iterate on this re-doing the DFS walk with the found entry node? Yep, that is the plan. Iterate long enough until I get unqiue hash. Can I easilly re-do the DFS walk? Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 212426) +++ lto-streamer-out.c (working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as -we cannot recurse through only pointers. -??? We can generalize this by keeping track of the -in-SCC edges for each tree (or arbitrarily the first -such edge) and hashing that in in a second stage -(instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x =
Re: [PATCH] Fix PR61762, reads from string constants
On Fri, Jul 11, 2014 at 11:13:12AM +0200, Richard Biener wrote: + if (offset = off) + ptr[offset - off] = value; For original off != 0, you aren't checking whether offset - off len though (various places), you don't want to write beyond end of buffer. - return total_bytes; + return total_bytes - off; What will happen with originally off is bigger than total_bytes? Returning negative value sounds wrong, IMHO you should in that case return early 0. So like: if ((off == -1 total_bytes len) || off = total_bytes) return 0; @@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un FIXED_VALUE_TYPE value; tree i_value, i_type; - if (total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) + if (off == -1 + total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) return 0; This isn't comparing total_bytes to len, so IMHO shouldn't be changed. @@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns up to 192 bits. */ long tmp[6]; - if (total_bytes len) + if (off == -1 + total_bytes len) This can fit onto one line. - rsize = native_encode_expr (part, ptr, len); + rsize = native_encode_expr (part, ptr, len, off); if (rsize == 0) return 0; If off is not -1 and len is too short, the above will do a partial store. But: a) if it is a partial store, because some bytes didn't fit, then the second native_encode_expr should probably not be invoked b) what about the case when the first one returns 0 because you are asking for few bytes from the imag part? part = TREE_IMAGPART (expr); - isize = native_encode_expr (part, ptr+rsize, len-rsize); - if (isize != rsize) + if (off != -1) +off = MAX (0, off - rsize); + isize = native_encode_expr (part, ptr+rsize, len-rsize, off); + if (off == -1 + isize != rsize) return 0; return rsize + isize; } @@ -7396,9 +7408,13 @@ native_encode_vector (const_tree expr, u for (i = 0; i count; i++) { elem = VECTOR_CST_ELT (expr, i); - if (native_encode_expr (elem, ptr+offset, len-offset) != size) + int res = native_encode_expr (elem, ptr+offset, len-offset, off); + if (off == -1 +res != size) return 0; I don't think this will work correctly if off is not -1. if (TREE_STRING_LENGTH (expr) total_bytes) No verification that you are not accessing beyond end of string here. { - memcpy (ptr, TREE_STRING_POINTER (expr), TREE_STRING_LENGTH (expr)); - memset (ptr + TREE_STRING_LENGTH (expr), 0, - total_bytes - TREE_STRING_LENGTH (expr)); + memcpy (ptr, TREE_STRING_POINTER (expr) + off, + TREE_STRING_LENGTH (expr) - off); + memset (ptr + TREE_STRING_LENGTH (expr) - off, 0, + MIN (total_bytes, len) - TREE_STRING_LENGTH (expr) + off); } else -memcpy (ptr, TREE_STRING_POINTER (expr), total_bytes); - return total_bytes; +memcpy (ptr, TREE_STRING_POINTER (expr) + off, + MIN (total_bytes, len)); + return MIN (total_bytes - off, len); } Jakub
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: Well - as we re-use the streamer cache to store the hash value it isn't really possible to do better ... (at least I don't have a clever idea) OK, no cleverness on my side either. Yeah (though you wouldn't need the extra hashing - we only need to know the hash of the SCC). The current iterating to iterate until no duplicate hashes are found is bogous IMHO - better do the right thing from the start or only iterate until at least _one_ hash is unique (you get rid of the scc_entry_len 1 case which can be quite expensive). agreed. I just got this implemented before I got into enough of understading of the lto/lto.c side of the machinery - I assumed that identical nodes are tested deeply but that would of course be expensive Of course the question is whether we can always guarantee this property ... if we can then the offline compare can be expanded to all trees (it requires a stable sort of the SCCs independent on SCC entry). I do not think we need stable sorting here. This is what I think extra DFS walk should ensure - as soon as we have unique entry (by identifying first node w/o conflict), the DFS walks in all units will give same order. Yes, that's what I call stable sorting of the SCC. I do not think we can 100% guarantee it, but we can get very close to 100%. If we won't find unique entry, I suppose we can stick with current scheme or just drop the complicated path with multiple entries and do not merge in this case (or just merge when the entry happens to match by accident). Yeah, I suppose we want some statistics here (how many times we have to iterate - I've once placed an assert to see wheter we have SCCs with non-unique entry and there was none during LTO bootstrap at least, with no iteration). This is because those otherwise equivalent trees are not really equivalent from merging perspective. You test strict pointer equivalence outside SCC regions, so it matters whether two same trees are referred or two equivalent copies. Hmm? So you say you have two identical trees in the _same_ SCC? How can that be? Answer: not possible - they at least differ in the edges forming the SCC (unless we have a cyclic list but I don't think we (should) have that). We hash only on outgoing SCC edges. You can easily have main variant type T and variants T1,T2 that are same all used by type T again. Ok, but then the two variants shouldn't be in the same SCC or at least not in the same SCC as the main variant. T1 and T2 differs by references in T, but they will always have same hash and in theory they can be merged if your code worked recursively on nested SCC regions (as given by Tarjan's algorithm). Because you work on outermost SCC regions only, we never merge T1 and T2. I wonder what ties them into a SCC though. The congruence you compute is not best possible, but best possible assuming no duplicates within compilation units. I think this is sane, we should look towards eliminating such duplicates rather than working harder in the streamer. Yeah. Btw, if you re-hash only nodes with duplicate hashes might there be a higher probability of ending up with a hash that collides with one that was not a duplicate one? That is, why not simply re-hash all nodes? (premature optimization...?) So - can you can iterate on this re-doing the DFS walk with the found entry node? Yep, that is the plan. Iterate long enough until I get unqiue hash. Can I easilly re-do the DFS walk? Not that easily... you have to refactor things a bit I fear ;) DFS_write_tree is the main worker and the init is embedded in lto_output_tree. The main issue is that you can end up with multiple SCCs from the start of the DFS walk. So you'd need to first collect all SCCs (but not write anything yet) and then factor out a worker that processes them - you can then iterate the DFS walk on the SCCs. Quite some work eventually. Not sure ;) Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c(revision 212426) +++ lto-streamer-out.c(working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +
Re: [RFC, PATCH 1/n] IPA C++ refactoring
Hi, this first patch continues with rafactoring of IPA infrastructure so that we will have C++ API. In the patch, I transformed many global functions to members of symtab_node and cgraph_node. Example: cgraph_remove_node (struct cgraph_node *node) - cgraph_node::remove (void) symtab_unregister_node (symtab_node *node) - symtab_node::unregister (void) The patch is being consulted with Honza and will iterate. We want to inform folk that we plan to do following changes. After the patch is applied, I would like to transform varpool_node and cgraph_edge in the following patch. Thank you for your comments, Martin /* Remove the node from cgraph. */ Perhaps Remove function from symbol table. (similarly for varpool, perhaps few other block comments needs revisiting. We may do that incrementally.) + /* Add node into symbol table. This function is not used directly, but via + cgraph/varpool node creation routines. */ + void register_symbol (void); + + /* Remove symtab node from the symbol table. */ + void remove (void); + + /* Dump symtab node to F. */ + void dump (FILE *f); + + /* Dump symtab node to stderr. */ + void DEBUG_FUNCTION debug (void); + + /* Verify consistency of node. */ + void DEBUG_FUNCTION verify (void); + + /* Return ipa reference from this symtab_node to + REFERED_NODE or REFERED_VARPOOL_NODE. USE_TYPE specify type + of the use and STMT the statement (if it exists). */ + struct ipa_ref *add_reference (symtab_node *referred_node, + enum ipa_ref_use use_type); + + /* Return ipa reference from this symtab_node to + REFERED_NODE or REFERED_VARPOOL_NODE. USE_TYPE specify type + of the use and STMT the statement (if it exists). */ + struct ipa_ref *add_reference (symtab_node *referred_node, +enum ipa_ref_use use_type, gimple stmt); + + /* If VAL is a reference to a function or a variable, add a reference from + this symtab_node to the corresponding symbol table node. USE_TYPE specify + type of the use and STMT the statement (if it exists). Return the new + reference or NULL if none was created. */ + struct ipa_ref *maybe_add_reference (tree val, enum ipa_ref_use use_type, + gimple stmt); + + /* Clone all references from symtab NODE to this symtab_node. */ + void clone_references (symtab_node *node); + + /* Remove all stmt references in non-speculative references. + Those are not maintained during inlining clonning. + The exception are speculative references that are updated along + with callgraph edges associated with them. */ + void clone_referring (symtab_node *node); + + /* Clone reference REF to this symtab_node and set its stmt to STMT. */ + struct ipa_ref *clone_reference (struct ipa_ref *ref, gimple stmt); + + /* Find the structure describing a reference to REFERRED_NODE + and associated with statement STMT. */ + struct ipa_ref *find_reference (symtab_node *, gimple, unsigned int); + + /* Remove all references that are associated with statement STMT. */ + void remove_stmt_references (gimple stmt); + + /* Remove all stmt references in non-speculative references. + Those are not maintained during inlining clonning. + The exception are speculative references that are updated along + with callgraph edges associated with them. */ + void clear_stmts_in_references (void); + + /* Remove all references in ref list. */ + void remove_all_references (void); + + /* Remove all referring items in ref list. */ + void remove_all_referring (void); + + /* Dump references in ref list to FILE. */ + void dump_references (FILE *file); + + /* Dump referring in list to FILE. */ + void dump_referring (FILE *); + + /* Return true if symtab node and TARGET represents + semantically equivalent symbols. */ + bool semantically_equivalent_p (symtab_node *target); + + /* Classify symbol symtab node for partitioning. */ + enum symbol_partitioning_class get_partitioning_class (void); + + /* Return comdat group. */ + tree get_comdat_group () +{ + return x_comdat_group; +} + + /* Return comdat group as identifier_node. */ + tree get_comdat_group_id () +{ + if (x_comdat_group TREE_CODE (x_comdat_group) != IDENTIFIER_NODE) + x_comdat_group = DECL_ASSEMBLER_NAME (x_comdat_group); + return x_comdat_group; +} + + /* Set comdat group. */ + void set_comdat_group (tree group) +{ + gcc_checking_assert (!group || TREE_CODE (group) == IDENTIFIER_NODE + || DECL_P (group)); + x_comdat_group = group; +} + + /* Return section as string. */ + const char * get_section () +{ + if (!x_section) + return NULL; + return x_section-name; +} + + /* Remove node from same comdat group. */ + void remove_from_same_comdat_group (void); + + /* Add this symtab_node to the same comdat
Re: Optimize type streaming
We hash only on outgoing SCC edges. You can easily have main variant type T and variants T1,T2 that are same all used by type T again. Ok, but then the two variants shouldn't be in the same SCC or at least not in the same SCC as the main variant. They will because T refers them (since it contains i.e. pointers to itself) T1 and T2 differs by references in T, but they will always have same hash and in theory they can be merged if your code worked recursively on nested SCC regions (as given by Tarjan's algorithm). Because you work on outermost SCC regions only, we never merge T1 and T2. I wonder what ties them into a SCC though. The congruence you compute is not best possible, but best possible assuming no duplicates within compilation units. I think this is sane, we should look towards eliminating such duplicates rather than working harder in the streamer. Yeah. Btw, if you re-hash only nodes with duplicate hashes might there be a higher probability of ending up with a hash that collides with one that was not a duplicate one? That is, why not simply re-hash all nodes? (premature optimization...?) So - can you can iterate on this re-doing the DFS walk with the found entry node? Yep, that is the plan. Iterate long enough until I get unqiue hash. Can I easilly re-do the DFS walk? Not that easily... you have to refactor things a bit I fear ;) DFS_write_tree is the main worker and the init is embedded in lto_output_tree. The main issue is that you can end up with multiple SCCs from the start of the DFS walk. So you'd need to first collect all SCCs (but not write anything yet) and then factor out a worker that processes them - you can then iterate the DFS walk on the SCCs. Quite some work eventually. Not sure ;) Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 212426) +++ lto-streamer-out.c (working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as -we cannot recurse through only pointers. -??? We can generalize this by keeping track of the -in-SCC edges for each tree (or arbitrarily the first -such edge) and hashing that in in a second stage -(instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size) { + unsigned int last_classes = 0, iterations = 0; + /* Compute hash values for the SCC members. */ for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); +sccstack[first+i].hash = hash_tree (cache, NULL, sccstack[first+i].t); if (size
Re: [PATCH] Fix PR61762, reads from string constants
On Fri, 11 Jul 2014, Jakub Jelinek wrote: On Fri, Jul 11, 2014 at 11:13:12AM +0200, Richard Biener wrote: + if (offset = off) + ptr[offset - off] = value; For original off != 0, you aren't checking whether offset - off len though (various places), you don't want to write beyond end of buffer. Indeed. Fixed. - return total_bytes; + return total_bytes - off; What will happen with originally off is bigger than total_bytes? Returning negative value sounds wrong, IMHO you should in that case return early 0. So like: if ((off == -1 total_bytes len) || off = total_bytes) return 0; Fixed. @@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un FIXED_VALUE_TYPE value; tree i_value, i_type; - if (total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) + if (off == -1 + total_bytes * BITS_PER_UNIT HOST_BITS_PER_DOUBLE_INT) return 0; This isn't comparing total_bytes to len, so IMHO shouldn't be changed. Oh, indeed. (reading that and native_encode_int I think we want a native_encode_wide_int and avoid creating an integer cst for encoding fixed_csts) @@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns up to 192 bits. */ long tmp[6]; - if (total_bytes len) + if (off == -1 + total_bytes len) This can fit onto one line. - rsize = native_encode_expr (part, ptr, len); + rsize = native_encode_expr (part, ptr, len, off); if (rsize == 0) return 0; If off is not -1 and len is too short, the above will do a partial store. But: a) if it is a partial store, because some bytes didn't fit, then the second native_encode_expr should probably not be invoked len will get negative so that might just be handled fine (fingers crossing). I suppose I need to come up with a testcase that ends up native-encoding a vector partially (SCCVN also benefits from this work). b) what about the case when the first one returns 0 because you are asking for few bytes from the imag part? Yeah. Sth like part = TREE_IMAGPART (expr); - isize = native_encode_expr (part, ptr+rsize, len-rsize); - if (isize != rsize) + if (off != -1) +off = MAX (0, off - rsize); if (off != -1) off = MAX (0, off - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (part; instead. Same mistake in vector encoding - we need to adjust off by the size of the sub-object not by what was encoded. + isize = native_encode_expr (part, ptr+rsize, len-rsize, off); + if (off == -1 + isize != rsize) return 0; return rsize + isize; } @@ -7396,9 +7408,13 @@ native_encode_vector (const_tree expr, u for (i = 0; i count; i++) { elem = VECTOR_CST_ELT (expr, i); - if (native_encode_expr (elem, ptr+offset, len-offset) != size) + int res = native_encode_expr (elem, ptr+offset, len-offset, off); + if (off == -1 + res != size) return 0; I don't think this will work correctly if off is not -1. I have elem = VECTOR_CST_ELT (expr, i); int res = native_encode_expr (elem, ptr+offset, len-offset, off); if (off == -1 res != size) return 0; offset += res; if (off != -1) off = MAX (0, off - size); now, so 'offset' tracks the number of encoded bytes and off is decremented properly. Now it will of course break if we fail to encode one vector element but not another. But I hate complicating things here ... (maybe guard all recursive calls with off size and check for a 0 return). Like for (i = 0; i count; i++) { if (off = size) { off -= size; continue; } elem = VECTOR_CST_ELT (expr, i); int res = native_encode_expr (elem, ptr+offset, len-offset, off); if ((off == -1 res != size) || res == 0) return 0; offset += res; if (off != -1) off = 0; ? if (TREE_STRING_LENGTH (expr) total_bytes) No verification that you are not accessing beyond end of string here. Changed to if (TREE_STRING_LENGTH (expr) - off MIN (total_bytes, len)) { int written = 0; if (off TREE_STRING_LENGTH (expr)) { written = MIN (len, TREE_STRING_LENGTH (expr) - off); memcpy (ptr, TREE_STRING_POINTER (expr) + off, written); } memset (ptr + written, 0, MIN (total_bytes - written, len - written)); } else memcpy (ptr, TREE_STRING_POINTER (expr) + off, MIN (total_bytes, len)); ugh. Now I have to think about getting at least some testing coverage ... Richard. Index: gcc/fold-const.c === *** gcc/fold-const.c.orig 2014-07-11 11:13:22.380607780 +0200 --- gcc/fold-const.c2014-07-11 12:05:02.352394351 +0200 *** fold_plusminus_mult_expr (location_t loc *** 7239,7253 upon failure. */ static int !
Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end
Hi! On Thu, 22 May 2014 11:31:25 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: On 16.05.2014 19:44, Ilmir Usmanov wrote: On 16.05.2014 19:12, Thomas Schwinge wrote: You recently indicated that you have already begun implementing OpenACC subarray specifications in the GCC Fortran front end, but have not been/are not currently able to complete that. Would you be willing to share your WIP patch with Cesar, who is now working on this, so that he doesn't have to duplicate your work? Sure! I'm glad to know that my work won't go directly to trash. :-) You can find the patch in attachment. I started to implement sub-arrays in gfortran by implementing OpenMP 4.0 target map clause. This clause was already implemented in C/C++ FEs, so I could check the behavior. I don't know whether it's already implemented in gfortran or not. To avoid duplication of work: with Jakub's Fortran OpenMP 4 target changes recently committed to trunk, and now merged into gomp-4_0-branch, I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4 target changes removed, and TODO markers added to integrate into that. Jakub, before your Fortran OpenMP 4 target changes, Ilmir had written the test case gcc/testsuite/gfortran.dg/gomp/map-1.f90 (based on his interpretation and implementation of OpenMP 4 target), which I have now amended with XFAILs and changed error messages -- anything in there that you'd like to see addressed for Fortran OpenMP 4 target? To represent OpenMP array sections (or OpenACC subarrays) I used gfc_expr. After implementing OpenMP target map clauses I was going to use it to represent OpenACC data clauses, just as Thomas recommended in his mail: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg02040.html I hope this will be useful for you. If you will have any question feel free to ask. gcc/fortran/dump-parse-tree.c | 47 +-- gcc/fortran/gfortran.h| 18 +++ gcc/fortran/openmp.c | 182 ++ gcc/fortran/trans-openmp.c| 145 +++- gcc/testsuite/gfortran.dg/goacc/subarrays.f95 | 36 + gcc/testsuite/gfortran.dg/gomp/map-1.f90 | 109 +++ 6 files changed, 520 insertions(+), 17 deletions(-) diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c index c3671395..8d7c38c 100644 --- gcc/fortran/dump-parse-tree.c +++ gcc/fortran/dump-parse-tree.c @@ -1072,6 +1072,18 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n) } } +/* TODO: remove; use show_omp_namelist. */ +static void +show_expr_list (gfc_expr_list *el) +{ + for (; el-next; el = el-next) +{ + show_expr (el-expr); + fputc (',', dumpfile); +} + show_expr (el-expr); +} + /* Show OpenMP or OpenACC clauses. */ @@ -1214,28 +1226,35 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) } fprintf (dumpfile, DEFAULT(%s), type); } + for (int kind = 0; kind OMP_MAP_LIST_LAST; kind++) +{ + const char *type; + if (omp_clauses-map_lists[kind] == NULL) + continue; + + switch (kind) + { + case OMP_MAP_LIST_ALLOC: type = ALLOC; break; + case OMP_MAP_LIST_TO: type = TO; break; + case OMP_MAP_LIST_FROM: type = FROM; break; + case OMP_MAP_LIST_TOFROM: type = TOFROM; break; + default: + gcc_unreachable (); + } + fprintf (dumpfile, MAP(%s:, type); + show_expr_list (omp_clauses-map_lists[kind]); + fputc (')', dumpfile); +} if (omp_clauses-tile_list) { - gfc_expr_list *list; fputs ( TILE(, dumpfile); - for (list = omp_clauses-tile_list; list; list = list-next) - { - show_expr (list-expr); - if (list-next) - fputs (, , dumpfile); - } + show_expr_list (omp_clauses-tile_list); fputc (')', dumpfile); } if (omp_clauses-wait_list) { - gfc_expr_list *list; fputs ( WAIT(, dumpfile); - for (list = omp_clauses-wait_list; list; list = list-next) - { - show_expr (list-expr); - if (list-next) - fputs (, , dumpfile); - } + show_expr_list (omp_clauses-wait_list); fputc (')', dumpfile); } if (omp_clauses-seq) diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h index cc445e6..09da2d1 100644 --- gcc/fortran/gfortran.h +++ gcc/fortran/gfortran.h @@ -1172,6 +1172,22 @@ enum OMP_LIST_NUM }; +/* OpenACC 2.0: data clauses kind. */ +/* TODO: remove; use OpenMP 4 target infrastructure. */ +enum gfc_omp_clause_map_kind +{ + /* If not already present, allocate. */ + OMP_MAP_LIST_ALLOC, + /* ..., and copy to device. */ + OMP_MAP_LIST_TO, + /* ..., and copy from device. */ + OMP_MAP_LIST_FROM, + /* ..., and copy to and from device. */ + OMP_MAP_LIST_TOFROM, + /* End marker. */ + OMP_MAP_LIST_LAST +}; + /* Because a symbol can belong to multiple
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: We hash only on outgoing SCC edges. You can easily have main variant type T and variants T1,T2 that are same all used by type T again. Ok, but then the two variants shouldn't be in the same SCC or at least not in the same SCC as the main variant. They will because T refers them (since it contains i.e. pointers to itself) Ah, yes. Bah ;) [handle TYPE_FIELDs like variants - built them up late via adjusting DECL_CONTEXTs TYPE_FIELDs ... 8)] That is, ideally stream a tree, not a graph ... T1 and T2 differs by references in T, but they will always have same hash and in theory they can be merged if your code worked recursively on nested SCC regions (as given by Tarjan's algorithm). Because you work on outermost SCC regions only, we never merge T1 and T2. I wonder what ties them into a SCC though. The congruence you compute is not best possible, but best possible assuming no duplicates within compilation units. I think this is sane, we should look towards eliminating such duplicates rather than working harder in the streamer. Yeah. Btw, if you re-hash only nodes with duplicate hashes might there be a higher probability of ending up with a hash that collides with one that was not a duplicate one? That is, why not simply re-hash all nodes? (premature optimization...?) So - can you can iterate on this re-doing the DFS walk with the found entry node? Yep, that is the plan. Iterate long enough until I get unqiue hash. Can I easilly re-do the DFS walk? Not that easily... you have to refactor things a bit I fear ;) DFS_write_tree is the main worker and the init is embedded in lto_output_tree. The main issue is that you can end up with multiple SCCs from the start of the DFS walk. So you'd need to first collect all SCCs (but not write anything yet) and then factor out a worker that processes them - you can then iterate the DFS walk on the SCCs. Quite some work eventually. Not sure ;) Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Might be possible with a 2nd SCC stack set up, yes. Richard. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c(revision 212426) +++ lto-streamer-out.c(working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as - we cannot recurse through only pointers. - ??? We can generalize this by keeping track of the - in-SCC edges for each tree (or arbitrarily the first - such edge) and hashing that in in a second stage - (instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache,
[C++ Patch] PR 53159 (Take 2)
Hi, only today it occurred to me that we can as well delay all the diagnostic at issue to the check_narrowing at the end of convert_like_real and avoid at once possible issues with duplicate warnings. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-07-11 Paolo Carlini paolo.carl...@oracle.com PR c++/53159 * call.c (build_user_type_conversion_1): Copy LOOKUP_NO_NARROWING into convflags. * decl.c (check_initializer): Dont' call check_narrowing here, set LOOKUP_NO_NARROWING. * typeck2.c (digest_init_r): Likewise. /testsuite 2014-07-11 Paolo Carlini paolo.carl...@oracle.com PR c++/53159 * g++.dg/cpp0x/Wnarrowing1.C: New. Index: cp/call.c === --- cp/call.c (revision 212449) +++ cp/call.c (working copy) @@ -3586,7 +3586,8 @@ build_user_type_conversion_1 (tree totype, tree ex /* It's OK to bind a temporary for converting constructor arguments, but not in converting the return value of a conversion operator. */ - convflags = ((flags LOOKUP_NO_TEMP_BIND) | LOOKUP_NO_CONVERSION); + convflags = ((flags LOOKUP_NO_TEMP_BIND) | LOOKUP_NO_CONVERSION + | (flags LOOKUP_NO_NARROWING)); flags = ~LOOKUP_NO_TEMP_BIND; if (ctors) Index: cp/decl.c === --- cp/decl.c (revision 212449) +++ cp/decl.c (working copy) @@ -5757,7 +5757,7 @@ check_initializer (tree decl, tree init, int flags { init = reshape_init (type, init, tf_warning_or_error); if (SCALAR_TYPE_P (type)) - check_narrowing (type, init); + flags |= LOOKUP_NO_NARROWING; } } else if (TREE_CODE (init) == TREE_LIST Index: cp/typeck2.c === --- cp/typeck2.c(revision 212449) +++ cp/typeck2.c(working copy) @@ -1027,7 +1027,7 @@ digest_init_r (tree type, tree init, bool nested, tree *exp; if (nested) - check_narrowing (type, init); + flags |= LOOKUP_NO_NARROWING; init = convert_for_initialization (0, type, init, flags, ICR_INIT, NULL_TREE, 0, complain); Index: testsuite/g++.dg/cpp0x/Wnarrowing1.C === --- testsuite/g++.dg/cpp0x/Wnarrowing1.C(revision 0) +++ testsuite/g++.dg/cpp0x/Wnarrowing1.C(working copy) @@ -0,0 +1,18 @@ +// PR c++/53159 +// { dg-do compile { target c++11 } } +// { dg-options -Wnarrowing -Wno-overflow } + +struct X +{ + constexpr operator int() { return __INT_MAX__; } +}; + +int f() { return __INT_MAX__; } + +signed char a { __INT_MAX__ }; // { dg-warning narrowing conversion } +signed char b { f() }; // { dg-warning narrowing conversion } +signed char c { X{} }; // { dg-warning narrowing conversion } + +signed char ar[] { __INT_MAX__ }; // { dg-warning narrowing conversion } +signed char br[] { f() }; // { dg-warning narrowing conversion } +signed char cr[] { X{} }; // { dg-warning narrowing conversion }
Re: [PATCH] Generate more efficient memory barriers for LEON3
That was an error on my side. The wrong memory model had gotten cached in a generated make script. Lets drop membar_leon3 also then :) On 2014-07-11 11:15, Eric Botcazou wrote: The full barrier pattern membar_leon3 also gets generated so I think that one should be kept also. Do you have a testcase? membar is generated by sparc_emit_membar_for_model and, for the TSO model of LEON3, implied = StoreStore | LoadLoad | LoadStore so mm can only be StoreLoad, which means that membar_storeload will match so the full barrier never will.
[PATCH-v3] Generate more efficient memory barriers for LEON3
ChangeLog: 2014-07-11 Daniel Cederman ceder...@gaisler.com gcc/config/sparc/ * sync.md: Generate more efficient memory barriers for LEON3 --- gcc/config/sparc/sync.md | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md index e6e237f..98ac0d3 100644 --- a/gcc/config/sparc/sync.md +++ b/gcc/config/sparc/sync.md @@ -64,11 +64,19 @@ stbar [(set_attr type multi)]) +;; For LEON3, STB has the effect of membar #StoreLoad. +(define_insn *membar_storeload_leon3 + [(set (match_operand:BLK 0 ) + (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))] + TARGET_LEON3 + stb\t%%g0, [%%sp-1] + [(set_attr type store)]) + ;; For V8, LDSTUB has the effect of membar #StoreLoad. (define_insn *membar_storeload [(set (match_operand:BLK 0 ) (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))] - TARGET_V8 + TARGET_V8 !TARGET_LEON3 ldstub\t[%%sp-1], %%g0 [(set_attr type multi)]) -- 1.7.9.5
Re: [Patch 1/2][ARM]Split insn type alu_reg into alu_sreg and alu_dsp_reg
On 10/07/14 08:46, Terry Guo wrote: Hi there, Currently the insn type of DSP-kind instructions like QSUB8 is alu_reg which is same as other normal instructions like SUB. In order to distinguish those DSP-kind instructions, this patch intends to replace current alu_reg with two sub categories alu_sreg and alu_dsp_reg. Meanwhile the alus_reg is renamed to alus_sreg in terms of consistence. This is the first patch of this series and intends to cover the files under gcc/config/arm. Tested with gcc regression, no new regressions. Is it ok to trunk? BR, Terry This is OK. But please wait for the review of the AArch64 code before committing (both patches need to be in one commit for consistency). R. 2014-07-10 Terry Guo terry@arm.com * config/arm/types.md (alu_reg): Replaced by alu_sreg and alu_dsp_reg. (alus_reg): Renamed to alus_sreg. * config/arm/arm-fixed.md: Change type of non-dsp instructions from alu_reg to alu_sreg. Change type of dsp instructions from alu_reg to alu_dsp_reg. * config/arm/thumb1.md: Likewise. * config/arm/thumb2.md: Likewise. * config/arm/arm.c (cortexa7_older_only): Use new ALU type names. * config/arm/arm1020e.md (1020alu_op): Replace alu_reg and alus_reg with alu_sreg and alus_sreg. * config/arm/arm1026ejs.md (alu_op): Likewise. * config/arm/arm1136jfs.md (11_alu_op): Likewise. * config/arm/arm926ejs.md (9_alu_op): Likewise. * config/arm/fa526.md (526_alu_op): Likewise. * config/arm/fa606te.md (606te_alu_op): Likewise. * config/arm/fa626te.md (626te_alu_op): Likewise. * config/arm/fa726te.md (726te_alu_op): Likewise. * config/arm/fmp626.md (mp626_alu_op): Likewise. * config/arm/arm.md (core_cycles): Replace alu_reg and alus_reg with alu_sreg, alu_dsp_reg and alus_sreg. * config/arm/cortex-a15.md (cortex_a15_alu): Likewise. * config/arm/cortex-a5.md (cortex_a5_alu): Likewise. * config/arm/cortex-a53.md (cortex_a53_alu): Likewise. * config/arm/cortex-a7.md (cortex_a7_alu_sreg): Likewise. * config/arm/cortex-a8.md (cortex_a8_alu): Likewise. * config/arm/cortex-a9.md (cortex_a9_dp): Likewise. * config/arm/cortex-m4.md (cortex_m4_alu): Likewise. * config/arm/cortex-r4.md (cortex_r4_alu): Likewise. * config/arm/marvell-pj4.md (pj4_alu, pj4_alu_conds): Likewise. arm-new-insn-type-v3.txt diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md index 4ab9d35..5611ad1 100644 --- a/gcc/config/arm/arm-fixed.md +++ b/gcc/config/arm/arm-fixed.md @@ -26,7 +26,7 @@ add%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it yes,no) - (set_attr type alu_reg)]) + (set_attr type alu_sreg)]) (define_insn addmode3 [(set (match_operand:ADDSUB 0 s_register_operand =r) @@ -36,7 +36,7 @@ saddqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) (define_insn usaddmode3 [(set (match_operand:UQADDSUB 0 s_register_operand =r) @@ -46,7 +46,7 @@ uqaddqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) (define_insn ssaddmode3 [(set (match_operand:QADDSUB 0 s_register_operand =r) @@ -56,7 +56,7 @@ qaddqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) (define_insn submode3 [(set (match_operand:FIXED 0 s_register_operand =l,r) @@ -66,7 +66,7 @@ sub%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it yes,no) - (set_attr type alu_reg)]) + (set_attr type alu_sreg)]) (define_insn submode3 [(set (match_operand:ADDSUB 0 s_register_operand =r) @@ -76,7 +76,7 @@ ssubqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) (define_insn ussubmode3 [(set (match_operand:UQADDSUB 0 s_register_operand =r) @@ -87,7 +87,7 @@ uqsubqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) (define_insn sssubmode3 [(set (match_operand:QADDSUB 0 s_register_operand =r) @@ -97,7 +97,7 @@ qsubqaddsub_suf%?\\t%0, %1, %2 [(set_attr predicable yes) (set_attr predicable_short_it no) - (set_attr type alu_reg)]) + (set_attr type alu_dsp_reg)]) ;; Fractional multiplies. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 78cae73..942df7d 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -11710,8 +11710,9 @@ cortexa7_older_only (rtx insn) switch (get_attr_type
Re: [Patch 2/2][AArch64]Split insn type alu_reg into alu_sreg and alu_dsp_reg
On 10/07/14 09:06, Terry Guo wrote: Hi there, As the second and final patch in this series, it intends to update alu_reg and alus_reg types for AArch64 port. With this change, the gcc can be successfully built for AArch64. Is it OK to trunk? BR, Terry 2014-07-10 Terry Guo terry@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64, *addsi3_aarch64_uxtw, subsi3, *adddi3_aarch64, *subsi3_uxtw, subdi3, absdi2, negmode2, add_losym_mode, *negsi2_uxtw, tlsle_small_mode): Rename type alu_reg to alu_sreg. (addmode3_compare0, *addsi3_compare0_uxtw, *addmode3nr_compare0, submode3_compare0, *compare_negmode, *negmode2_compare0, subsi3_compare0_uxtw, *negsi2_compare0_uxtw, *cmpmode): Rename type alus_reg to alus_sreg. OK. As mentioned in the other thread, please commit both patches in a single commit. R. aarch64-new-insn-type-v1.txt diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3eb783c..76d8cd3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1167,7 +1167,7 @@ add\\t%w0, %w1, %w2 add\\t%0.2s, %1.2s, %2.2s sub\\t%w0, %w1, #%n2 - [(set_attr type alu_imm,alu_reg,neon_add,alu_imm) + [(set_attr type alu_imm,alu_sreg,neon_add,alu_imm) (set_attr simd *,*,yes,*)] ) @@ -1183,7 +1183,7 @@ add\\t%w0, %w1, %2 add\\t%w0, %w1, %w2 sub\\t%w0, %w1, #%n2 - [(set_attr type alu_imm,alu_reg,alu_imm)] + [(set_attr type alu_imm,alu_sreg,alu_imm)] ) (define_insn *adddi3_aarch64 @@ -1198,7 +1198,7 @@ add\\t%x0, %x1, %x2 sub\\t%x0, %x1, #%n2 add\\t%d0, %d1, %d2 - [(set_attr type alu_imm,alu_reg,alu_imm,alu_reg) + [(set_attr type alu_imm,alu_sreg,alu_imm,alu_sreg) (set_attr simd *,*,*,yes)] ) @@ -1234,7 +1234,7 @@ adds\\t%w0, %w1, %w2 adds\\t%w0, %w1, %w2 subs\\t%w0, %w1, #%n2 - [(set_attr type alus_reg,alus_imm,alus_imm)] + [(set_attr type alus_sreg,alus_imm,alus_imm)] ) ;; zero_extend version of above @@ -1251,7 +1251,7 @@ adds\\t%w0, %w1, %w2 adds\\t%w0, %w1, %w2 subs\\t%w0, %w1, #%n2 - [(set_attr type alus_reg,alus_imm,alus_imm)] + [(set_attr type alus_sreg,alus_imm,alus_imm)] ) (define_insn *adds_mul_imm_mode @@ -1365,7 +1365,7 @@ cmn\\t%w0, %w1 cmn\\t%w0, %w1 cmp\\t%w0, #%n1 - [(set_attr type alus_reg,alus_imm,alus_imm)] + [(set_attr type alus_sreg,alus_imm,alus_imm)] ) (define_insn *compare_negmode @@ -1375,7 +1375,7 @@ (match_operand:GPI 1 register_operand r)))] cmn\\t%w1, %w0 - [(set_attr type alus_reg)] + [(set_attr type alus_sreg)] ) (define_insn *add_shift_mode @@ -1647,7 +1647,7 @@ (match_operand:SI 2 register_operand r)))] sub\\t%w0, %w1, %w2 - [(set_attr type alu_reg)] + [(set_attr type alu_sreg)] ) ;; zero_extend version of above @@ -1658,7 +1658,7 @@ (match_operand:SI 2 register_operand r] sub\\t%w0, %w1, %w2 - [(set_attr type alu_reg)] + [(set_attr type alu_sreg)] ) (define_insn subdi3 @@ -1669,7 +1669,7 @@ @ sub\\t%x0, %x1, %x2 sub\\t%d0, %d1, %d2 - [(set_attr type alu_reg, neon_sub) + [(set_attr type alu_sreg, neon_sub) (set_attr simd *,yes)] ) @@ -1701,7 +1701,7 @@ (minus:GPI (match_dup 1) (match_dup 2)))] subs\\t%w0, %w1, %w2 - [(set_attr type alus_reg)] + [(set_attr type alus_sreg)] ) ;; zero_extend version of above @@ -1714,7 +1714,7 @@ (zero_extend:DI (minus:SI (match_dup 1) (match_dup 2] subs\\t%w0, %w1, %w2 - [(set_attr type alus_reg)] + [(set_attr type alus_sreg)] ) (define_insn *sub_shift_mode @@ -1925,7 +1925,7 @@ GEN_INT (63); DONE; } - [(set_attr type alu_reg)] + [(set_attr type alu_sreg)] ) (define_insn negmode2 @@ -1935,7 +1935,7 @@ @ neg\\t%w0, %w1 neg\\t%rtn0vas, %rtn1vas - [(set_attr type alu_reg, neon_negq) + [(set_attr type alu_sreg, neon_negq) (set_attr simd *,yes)] ) @@ -1945,7 +1945,7 @@ (zero_extend:DI (neg:SI (match_operand:SI 1 register_operand r] neg\\t%w0, %w1 - [(set_attr type alu_reg)] + [(set_attr type alu_sreg)] ) (define_insn *ngcmode @@ -1975,7 +1975,7 @@ (neg:GPI (match_dup 1)))] negs\\t%w0, %w1 - [(set_attr type alus_reg)] + [(set_attr type alus_sreg)] ) ;; zero_extend version of above @@ -1987,7 +1987,7 @@ (zero_extend:DI (neg:SI (match_dup 1] negs\\t%w0, %w1 - [(set_attr type alus_reg)] + [(set_attr type alus_sreg)] ) (define_insn *neg_shiftmode3_compare0 @@ -2266,7 +2266,7 @@ cmp\\t%w0, %w1 cmp\\t%w0, %w1 cmn\\t%w0, #%n1 - [(set_attr type alus_reg,alus_imm,alus_imm)] + [(set_attr type alus_sreg,alus_imm,alus_imm)] ) (define_insn *cmpmode @@ -3790,7
Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end
On Fri, Jul 11, 2014 at 12:11:10PM +0200, Thomas Schwinge wrote: To avoid duplication of work: with Jakub's Fortran OpenMP 4 target changes recently committed to trunk, and now merged into gomp-4_0-branch, I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4 target changes removed, and TODO markers added to integrate into that. Resolving the TODO markers would be nice, indeed. Jakub, before your Fortran OpenMP 4 target changes, Ilmir had written the test case gcc/testsuite/gfortran.dg/gomp/map-1.f90 (based on his interpretation and implementation of OpenMP 4 target), which I have now amended with XFAILs and changed error messages -- anything in there that you'd like to see addressed for Fortran OpenMP 4 target? + !$omp target map(j(5:4)) ! { dg-error Lower bound of OpenMP array section in greater than upper { xfail *-*-* } } + !$omp end target I think this isn't an error in Fortran, if low bound is above upper bound, then it is considered a zero size array section. Though supposedly for depend clause we might want to diagnose that. + !$omp target map(aas) ! { dg-error The upper bound in the last dimension must appear { xfail *-*-* } } + !$omp end target Assumed-size in map without array section would be indeed nice thing to diagnose. + !$omp target map(tt%i) ! { dg-error Syntax error in OpenMP variable list } + !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement { xfail *-*-* } } Right now the parsing of !$omp directives in case of parsing error rejects the whole directive, perhaps it should be reconsidered unless it is a fatal error from which there is no easy way out. + !$omp target map(tt%j(1)) ! { dg-bogus Syntax error in OpenMP variable list { xfail *-*-* } } + !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement { xfail *-*-* } } + + !$omp target map(tt%j(1:)) ! { dg-bogus Syntax error in OpenMP variable list { xfail *-*-* } } + !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement { xfail *-*-* } } These two are pending resolution on omp-lang, I had exchanged a few mails about it, I think we shouldn't support those for consistency with the C/C++ support, where tt.j[1] or tt.j[1:] and similar is explicitly invalid. Jakub
Re: [Patch ARM-AArch64/testsuite v2 01/21] Neon intrinsics execution tests initial framework.
On 10/07/14 11:12, Marcus Shawcroft wrote: On 1 July 2014 11:05, Christophe Lyon christophe.l...@linaro.org wrote: * documentation (README) * dejanu driver (neon-intrinsics.exp) * support macros (arm-neon-ref.h, compute-ref-data.h) * Tests for 3 intrinsics: vaba, vld1, vshl Hi, The terminology in armv8 is advsimd rather than neon. Can we rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics throughout please. The existing gcc.target/aarch64/simd directory of tests will presumably be superseded by this more comprehensive set of tests so I suggest these tests go in gcc.target/aarch64/advsimd and we eventually remove gcc.target/aarch64/simd/ directory. GNU style should apply throughout this patch series, notably double space after period in comments and README text. Space before left parenthesis in function/macro call and function declaration. The function name in a declaration goes on a new line. The GCC wiki notes on test case state individual test should have file names ending in _number, see here https://gcc.gnu.org/wiki/TestCaseWriting I'm OK with the execute only no scan nature of the tests. diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README new file mode 100644 index 000..232bb1d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README @@ -0,0 +1,132 @@ +This directory contains executable tests for ARM/AArch64 Neon +intrinsics. Neon - Advanced SIMD as below. On first use, I think Advanced SIMD (Neon) is even better. R. + +It is meant to cover execution cases of all the Advanced SIMD +intrinsics, but does not scan the generated assembler code. +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ + +typedef union { + struct { GNUstyle { on new lne. +#define Neon_Cumulative_Sat __read_neon_cumulative_sat() +#define Set_Neon_Cumulative_Sat(x) __set_neon_cumulative_sat((x)) Upper case the macro's rather than camel case. +# Copyright (C) 2013 Free Software Foundation, Inc. s/13/14/ Cheers /Marcus
Re: [GSoC] generation of Gimple loops with empty bodies
The struct now contains only a single element such that there seems to be no need for it anymore. Should we remove it? (We could still use a typedef if you believe the datatype is too long). I don't mind removing it. However I think that we should choose the way of transferring of sese region, which is used for translation of an isl_ast_node_user to Gimple code. Should we transfer it separately or restore ivs_params later? What do you think? -- Cheers, Roman Gareev.
Re: [GSoC] generation of Gimple loops with empty bodies
On 11/07/2014 13:11, Roman Gareev wrote: The struct now contains only a single element such that there seems to be no need for it anymore. Should we remove it? (We could still use a typedef if you believe the datatype is too long). I don't mind removing it. However I think that we should choose the way of transferring of sese region, which is used for translation of an isl_ast_node_user to Gimple code. Should we transfer it separately or restore ivs_params later? What do you think? Sorry, I currently don't see why sese region is needed later. In case it is, we may keep it, but this would require to pull more code in the review. I personally would just remove it to finish this review. Feel free to commit after applying this change. Cheers, Tobias
Re: Use separate sections to stream non-trivial constructors
On Fri, 11 Jul 2014, Jan Hubicka wrote: Hi, since we both agreed offlining constructors from global decl stream is a good idea, I went ahead and implemented it. I would like to followup by an cleanups; for example the sections are still tagged as function sections, but I would like to do it incrementally. There is quite some uglyness in the way we handle function sections and the patch started to snowball very quickly. The patch conceptually copies what we do for functions and re-uses most of infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of getting function in) and it is used by output machinery, by ipa-visibility while rewritting the constructor and by ctor_for_folding (which makes us to load the ctor whenever it is needed by ipa-cp or ipa-devirt). I kept get_symbol_initial_value as an authority to decide if we want to encode given constructor or not. The section itself for trivial ctor is about 25 bytes and with header it is probably close to double of it. Currently the heuristic is to offline only constructors that are CONSTRUCTOR and keep simple expressions inline. We may want to tweak it. Hmm, so what about artificial testcase with gazillions of struct X { int i; }; struct X a0001 = { 1 }; struct X a0002 = { 2 }; how does it explode LTO IL size and streaming time (compile-out and LTRANS in)? I suppose it still helps WPA stage. Also what we desparately miss is to put CONST_DECLs into the symbol table (and thus eventually move the constant pool to symtab). That and no longer allowing STRING_CSTs in the IL but only CONST_DECLs with STRING_CST initializers (to fix PR50199). The patch does not bring miraculous savings to firefox WPA, but it does some: GGC memory after global stream is read goes from 1376898k to 1250533k overall GGC allocations from 4156478 kB to 4012462 kB read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average size 2.037867 20997206 tree bodies read in total - 18584194 tree bodies read in total Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section decls: 271557265 bytes Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section function_body: 7548680 bytes Things would be better if ipa-visibility and ipa-devirt did not load most of the virtual tables into memory (still better than loading each into memory 20 times at average). I will work on that incrementally. We load 10311 ctors into memory at WPA time. Note that firefox seems to feature really huge data segment these days. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK? The patch looks ok to me. How about simply doing s/LTO_section_function_body/LTO_section_symbol_content/ instead of adding LTO_section_variable_initializer? Thanks, Richard. * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h (varpool_get_constructor): New function. (ctor_for_folding): Use it. (varpool_assemble_decl): Likewise. * lto-streamer.h (struct output_block): Turn cgraph_node to symbol filed. (lto_input_variable_constructor): Declare. * ipa-visibility.c (function_and_variable_visibility): Use varpool_get_constructor. * cgraph.h (varpool_get_constructor): Declare. * lto-streamer-out.c (get_symbol_initial_value): Take encoder parameter; return error_mark_node for non-trivial constructors. (lto_write_tree_1, DFS_write_tree): UPdate use of get_symbol_initial_value. (output_function): Update initialization of symbol. (output_constructor): New function. (copy_function): Rename to .. (copy_function_or_variable): ... this one; handle vars too. (lto_output): Output variable sections. * lto-streamer-in.c (input_constructor): New function. (lto_read_body): Rename from ... (lto_read_body_or_constructor): ... this one; handle vars too. (lto_input_variable_constructor): New function. * ipa-prop.c (ipa_prop_write_jump_functions, ipa_prop_write_all_agg_replacement): Update. Index: varpool.c === --- varpool.c (revision 212426) +++ varpool.c (working copy) @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3. #include gimple-expr.h #include flags.h #include pointer-set.h +#include tree-ssa-alias.h +#include gimple.h +#include lto-streamer.h const char * const tls_model_names[]={none, tls-emulated, tls-real, tls-global-dynamic, tls-local-dynamic, @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname) return NULL; } +/* When doing LTO, read NODE's constructor from disk if it is not already present. */ + +tree +varpool_get_constructor (struct varpool_node *node) +{ +
Re: [PATCH 2/2] Enable elimination of zext/sext
Thanks foe the review and suggestions. On 10/07/14 22:15, Richard Biener wrote: On Mon, Jul 7, 2014 at 8:55 AM, Kugan kugan.vivekanandara...@linaro.org wrote: [...] For -fwrapv, it is due to how PROMOTE_MODE is defined in arm back-end. In the test-case, a function (which has signed char return type) returns -1 in one of the paths. ARM PROMOTE_MODE changes that to 255 and relies on zero/sign extension generated by RTL again for the correct value. I saw some other targets also defining similar think. I am therefore skipping removing zero/sign extension if the ssa variable can be set to negative integer constants. Hm? I think you should rather check that you are removing a sign-/zero-extension - PROMOTE_MODE tells you if it will sign- or zero-extend. Definitely + /* In some architectures, negative integer constants are truncated and + sign changed with target defined PROMOTE_MODE macro. This will impact + the value range seen here and produce wrong code if zero/sign extensions + are eliminated. Therefore, return false if this SSA can have negative + integers. */ + if (is_gimple_assign (stmt) + (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_unary)) +{ + tree rhs1 = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs1) == INTEGER_CST + !TYPE_UNSIGNED (TREE_TYPE (ssa)) + tree_int_cst_compare (rhs1, integer_zero_node) == -1) + return false; looks completely bogus ... (an unary op with a constant operand?) instead you want to do sth like I see that unary op with a constant operand is not possible in gimple. What I wanted to check here is any sort of constant loads; but seems that will not happen in gimple. Is PHI statements the only possible statements where we will end up with such constants. mode = TYPE_MODE (TREE_TYPE (ssa)); rhs_uns = TYPE_UNSIGNED (TREE_TYPE (ssa)); PROMOTE_MODE (mode, rhs_uns, TREE_TYPE (ssa)); instead of initializing rhs_uns from ssas type. That is, if PROMOTE_MODE tells you to promote _not_ according to ssas sign then honor that. This is triggered in pr43017.c in function foo for arm-none-linux-gnueabi. where, the gimple statement that cause this looks like: . # _3 = PHI _17(7), -1(2) bb43: return _3; ARM PROMOTE_MODE changes the sign for integer constants only and hence looking at the variable with PROMOTE_MODE is not changing the sign in this case. #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ if (GET_MODE_CLASS (MODE) == MODE_INT \ GET_MODE_SIZE (MODE) 4) \ { \ if (MODE == QImode) \ UNSIGNEDP = 1; \ else if (MODE == HImode) \ UNSIGNEDP = 1; \ (MODE) = SImode; \ } As for the -fno-strict-overflow case, if the variables overflows, in VRP dumps, I see +INF(OVF), but the value range stored in ssa has TYPE_MAX. We therefore should limit the comparison to (TYPE_MIN VR_MIN VR_MAX TYPE_MAX) instead of (TYPE_MIN = VR_MIN VR_MAX = TYPE_MAX) when checking to be sure that this is not the overflowing case. Attached patch changes this. I don't think that's necessary - the overflow cases happen only when that overflow has undefined behavior, thus any valid program will have values = MAX. I see that you have now removed +INF(OVF). I will change it this way. Thanks again, Kugan
Re: Use separate sections to stream non-trivial constructors
On Fri, 11 Jul 2014, Jan Hubicka wrote: Hi, since we both agreed offlining constructors from global decl stream is a good idea, I went ahead and implemented it. I would like to followup by an cleanups; for example the sections are still tagged as function sections, but I would like to do it incrementally. There is quite some uglyness in the way we handle function sections and the patch started to snowball very quickly. The patch conceptually copies what we do for functions and re-uses most of infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of getting function in) and it is used by output machinery, by ipa-visibility while rewritting the constructor and by ctor_for_folding (which makes us to load the ctor whenever it is needed by ipa-cp or ipa-devirt). I kept get_symbol_initial_value as an authority to decide if we want to encode given constructor or not. The section itself for trivial ctor is about 25 bytes and with header it is probably close to double of it. Currently the heuristic is to offline only constructors that are CONSTRUCTOR and keep simple expressions inline. We may want to tweak it. Hmm, so what about artificial testcase with gazillions of struct X { int i; }; struct X a0001 = { 1 }; struct X a0002 = { 2 }; how does it explode LTO IL size and streaming time (compile-out and LTRANS in)? I suppose it still helps WPA stage. Well, nothing really artificial, except that gazzilions of static variables called a0001 to a000gazzilion are ugly :)) I just put the CONSRUCTOR bits in the initial varsion to not have the path unused at all. Either we can base our decision on size of the variable or do simple walk to see if it needs more than, say 8 trees. I will play with this incrementally after cleaning up the headers (as those accounts for the overhead) Also what we desparately miss is to put CONST_DECLs into the symbol table (and thus eventually move the constant pool to symtab). That and no longer allowing STRING_CSTs in the IL but only CONST_DECLs with STRING_CST initializers (to fix PR50199). Yep, I have patch for putting CONST_DECLs into symbol table. It however does not help partitionability because at the moment output machinery do not expect const decls to have visibilities. I will push out that change (and LABEL_DECL, too) after Martin's renaming patches lands to mainline. The patch does not bring miraculous savings to firefox WPA, but it does some: GGC memory after global stream is read goes from 1376898k to 1250533k overall GGC allocations from 4156478 kB to 4012462 kB read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average size 2.037867 20997206 tree bodies read in total - 18584194 tree bodies read in total Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section decls: 271557265 bytes Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section function_body: 7548680 bytes Things would be better if ipa-visibility and ipa-devirt did not load most of the virtual tables into memory (still better than loading each into memory 20 times at average). I will work on that incrementally. We load 10311 ctors into memory at WPA time. Note that firefox seems to feature really huge data segment these days. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK? The patch looks ok to me. How about simply doing s/LTO_section_function_body/LTO_section_symbol_content/ instead of adding LTO_section_variable_initializer? Yeah, I was thinking about it, too. I think variable and constructor sections may differ in its header however, since we do not need CFG stream for variables. Thanks! Honza Thanks, Richard.
Re: Optimize type streaming
Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Might be possible with a 2nd SCC stack set up, yes. I set up hash_map to store the hash values anyway. What about sorting using walk_tree with simple callback pushing the parameter on stack if it is in the hash map and removing it from hash map? That way I do not need to duplicate the tricky SCC walking and probably won't be more terrible in overhead than qsort. Seems like a good idea at least for a prototype, will try it out. honza Richard. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 212426) +++ lto-streamer-out.c (working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as -we cannot recurse through only pointers. -??? We can generalize this by keeping track of the -in-SCC edges for each tree (or arbitrarily the first -such edge) and hashing that in in a second stage -(instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size) { + unsigned int last_classes = 0, iterations = 0; + /* Compute hash values for the SCC members. */ for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); +sccstack[first+i].hash = hash_tree (cache, NULL, sccstack[first+i].t); if (size == 1) return sccstack[first].hash; - /* Sort the SCC of type, hash pairs so that when we mix in - all members of the SCC the hash value becomes independent on - the order we visited the SCC. Produce hash of the whole SCC as - combination of hashes of individual elements. Then combine that hash into - hash of each element, so othewise identically looking elements from two - different SCCs are distinguished. */ - qsort (sccstack[first], size, sizeof (scc_entry), scc_entry_compare); - - hashval_t scc_hash = sccstack[first].hash; - for (unsigned i = 1; i size; ++i) -scc_hash = iterative_hash_hashval_t (scc_hash, -sccstack[first+i].hash); - for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = iterative_hash_hashval_t (sccstack[first+i].hash, scc_hash); - return scc_hash; + /* We may want to iterate multiple times across SCC propagating across the internal + edges in order to get different hash values for individual trees. */ + do +{ + /* Sort the SCC of type, hash pairs
Re: Use separate sections to stream non-trivial constructors
On Fri, 11 Jul 2014, Jan Hubicka wrote: On Fri, 11 Jul 2014, Jan Hubicka wrote: Hi, since we both agreed offlining constructors from global decl stream is a good idea, I went ahead and implemented it. I would like to followup by an cleanups; for example the sections are still tagged as function sections, but I would like to do it incrementally. There is quite some uglyness in the way we handle function sections and the patch started to snowball very quickly. The patch conceptually copies what we do for functions and re-uses most of infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of getting function in) and it is used by output machinery, by ipa-visibility while rewritting the constructor and by ctor_for_folding (which makes us to load the ctor whenever it is needed by ipa-cp or ipa-devirt). I kept get_symbol_initial_value as an authority to decide if we want to encode given constructor or not. The section itself for trivial ctor is about 25 bytes and with header it is probably close to double of it. Currently the heuristic is to offline only constructors that are CONSTRUCTOR and keep simple expressions inline. We may want to tweak it. Hmm, so what about artificial testcase with gazillions of struct X { int i; }; struct X a0001 = { 1 }; struct X a0002 = { 2 }; how does it explode LTO IL size and streaming time (compile-out and LTRANS in)? I suppose it still helps WPA stage. Well, nothing really artificial, except that gazzilions of static variables called a0001 to a000gazzilion are ugly :)) I just put the CONSRUCTOR bits in the initial varsion to not have the path unused at all. Either we can base our decision on size of the variable or do simple walk to see if it needs more than, say 8 trees. Hum, probably not worth special-casing. I will play with this incrementally after cleaning up the headers (as those accounts for the overhead) Also what we desparately miss is to put CONST_DECLs into the symbol table (and thus eventually move the constant pool to symtab). That and no longer allowing STRING_CSTs in the IL but only CONST_DECLs with STRING_CST initializers (to fix PR50199). Yep, I have patch for putting CONST_DECLs into symbol table. It however does not help partitionability because at the moment output machinery do not expect const decls to have visibilities. Well, just make them regular (anonymous) VAR_DECLs then ... (the fact that a CONST_DECL is anonymous is probably the only real difference - and that they are mergeable by content). I will push out that change (and LABEL_DECL, too) after Martin's renaming patches lands to mainline. Thanks. The patch does not bring miraculous savings to firefox WPA, but it does some: GGC memory after global stream is read goes from 1376898k to 1250533k overall GGC allocations from 4156478 kB to 4012462 kB read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average size 2.037867 20997206 tree bodies read in total - 18584194 tree bodies read in total Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section decls: 271557265 bytes Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section function_body: 7548680 bytes Things would be better if ipa-visibility and ipa-devirt did not load most of the virtual tables into memory (still better than loading each into memory 20 times at average). I will work on that incrementally. We load 10311 ctors into memory at WPA time. Note that firefox seems to feature really huge data segment these days. http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in progress, OK? The patch looks ok to me. How about simply doing s/LTO_section_function_body/LTO_section_symbol_content/ instead of adding LTO_section_variable_initializer? Yeah, I was thinking about it, too. I think variable and constructor sections may differ in its header however, since we do not need CFG stream for variables. Thanks! Honza Thanks, Richard. -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Might be possible with a 2nd SCC stack set up, yes. I set up hash_map to store the hash values anyway. What about sorting using walk_tree with simple callback pushing the parameter on stack if it is in the hash map and removing it from hash map? That way I do not need to duplicate the tricky SCC walking and probably won't be more terrible in overhead than qsort. Seems like a good idea at least for a prototype, will try it out. Awww, but you get edges walked that are not walked in the DFS walk ... (basically you can end up walking the whole program IL if you are unlucky). RIchard. honza Richard. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c(revision 212426) +++ lto-streamer-out.c(working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as - we cannot recurse through only pointers. - ??? We can generalize this by keeping track of the - in-SCC edges for each tree (or arbitrarily the first - such edge) and hashing that in in a second stage - (instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size) { + unsigned int last_classes = 0, iterations = 0; + /* Compute hash values for the SCC members. */ for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); +sccstack[first+i].hash = hash_tree (cache, NULL, sccstack[first+i].t); if (size == 1) return sccstack[first].hash; - /* Sort the SCC of type, hash pairs so that when we mix in - all members of the SCC the hash value becomes independent on - the order we visited the SCC. Produce hash of the whole SCC as - combination of hashes of individual elements. Then combine that hash into - hash of each element, so othewise identically looking elements from two - different SCCs are distinguished. */ - qsort (sccstack[first], size, sizeof (scc_entry), scc_entry_compare); - - hashval_t scc_hash = sccstack[first].hash; - for (unsigned i = 1; i size; ++i) -scc_hash = iterative_hash_hashval_t (scc_hash, - sccstack[first+i].hash); - for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = iterative_hash_hashval_t (sccstack[first+i].hash, scc_hash); -
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Might be possible with a 2nd SCC stack set up, yes. I set up hash_map to store the hash values anyway. What about sorting using walk_tree with simple callback pushing the parameter on stack if it is in the hash map and removing it from hash map? That way I do not need to duplicate the tricky SCC walking and probably won't be more terrible in overhead than qsort. Seems like a good idea at least for a prototype, will try it out. Awww, but you get edges walked that are not walked in the DFS walk ... (basically you can end up walking the whole program IL if you are unlucky). Why? I just terminate the walk on every node that is not withing the SCC region. Yes, I will unnecesarily walk all edges out of my SCC region, but I think we can live with that (SCC walk would visit them anyway, too) Honza RIchard. honza Richard. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 212426) +++ lto-streamer-out.c (working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as -we cannot recurse through only pointers. -??? We can generalize this by keeping track of the -in-SCC edges for each tree (or arbitrarily the first -such edge) and hashing that in in a second stage -(instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size) { + unsigned int last_classes = 0, iterations = 0; + /* Compute hash values for the SCC members. */ for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); +sccstack[first+i].hash = hash_tree (cache, NULL, sccstack[first+i].t); if (size == 1) return sccstack[first].hash; - /* Sort the SCC of type, hash pairs so that when we mix in - all members of the SCC the hash value becomes independent on - the order we visited the SCC. Produce hash of the whole SCC as - combination of hashes of individual elements. Then combine that hash into - hash of each element, so othewise identically looking elements from
Re: update address taken: don't drop clobbers
Hi, On Thu, 10 Jul 2014, Jeff Law wrote: The insight to note is, that undefined SSA names should really be coalesced with something (otherwise you lost an optimization opportunity), but it doesn't matter with _what_ each use of the undefined name is coalesced, you can even identify different uses of them with different SSA names (e.g. the LHS of each using stmt). Requires some change in the order things are done in out-of-ssa. The last part is what I hinted might be problematical. If some undefined SSA_NAME appears on the RHS of two PHIs and we want to coalesce that undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those two PHIs must coalesce as well. At least that's my recollection of how all that stuff worked. Only with the usual definition of coalescing (being transitive). For undefined SSA names the definition can be mended. It was that realization that made me wonder if we should have a unique SSA_NAME at each undefined use point. It's easier to implicitely regard every individual use of an undefined SSA name as a unique name in coalescing I think (instead of having it be a unique name explicitely). That is, given: bb1: x_1 = PHI a_2, b_3(UND) ... bb2: x_4 = PHI y_5, b_3(UND) ... There is no reason to not regard the two uses of b_3 as separate and identify the first with x_1 and the second with x_2, _without_ coalescing x_1 and x_2. But yes, this doesn't fit readily into the normal coalescing merge-find framework, but rather would have to be something after coalescing when writing out-of-ssa (whenever an undefined use is rewritten just take a random other fitting variable). Ciao, Michael.
Re: Optimize type streaming
On Fri, 11 Jul 2014, Jan Hubicka wrote: On Fri, 11 Jul 2014, Jan Hubicka wrote: Hmm, walking everything first and then starting streaming sounds bad idea for locality. Can't I just re-do the walk since I know what the SCC is? I will read the code more after lunch. Might be possible with a 2nd SCC stack set up, yes. I set up hash_map to store the hash values anyway. What about sorting using walk_tree with simple callback pushing the parameter on stack if it is in the hash map and removing it from hash map? That way I do not need to duplicate the tricky SCC walking and probably won't be more terrible in overhead than qsort. Seems like a good idea at least for a prototype, will try it out. Awww, but you get edges walked that are not walked in the DFS walk ... (basically you can end up walking the whole program IL if you are unlucky). Why? I just terminate the walk on every node that is not withing the SCC region. Yes, I will unnecesarily walk all edges out of my SCC region, but I think we can live with that (SCC walk would visit them anyway, too) Ah, ok. Well, let's hope walk_tree walks all edges the DFS walk walks ;) A quick look tells me it doesn't walk DECL_VINDEX or DECL_FUNCTION_PERSONALITY for example. Richard. Honza RIchard. honza Richard. Honza Richard. Honza Thanks, Richard. Honza * lto-streamer-out.c (hash_tree): Add map argument; remove special purpose hack for pointer types. (visit): Update it. (hash_scc): Add iteration until equivalence classes stabilize. Index: lto-streamer-out.c === --- lto-streamer-out.c(revision 212426) +++ lto-streamer-out.c(working copy) @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block /* Return a hash value for the tree T. */ static hashval_t -hash_tree (struct streamer_tree_cache_d *cache, tree t) +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t *map, tree t) { #define visit(SIBLING) \ do { \ unsigned ix; \ -if (SIBLING streamer_tree_cache_lookup (cache, SIBLING, ix)) \ +if (!SIBLING) \ + v = iterative_hash_hashval_t (0, v); \ +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \ v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, ix), v); \ +else if (map) \ + v = iterative_hash_hashval_t (*map-get (SIBLING), v); \ +else \ + v = iterative_hash_hashval_t (1, v); \ } while (0) /* Hash TS_BASE. */ @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d if (CODE_CONTAINS_STRUCT (code, TS_TYPED)) { - if (POINTER_TYPE_P (t)) - { - /* For pointers factor in the pointed-to type recursively as - we cannot recurse through only pointers. - ??? We can generalize this by keeping track of the - in-SCC edges for each tree (or arbitrarily the first - such edge) and hashing that in in a second stage - (instead of the quadratic mixing of the SCC we do now). */ - hashval_t x; - unsigned ix; - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix)) - x = streamer_tree_cache_get_hash (cache, ix); - else - x = hash_tree (cache, TREE_TYPE (t)); - v = iterative_hash_hashval_t (x, v); - } - else if (code != IDENTIFIER_NODE) + if (code != IDENTIFIER_NODE) visit (TREE_TYPE (t)); } @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons static hashval_t hash_scc (struct streamer_tree_cache_d *cache, unsigned first, unsigned size) { + unsigned int last_classes = 0, iterations = 0; + /* Compute hash values for the SCC members. */ for (unsigned i = 0; i size; ++i) -sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t); +sccstack[first+i].hash = hash_tree (cache, NULL, sccstack[first+i].t); if (size == 1) return sccstack[first].hash; - /* Sort the SCC of type, hash pairs so that when we mix in -
Re: Use separate sections to stream non-trivial constructors
Well, just make them regular (anonymous) VAR_DECLs then ... (the fact that a CONST_DECL is anonymous is probably the only real difference - and that they are mergeable by content). Something like that, perhaps. Plan to do that incrementally - having them in symbol tabel first is an important step. There is also an option to update CONST_DECL into VAR_DECL when it is being turned into hidden. Currently things are bit inflexible because we still make difference between const decl and var decl in tree representation. Once I finish my transition to push out DECL_WITH_VIS and DECL_NON_COMMON fields, we can turn CONST_DECL into VAR_DECL with special flag saying that symbol name/address value doesn't matter. This and, of course, cleaning up constpool mess can get me occupised for months ;) Honza
Re: Optimize type streaming
Ah, ok. Well, let's hope walk_tree walks all edges the DFS walk walks ;) A quick look tells me it doesn't walk DECL_VINDEX or DECL_FUNCTION_PERSONALITY for example. I guess it should - will try patch adding that :) Both seems like just ommisions - these fileds were added quite recently (moved from elsewhere) and probably walk tree was not updated and no current user cares. DECL_VINDEX is a strange animal, since it always points to INTEGER_CST in well formed IL. I wanted to turn it into integer, but of course both C++ and Java temporarily sets it into TREE_LIST. We probably could stream it as integer saving the walk. Those two should not be in-SCC edges. I can easily match the two stacks for size and see if I find something catastrophical. Honza
Fix PR61772: ifcvt removing asm volatile gotos
Hi, our kernel guys saw this problem when they used an asm goto to implement a conditional where the then-block was empty. RTL ifcvt happily goes on and removes the whole block and the jump instruction, even though it has side-effects (as marked with the volatility). Patch below fixes it by checking that it's really just a jump. Regstrapped on trunk. Okay for trunk, 4.9, 4.8? Ciao, Michael. PR rtl-optimization/61772 * ifcvt.c (dead_or_predicable): Check jump to be free of side effects. testsuite/ PR rtl-optimization/61772 * gcc.dg/torture/pr61772.c: New test. Index: ifcvt.c === --- ifcvt.c (revision 212430) +++ ifcvt.c (working copy) @@ -4138,6 +4138,8 @@ dead_or_predicable (basic_block test_bb, if (JUMP_P (end)) { + if (!onlyjump_p (end)) + return FALSE; if (head == end) { head = end = NULL_RTX; Index: testsuite/gcc.dg/torture/pr61772.c === --- testsuite/gcc.dg/torture/pr61772.c (revision 0) +++ testsuite/gcc.dg/torture/pr61772.c (working copy) @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-final { scan-assembler-times XXX 2 } } */ + +static inline __attribute__((always_inline)) int dec_and_test (int *i) +{ +asm volatile goto (XXX %0, %l[cc_label] + : : m (*i) : memory : cc_label); +return 0; +cc_label: +return 1; +} +extern int getit (int *); +int f (int *i, int cond) +{ + if (cond) { + getit (0); + if (dec_and_test (i)) + getit (i); + return 42; + } + if (dec_and_test (i)) +(void)1; + return getit (i); +}
Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns
On 10/07/14 22:53, Maxim Kuvyrkov wrote: On Jul 10, 2014, at 8:00 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 30/06/14 21:39, Jeff Law wrote: On 06/27/14 02:29, Kyrill Tkachov wrote: Hi all, This patch generalises the TARGET_MACRO_FUSION_PAIR_P hook usage to work on more than just compares and conditional branches for which it was initially designed for (for x86). There are some instructions in arm and aarch64 that can be fused together when they're back to back in the instruction stream and I'd like to use this hook to keep them together. I'll post an implementation of TARGET_MACRO_FUSION_PAIR_P for arm and aarch64 shortly... Bootstrapped and tested on x86, aarch64-none-linux-gnu and arm-none-linux-gnueabihf. Ok for trunk? The patch looks good to me, but some cleanup suggestions below. Thanks, here's an updated patch. How's this? 2014-07-11 Ramana Radhakrishnan ramana.radhakrish...@arm.com Kyrylo Tkachov kyrylo.tkac...@arm.com * sched-deps.c (try_group_insn): Generalise macro fusion hook usage to any two insns. Update comment. Rename to sched_macro_fuse_insns. (sched_analyze_insn): Update use of try_group_insn to sched_macro_fuse_insns. * config/i386/i386.c (ix86_macro_fusion_pair_p): Reject 2nd arguments that are not conditional jumps. commit e36b8977738dbe3f63445199710ca627ab37e243 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Jun 13 11:41:41 2014 +0100 [sched-deps] Generalise macro fusion hook usage diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8046c67..7dd2ce5 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25817,6 +25817,9 @@ ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp) rtx compare_set = NULL_RTX, test_if, cond; rtx alu_set = NULL_RTX, addr = NULL_RTX; + if (!any_condjump_p (condjmp)) +return false; + if (get_attr_type (condgen) != TYPE_TEST get_attr_type (condgen) != TYPE_ICMP get_attr_type (condgen) != TYPE_INCDEC diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 7cafc8b..c01a8a6 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -2820,35 +2820,48 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rtx insn) sched_deps_info-finish_rhs (); } -/* Try to group comparison and the following conditional jump INSN if - they're already adjacent. This is to prevent scheduler from scheduling - them apart. */ +/* Try to group two fuseable insns together to prevent scheduler + from scheduling them apart. */ static void try_group_insn (rtx insn) Please rename try_group_insn to sched_macro_fuse_insns. The call is predicated to try_group_insn is predicated on targetm.sched.macro_fusion_p, so this code will not be used for any other kinds of fusion -- might as well just state that in the name,. { - unsigned int condreg1, condreg2; - rtx cc_reg_1; rtx prev; - if (!any_condjump_p (insn)) + if (!targetm.sched.macro_fusion_p ()) return; This is a no-op since there is a check on the upper level. Please remove. - targetm.fixed_condition_code_regs (condreg1, condreg2); - cc_reg_1 = gen_rtx_REG (CCmode, condreg1); - prev = prev_nonnote_nondebug_insn (insn); - if (!reg_referenced_p (cc_reg_1, PATTERN (insn)) - || !prev - || !modified_in_p (cc_reg_1, prev)) -return; + if (any_condjump_p (insn)) +{ + unsigned int condreg1, condreg2; + rtx cc_reg_1; + targetm.fixed_condition_code_regs (condreg1, condreg2); + cc_reg_1 = gen_rtx_REG (CCmode, condreg1); + prev = prev_nonnote_nondebug_insn (insn); + if (!reg_referenced_p (cc_reg_1, PATTERN (insn)) + || !prev + || !modified_in_p (cc_reg_1, prev)) + return; - /* Different microarchitectures support macro fusions for different - combinations of insn pairs. */ - if (!targetm.sched.macro_fusion_pair_p - || !targetm.sched.macro_fusion_pair_p (prev, insn)) -return; + if (targetm.sched.macro_fusion_pair_p (prev, insn)) +SCHED_GROUP_P (insn) = 1; +} + else +{ + rtx insn_set = single_set (insn); + + prev = prev_nonnote_nondebug_insn (insn); + if (prev + insn_set + single_set (prev) + modified_in_p (SET_DEST (insn_set), prev) Invert the check (as done in the upper if-clause) and cut it here. Then you can use a single unified if (targetm.sched.macro_fusion_pair_p (prev, insn)) SCHED_GROUP_P (insn) = 1; as the final statement of the function. Thank you, -- Maxim Kuvyrkov www.linaro.org commit cb0584229d9247df805df35dc4c5bffbb839d59f Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Jun 13 11:41:41 2014 +0100 [sched-deps] Generalise macro fusion hook usage diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 1b5cbeb..6951ddd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25820,6 +25820,9 @@ ix86_macro_fusion_pair_p (rtx condgen,
Re: [PATCH 2/2] Enable elimination of zext/sext
On Fri, Jul 11, 2014 at 1:52 PM, Kugan kugan.vivekanandara...@linaro.org wrote: Thanks foe the review and suggestions. On 10/07/14 22:15, Richard Biener wrote: On Mon, Jul 7, 2014 at 8:55 AM, Kugan kugan.vivekanandara...@linaro.org wrote: [...] For -fwrapv, it is due to how PROMOTE_MODE is defined in arm back-end. In the test-case, a function (which has signed char return type) returns -1 in one of the paths. ARM PROMOTE_MODE changes that to 255 and relies on zero/sign extension generated by RTL again for the correct value. I saw some other targets also defining similar think. I am therefore skipping removing zero/sign extension if the ssa variable can be set to negative integer constants. Hm? I think you should rather check that you are removing a sign-/zero-extension - PROMOTE_MODE tells you if it will sign- or zero-extend. Definitely + /* In some architectures, negative integer constants are truncated and + sign changed with target defined PROMOTE_MODE macro. This will impact + the value range seen here and produce wrong code if zero/sign extensions + are eliminated. Therefore, return false if this SSA can have negative + integers. */ + if (is_gimple_assign (stmt) + (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_unary)) +{ + tree rhs1 = gimple_assign_rhs1 (stmt); + if (TREE_CODE (rhs1) == INTEGER_CST + !TYPE_UNSIGNED (TREE_TYPE (ssa)) + tree_int_cst_compare (rhs1, integer_zero_node) == -1) + return false; looks completely bogus ... (an unary op with a constant operand?) instead you want to do sth like I see that unary op with a constant operand is not possible in gimple. What I wanted to check here is any sort of constant loads; but seems that will not happen in gimple. Is PHI statements the only possible statements where we will end up with such constants. No, in theory you can have ssa_1 = -1; but that's not unary but a GIMPLE_SINGLE_RHS and thus gimple_assign_rhs_code (stmt) == INTEGER_CST. mode = TYPE_MODE (TREE_TYPE (ssa)); rhs_uns = TYPE_UNSIGNED (TREE_TYPE (ssa)); PROMOTE_MODE (mode, rhs_uns, TREE_TYPE (ssa)); instead of initializing rhs_uns from ssas type. That is, if PROMOTE_MODE tells you to promote _not_ according to ssas sign then honor that. This is triggered in pr43017.c in function foo for arm-none-linux-gnueabi. where, the gimple statement that cause this looks like: . # _3 = PHI _17(7), -1(2) bb43: return _3; ARM PROMOTE_MODE changes the sign for integer constants only and hence looking at the variable with PROMOTE_MODE is not changing the sign in this case. #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \ if (GET_MODE_CLASS (MODE) == MODE_INT \ GET_MODE_SIZE (MODE) 4) \ { \ if (MODE == QImode) \ UNSIGNEDP = 1; \ else if (MODE == HImode) \ UNSIGNEDP = 1; \ (MODE) = SImode; \ } Where does it only apply for constants? It applies to all QImode and HImode entities. As for the -fno-strict-overflow case, if the variables overflows, in VRP dumps, I see +INF(OVF), but the value range stored in ssa has TYPE_MAX. We therefore should limit the comparison to (TYPE_MIN VR_MIN VR_MAX TYPE_MAX) instead of (TYPE_MIN = VR_MIN VR_MAX = TYPE_MAX) when checking to be sure that this is not the overflowing case. Attached patch changes this. I don't think that's necessary - the overflow cases happen only when that overflow has undefined behavior, thus any valid program will have values = MAX. I see that you have now removed +INF(OVF). I will change it this way. I have not removed anything, I just fixed a bug. Richard. Thanks again, Kugan
Change an assignment to an assert in varpool
I noticed that we set node-definition = true in varpool_assemble_decl. The surrounding code suggests that we should only ever get there if definition is already true, so I changed it to an assert. The question is interesting for some modifications I'm making for ptx (which requires declarations for external variables in the assembly). The only thing that was tripped by the assert was a variable created by asan. AFAICT the problem there is that asan calls varpool_assemble_decl when it should be calling varpool_finalize_decl. Bootstrapped and tested on x86_64-linux, ok? Bernd * asan.c (asan_finish_file): Use varpool_finalize_decl instead of varpool_assemble_decl. * varpool.c (varpool_assemble_decl): Assert that node-definition is true. diff --git a/gcc/asan.c b/gcc/asan.c index b9a4a91..0d78634 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2595,7 +2595,7 @@ asan_finish_file (void) TREE_CONSTANT (ctor) = 1; TREE_STATIC (ctor) = 1; DECL_INITIAL (var) = ctor; - varpool_assemble_decl (varpool_node_for_decl (var)); + varpool_finalize_decl (var); fn = builtin_decl_implicit (BUILT_IN_ASAN_REGISTER_GLOBALS); tree gcount_tree = build_int_cst (pointer_sized_int_node, gcount); diff --git a/gcc/varpool.c b/gcc/varpool.c index 79f07bf..a72fb22 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -473,7 +473,7 @@ varpool_assemble_decl (varpool_node *node) { assemble_variable (decl, 0, 1, 0); gcc_assert (TREE_ASM_WRITTEN (decl)); - node-definition = true; + gcc_assert (node-definition); assemble_aliases (node); return true; }
[BUILDROBOT] gcov patch (was: r212448 - in /trunk: gcc/ChangeLog gcc/Makefile...)
On Fri, 2014-07-11 05:48:08 -, x...@gcc.gnu.org x...@gcc.gnu.org wrote: Author: xur Date: Fri Jul 11 05:48:07 2014 New Revision: 212448 URL: https://gcc.gnu.org/viewcvs?rev=212448root=gccview=rev Log: 2014-07-10 Rong Xu x...@google.com Add gcov-tool: an offline gcda profile processing tool Support. [...] See eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639, it breaks like this: [...] g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT gcov-tool.o -MMD -MP -MF ./.deps/gcov-tool.TPo /home/jbglaw/repos/gcc/gcc/gcov-tool.c /usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, gcov_info*)’: /usr/include/sys/stat.h:323: error: too few arguments to function ‘int mkdir(const char*, __mode_t)’ /home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file make[1]: *** [gcov-tool.o] Error 1 make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc' make: *** [all-gcc] Error 2 MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:Lauf nicht vor Deinem Glück davon: the second : Es könnte hinter Dir stehen! signature.asc Description: Digital signature
Re: [BUILDROBOT] gcov patch
On Fri, 2014-07-11 15:03:06 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: See eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639, it breaks like this: [...] g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT gcov-tool.o -MMD -MP -MF ./.deps/gcov-tool.TPo /home/jbglaw/repos/gcc/gcc/gcov-tool.c /usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, gcov_info*)’: /usr/include/sys/stat.h:323: error: too few arguments to function ‘int mkdir(const char*, __mode_t)’ /home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file make[1]: *** [gcov-tool.o] Error 1 make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc' make: *** [all-gcc] Error 2 [...] +#ifdef TARGET_POSIX_IO + if (mkdir (out, 0755) == -1 errno != EEXIST) +#else + if (mkdir (out) == -1 errno != EEXIST) +#endif And with POSIX I/O, this should probably use the S_I* macros for the mode bits? MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:Arroganz verkürzt fruchtlose Gespräche. the second : -- Jan-Benedict Glaw signature.asc Description: Digital signature
Re: [PATCH, AArch64, Testsuite] Specify -fno-use-caller-save for func-ret* tests
Hi Tom, On 8 July 2014 20:45, Tom de Vries tom_devr...@mentor.com wrote: On 01-07-14 19:26, Jeff Law wrote: On 07/01/14 09:51, Yufeng Zhang wrote: Hi, This patch resolves a conflict between the aapcs64 test framework for func-ret tests and the optimization option -fuse-caller-save, which was enabled by default at -O1 or above recently. Minor detail: it's enabled by default at -O2 or above: ... { OPT_LEVELS_2_PLUS, OPT_fuse_caller_save, NULL, 1 }, ... I see. Thanks for correcting me. Basically, the test framework has an inline-assembly based mechanism in place which invokes the test facility function right on the return of a tested function. The compiler with -fuse-caller-save is unable to identify the unconventional call graph and carries out the optimization regardless. AFAIU, we're overwriting the return register to implement a function call at return in order to see the exact state of registers at return: Yes, exactly. ... __attribute__ ((noinline)) unsigned char func_return_val_0 (int i, double d, unsigned char t) { asm (::r (i),r (d)); asm volatile (mov %0, x30 : =r (saved_return_address)); asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc)); return t; } ... But we're not informing the compiler that a hidden function call takes place. This patch fixes that, and there's no need to disable fuse-caller-save. Tested with aarch64 build. OK for trunk? Thanks, - Tom 2014-07-08 Tom de Vries t...@codesourcery.com * gcc.target/aarch64/aapcs64/aapcs64.exp (additional_flags_for_func_ret): Remove. (func-ret-*.c): Use additional_flags. * gcc.target/aarch64/aapcs64/abitest-2.h (FUNC_VAL_CHECK): Add missing call_used_regs clobbers. Index: gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp === --- gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp (revision 212294) +++ gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp (working copy) @@ -48,15 +48,11 @@ foreach src [lsort [glob -nocomplain $sr } # Test function return value. -# Disable -fuse-caller-save to prevent the compiler from generating -# conflicting code. -set additional_flags_for_func_ret $additional_flags -append additional_flags_for_func_ret -fno-use-caller-save foreach src [lsort [glob -nocomplain $srcdir/$subdir/func-ret-*.c]] { if {[runtest_file_p $runtests $src]} { c-torture-execute [list $src \ $srcdir/$subdir/abitest.S] \ -$additional_flags_for_func_ret +$additional_flags } } Index: gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h === --- gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h (revision 212294) +++ gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h (working copy) @@ -80,10 +80,18 @@ __attribute__ ((noinline)) type FUNC_NAM The previous approach of sequentially calling myfunc right after \ this function does not guarantee myfunc see the exact register \ content, as compiler may emit code in between the two calls, \ - especially during the -O0 codegen. */ \ + especially during the -O0 codegen. \ + However, since we're doing a call, we need to clobber all call \ + used regs. */ \ asm volatile (mov %0, x30 : =r (saved_return_address)); \ -asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc)); \ -return t; \ +asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc) : \ + x0, x1, x2, x3, x4, x5, x6, x7, \ + x8, x9,x10, x11, x12, x13, x14, x15, \ + x16, x17, x18, \ + v0, v1,v2, v3, v4, v5, v6, v7, \ + v16, v17, v18, v19, v20, v21, v22, v23, \ + v24, v25, v26, v27, v28, v29, v30, v31);\ +return t;\ } #include TESTFILE Your patch is probably OK (although I'm no longer in a position to verify the code-gen by myself easily). I still prefer to have -fuse-caller-save disabled for these tests in order to have a strictly-conformed ABI environment for these ABI tests. I'll leave the AArch64 maintainers to comment. Thanks, Yufeng
Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns
On Fri, 11 Jul 2014, Kyrill Tkachov wrote: On 10/07/14 22:53, Maxim Kuvyrkov wrote: The patch looks good to me, but some cleanup suggestions below. Thanks, here's an updated patch. How's this? You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the first if branch, keeping only one SCHED_GROUP_P assignment at the end of the function. Alexander
Re: Change an assignment to an assert in varpool
I noticed that we set node-definition = true in varpool_assemble_decl. The surrounding code suggests that we should only ever get there if definition is already true, so I changed it to an assert. The question is interesting for some modifications I'm making for ptx (which requires declarations for external variables in the assembly). The only thing that was tripped by the assert was a variable created by asan. AFAICT the problem there is that asan calls varpool_assemble_decl when it should be calling varpool_finalize_decl. Bootstrapped and tested on x86_64-linux, ok? Bernd * asan.c (asan_finish_file): Use varpool_finalize_decl instead of varpool_assemble_decl. * varpool.c (varpool_assemble_decl): Assert that node-definition is true. OK, we used to call assemble_decl directly before the transition to varpool was finished, seems I never got around adding an assert. Thanks, Honza diff --git a/gcc/asan.c b/gcc/asan.c index b9a4a91..0d78634 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -2595,7 +2595,7 @@ asan_finish_file (void) TREE_CONSTANT (ctor) = 1; TREE_STATIC (ctor) = 1; DECL_INITIAL (var) = ctor; - varpool_assemble_decl (varpool_node_for_decl (var)); + varpool_finalize_decl (var); fn = builtin_decl_implicit (BUILT_IN_ASAN_REGISTER_GLOBALS); tree gcount_tree = build_int_cst (pointer_sized_int_node, gcount); diff --git a/gcc/varpool.c b/gcc/varpool.c index 79f07bf..a72fb22 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -473,7 +473,7 @@ varpool_assemble_decl (varpool_node *node) { assemble_variable (decl, 0, 1, 0); gcc_assert (TREE_ASM_WRITTEN (decl)); - node-definition = true; + gcc_assert (node-definition); assemble_aliases (node); return true; }
Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns
On 11/07/14 14:20, Alexander Monakov wrote: On Fri, 11 Jul 2014, Kyrill Tkachov wrote: On 10/07/14 22:53, Maxim Kuvyrkov wrote: The patch looks good to me, but some cleanup suggestions below. Thanks, here's an updated patch. How's this? You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the first if branch, keeping only one SCHED_GROUP_P assignment at the end of the function. Yes, your're right, I had missed that. Will do. Thanks, Kyrill Alexander
Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts
On Fri, 11 Jul 2014, Richard Biener wrote: On Thu, 10 Jul 2014, Jakub Jelinek wrote: On Thu, Jul 10, 2014 at 04:30:13PM +0200, Richard Biener wrote: Compromise hack below. It simply avoids the transform for sources that c_strlen can compute a length of. That fixes all strlenopt testcase apart from strlenopt-8.c which does memcpy (, flag ? a : b); which then still folds during gimplification. I have XFAILed that. I've tried to comb my way through strlenopt but failed to quickly add support for generic stores (it has very rough support for char stores, see also PR61773). Does the below look ok? I can look at the tree-ssa-strlen.c stuff and removing the !c_strlen hack incrementally. Ok, I'll bootstrap/test the patch then and will commit if there are no issues left. This is what I have applied. Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. 2014-07-11 Richard Biener rguent...@suse.de PR middle-end/61473 * builtins.c (fold_builtin_memory_op): Inline memory moves that can be implemented with a single load followed by a single store. (c_strlen): Only warn when only_value is not 2. * gcc.dg/memmove-4.c: New testcase. * gcc.dg/strlenopt-8.c: XFAIL. * gfortran.dg/coarray_lib_realloc_1.f90: Adjust. Index: gcc/builtins.c === *** gcc/builtins.c.orig 2014-07-11 09:54:49.844932232 +0200 --- gcc/builtins.c 2014-07-11 13:15:35.297102917 +0200 *** get_pointer_alignment (tree exp) *** 535,540 --- 535,544 len = c_strlen (src, 1); if (len) expand_expr (len, ...); would not evaluate the side-effects. +If ONLY_VALUE is two then we do not emit warnings about out-of-bound +accesses. Note that this implies the result is not going to be emitted +into the instruction stream. + The value returned is of type `ssizetype'. Unfortunately, string_constant can't access the values of const char *** c_strlen (tree src, int only_value) *** 606,612 /* If the offset is known to be out of bounds, warn, and call strlen at runtime. */ ! if (offset 0 || offset max) { /* Suppress multiple warnings for propagated constant strings. */ if (! TREE_NO_WARNING (src)) --- 610,617 /* If the offset is known to be out of bounds, warn, and call strlen at runtime. */ ! if (only_value != 2 !(offset 0 || offset max)) { /* Suppress multiple warnings for propagated constant strings. */ if (! TREE_NO_WARNING (src)) *** fold_builtin_memory_op (location_t loc, *** 8637,8647 unsigned int src_align, dest_align; tree off0; ! if (endp == 3) { ! src_align = get_pointer_alignment (src); ! dest_align = get_pointer_alignment (dest); /* Both DEST and SRC must be pointer types. ??? This is what old code did. Is the testing for pointer types really mandatory? --- 8642,8698 unsigned int src_align, dest_align; tree off0; ! /* Build accesses at offset zero with a ref-all character type. */ ! off0 = build_int_cst (build_pointer_type_for_mode (char_type_node, !ptr_mode, true), 0); ! ! /* If we can perform the copy efficiently with first doing all loads ! and then all stores inline it that way. Currently efficiently !means that we can load all the memory into a single integer !register which is what MOVE_MAX gives us. */ ! src_align = get_pointer_alignment (src); ! dest_align = get_pointer_alignment (dest); ! if (tree_fits_uhwi_p (len) ! compare_tree_int (len, MOVE_MAX) = 0 ! /* ??? Don't transform copies from strings with known length this !confuses the tree-ssa-strlen.c. This doesn't handle !the case in gcc.dg/strlenopt-8.c which is XFAILed for that !reason. */ ! !c_strlen (src, 2)) { ! unsigned ilen = tree_to_uhwi (len); ! if (exact_log2 (ilen) != -1) ! { ! tree type = lang_hooks.types.type_for_size (ilen * 8, 1); ! if (type ! TYPE_MODE (type) != BLKmode ! (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT ! == ilen * 8) ! /* If the pointers are not aligned we must be able to !emit an unaligned load. */ ! ((src_align = GET_MODE_ALIGNMENT (TYPE_MODE (type)) ! dest_align = GET_MODE_ALIGNMENT (TYPE_MODE (type))) ! || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type), !MIN (src_align, dest_align ! { ! tree srctype = type;
Re: [GSoC] generation of Gimple loops with empty bodies
I've attached an improved version of the patch and the ChangeLog entry. Are they fine for trunk? -- Cheers, Roman Gareev. 2014-07-11 Roman Gareev gareevro...@gmail.com gcc/ * graphite-isl-ast-to-gimple.c (gmp_cst_to_tree): New function. (graphite_verify): New function. (ivs_params_clear): New function. (gcc_expression_from_isl_ast_expr_id): New function. (gcc_expression_from_isl_expr_int): New function. (binary_op_to_tree): New function. (ternary_op_to_tree): New function. (unary_op_to_tree): New function. (nary_op_to_tree): New function. (gcc_expression_from_isl_expr_op): New function. (gcc_expression_from_isl_expression): New function. (graphite_create_new_loop): New function. (translate_isl_ast_for_loop): New function. (get_upper_bound): New function. (graphite_create_new_loop_guard): New function. (translate_isl_ast_node_for): New function. (translate_isl_ast): New function. (add_parameters_to_ivs_params): New function. (scop_to_isl_ast): New parameter ip. (graphite_regenerate_ast_isl): Add generation of GIMPLE code. Index: gcc/graphite-isl-ast-to-gimple.c === --- gcc/graphite-isl-ast-to-gimple.c(revision 212450) +++ gcc/graphite-isl-ast-to-gimple.c(working copy) @@ -25,7 +25,14 @@ #include isl/map.h #include isl/union_map.h #include isl/ast_build.h +#if defined(__cplusplus) +extern C { #endif +#include isl/val_gmp.h +#if defined(__cplusplus) +} +#endif +#endif #include system.h #include coretypes.h @@ -42,6 +49,9 @@ #include cfgloop.h #include tree-data-ref.h #include sese.h +#include tree-ssa-loop-manip.h +#include tree-scalar-evolution.h +#include map #ifdef HAVE_cloog #include graphite-poly.h @@ -52,6 +62,515 @@ static bool graphite_regenerate_error; +/* We always use signed 128, until isl is able to give information about +types */ + +static tree *graphite_expression_size_type = int128_integer_type_node; + +/* Converts a GMP constant VAL to a tree and returns it. */ + +static tree +gmp_cst_to_tree (tree type, mpz_t val) +{ + tree t = type ? type : integer_type_node; + mpz_t tmp; + + mpz_init (tmp); + mpz_set (tmp, val); + wide_int wi = wi::from_mpz (t, tmp, true); + mpz_clear (tmp); + + return wide_int_to_tree (t, wi); +} + +/* Verifies properties that GRAPHITE should maintain during translation. */ + +static inline void +graphite_verify (void) +{ +#ifdef ENABLE_CHECKING + verify_loop_structure (); + verify_loop_closed_ssa (true); +#endif +} + +/* IVS_PARAMS maps ISL's scattering and parameter identifiers + to corresponding trees. */ + +typedef std::mapisl_id *, tree ivs_params; + +/* Free all memory allocated for ISL's identifiers. */ + +void ivs_params_clear (ivs_params ip) +{ + std::mapisl_id *, tree::iterator it; + for (it = ip.begin (); + it != ip.end (); it++) +{ + isl_id_free (it-first); +} +} + +static tree +gcc_expression_from_isl_expression (tree type, __isl_take isl_ast_expr *, + ivs_params ip); + +/* Return the tree variable that corresponds to the given isl ast identifier + expression (an isl_ast_expr of type isl_ast_expr_id). */ + +static tree +gcc_expression_from_isl_ast_expr_id (__isl_keep isl_ast_expr *expr_id, +ivs_params ip) +{ + gcc_assert (isl_ast_expr_get_type (expr_id) == isl_ast_expr_id); + isl_id *tmp_isl_id = isl_ast_expr_get_id (expr_id); + std::mapisl_id *, tree::iterator res; + res = ip.find (tmp_isl_id); + isl_id_free (tmp_isl_id); + gcc_assert (res != ip.end () + Could not map isl_id to tree expression); + isl_ast_expr_free (expr_id); + return res-second; +} + +/* Converts an isl_ast_expr_int expression E to a GCC expression tree of + type TYPE. */ + +static tree +gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr) +{ + gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int); + isl_val *val = isl_ast_expr_get_val (expr); + mpz_t val_mpz_t; + mpz_init (val_mpz_t); + tree res; + if (isl_val_get_num_gmp (val, val_mpz_t) == -1) +res = NULL_TREE; + else +res = gmp_cst_to_tree (type, val_mpz_t); + isl_val_free (val); + isl_ast_expr_free (expr); + mpz_clear (val_mpz_t); + return res; +} + +/* Converts a binary isl_ast_expr_op expression E to a GCC expression tree of + type TYPE. */ + +static tree +binary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params ip) +{ + isl_ast_expr *arg_expr = isl_ast_expr_get_op_arg (expr, 0); + tree tree_lhs_expr = gcc_expression_from_isl_expression (type, arg_expr, ip); + arg_expr = isl_ast_expr_get_op_arg (expr, 1); + tree tree_rhs_expr = gcc_expression_from_isl_expression (type, arg_expr, ip); + enum isl_ast_op_type expr_type =
Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts
On Fri, Jul 11, 2014 at 03:36:15PM +0200, Richard Biener wrote: *** c_strlen (tree src, int only_value) *** 606,612 /* If the offset is known to be out of bounds, warn, and call strlen at runtime. */ ! if (offset 0 || offset max) { /* Suppress multiple warnings for propagated constant strings. */ if (! TREE_NO_WARNING (src)) --- 610,617 /* If the offset is known to be out of bounds, warn, and call strlen at runtime. */ ! if (only_value != 2 !(offset 0 || offset max)) { /* Suppress multiple warnings for propagated constant strings. */ if (! TREE_NO_WARNING (src)) This looks wrong. I'd say you only want to disable the warning for only_value != 2, but still return NULL_TREE, otherwise the strlen call will be out of bounds in the compiler. So move only_value != 2 down to the second if ? Jakub
[PATCH][Ping v4] Add patch for debugging compiler ICEs
Ping. Added small changes due to previous discussion in community. Original Message Subject:[PATCH][Ping v3] Add patch for debugging compiler ICEs Date: Fri, 04 Jul 2014 18:32:44 +0400 From: Maxim Ostapenko m.ostape...@partner.samsung.com To: GCC Patches gcc-patches@gcc.gnu.org CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com, tsaund...@mozilla.com, Maxim Ostapenko chefm...@gmail.com Ping. Original Message Subject:[PATCH][Ping v2] Add patch for debugging compiler ICEs Date: Thu, 26 Jun 2014 19:46:08 +0400 From: Maxim Ostapenko m.ostape...@partner.samsung.com To: GCC Patches gcc-patches@gcc.gnu.org CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com, tsaund...@mozilla.com, Maxim Ostapenko chefm...@gmail.com Ping. Original Message Subject:[PATCH][Ping] Add patch for debugging compiler ICEs Date: Wed, 11 Jun 2014 18:15:27 +0400 From: Maxim Ostapenko m.ostape...@partner.samsung.com To: GCC Patches gcc-patches@gcc.gnu.org CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com, tsaund...@mozilla.com, chefm...@gmail.com Ping. Original Message Subject:[PATCH] Add patch for debugging compiler ICEs Date: Mon, 02 Jun 2014 19:21:14 +0400 From: Maxim Ostapenko m.ostape...@partner.samsung.com To: GCC Patches gcc-patches@gcc.gnu.org CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com, tsaund...@mozilla.com, chefm...@gmail.com Hi, A years ago there was a discussion (https://gcc.gnu.org/ml/gcc-patches/2004-01/msg02437.html) about debugging compiler ICEs that resulted in a patch from Jakub, which dumps useful information into temporary file, but for some reasons this patch wasn't applied to trunk. This is the resurrected patch with added GCC version information into generated repro file. -Maxim 2014-06-02 Jakub Jelinek ja...@redhat.com Max Ostapenko m.ostape...@partner.samsung.com * diagnostic.c (diagnostic_action_after_output): Exit with ICE_EXIT_CODE instead of FATAL_EXIT_CODE. * gcc.c (execute): Don't free first string early, but at the end of the function. Call retry_ice if compiler exited with ICE_EXIT_CODE. (main): Factor out common code. (print_configuration): New function. (try_fork): Likewise. (redirect_stdout_stderr): Likewise. (files_equal_p): Likewise. (check_repro): Likewise. (run_attempt): Likewise. (generate_preprocessed_code): Likewise. (append_text): Likewise. (try_generate_repro): Likewise. diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 0cc7593..67b8c5b 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -492,7 +492,7 @@ diagnostic_action_after_output (diagnostic_context *context, real_abort (); diagnostic_finish (context); fnotice (stderr, compilation terminated.\n); - exit (FATAL_EXIT_CODE); + exit (ICE_EXIT_CODE); default: gcc_unreachable (); diff --git a/gcc/gcc.c b/gcc/gcc.c index 6cd08ea..045363c 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -43,6 +43,13 @@ compilation is specified by a string called a spec. */ #include params.h #include vec.h #include filenames.h +#ifdef HAVE_UNISTD_H +#include unistd.h +#endif + +#if !(defined (__MSDOS__) || defined (OS2) || defined (VMS)) +#define RETRY_ICE_SUPPORTED +#endif /* By default there is no special suffix for target executables. */ /* FIXME: when autoconf is fixed, remove the host check - dj */ @@ -253,6 +260,9 @@ static void init_gcc_specs (struct obstack *, const char *, const char *, static const char *convert_filename (const char *, int, int); #endif +#ifdef RETRY_ICE_SUPPORTED +static void try_generate_repro (const char *prog, const char **argv); +#endif static const char *getenv_spec_function (int, const char **); static const char *if_exists_spec_function (int, const char **); static const char *if_exists_else_spec_function (int, const char **); @@ -2850,7 +2860,7 @@ execute (void) } } - if (string != commands[i].prog) + if (i string != commands[i].prog) free (CONST_CAST (char *, string)); } @@ -2903,6 +2913,16 @@ execute (void) else if (WIFEXITED (status) WEXITSTATUS (status) = MIN_FATAL_STATUS) { +#ifdef RETRY_ICE_SUPPORTED + /* For ICEs in cc1, cc1obj, cc1plus see if it is + reproducible or not. */ + const char *p; + if (WEXITSTATUS (status) == ICE_EXIT_CODE + i == 0 + (p = strrchr (commands[0].argv[0], DIR_SEPARATOR)) + ! strncmp (p + 1, cc1, 3)) + try_generate_repro (commands[0].prog, commands[0].argv); +#endif if (WEXITSTATUS (status) greatest_status) greatest_status = WEXITSTATUS (status); ret_code = -1; @@ -2960,6 +2980,9 @@ execute (void) }
Re: [GSoC] generation of Gimple loops with empty bodies
On 11/07/2014 15:41, Roman Gareev wrote: I've attached an improved version of the patch and the ChangeLog entry. Are they fine for trunk? LGTM. Thank you! Tobias
Re: [PATCH v3 3/3] Port libstdc++ pretty-printers to Python 2 + Python 3
On 10/07/14 22:48 -0400, Samuel Bronson wrote: PR libstdc++/58962 * python/libstdcxx/v6/printers.py: Port to Python 2+3 (imap): New compat function. (izip): Likewise. (Iterator): New mixin to allow writing iterators in Python 3 style regardless of which version we're running on. [Python3] (long) New compat alias for int. * testsuite/lib/gdb-test.exp: Port to Python 2+3 (print syntax) Thanks, I've committed this with a few tiny changes to the comments (s/compatability/compatibility/ and the comment about your GDB PR). I'm not planning to backport this myself, but if the patch applies cleanly or if someone else does the work then I have no objections to it going on the release branches (after the branch reopens post-4.9.1 of course). Thanks very much for doing this.
Re: [PATCH preprocessor/61389] - libcpp diagnostics shouldn't talk about ISO C99 for C++ input files
Here is a preprocessor patch to make error messages show C++11 and other relevant C++ language instead of C99. Built and tested on x86_64-linux. This caused FAIL: gcc.dg/cpp/macsyntx.c (test for excess errors) FAIL: gcc.dg/cpp/macsyntx.c (test for excess errors) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 54) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 54) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 55) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 55) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 62) FAIL: gcc.dg/cpp/macsyntx.c (test for warnings, line 62) FAIL: gcc.dg/cpp/sysmac1.c (test for excess errors) FAIL: gcc.dg/cpp/sysmac1.c (test for excess errors) FAIL: gcc.dg/cpp/sysmac1.c (test for warnings, line 25) FAIL: gcc.dg/cpp/sysmac1.c (test for warnings, line 25) (see https://gcc.gnu.org/ml/gcc-regression/2014-07/msg00162.html). The warnings have to be adjusted as in --- ../_clean/gcc/testsuite/gcc.dg/cpp/macsyntx.c 2014-05-10 23:17:13.0 +0200 +++ gcc/testsuite/gcc.dg/cpp/macsyntx.c 2014-07-11 15:57:33.0 +0200 @@ -51,15 +51,15 @@ one(ichi\ two(ichi) /* { dg-error requires 2 } */ var0() /* OK. */ var0(ichi) /* OK. */ -var1() /* { dg-warning rest arguments to be used } */ -var1(ichi) /* { dg-warning rest arguments to be used } */ +var1() /* { dg-warning ISO C99 requires at least one argument } */ +var1(ichi) /* { dg-warning ISO C99 requires at least one argument } */ var1(ichi, ni) /* OK. */ /* This tests two oddities of GNU rest args - omitting a comma is OK, and backtracking a token on pasting an empty rest args. */ #define rest(x, y...) x ## y /* { dg-warning ISO C } */ rest(ichi,)/* OK. */ -rest(ichi) /* { dg-warning rest arguments to be used } */ +rest(ichi) /* { dg-warning ISO C99 requires at least one argument } */ #if 23 != rest(2, 3) /* OK, no warning. */ #error 23 != 23 !! #endif --- ../_clean/gcc/testsuite/gcc.dg/cpp/sysmac1.c2014-05-10 23:17:12.0 +0200 +++ gcc/testsuite/gcc.dg/cpp/sysmac1.c 2014-07-11 15:59:18.0 +0200 @@ -22,5 +22,5 @@ (str); /* { dg-warning used with arguments } */ (sys_str); /* { dg-bogus used with arguments } */ -foo (one_arg); /* { dg-warning requires rest arguments } */ +foo (one_arg); /* { dg-warning ISO C99 requires at least one argument } */ sys_foo (one_arg); /* { dg-bogus requires rest arguments } */ Dominique
Re: [BUILDROBOT] gcov patch
Sorry. This code meant to work with the different mkdir api in windows. I used wrong ifdef. Here is the patch. OK for checkin? Thanks, -Rong 2014-07-11 Rong Xu x...@google.com * gcov-tool.c (gcov_output_files): Fix build error. Index: gcov-tool.c === --- gcov-tool.c (revision 212448) +++ gcov-tool.c (working copy) @@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in /* Try to make directory if it doesn't already exist. */ if (access (out, F_OK) == -1) { -#ifdef TARGET_POSIX_IO - if (mkdir (out, 0755) == -1 errno != EEXIST) +#if !defined(_WIN32) + if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 errno != EEXIST) #else if (mkdir (out) == -1 errno != EEXIST) #endif On Fri, Jul 11, 2014 at 6:08 AM, Jan-Benedict Glaw jbg...@lug-owl.de wrote: On Fri, 2014-07-11 15:03:06 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote: See eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639, it breaks like this: [...] g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT gcov-tool.o -MMD -MP -MF ./.deps/gcov-tool.TPo /home/jbglaw/repos/gcc/gcc/gcov-tool.c /usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, gcov_info*)’: /usr/include/sys/stat.h:323: error: too few arguments to function ‘int mkdir(const char*, __mode_t)’ /home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file make[1]: *** [gcov-tool.o] Error 1 make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc' make: *** [all-gcc] Error 2 [...] +#ifdef TARGET_POSIX_IO + if (mkdir (out, 0755) == -1 errno != EEXIST) +#else + if (mkdir (out) == -1 errno != EEXIST) +#endif And with POSIX I/O, this should probably use the S_I* macros for the mode bits? MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:Arroganz verkürzt fruchtlose Gespräche. the second : -- Jan-Benedict Glaw
Re: Fix PR61772: ifcvt removing asm volatile gotos
On Fri, Jul 11, 2014 at 2:34 PM, Michael Matz wrote: PR rtl-optimization/61772 * ifcvt.c (dead_or_predicable): Check jump to be free of side effects. This is OK. Ciao! Steven
Re: [PATCH preprocessor/61389] - libcpp diagnostics shouldn't talk about ISO C99 for C++ input files
Tested on x86_64-suse-linux and installed as obvious. Andreas. PR preprocessor/61389 * gcc.dg/cpp/macsyntx.c: Update expected warnings. * gcc.dg/cpp/sysmac1.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/cpp/macsyntx.c b/gcc/testsuite/gcc.dg/cpp/macsyntx.c index 495921e..146dced 100644 --- a/gcc/testsuite/gcc.dg/cpp/macsyntx.c +++ b/gcc/testsuite/gcc.dg/cpp/macsyntx.c @@ -51,15 +51,15 @@ one(ichi\ two(ichi) /* { dg-error requires 2 } */ var0() /* OK. */ var0(ichi) /* OK. */ -var1() /* { dg-warning rest arguments to be used } */ -var1(ichi) /* { dg-warning rest arguments to be used } */ +var1() /* { dg-warning requires at least one } */ +var1(ichi) /* { dg-warning requires at least one } */ var1(ichi, ni) /* OK. */ /* This tests two oddities of GNU rest args - omitting a comma is OK, and backtracking a token on pasting an empty rest args. */ #define rest(x, y...) x ## y /* { dg-warning ISO C } */ rest(ichi,)/* OK. */ -rest(ichi) /* { dg-warning rest arguments to be used } */ +rest(ichi) /* { dg-warning requires at least one } */ #if 23 != rest(2, 3) /* OK, no warning. */ #error 23 != 23 !! #endif diff --git a/gcc/testsuite/gcc.dg/cpp/sysmac1.c b/gcc/testsuite/gcc.dg/cpp/sysmac1.c index cc8469e..54f161e 100644 --- a/gcc/testsuite/gcc.dg/cpp/sysmac1.c +++ b/gcc/testsuite/gcc.dg/cpp/sysmac1.c @@ -22,5 +22,5 @@ (str); /* { dg-warning used with arguments } */ (sys_str); /* { dg-bogus used with arguments } */ -foo (one_arg); /* { dg-warning requires rest arguments } */ -sys_foo (one_arg); /* { dg-bogus requires rest arguments } */ +foo (one_arg); /* { dg-warning requires at least one } */ +sys_foo (one_arg); /* { dg-bogus requires at least one } */ -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH] Fix for PR 61561
On 19/06/14 21:19, Yuri Gribov wrote: Thirdly, we also need to fix movhi_bytes (for pre-v4) thumb2_movhi_insn (for thumb2) and, quite possibly, thumb1_movhi_insn (for thumb1). There may well be additional changes for movqi variants as well. A general question: how should one test ARM backend patches? Is it enough to regtest ARM and Thumb2 on some modern Cortex? If not - what other variants should be tested? -Y The answer to this question is that you have to be realistic. It's clearly impossible to test every permutation, but you should, ideally, test a reasonable selection of permutations that might be affected by a change. Ie, if you make a change that is likely to affect thumb1 code generation, it doesn't make sense to only test the arm or thumb2 code generation (and vice versa). The tricky thing here is that although thumb1 and thumb2 sound quite similar in name, in practice thumb2 code generation is closer to ARM than thumb1. R.
[PATCH] AIX COMDAT
This patch enables COMDAT functionality on AIX. It defines MAKE_DECL_ONE_ONLY and removes the historical legacy link-line option of -bnodelcsect. -bnodelcsect instructs the AIX (garbage collecting) linker to preserve un-referenced CSECTs (like ELF sections). In the distant past, GCC emitted code that produced necessary sections that appeared not to be referenced, and the linker deleted them. GCC bootstraps and shows similar testsuite results with the patch. At this point, the patch needs wider use to find any problems. I plan to install the patch shortly. Thanks, David Bootstrapped on powerpc-ibm-aix7.1.0.0 * config/rs6000/aix51.h (LINK_SPEC): Remove -bnodelcsect. * config/rs6000/aix52.h (LINK_SPEC): Same. * config/rs6000/aix53.h (LINK_SPEC): Same. * config/rs6000/aix61.h (LINK_SPEC): Same. * config/rs6000/xcoff.h (MAKE_DECL_ONE_ONLY): Define. Index: aix53.h === --- aix53.h (revision 212454) +++ aix53.h (working copy) @@ -133,7 +133,7 @@ %{pthread:-lpthreads} -lc #undef LINK_SPEC -#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\ +#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\ %{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\ %{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\ %{mpe:-binitfini:poe_remote_main} Index: aix51.h === --- aix51.h (revision 212454) +++ aix51.h (working copy) @@ -121,7 +121,7 @@ %{pthread:-lpthreads} -lc #undef LINK_SPEC -#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\ +#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\ %{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\ %{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\ %{mpe:-binitfini:poe_remote_main} Index: xcoff.h === --- xcoff.h (revision 212454) +++ xcoff.h (working copy) @@ -309,3 +309,6 @@ than in the .eh_frame section. We do this because the AIX linker would otherwise garbage collect these sections. */ #define EH_FRAME_IN_DATA_SECTION 1 + +#define MAKE_DECL_ONE_ONLY(DECL) (DECL_WEAK (DECL) = 1) + Index: aix52.h === --- aix52.h (revision 212454) +++ aix52.h (working copy) @@ -133,7 +133,7 @@ %{pthread:-lpthreads} -lc #undef LINK_SPEC -#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\ +#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\ %{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\ %{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\ %{mpe:-binitfini:poe_remote_main} Index: aix61.h === --- aix61.h (revision 212454) +++ aix61.h (working copy) @@ -156,7 +156,7 @@ %{pthread:-lpthreads} -lc #undef LINK_SPEC -#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\ +#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\ %{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\ %{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\ %{mpe:-binitfini:poe_remote_main}
Re: [BUILDROBOT] gcov patch
Sorry. This code meant to work with the different mkdir api in windows. I used wrong ifdef. Here is the patch. OK for checkin? OK. I also see the following with LTO bootstrap: ../../gcc/../libgcc/libgcov-util.c:41:24: error: type of �gcov_max_filename� does not match original declaration [-Werror] extern gcov_unsigned_t gcov_max_filename; ^ ../../gcc/../libgcc/libgcov-driver.c:88:8: note: previously declared here size_t gcov_max_filename = 0; Probably both can be size_t? Honza Thanks, -Rong 2014-07-11 Rong Xu x...@google.com * gcov-tool.c (gcov_output_files): Fix build error. Index: gcov-tool.c === --- gcov-tool.c (revision 212448) +++ gcov-tool.c (working copy) @@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in /* Try to make directory if it doesn't already exist. */ if (access (out, F_OK) == -1) { -#ifdef TARGET_POSIX_IO - if (mkdir (out, 0755) == -1 errno != EEXIST) +#if !defined(_WIN32) + if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 errno != EEXIST) Sounds almost like something we could have libiberty glue for... #else if (mkdir (out) == -1 errno != EEXIST) #endif
[c++-concepts] fixes
Fix a couple of issues causing a test regression and boostrap build errors. 2014-07-11 Andrew Sutton andrew.n.sut...@gmail.com * gcc/cp/typeck.c (cp_build_function_call_vec): Emit diagnostic at the input location. * gcc/cp/error.c (dump_template_decl): Constraints are never invalid in this way. Also fixes brace warning. Andrew Sutton Index: gcc/cp/typeck.c === --- gcc/cp/typeck.c (revision 212456) +++ gcc/cp/typeck.c (working copy) @@ -3463,8 +3463,8 @@ cp_build_function_call_vec (tree functio if (tree ci = get_constraints (function)) if (!check_constraints (ci)) { + error (cannot call function %qD, function); location_t loc = DECL_SOURCE_LOCATION (function); - error_at (loc, cannot call function %qD, function); diagnose_constraints (loc, function, NULL_TREE); return error_mark_node; } Index: gcc/cp/error.c === --- gcc/cp/error.c (revision 212456) +++ gcc/cp/error.c (working copy) @@ -1280,10 +1280,7 @@ dump_template_decl (cxx_pretty_printer * if (flag_concepts) if (tree ci = get_constraints (t)) -if (ci != error_mark_node) - pp_cxx_requires_clause (pp, CI_LEADING_REQS (ci)); -else - pp_cxx_ws_string (pp, invalid-constraints); + pp_cxx_requires_clause (pp, CI_LEADING_REQS (ci)); pp_cxx_whitespace (pp); }
Re: [PATCH v2 1/3] Make libstdc++ testsuite work with pre-color GCC versions again
On 10/07/14 22:48 -0400, Samuel Bronson wrote: When I try to build test just libstdc++, or to run the testsuite from trunk against my installed libstdc++, the testsuite tries to pass -fdiagnostics-color=never to the system GCC, which is too old to know what that is. Since I really just want to test a patch for the gdb pretty-printers, and since evidently my machine is too puny to actually build GCC, this is a bit problematic. According to the documentation, setting GCC_COLORS to in the environment should be just as effective, while it clearly can't cause older GCCs to freak out, so that's just what I've done. (I've also taken the liberty of swapping the set ccflags and set cxxflags lines here so that ccflags doesn't end up with two -DLOCALEDIR flags.) libstdc++-v3/ * testsuite/lib/libstdc++.exp (libstdc++_init): Set $GCC_COLORS= instead of insisting that GCC understand -fdiagnostics-color=never Committed to trunk - thanks.
Re: [PATCH v2 2/3] libstdc++ testsuite: Turn off GDB's auto-load, list loaded libs
On 10/07/14 22:48 -0400, Samuel Bronson wrote: We load our pretty-printers explicitly, and we shouldn't need any other random -gdb.gdb or -gdb.py files from anywhere, so in this patch we turn that off by running set auto-load no. Also, run info share so that the list of loaded libraries ends up in the logs for the GDB tests. libstdc++-v3/ * testsuite/lib/gdb-test.exp (gdb-test): Turn off GDB's auto-load, list loaded libs Committed to trunk - thanks.
Re: [PATCH, libstdc++] Add the logistic distribution as an extension
On 07/10/2014 06:16 AM, Paolo Carlini wrote: .. I have another comment: are we sure the usual strategy: templatetypename _UniformRandomNumberGenerator result_type operator()(_UniformRandomNumberGenerator __urng) { return this-operator()(__urng, this-_M_param); } doesn't make sense here too? Paolo. Look OK to me too, but I would move both operator() out of line and definitely operator()(_UniformRandomNumberGenerator, const param_type) of von_mises_distribution. Ouch - that on was huge huge! So moved. The logistic operator() bodies were small looking to me but they aren't one or two lines either. And DRY. Rebuilt and retested on x86_64-linux. OK? 2014-07-11 Edward Smith-Rowland 3dw...@verizon.net Add the logistic_distribution as an extension. * include/ext/random: Add the logistic_distribution. * include/ext/random.tcc: Add the logistic_distribution. * testsuite/ext/random/logistic_distribution/cons/parms.cc: New. * testsuite/ext/random/logistic_distribution/cons/default.cc: New. * testsuite/ext/random/logistic_distribution/requirements/typedefs.cc: New. * testsuite/ext/random/logistic_distribution/operators/inequal.cc: New. * testsuite/ext/random/logistic_distribution/operators/equal.cc: New. * testsuite/ext/random/logistic_distribution/operators/serialize.cc: New. Index: include/ext/random === --- include/ext/random (revision 212442) +++ include/ext/random (working copy) @@ -2728,42 +2728,8 @@ templatetypename _UniformRandomNumberGenerator result_type operator()(_UniformRandomNumberGenerator __urng, - const param_type __p) - { - const result_type __pi - = __gnu_cxx::__math_constantsresult_type::__pi; - std::__detail::_Adaptor_UniformRandomNumberGenerator, result_type - __aurng(__urng); + const param_type __p); - result_type __f; - while (1) - { - result_type __rnd = std::cos(__pi * __aurng()); - __f = (result_type(1) + __p._M_r * __rnd) / (__p._M_r + __rnd); - result_type __c = __p._M_kappa * (__p._M_r - __f); - - result_type __rnd2 = __aurng(); - if (__c * (result_type(2) - __c) __rnd2) - break; - if (std::log(__c / __rnd2) = __c - result_type(1)) - break; - } - - result_type __res = std::acos(__f); -#if _GLIBCXX_USE_C99_MATH_TR1 - __res = std::copysign(__res, __aurng() - result_type(0.5)); -#else - if (__aurng() result_type(0.5)) - __res = -__res; -#endif - __res += __p._M_mu; - if (__res __pi) - __res -= result_type(2) * __pi; - else if (__res -__pi) - __res += result_type(2) * __pi; - return __res; - } - templatetypename _ForwardIterator, typename _UniformRandomNumberGenerator void @@ -3106,6 +3072,227 @@ const __gnu_cxx::hypergeometric_distribution_UIntType __d2) { return !(__d1 == __d2); } + /** + * @brief A logistic continuous distribution for random numbers. + * + * The formula for the logistic probability density function is + * @f[ + * p(x|\a,\b) = \frac{e^{(x - a)/b}}{b[1 + e^{(x - a)/b}]^2} + * @f] + * where @f$b 0@f$. + * + * The formula for the logistic probability function is + * @f[ + * cdf(x|\a,\b) = \frac{e^{(x - a)/b}}{1 + e^{(x - a)/b}} + * @f] + * where @f$b 0@f$. + * + * table border=1 cellpadding=10 cellspacing=0 + * caption align=topDistribution Statistics/caption + * trtdMean/tdtd@f$a@f$/td/tr + * trtdVariance/tdtd@f$b^2\pi^2/3@f$/td/tr + * trtdRange/tdtd@f$[0, \infty)@f$/td/tr + * /table + */ + templatetypename _RealType = double +class +logistic_distribution +{ + static_assert(std::is_floating_point_RealType::value, + template argument not a floating point type); + +public: + /** The type of the range of the distribution. */ + typedef _RealType result_type; + /** Parameter type. */ + struct param_type + { + typedef logistic_distributionresult_type distribution_type; + + param_type(result_type __a = result_type(0), + result_type __b = result_type(1)) + : _M_a(__a), _M_b(__b) + { + _GLIBCXX_DEBUG_ASSERT(_M_b result_type(0)); + } + + result_type + a() const + { return _M_a; } + + result_type + b() const + { return _M_b; } + + friend bool + operator==(const param_type __p1, const param_type __p2) + { return __p1._M_a == __p2._M_a + __p1._M_b == __p2._M_b; } + + private: + void _M_initialize(); + + result_type _M_a; + result_type _M_b; + }; + +
Re: [PATCH] offline gcda profile processing tool
I wonder why. The struct definition for gcov_fn_info has not changed in this patch. David On Fri, Jul 11, 2014 at 1:07 AM, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote: Here is the updated patch. The difference with patch set 3 is (1) use the gcov-counter.def for scaling operation. (2) fix wrong scaling function for time-profiler. passed with bootstrap, profiledboostrap and SPEC2006. One of the patches breaks bootstrap for me: /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO C++ forbids zero-size array ‘ctrs’ make[3]: *** [libgcov-util.o] Error 1 host compiler is gcc 4.3.4 Richard. Thanks, -Rong On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote: Hi, Honza, Please find the new patch in the attachment. This patch integrated your earlier suggestions. The noticeable changes are: (1) build specialized object for libgcov-driver.c and libgcov-merge.c and link into gcov-tool, rather including the source file. (2) split some gcov counter definition code to gcov-counter.def to avoid code duplication. (3) use a macro for gcov_read_counter(), so gcov-tool can use existing code in libgcov-merge.c with minimal change. This patch does not address the suggestion of creating a new directory for libgcov. I agree with you that's a much better and cleaner structure we should go for. We can do that in follow-up patches. I tested this patch with boostrap and profiledbootstrap. Other tests are on-going. Thanks, -Rong On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote: GCOT_TOOL needs to use this function to read the string in gcda file to memory to construct gcov_info objects. As you noticed, gcov runtime does not need this interface. But gcov-tool links with gcov runtime and it also uses the function. We could make it available in gcov_runtime, but that will slightly increase the memory footprint. Yep, it is not really pretty. I wrote bellow some plan how things may be structured in more convenient way. Any work in that direction would be welcome. The planned new functions for trunk version is: (1) overlap score b/w two profiles (2) better dumping (top hot objects/function/counters) and statistics. Once this basic version is in, we can start to add the new functionality. Sounds good. I assume the autoFDO does not go via gcov tool but rather uses custom reader of profile data at GCC side? I wonder, are there any recent overview papers/slides/text of design of AutoFDO and other features to be merged? I remember the talk from GCC Summit and I did read some of pre-LTO LIPO sources papers, but it would be nice to have somethin up to date. libgcov-util.o is built in gcc/ directory, rather in libgcc. It's directly linked to gcov-tool. So libgcov-util.o is built for HOST, not TARGET. With makefile changes, we can built HOST version of libgcov-driver.o and libgcov-merge.o. I also need to make some static functions/variables public. I suppose that can go with IN_GCOV_TOOL ifdef. So we currently have basic gcov io handling in gcc/gcov-io.c that is borrowed by libgcc/libgcov* code. We also will get libgcov-util.c in libgcc directory that is actually borrowed by by gcc/gcov-tool.c code. We now have one runtime using STDIO for streaming and kernel has custom version that goes via /proc interface (last time I looked). We added some abstraction into libgcov-interface that is the part that relies on STDIO, partly via gcov-io.c code and now we have in-memory handling code in libgcov-util. I guess it would make most sentse to put all the gcov code into a new directory (libgcov) and make it stand-alone library that can be configured 1) for stdio based runtime as we do not 2) for runtime missing the interface and relyin on user providing it 3) for use within gcov file manipulation tools with reorg of GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces. In a longer term, I would like to make FDO profiling intstrumentation to happen at linktime. For that I need to make the instrumentation code (minimal spaning tree friends) to work w/o cgraph that would ideally be done in a shared implementation Won't this get wrong answer when counters[0] is not the same? I would expected the merging code to compare the counters first. Similarly for delta counter. These *_op functions are for scaling only. So there is only one profile involved (thus there is no comparison). The merge handles are in libgcov-merge.c which have the code to handle mismatched profile targets. I see, OK then. Adding correct rounding may actually make difference for Martin's startup time work. Do you mean to use something like in RDIV macro? Yes. Honza
Re: [PATCH, libstdc++] Add the logistic distribution as an extension
Hi, On 07/11/2014 05:38 PM, Ed Smith-Rowland wrote: OK? Ok, thanks, but please adjust the dates you have on the testcases to the date of the actual commit (I suppose today or tomorrow) Thanks again! Paolo.
Re: [PATCH] offline gcda profile processing tool
What is the macro to test POSIX IO on host? The guard uses TARGET_POSIX_IO which is not right. David On Fri, Jul 11, 2014 at 1:12 AM, Christophe Lyon christophe.l...@linaro.org wrote: On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote: Here is the updated patch. The difference with patch set 3 is (1) use the gcov-counter.def for scaling operation. (2) fix wrong scaling function for time-profiler. passed with bootstrap, profiledboostrap and SPEC2006. One of the patches breaks bootstrap for me: /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO C++ forbids zero-size array ‘ctrs’ make[3]: *** [libgcov-util.o] Error 1 host compiler is gcc 4.3.4 Richard. On my side, commit 212448 breaks the cross-build of GCC for targets using newlib: * arm-none-eabi * aarch64-none-elf /usr/include/sys/stat.h: In function 8098void gcov_output_files(const char*, gcov_info*)8099: /usr/include/sys/stat.h:317: error: too few arguments to function 8098int mkdir(const char*, __mode_t)8099 /tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96: error: at this point in file make[2]: *** [gcov-tool.o] Error 1 Christophe. Thanks, -Rong On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote: Hi, Honza, Please find the new patch in the attachment. This patch integrated your earlier suggestions. The noticeable changes are: (1) build specialized object for libgcov-driver.c and libgcov-merge.c and link into gcov-tool, rather including the source file. (2) split some gcov counter definition code to gcov-counter.def to avoid code duplication. (3) use a macro for gcov_read_counter(), so gcov-tool can use existing code in libgcov-merge.c with minimal change. This patch does not address the suggestion of creating a new directory for libgcov. I agree with you that's a much better and cleaner structure we should go for. We can do that in follow-up patches. I tested this patch with boostrap and profiledbootstrap. Other tests are on-going. Thanks, -Rong On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote: GCOT_TOOL needs to use this function to read the string in gcda file to memory to construct gcov_info objects. As you noticed, gcov runtime does not need this interface. But gcov-tool links with gcov runtime and it also uses the function. We could make it available in gcov_runtime, but that will slightly increase the memory footprint. Yep, it is not really pretty. I wrote bellow some plan how things may be structured in more convenient way. Any work in that direction would be welcome. The planned new functions for trunk version is: (1) overlap score b/w two profiles (2) better dumping (top hot objects/function/counters) and statistics. Once this basic version is in, we can start to add the new functionality. Sounds good. I assume the autoFDO does not go via gcov tool but rather uses custom reader of profile data at GCC side? I wonder, are there any recent overview papers/slides/text of design of AutoFDO and other features to be merged? I remember the talk from GCC Summit and I did read some of pre-LTO LIPO sources papers, but it would be nice to have somethin up to date. libgcov-util.o is built in gcc/ directory, rather in libgcc. It's directly linked to gcov-tool. So libgcov-util.o is built for HOST, not TARGET. With makefile changes, we can built HOST version of libgcov-driver.o and libgcov-merge.o. I also need to make some static functions/variables public. I suppose that can go with IN_GCOV_TOOL ifdef. So we currently have basic gcov io handling in gcc/gcov-io.c that is borrowed by libgcc/libgcov* code. We also will get libgcov-util.c in libgcc directory that is actually borrowed by by gcc/gcov-tool.c code. We now have one runtime using STDIO for streaming and kernel has custom version that goes via /proc interface (last time I looked). We added some abstraction into libgcov-interface that is the part that relies on STDIO, partly via gcov-io.c code and now we have in-memory handling code in libgcov-util. I guess it would make most sentse to put all the gcov code into a new directory (libgcov) and make it stand-alone library that can be configured 1) for stdio based runtime as we do not 2) for runtime missing the interface and relyin on user providing it 3) for use within gcov file manipulation tools with reorg of GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces. In a longer term, I would like to make FDO profiling intstrumentation to happen at linktime. For that I need to make the instrumentation code (minimal spaning tree friends) to work w/o cgraph that would ideally be done in a shared implementation Won't this get wrong answer when counters[0] is not the same? I would expected the merging code to compare the counters first.
Re: [PATCH] offline gcda profile processing tool
On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote: I wonder why. The struct definition for gcov_fn_info has not changed in this patch. Perhaps it has been used only in C until now? Jakub
Re: [PATCH] offline gcda profile processing tool
right. Rong, the fix would be just change ctr array size to 1. For each function, there should be at least one kind of counters -- see the assertion in build_fn_info_type. There are some code that do 'sizeof(gcov_fn_info)' when computing heap size -- they can be adjusted or leave it as it is (if not doing memcpy for the whole array). David On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote: I wonder why. The struct definition for gcov_fn_info has not changed in this patch. Perhaps it has been used only in C until now? Jakub
Re: [Patch, avr] Add atmel ata5782 and ata5831 devices
2014-07-11 12:50 GMT+04:00 S, Pitchumani pitchuman...@atmel.com: This patch adds supports for Atmel's ata5782 and ata5831 devices. These devices ISA matches with AVR5 architecture except that text section start for these two devices is 0x8000. By default 0x0 is used as the text start address. This patch adds a device attribute for text start and that is passed to Linker via -Ttext option. If OK, could someone commit please? Regards, Pitchumani 2014-07-11 Pitchumani Sivanupandi pitchuman...@atmel.com * config/avr/avr-arch.h (avr_mcu_t): Add text section start attribute. * config/avr/avr-devices.c (AVR_MCU): Same. (avr_mcu_types): add text start value to end of device list. * config/avr/avr-mcus.def: Add text section start for all devices. (ata5782): Add new avr5 device. (ata5831): Same. * config/avr/avr-tables.opt: Regenerate. * config/avr/avr.h: Add declaration for text section start handler. (EXTRA_SPEC_FUNCTIONS): Add text section start handler to SPEC functions. (LINK_SPEC): Include text section start handler to linker spec. * config/avr/driver-avr.c (avr_device_to_text_start): New function to pass -Ttext option to linker if the text section start for the device is not zero. * config/avr/t-multilib: Regenerate. * doc/avr-mmcu.texi: Regenerate. Committed. Denis.
Re: [patch 1/4] change specific int128 - generic intN
PSImode is 20 bits, fits in a 20 bit register, and uses 20 bit operations. Then why do you need this change? Because parts of the gcc code use the byte size instead of the bit size, or round up, or assume powers-of-two sizes. - TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type))); + TYPE_SIZE (type) = bitsize_int (GET_MODE_PRECISION (TYPE_MODE (type))); TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type))); break; What are GET_MODE_BITSIZE and GET_MODE_PRECISION for PSImode? It *should* be 20 and 20 for msp430. But GET_MODE_BITSIZE returns 32, because it's a macro that does GET_MODE_SIZE * BITS_PER_UNIT, so it cannot return 20. If a type is 17-20 bits, PSImode is chosen. If it's 21 bits or larger, SImode is chosen. If it's 16 or fewer bits, HImode is chosen. Size or precision? That's the crux of the matter. GCC typically uses size for fits in a tests.
Re: [BUILDROBOT] gcov patch
On Fri, Jul 11, 2014 at 8:06 AM, Jan Hubicka hubi...@ucw.cz wrote: Sorry. This code meant to work with the different mkdir api in windows. I used wrong ifdef. Here is the patch. OK for checkin? OK. I also see the following with LTO bootstrap: ../../gcc/../libgcc/libgcov-util.c:41:24: error: type of �gcov_max_filename� does not match original declaration [-Werror] extern gcov_unsigned_t gcov_max_filename; ^ ../../gcc/../libgcc/libgcov-driver.c:88:8: note: previously declared here size_t gcov_max_filename = 0; Probably both can be size_t? OK. I will change this too and submit. Thanks for the quick review. -Rong Honza Thanks, -Rong 2014-07-11 Rong Xu x...@google.com * gcov-tool.c (gcov_output_files): Fix build error. Index: gcov-tool.c === --- gcov-tool.c (revision 212448) +++ gcov-tool.c (working copy) @@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in /* Try to make directory if it doesn't already exist. */ if (access (out, F_OK) == -1) { -#ifdef TARGET_POSIX_IO - if (mkdir (out, 0755) == -1 errno != EEXIST) +#if !defined(_WIN32) + if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1 errno != EEXIST) Sounds almost like something we could have libiberty glue for... #else if (mkdir (out) == -1 errno != EEXIST) #endif
Re: [PATCH] offline gcda profile processing tool
I should use _WIN32 macro. This code is for windows mkdir api. I'll commit a patch for this shortly (approved by honza). -Rong On Fri, Jul 11, 2014 at 8:49 AM, Xinliang David Li davi...@google.com wrote: What is the macro to test POSIX IO on host? The guard uses TARGET_POSIX_IO which is not right. David On Fri, Jul 11, 2014 at 1:12 AM, Christophe Lyon christophe.l...@linaro.org wrote: On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote: On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote: Here is the updated patch. The difference with patch set 3 is (1) use the gcov-counter.def for scaling operation. (2) fix wrong scaling function for time-profiler. passed with bootstrap, profiledboostrap and SPEC2006. One of the patches breaks bootstrap for me: /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO C++ forbids zero-size array ‘ctrs’ make[3]: *** [libgcov-util.o] Error 1 host compiler is gcc 4.3.4 Richard. On my side, commit 212448 breaks the cross-build of GCC for targets using newlib: * arm-none-eabi * aarch64-none-elf /usr/include/sys/stat.h: In function 8098void gcov_output_files(const char*, gcov_info*)8099: /usr/include/sys/stat.h:317: error: too few arguments to function 8098int mkdir(const char*, __mode_t)8099 /tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96: error: at this point in file make[2]: *** [gcov-tool.o] Error 1 Christophe. Thanks, -Rong On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote: Hi, Honza, Please find the new patch in the attachment. This patch integrated your earlier suggestions. The noticeable changes are: (1) build specialized object for libgcov-driver.c and libgcov-merge.c and link into gcov-tool, rather including the source file. (2) split some gcov counter definition code to gcov-counter.def to avoid code duplication. (3) use a macro for gcov_read_counter(), so gcov-tool can use existing code in libgcov-merge.c with minimal change. This patch does not address the suggestion of creating a new directory for libgcov. I agree with you that's a much better and cleaner structure we should go for. We can do that in follow-up patches. I tested this patch with boostrap and profiledbootstrap. Other tests are on-going. Thanks, -Rong On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote: GCOT_TOOL needs to use this function to read the string in gcda file to memory to construct gcov_info objects. As you noticed, gcov runtime does not need this interface. But gcov-tool links with gcov runtime and it also uses the function. We could make it available in gcov_runtime, but that will slightly increase the memory footprint. Yep, it is not really pretty. I wrote bellow some plan how things may be structured in more convenient way. Any work in that direction would be welcome. The planned new functions for trunk version is: (1) overlap score b/w two profiles (2) better dumping (top hot objects/function/counters) and statistics. Once this basic version is in, we can start to add the new functionality. Sounds good. I assume the autoFDO does not go via gcov tool but rather uses custom reader of profile data at GCC side? I wonder, are there any recent overview papers/slides/text of design of AutoFDO and other features to be merged? I remember the talk from GCC Summit and I did read some of pre-LTO LIPO sources papers, but it would be nice to have somethin up to date. libgcov-util.o is built in gcc/ directory, rather in libgcc. It's directly linked to gcov-tool. So libgcov-util.o is built for HOST, not TARGET. With makefile changes, we can built HOST version of libgcov-driver.o and libgcov-merge.o. I also need to make some static functions/variables public. I suppose that can go with IN_GCOV_TOOL ifdef. So we currently have basic gcov io handling in gcc/gcov-io.c that is borrowed by libgcc/libgcov* code. We also will get libgcov-util.c in libgcc directory that is actually borrowed by by gcc/gcov-tool.c code. We now have one runtime using STDIO for streaming and kernel has custom version that goes via /proc interface (last time I looked). We added some abstraction into libgcov-interface that is the part that relies on STDIO, partly via gcov-io.c code and now we have in-memory handling code in libgcov-util. I guess it would make most sentse to put all the gcov code into a new directory (libgcov) and make it stand-alone library that can be configured 1) for stdio based runtime as we do not 2) for runtime missing the interface and relyin on user providing it 3) for use within gcov file manipulation tools with reorg of GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces. In a longer term, I would like to make FDO profiling intstrumentation to happen at linktime. For that I need to make the instrumentation code (minimal spaning tree friends) to
Re: [PATCH] offline gcda profile processing tool
I did see the warning in the bootstrap, but it did not exit the build. I thought it was ok. I'll have a patch for this and send for review. -Rong On Fri, Jul 11, 2014 at 9:13 AM, Xinliang David Li davi...@google.com wrote: right. Rong, the fix would be just change ctr array size to 1. For each function, there should be at least one kind of counters -- see the assertion in build_fn_info_type. There are some code that do 'sizeof(gcov_fn_info)' when computing heap size -- they can be adjusted or leave it as it is (if not doing memcpy for the whole array). David On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote: I wonder why. The struct definition for gcov_fn_info has not changed in this patch. Perhaps it has been used only in C until now? Jakub
possible negative patch for gcc/tree-ssa-loop-im.c (or removal of an assert)
Hello, I've been looking through tree-ssa-loop-im.c (while hunting down a bug in the modula-2 front end) and found a curiosity in gcc/tree-ssa-loop-im.c. It seems that there is dead code in function determine_max_movement as mem_ref_in_stmt can never return NULL. static mem_ref_p mem_ref_in_stmt (gimple stmt) { ... gcc_assert (ref != NULL); return ref; } so the patch below could logically be applied as the else statement is currently unreachable. --- tree-ssa-loop-im.c.orig 2014-07-11 16:54:41.0 +0100 +++ tree-ssa-loop-im.c 2014-07-11 16:55:38.0 +0100 @@ -798,21 +798,11 @@ { mem_ref_p ref = mem_ref_in_stmt (stmt); - if (ref) - { - lim_data-max_loop - = outermost_indep_loop (lim_data-max_loop, loop, ref); - if (!lim_data-max_loop) - return false; - } - else - { - if ((val = gimple_vuse (stmt)) != NULL_TREE) - { - if (!add_dependency (val, lim_data, loop, false)) - return false; - } - } + gcc_assert (ref != NULL); + lim_data-max_loop + = outermost_indep_loop (lim_data-max_loop, loop, ref); + if (!lim_data-max_loop) + return false; } lim_data-cost += stmt_cost (stmt); However my question is whether the assert in mem_ref_in_stmt is correct? Since the author of determine_max_movement must have thought ref could be NULL. Anyhow, it seems that either the above patch should be applied or the 'gcc_assert (ref != NULL);' from mem_ref_p should be removed. regards, Gaius
Re: update address taken: don't drop clobbers
On 07/11/14 06:05, Michael Matz wrote: Hi, On Thu, 10 Jul 2014, Jeff Law wrote: The insight to note is, that undefined SSA names should really be coalesced with something (otherwise you lost an optimization opportunity), but it doesn't matter with _what_ each use of the undefined name is coalesced, you can even identify different uses of them with different SSA names (e.g. the LHS of each using stmt). Requires some change in the order things are done in out-of-ssa. The last part is what I hinted might be problematical. If some undefined SSA_NAME appears on the RHS of two PHIs and we want to coalesce that undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those two PHIs must coalesce as well. At least that's my recollection of how all that stuff worked. Only with the usual definition of coalescing (being transitive). For undefined SSA names the definition can be mended. My recollection is the transitive nature of coalescing is baked into our implementation. I'd be happy to be proved wrong :-) Jeff
Patch to fix a bug in LRA inheritance
The following patch fixes a bug in LRA inheritance found on compiling a big file on s390x. Unfortunately, the bug is very hard to reproduce and the test is too big and can not be included. But even if it were included, checking a correct code generation would be a problem too. The bug was in undoing inheritance transformation. Not all inheritance pseudo occurrences were removed which resulted in reading a garbage from memory. The patch was successfully bootstrapped and tested on x86-64. Committed to the trunk as rev. 212464. 2014-07-11 Vladimir Makarov vmaka...@redhat.com * lra-constraints.c (remove_inheritance_pseudos): Process destination pseudo too. Index: lra-constraints.c === --- lra-constraints.c (revision 212460) +++ lra-constraints.c (working copy) @@ -5697,6 +5697,20 @@ remove_inheritance_pseudos (bitmap remov SUBREG_REG (SET_SRC (set)) = SET_SRC (prev_set); else SET_SRC (set) = SET_SRC (prev_set); + /* As we are finishing with processing the insn +here, check the destination too as it might +inheritance pseudo for another pseudo. */ + if (bitmap_bit_p (remove_pseudos, dregno) + bitmap_bit_p (lra_inheritance_pseudos, dregno) + (restore_regno + = lra_reg_info[dregno].restore_regno) = 0) + { + if (GET_CODE (SET_DEST (set)) == SUBREG) + SUBREG_REG (SET_DEST (set)) + = regno_reg_rtx[restore_regno]; + else + SET_DEST (set) = regno_reg_rtx[restore_regno]; + } lra_push_insn_and_update_insn_regno_info (curr_insn); lra_set_used_insn_alternative_by_uid (INSN_UID (curr_insn), -1);
Re: possible negative patch for gcc/tree-ssa-loop-im.c (or removal of an assert)
Gaius Mulley gaius.mul...@southwales.ac.uk writes: Hello, I've been looking through tree-ssa-loop-im.c (while hunting down a bug in the modula-2 front end) and found a curiosity in gcc/tree-ssa-loop-im.c. It seems that there is dead code in function determine_max_movement as mem_ref_in_stmt can never return NULL. static mem_ref_p mem_ref_in_stmt (gimple stmt) { ... gcc_assert (ref != NULL); return ref; } so the patch below could logically be applied as the else statement is currently unreachable. --- tree-ssa-loop-im.c.orig 2014-07-11 16:54:41.0 +0100 +++ tree-ssa-loop-im.c2014-07-11 16:55:38.0 +0100 @@ -798,21 +798,11 @@ { mem_ref_p ref = mem_ref_in_stmt (stmt); - if (ref) - { - lim_data-max_loop - = outermost_indep_loop (lim_data-max_loop, loop, ref); - if (!lim_data-max_loop) - return false; - } - else - { - if ((val = gimple_vuse (stmt)) != NULL_TREE) - { - if (!add_dependency (val, lim_data, loop, false)) - return false; - } - } + gcc_assert (ref != NULL); + lim_data-max_loop + = outermost_indep_loop (lim_data-max_loop, loop, ref); + if (!lim_data-max_loop) + return false; } lim_data-cost += stmt_cost (stmt); However my question is whether the assert in mem_ref_in_stmt is correct? Since the author of determine_max_movement must have thought ref could be NULL. Anyhow, it seems that either the above patch should be applied or the 'gcc_assert (ref != NULL);' from mem_ref_p should be removed. apologies, this should read: Anyhow, it seems that either the above patch should be applied or the 'gcc_assert (ref != NULL);' from mem_ref_in_stmt should be removed. regards, Gaius
Fix ICE in ipa-devirt while building firefox
Hi, Firefox build ICEs in ipa-devirt (types_same_for_odr) not being able to establish ODR equivalency for non-polymorphic types. This is because ipa-prop and ipa-cp does call get_binfo_at_offset where it really wants to propagate on types and offsets. Before this is fixed this patch avoids the ICE by not doing the propagation when we don't know what to do. Bootstrapped/regtested x86_64-linux, comitted. Honza * ipa-prop.c (ipa_binfo_from_known_type_jfunc): In LTO do not walk non-polymorphic types. * ipa-cp.c (ipa_get_jf_ancestor_result): Likewise. * ipa-devirt.c (types_same_for_odr): Do not explode when one of types is not polymorphic. Index: ipa-prop.c === --- ipa-prop.c (revision 212457) +++ ipa-prop.c (working copy) @@ -560,6 +560,19 @@ ipa_binfo_from_known_type_jfunc (struct if (!base_binfo) return NULL_TREE; + /* FIXME: At LTO we can't propagate to non-polymorphic type, because + we have no ODR equivalency on those. This should be fixed by + propagating on types rather than binfos that would make type + matching here unnecesary. */ + if (in_lto_p + (TREE_CODE (jfunc-value.known_type.component_type) != RECORD_TYPE + || !TYPE_BINFO (jfunc-value.known_type.component_type) + || !BINFO_VTABLE (TYPE_BINFO (jfunc-value.known_type.component_type +{ + if (!jfunc-value.known_type.offset) + return base_binfo; + return NULL; +} return get_binfo_at_offset (base_binfo, jfunc-value.known_type.offset, jfunc-value.known_type.component_type); Index: ipa-cp.c === --- ipa-cp.c(revision 212457) +++ ipa-cp.c(working copy) @@ -789,6 +789,19 @@ ipa_get_jf_ancestor_result (struct ipa_j { if (!ipa_get_jf_ancestor_type_preserved (jfunc)) return NULL; + /* FIXME: At LTO we can't propagate to non-polymorphic type, because +we have no ODR equivalency on those. This should be fixed by +propagating on types rather than binfos that would make type +matching here unnecesary. */ + if (in_lto_p + (TREE_CODE (ipa_get_jf_ancestor_type (jfunc)) != RECORD_TYPE + || !TYPE_BINFO (ipa_get_jf_ancestor_type (jfunc)) + || !BINFO_VTABLE (TYPE_BINFO (ipa_get_jf_ancestor_type (jfunc) + { + if (!ipa_get_jf_ancestor_offset (jfunc)) + return input; + return NULL; + } return get_binfo_at_offset (input, ipa_get_jf_ancestor_offset (jfunc), ipa_get_jf_ancestor_type (jfunc)); Index: ipa-devirt.c === --- ipa-devirt.c(revision 212457) +++ ipa-devirt.c(working copy) @@ -341,6 +341,20 @@ types_same_for_odr (const_tree type1, co || type_in_anonymous_namespace_p (type2)) return false; + /* See if types are obvoiusly different (i.e. different codes + or polymorphis wrt non-polymorphic). This is not strictly correct + for ODR violating programs, but we can't do better without streaming + ODR names. */ + if (TREE_CODE (type1) != TREE_CODE (type2)) +return false; + if (TREE_CODE (type1) == RECORD_TYPE + (TYPE_BINFO (type1) == NULL_TREE) != (TYPE_BINFO (type1) == NULL_TREE)) +return false; + if (TREE_CODE (type1) == RECORD_TYPE TYPE_BINFO (type1) + (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE) +!= (BINFO_VTABLE (TYPE_BINFO (type2)) == NULL_TREE)) +return false; + /* At the moment we have no way to establish ODR equivlaence at LTO other than comparing virtual table pointrs of polymorphic types. Eventually we should start saving mangled names in TYPE_NAME.
Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations
Ping. On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam tmsri...@google.com wrote: Hi Uros, Could you please review this patch? Thanks Sri On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam tmsri...@google.com wrote: Patch Updated. Sri On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam tmsri...@google.com wrote: Optimize access to globals with -fpie, x86_64 only: Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the module using the GOT. This is two instructions, one to get the address of the global from the GOT and the other to get the value. If it turns out that the global gets defined in the executable at link-time, it still needs to go through the GOT as it is too late then to generate a direct access. Examples: foo.cc -- int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code directly accesses the global via PC-relative insn: 5e0 main: mov0x165a(%rip),%eax# 1c40 a_glob foo.cc -- extern int a_glob; int main () { return a_glob; // defined in this file } With -O2 -fpie -pie, the generated code accesses global via GOT using two memory loads: 6f0 main: mov0x1609(%rip),%rax # 1d00 _DYNAMIC+0x230 mov(%rax),%eax This is true even if in the latter case the global was defined in the executable through a different file. Some experiments on google benchmarks shows that the extra memory loads affects performance by 1% to 5%. Solution - Copy Relocations: When the linker supports copy relocations, GCC can always assume that the global will be defined in the executable. For globals that are truly extern (come from shared objects), the linker will create copy relocations and have them defined in the executable. Result is that no global access needs to go through the GOT and hence improves performance. This patch to the gold linker : https://sourceware.org/ml/binutils/2014-05/msg00092.html submitted recently allows gold to generate copy relocations for -pie mode when necessary. I have added option -mld-pie-copyrelocs which when combined with -fpie would do this. Note that the BFD linker does not support pie copyrelocs yet and this option cannot be used there. Please review. ChangeLog: * config/i386/i36.opt (mld-pie-copyrelocs): New option. * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this address is still legitimate in the presence of copy relocations and -fpie. * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test. * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test. Patch attached. Thanks Sri
Re: Use separate sections to stream non-trivial constructors
Hi, this is the variant of patch I comitted. I noticed that partitioning actually calls ctor_for_folding just to figure out if the constant value may be used that drags in every readonly variable ctor into memory at WPA. So now we have separate predicate varpool_ctor_useable_for_folding_p to check if ctor can be used and ctor_for_folding to drag in the actual value. This actually happens too late to show in LTO report. In followup patch I will add timevar for the delayed streaming so we have some idea how often it triggers and how much memory it takes. Bootstrapped/regtested x86_64-linux and comitted. Honza * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h (varpool_get_constructor): New function. (varpool_ctor_useable_for_folding_p): Break out from ... (ctor_for_folding): ... here; use varpool_get_constructor. (varpool_assemble_decl): Likewise. * lto-streamer.h (struct output_block): Turn cgraph_node to symbol filed. (lto_input_variable_constructor): Declare. * ipa-visibility.c (function_and_variable_visibility): Use varpool_get_constructor. * cgraph.h (varpool_get_constructor): Declare. (varpool_ctor_useable_for_folding_p): New function. * lto-streamer-out.c (get_symbol_initial_value): Take encoder parameter; return error_mark_node for non-trivial constructors. (lto_write_tree_1, DFS_write_tree): UPdate use of get_symbol_initial_value. (output_function): Update initialization of symbol. (output_constructor): New function. (copy_function): Rename to .. (copy_function_or_variable): ... this one; handle vars too. (lto_output): Output variable sections. * lto-streamer-in.c (input_constructor): New function. (lto_read_body): Rename from ... (lto_read_body_or_constructor): ... this one; handle vars too. (lto_input_variable_constructor): New function. * ipa-prop.c (ipa_prop_write_jump_functions, ipa_prop_write_all_agg_replacement): Update. * lto-cgraph.c (compute_ltrans_boundary): Use it. (output_cgraph_opt_summary): Set symbol to NULL. * lto-partition.c (add_references_to_partition): Use varpool_ctor_useable_for_folding_p. * lto.c (lto_read_in_decl_state): Update sanity check. Index: ipa-visibility.c === --- ipa-visibility.c(revision 212457) +++ ipa-visibility.c(working copy) @@ -686,6 +686,8 @@ function_and_variable_visibility (bool w if (found) { struct pointer_set_t *visited_nodes = pointer_set_create (); + + varpool_get_constructor (vnode); walk_tree (DECL_INITIAL (vnode-decl), update_vtable_references, NULL, visited_nodes); pointer_set_destroy (visited_nodes); Index: lto/lto.c === --- lto/lto.c (revision 212457) +++ lto/lto.c (working copy) @@ -236,7 +236,7 @@ lto_read_in_decl_state (struct data_in * ix = *data++; decl = streamer_tree_cache_get_tree (data_in-reader_cache, ix); - if (TREE_CODE (decl) != FUNCTION_DECL) + if (!VAR_OR_FUNCTION_DECL_P (decl)) { gcc_assert (decl == void_type_node); decl = NULL_TREE; Index: lto/lto-partition.c === --- lto/lto-partition.c (revision 212457) +++ lto/lto-partition.c (working copy) @@ -96,7 +96,7 @@ add_references_to_partition (ltrans_part Recursively look into the initializers of the constant variable and add references, too. */ else if (is_a varpool_node * (ref-referred) - ctor_for_folding (ref-referred-decl) != error_mark_node + varpool_ctor_useable_for_folding_p (varpool (ref-referred)) !lto_symtab_encoder_in_partition_p (part-encoder, ref-referred)) { if (!part-initializers_visited) Index: lto-cgraph.c === --- lto-cgraph.c(revision 212457) +++ lto-cgraph.c(working copy) @@ -867,7 +867,7 @@ compute_ltrans_boundary (lto_symtab_enco { if (!lto_symtab_encoder_encode_initializer_p (encoder, vnode) - ctor_for_folding (vnode-decl) != error_mark_node) + varpool_ctor_useable_for_folding_p (vnode)) { lto_set_symtab_encoder_encode_initializer (encoder, vnode); add_references (encoder, vnode); @@ -1808,7 +1808,7 @@ output_cgraph_opt_summary (void) struct output_block *ob = create_output_block (LTO_section_cgraph_opt_sum); unsigned count = 0; - ob-cgraph_node = NULL; + ob-symbol = NULL; encoder = ob-decl_state-symtab_node_encoder; n_nodes =
Re: [C++ Patch/RFC] Back to PR 53159
On 07/10/2014 06:38 PM, Paolo Carlini wrote: if (SCALAR_TYPE_P (type)) + if (!CLASS_TYPE_P (TREE_TYPE (init))) Why this change? check_narrowing only deals with scalar types. Jason
Re: [C++ Patch] PR 53159 (Take 2)
Oops, this patch didn't thread with the earlier one... On 07/11/2014 06:15 AM, Paolo Carlini wrote: if (SCALAR_TYPE_P (type)) Is the condition still necessary? - check_narrowing (type, init); + flags |= LOOKUP_NO_NARROWING; Jason
Re: [C++ Patch] PR 53159 (Take 2)
Hi, On 07/11/2014 07:58 PM, Jason Merrill wrote: Oops, this patch didn't thread with the earlier one... Yeah, sorry, I considered my drafts of yesterday evening a distraction and wanted to start a fresh thread with a complete (more meaningful) new proposal. Probably not a good idea, after all, confusing... On 07/11/2014 06:15 AM, Paolo Carlini wrote: if (SCALAR_TYPE_P (type)) Is the condition still necessary? I'm removing it and retesting. In any case (the eventual) check_narrowing will return immediately if !ARITHMETIC_TYPE_P (type) is true... Would the amended patch be Ok? Thanks! Paolo.
Re: [PATCH] offline gcda profile processing tool
Richard, I looked at my patch again. I already add -Wno-error to libgcov-util.o compilation: In line 200 of gcc/Makefile.in libgcov-util.o-warn = -Wno-error In my test, I used gcc-4.6 as the host compiler. I got warning like this: In file included from ../../gcc/gcc/../libgcc/libgcov-util.c:30:0: ../../gcc/gcc/../libgcc/libgcov.h:184:30: warning: ISO C++ forbids zero-size array ‘ctrs’ [-pedantic] Can you check your buildlog to see if -Wno-error is added to the command line? Thanks, -Rong On Fri, Jul 11, 2014 at 9:47 AM, Rong Xu x...@google.com wrote: I did see the warning in the bootstrap, but it did not exit the build. I thought it was ok. I'll have a patch for this and send for review. -Rong On Fri, Jul 11, 2014 at 9:13 AM, Xinliang David Li davi...@google.com wrote: right. Rong, the fix would be just change ctr array size to 1. For each function, there should be at least one kind of counters -- see the assertion in build_fn_info_type. There are some code that do 'sizeof(gcov_fn_info)' when computing heap size -- they can be adjusted or leave it as it is (if not doing memcpy for the whole array). David On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote: I wonder why. The struct definition for gcov_fn_info has not changed in this patch. Perhaps it has been used only in C until now? Jakub
Re: Use separate sections to stream non-trivial constructors
Hi this is patch i am going to commit after testing. It removes DECL_INIT_IO timevar that guards only one variable set (so hardly measure anything) and moves GIMPLE_IN to proper place. It also adds CTORS_IN and CTORS_OUT. I get: ipa lto gimple out : 0.37 ( 0%) usr 0.21 ( 3%) sys 0.64 ( 1%) wall 0 kB ( 0%) ggc ipa lto decl in : 23.56 (26%) usr 1.24 (15%) sys 24.81 (23%) wall 2429174 kB (60%) ggc ipa lto decl out: 5.58 ( 6%) usr 0.34 ( 4%) sys 5.94 ( 5%) wall 0 kB ( 0%) ggc ipa lto constructors in : 0.34 ( 0%) usr 0.10 ( 1%) sys 0.47 ( 0%) wall 14864 kB ( 0%) ggc ipa lto constructors out: 0.06 ( 0%) usr 0.01 ( 0%) sys 0.01 ( 0%) wall 0 kB ( 0%) ggc ipa lto cgraph I/O : 1.20 ( 1%) usr 0.25 ( 3%) sys 1.45 ( 1%) wall 437317 kB (11%) ggc for Firefox WPA that is surprisingly good. We traded about 400MB for 14MB. So perhaps I do not need to care about not bringing in all vtables needed for devirt machinery. honza * lto.c (read_cgraph_and_symbols): Do not push DECL_INIT_IO timevar (materialize_cgraph): Do not push GIMPLE_IN timevar. * timevar.def (TV_IPA_LTO_DECL_INIT_IO): Remove. (TV_IPA_LTO_CTORS_IN, TV_IPA_LTO_CTORS_OUT): New timevar. * cgraph.c (cgraph_get_body): Push GIMPLE_IN timevar. (varpool_get_constructor): Push CTORS_IN timevar. * lto-streamer-out.c (lto_output): Push TV_IPA_LTO_CTORS_OUT timevar. Index: lto/lto.c === --- lto/lto.c (revision 212467) +++ lto/lto.c (working copy) @@ -3094,12 +3094,9 @@ read_cgraph_and_symbols (unsigned nfiles timevar_pop (TV_IPA_LTO_CGRAPH_MERGE); - timevar_push (TV_IPA_LTO_DECL_INIT_IO); - /* Indicate that the cgraph is built and ready. */ cgraph_function_flags_ready = true; - timevar_pop (TV_IPA_LTO_DECL_INIT_IO); ggc_free (all_file_decl_data); all_file_decl_data = NULL; } @@ -3117,9 +3114,6 @@ materialize_cgraph (void) fprintf (stderr, flag_wpa ? Materializing decls: : Reading function bodies:); - /* Now that we have input the cgraph, we need to clear all of the aux - nodes and read the functions if we are not running in WPA mode. */ - timevar_push (TV_IPA_LTO_GIMPLE_IN); FOR_EACH_FUNCTION (node) { @@ -3130,7 +3124,6 @@ materialize_cgraph (void) } } - timevar_pop (TV_IPA_LTO_GIMPLE_IN); /* Start the appropriate timer depending on the mode that we are operating in. */ Index: timevar.def === --- timevar.def (revision 212457) +++ timevar.def (working copy) @@ -77,7 +77,8 @@ DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IN , DEFTIMEVAR (TV_IPA_LTO_GIMPLE_OUT, ipa lto gimple out) DEFTIMEVAR (TV_IPA_LTO_DECL_IN , ipa lto decl in) DEFTIMEVAR (TV_IPA_LTO_DECL_OUT , ipa lto decl out) -DEFTIMEVAR (TV_IPA_LTO_DECL_INIT_IO , ipa lto decl init I/O) +DEFTIMEVAR (TV_IPA_LTO_CTORS_IN , ipa lto constructors in) +DEFTIMEVAR (TV_IPA_LTO_CTORS_OUT , ipa lto constructors out) DEFTIMEVAR (TV_IPA_LTO_CGRAPH_IO , ipa lto cgraph I/O) DEFTIMEVAR (TV_IPA_LTO_DECL_MERGE, ipa lto decl merge) DEFTIMEVAR (TV_IPA_LTO_CGRAPH_MERGE , ipa lto cgraph merge) Index: cgraph.c === --- cgraph.c(revision 212457) +++ cgraph.c(working copy) @@ -3053,6 +3053,8 @@ cgraph_get_body (struct cgraph_node *nod gcc_assert (in_lto_p); + timevar_push (TV_IPA_LTO_GIMPLE_IN); + file_data = node-lto_file_data; name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); @@ -3076,6 +3078,9 @@ cgraph_get_body (struct cgraph_node *nod lto_free_section_data (file_data, LTO_section_function_body, name, data, len); lto_free_function_in_decl_state_for_node (node); + + timevar_pop (TV_IPA_LTO_GIMPLE_IN); + return true; } Index: varpool.c === --- varpool.c (revision 212467) +++ varpool.c (working copy) @@ -268,6 +268,8 @@ varpool_get_constructor (struct varpool_ || !in_lto_p) return DECL_INITIAL (node-decl); + timevar_push (TV_IPA_LTO_CTORS_IN); + file_data = node-lto_file_data; name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl)); @@ -286,6 +288,7 @@ varpool_get_constructor (struct varpool_ lto_free_section_data (file_data, LTO_section_function_body, name, data, len); lto_free_function_in_decl_state_for_node (node); + timevar_pop (TV_IPA_LTO_CTORS_IN); return DECL_INITIAL (node-decl); } Index: lto-streamer-out.c === --- lto-streamer-out.c (revision 212467) +++ lto-streamer-out.c (working copy) @@ -2115,6 +2115,7 @@ lto_output (void) lto_symtab_encoder_encode_initializer_p (encoder, node)