Re: [google][4.6] Bug fixes to function reordering linker plugin to handle local and comdat functions. (issue 5851044)
Committed to google/gcc-4_6 after validation. On 2012/03/21 05:07:33, davidxl wrote: ok for google branches after checkin validation. David http://codereview.appspot.com/5851044/
Re: [google][4.6] Bug fixes to function reordering linker plugin to handle local and comdat functions. (issue 5851044)
Uploaded new patch. On 2012/03/20 19:25:38, davidxl wrote: It would be nice to add some unit/regression test cases of some sort. Made the existing unit test case check the final layout. David http://codereview.appspot.com/5851044/diff/1/callgraph.c File callgraph.c (right): http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode309 callgraph.c:309: if (!is_prefix_of (_ZL, name)) How about static functions in namespace? How about functions in anonymous namespace? Thanks for pointing this out. One solution is to add more plugin interfaces to plugin-api.h to find the section flags and the section group. This way, comdats can be detected. All other duplicates can be treated as file static functions. I marked this as TODO for now. I am also thinking of sending a patch to generate unique section names for file static functions. This will help --section-ordering-file in gold too. http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode511 callgraph.c:511: .text. }; How are the sections ordered in the array? Keep it in mind that it is possible to encode the actual profile count of the function in the section name in the future. Right, for now this parsing will work. The parsing needs to be updated once the section names change. http://codereview.appspot.com/5851044/
Re: [google][4.6] Bug fixes to function reordering linker plugin to handle local and comdat functions. (issue 5851044)
On 2012/03/20 19:26:26, eraman wrote: http://codereview.appspot.com/5851044/diff/1/callgraph.c File callgraph.c (right): http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode513 callgraph.c:513: const int section_priority[] = {0, 3, 4, 2, 1}; Add a comment about section_priority Done. http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode571 callgraph.c:571: if (section_priority[kept-section_type] Add an example that shows why we want to do that I added a comment. http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode655 callgraph.c:655: write_out_node (n_it-name, section_start[0], section_end[0]); In write_out_node, why take the function name and do a hash table lookup to get the section, instead of directly passing Section_id * in the caller. In all calls to write_out_node, you are in fact getting the name from the Section_id *. That is not true, I pass the Node * in the first two calls. What you said holds for the other calls. So, I made this more efficient. http://codereview.appspot.com/5851044/diff/1/callgraph.c#newcode674 callgraph.c:674: s_it-processed = 1; setting processed to 1 is redundant as it is already done in write_out_node. Done. http://codereview.appspot.com/5851044/diff/1/callgraph.h File callgraph.h (right): http://codereview.appspot.com/5851044/diff/1/callgraph.h#newcode31 callgraph.h:31: void push_mm_ptr (void *ptr); push_allocated_ptr or save_allocated_ptr would be a better name. Done. http://codereview.appspot.com/5851044/diff/1/callgraph.h#newcode48 callgraph.h:48: push_mm_ptr (list); It might be cleaner to create a wrapper around XNEW that calls push_mm_ptr after XNEW. Similar for malloc as well. Done. http://codereview.appspot.com/5851044/
Re: [google][4.6]Add new target builtin to check for amdfam15h processors (issue 5535046)
New patch uploaded with the changes. * i386.c (IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1): New enum value. (IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2): New enum value. (IX86_BUILTIN_CPU_IS_AMDFAM10H): Rename IX86_BUILTIN_CPU_IS_AMDFAM10. (IX86_BUILTIN_CPU_IS_AMDFAM10H_BARCELONA): Rename IX86_BUILTIN_CPU_IS_AMDFAM10_BARCELONA. (IX86_BUILTIN_CPU_IS_AMDFAM10H_SHANGHAI): Rename IX86_BUILTIN_CPU_IS_AMDFAM10_SHANGHAI. (IX86_BUILTIN_CPU_IS_AMDFAM10H_ISTANBUL): Rename IX86_BUILTIN_CPU_IS_AMDFAM10_ISTANBUL. (fold_builtin_cpu): Process IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER1 and IX86_BUILTIN_CPU_IS_AMDFAM15H_BDVER2. (M_AMDFAM10H): Rename M_AMDFAM10. (M_AMDFAM10H_BARCELONA): Rename M_AMDFAM10_BARCELONA. (M_AMDFAM10H_SHANGHAI): Rename M_AMDFAM10_SHANGHAI. (M_AMDFAM10H_ISTANBUL): Rename M_AMDFAM10_ISTANBUL. (ix86_init_platform_type_builtins): Make new builtins __builtin_cpu_is_amdfam15_bdver1 and __builtin_cpu_is_amdfam15_bdver2. (ix86_expand_builtin): Expand IX86_BUILTIN_CPU_IS_AMDFAM15_BDVER1 and IX86_BUILTIN_CPU_IS_AMDFAM15_BDVER2. * testsuite/gcc.target/builtin_target.c (fn1): Call new builtins. * i386-cpuinfo.c (__cpu_is_amdfam15h_bdver1): New member in __cpu_model struct. (__cpu_is_amdfam15h_bdver2): New member in __cpu_model struct. (__cpu_model): Rename __cpu_is_amdfam10 to __cpu_is_amdfam10h. Rename __cpu_is_amdfam10_barcelona to __cpu_is_amdfam10h_barcelona. Rename __cpu_is_amdfam10_shanghai to __cpu_is_amdfam10h_shanghai. Rename __cpu_is_amdfam10_istanbul to __cpu_is_amdfam10h_istanbul. (get_amd_cpu): Check for family 15h processors. (cpu_indicator_init): Adjust model and family for AMD processors. Refactor code. Thanks, -Sri. On 2012/01/11 00:05:01, davidxl wrote: http://codereview.appspot.com/5535046/diff/1/gcc/config/i386/i386.c File gcc/config/i386/i386.c (right): http://codereview.appspot.com/5535046/diff/1/gcc/config/i386/i386.c#newcode26032 gcc/config/i386/i386.c:26032: +M_AMDFAM15, Maybe better to change 10 to 10H, and 15 to 15H in the name as the number is hex. Why not split the enum for family 15h into M_AMDFAM15H_BDVER1 and .._BVER2? http://codereview.appspot.com/5535046/diff/1/libgcc/config/i386/i386-cpuinfo.c File libgcc/config/i386/i386-cpuinfo.c (right): http://codereview.appspot.com/5535046/diff/1/libgcc/config/i386/i386-cpuinfo.c#newcode102 libgcc/config/i386/i386-cpuinfo.c:102: break; No family15h model encoding? http://codereview.appspot.com/5535046/
Re: [google][4.6]Compiler Directed Multiversioning with new -mvarch option (issue 5490054)
I have uploaded a new patch set with all the mentioned changes made. If a function has the target attribute it will not be touched by the autoclone pass. Also fixed some test cases which were broken because the clone names used '_' instead of '.' for suffixing. On 2011/12/16 19:39:47, davidxl wrote: http://codereview.appspot.com/5490054/diff/1011/config/i386/i386.c File config/i386/i386.c (right): http://codereview.appspot.com/5490054/diff/1011/config/i386/i386.c#newcode26569 config/i386/i386.c:26569: +mversion_for_core2 (tree *optimization_node, - mversionable_for_core2_p ? http://codereview.appspot.com/5490054/diff/1011/mversn-dispatch.c File mversn-dispatch.c (right): http://codereview.appspot.com/5490054/diff/1011/mversn-dispatch.c#newcode931 mversn-dispatch.c:931: DECL_STATIC_DESTRUCTOR (new_decl) = 0; Should you assert it instead? Should not clone ctor/dtors. http://codereview.appspot.com/5490054/diff/1011/mversn-dispatch.c#newcode2221 mversn-dispatch.c:2221: VEC_truncate (edge, EXIT_BLOCK_PTR-preds, 0); {} -- remove http://codereview.appspot.com/5490054/diff/1011/mversn-dispatch.c#newcode2389 mversn-dispatch.c:2389: How does it interact with manual multi-versioning from user? You probably don't want to clone functions that are marked with target attributes (explicitly -- not implied from command line). http://codereview.appspot.com/5490054/
Re: [4.6][google]Compiler Directed Function Multiversioning (issue 5477048)
Reviewers: davidxl, Message: I have uploaded a new patch. I did not split the patch into two because I felt the ifunc dispatch patch does nothing as a standalone. The framework now will do aggressive cloning and it will mark the clones specialized for core2 with -mtune=core2. I have already submitted another patch that will optimize vectorization for core2. I also tested Teresa's RAT stalls avoidance in my MV framework and it works great. I removed that from the current patch as I thought that should go in as a separate patch. Pending: Test case. On 2011/12/09 07:49:57, davidxl wrote: It might be better to split this into two patches: 1) ifunc dispatch patch 2) auto cloning patch The auto cloning framework is good in general, the main problem potential disconnection of early cloning decisions and later transformations. the logic of deciding if loops are vectorizable is not the same as that used in the actual vectorization pass which can lead to either false positives (unnecessary clones) and false negatives (missing clones of vectorizable loops). The false negatives are entirely possible because later transformations such as if-cvt can expose more opportunities. To alleviate the problem, it would be more powerful if the framework allows more aggressive (speculative) cloning, and undo some of the cloning later when they turn out to be not needed. This is possible with some book keeping. David http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c File config/i386/i386.c (right): http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c#newcode26331 config/i386/i386.c:26331: + THe -- The. http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c#newcode26365 config/i386/i386.c:26365: + tree-vect-stmts.cc. The loop is not vectorizable even if a single stmt is not even if? http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c#newcode26367 config/i386/i386.c:26367: + You need a good explanation here why a common utility function can not be used, but 'duplicate' the check here. http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c#newcode26524 config/i386/i386.c:26524: + if (ix86_mv_arch_string Using hardcoded target name here is not ideal -- better encode this attribute in the target/processor model array. For now better add a comment here. http://codereview.appspot.com/5477048/diff/1/config/i386/i386.c#newcode26544 config/i386/i386.c:26544: + if (*cond_func_decl == NULL) Add a small example in the comment here. Also better use a helper function. http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c File mversn-dispatch.c (right): http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2330 mversn-dispatch.c:2330: static unsigned int do_auto_clone (void) Start a new line http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2371 mversn-dispatch.c:2371: return 0; Fix indentation http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2404 mversn-dispatch.c:2404: NULL, fn_ver_addr_chain); indentation http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2412 mversn-dispatch.c:2412: /* The current function is replaced by a ifunc call to the right version. s/a/an/ http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2424 mversn-dispatch.c:2424: cond_func_addr, fn_ver_addr_chain, Indentation http://codereview.appspot.com/5477048/diff/1/mversn-dispatch.c#newcode2431 mversn-dispatch.c:2431: gimple_set_bb (gsi_stmt (gsi), empty_bb); Fix indentation. Please review this at http://codereview.appspot.com/5477048/ Affected files: M config/i386/i386.c M config/i386/i386.opt M doc/tm.texi M doc/tm.texi.in M mversn-dispatch.c M params.def M passes.c M target.def M tree-pass.h
Re: [PATCH i386][google]With -mtune=core2, avoid generating the slow unaligned vector load/store (issue 5488054)
On 2011/12/14 00:04:10, davidxl wrote: http://codereview.appspot.com/5488054/diff/4002/tree-vect-stmts.c File tree-vect-stmts.c (right): http://codereview.appspot.com/5488054/diff/4002/tree-vect-stmts.c#newcode3712 tree-vect-stmts.c:3712: } The check can be put into a helper function. Fixed. http://codereview.appspot.com/5488054/