Re: [PATCH GCC][4/6]Support regional coalesce and live range computation
On Thu, May 17, 2018 at 5:49 PM Bin.Cheng wrote: > On Thu, May 17, 2018 at 3:04 PM, Richard Biener > wrote: > > On Tue, May 15, 2018 at 5:44 PM Bin.Cheng wrote: > > > >> On Fri, May 11, 2018 at 1:53 PM, Richard Biener > >> wrote: > >> > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng wrote: > >> >> Hi, > >> >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c > > and > >> >> tree-ssa-coalesce.c to compute register pressure, rather than inventing > >> >> another live range solver. > >> >> > >> >> The major change is to record region's basic blocks in var_map and use > > that > >> >> information in computation, rather than FOR_EACH_BB_FN. For now only > > loop > >> >> and function type regions are supported. The default one is function > > type > >> >> region which is used in out-of-ssa. Loop type region will be used in > > next > >> >> patch to compute information for a loop. > >> >> > >> >> Bootstrap and test on x86_64 and AArch64 ongoing. Any comments? > >> > > >> > I believe your changes to create_outofssa_var_map should be done > > differently > >> > by simply only calling it from the coalescing context and passing in the > >> > var_map rather than initializing it therein and returning it. > >> > > >> > This also means the coalesce_vars_p flag in the var_map structure looks > >> > somewhat out-of-place. That is, it looks you could do with many less > >> > changes if you refactored what calls what slightly? For example > >> > the extra arg to gimple_can_coalesce_p looks unneeded. > >> > > >> > Just as a note I do have a CFG helper pending that computes RPO order > >> > for SEME regions (attached). loops are SEME regions, so your RTYPE_SESE > >> > is somewhat odd - I guess RTYPE_LOOP exists only because of the > >> > convenience of passing in a loop * to the "constructor". I'd rather > >> > drop this region_type thing and always assume a SEME region - at least > >> > I didn't see anything in the patch that depends on any of the forms > >> > apart from the initial BB gathering. > > > >> Hi Richard, > > > >> Thanks for reviewing. I refactored tree-ssa-live.c and > >> tree-ssa-coalesce.c following your comments. > >> Basically I did following changes: > >> 1) Remove region_type and only support loop region live range computation. > >> Also I added one boolean field in var_map indicating whether we > >> are computing > >> loop region live range or out-of-ssa. > >> 2) Refactored create_outofssa_var_map into > > create_coalesce_list_for_region and > >> populate_coalesce_list_for_outofssa. Actually the original > >> function name doesn't > >> quite make sense because it has nothing to do with var_map. > >> 3) Hoist init_var_map up in call stack. Now it's called by caller > >> (out-of-ssa or live range) > >> and the returned var_map is passed to coalesce_* stuff. > >> 4) Move global functions to tree-outof-ssa.c and make them static. > >> 5) Save/restore flag_tree_coalesce_vars in order to avoid updating > >> checks on the flag. > > > >> So how is this one? Patch attached. > > > > A lot better. Few things I noticed: > > > > + map->bmp_bbs = BITMAP_ALLOC (NULL); > > > > use a bitmap_head member and bitmap_initialize (). > > > > + map->vec_bbs = new vec (); > > > > use a vec<> member and map->vec_bbs = vNULL; > > > > Both changes remove an unnecessary indirection. > > > > + map->outofssa_p = true; > > + basic_block bb; > > + FOR_EACH_BB_FN (bb, cfun) > > + { > > + bitmap_set_bit (map->bmp_bbs, bb->index); > > + map->vec_bbs->safe_push (bb); > > + } > > > > I think you can avoid populating the bitmap and return > > true unconditionally for outofssa_p in the contains function? > > Ah, you already do - so why populate the bitmap? > > > > +/* Return TRUE if region of the MAP contains basic block BB. */ > > + > > +inline bool > > +region_contains_p (var_map map, basic_block bb) > > +{ > > + if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun) > > + || bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) > > +return false; > > + > > + if (map->outofssa_p) > > +return true; > > + > > + return bitmap_bit_p (map->bmp_bbs, bb->index); > > > > the entry/exit block check should be conditional in map->outofssa_p > > but I think we should never get the function called with those args > > so we can as well use a gcc_checking_assert ()? > > > > I think as followup we should try to get a BB order that > > is more suited for the dataflow problem. Btw, I was > > thinking about adding anoter "visited" BB flag that is guaranteed to > > be unset and free to be used by infrastructure. So the bitmap > > could be elided for a bb flag check (but we need to clear that flag > > at the end of processing). Not sure if it's worth to add a machinery > > to dynamically assign pass-specific flags... it would at least be > > less error prone. Sth to think about. > > > > So -- I think the patch is ok with the two indirections removed, > > the rest can be opt
Re: [PATCH GCC][4/6]Support regional coalesce and live range computation
On Thu, May 17, 2018 at 3:04 PM, Richard Biener wrote: > On Tue, May 15, 2018 at 5:44 PM Bin.Cheng wrote: > >> On Fri, May 11, 2018 at 1:53 PM, Richard Biener >> wrote: >> > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng wrote: >> >> Hi, >> >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c > and >> >> tree-ssa-coalesce.c to compute register pressure, rather than inventing >> >> another live range solver. >> >> >> >> The major change is to record region's basic blocks in var_map and use > that >> >> information in computation, rather than FOR_EACH_BB_FN. For now only > loop >> >> and function type regions are supported. The default one is function > type >> >> region which is used in out-of-ssa. Loop type region will be used in > next >> >> patch to compute information for a loop. >> >> >> >> Bootstrap and test on x86_64 and AArch64 ongoing. Any comments? >> > >> > I believe your changes to create_outofssa_var_map should be done > differently >> > by simply only calling it from the coalescing context and passing in the >> > var_map rather than initializing it therein and returning it. >> > >> > This also means the coalesce_vars_p flag in the var_map structure looks >> > somewhat out-of-place. That is, it looks you could do with many less >> > changes if you refactored what calls what slightly? For example >> > the extra arg to gimple_can_coalesce_p looks unneeded. >> > >> > Just as a note I do have a CFG helper pending that computes RPO order >> > for SEME regions (attached). loops are SEME regions, so your RTYPE_SESE >> > is somewhat odd - I guess RTYPE_LOOP exists only because of the >> > convenience of passing in a loop * to the "constructor". I'd rather >> > drop this region_type thing and always assume a SEME region - at least >> > I didn't see anything in the patch that depends on any of the forms >> > apart from the initial BB gathering. > >> Hi Richard, > >> Thanks for reviewing. I refactored tree-ssa-live.c and >> tree-ssa-coalesce.c following your comments. >> Basically I did following changes: >> 1) Remove region_type and only support loop region live range computation. >> Also I added one boolean field in var_map indicating whether we >> are computing >> loop region live range or out-of-ssa. >> 2) Refactored create_outofssa_var_map into > create_coalesce_list_for_region and >> populate_coalesce_list_for_outofssa. Actually the original >> function name doesn't >> quite make sense because it has nothing to do with var_map. >> 3) Hoist init_var_map up in call stack. Now it's called by caller >> (out-of-ssa or live range) >> and the returned var_map is passed to coalesce_* stuff. >> 4) Move global functions to tree-outof-ssa.c and make them static. >> 5) Save/restore flag_tree_coalesce_vars in order to avoid updating >> checks on the flag. > >> So how is this one? Patch attached. > > A lot better. Few things I noticed: > > + map->bmp_bbs = BITMAP_ALLOC (NULL); > > use a bitmap_head member and bitmap_initialize (). > > + map->vec_bbs = new vec (); > > use a vec<> member and map->vec_bbs = vNULL; > > Both changes remove an unnecessary indirection. > > + map->outofssa_p = true; > + basic_block bb; > + FOR_EACH_BB_FN (bb, cfun) > + { > + bitmap_set_bit (map->bmp_bbs, bb->index); > + map->vec_bbs->safe_push (bb); > + } > > I think you can avoid populating the bitmap and return > true unconditionally for outofssa_p in the contains function? > Ah, you already do - so why populate the bitmap? > > +/* Return TRUE if region of the MAP contains basic block BB. */ > + > +inline bool > +region_contains_p (var_map map, basic_block bb) > +{ > + if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun) > + || bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) > +return false; > + > + if (map->outofssa_p) > +return true; > + > + return bitmap_bit_p (map->bmp_bbs, bb->index); > > the entry/exit block check should be conditional in map->outofssa_p > but I think we should never get the function called with those args > so we can as well use a gcc_checking_assert ()? > > I think as followup we should try to get a BB order that > is more suited for the dataflow problem. Btw, I was > thinking about adding anoter "visited" BB flag that is guaranteed to > be unset and free to be used by infrastructure. So the bitmap > could be elided for a bb flag check (but we need to clear that flag > at the end of processing). Not sure if it's worth to add a machinery > to dynamically assign pass-specific flags... it would at least be > less error prone. Sth to think about. > > So -- I think the patch is ok with the two indirections removed, > the rest can be optimized as followup. Hi, This is the updated patch. I moved checks on ENTRY/EXIT blocks under outofssa_p, as well as changed vec_bbs into object. Note bmp_bbs is kept in pointer so that we can avoid allocating memory in out-of-ssa case. Bootstrap and test ongoing. Is it OK? Thanks, bin
Re: [PATCH GCC][4/6]Support regional coalesce and live range computation
On Tue, May 15, 2018 at 5:44 PM Bin.Cheng wrote: > On Fri, May 11, 2018 at 1:53 PM, Richard Biener > wrote: > > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng wrote: > >> Hi, > >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c and > >> tree-ssa-coalesce.c to compute register pressure, rather than inventing > >> another live range solver. > >> > >> The major change is to record region's basic blocks in var_map and use that > >> information in computation, rather than FOR_EACH_BB_FN. For now only loop > >> and function type regions are supported. The default one is function type > >> region which is used in out-of-ssa. Loop type region will be used in next > >> patch to compute information for a loop. > >> > >> Bootstrap and test on x86_64 and AArch64 ongoing. Any comments? > > > > I believe your changes to create_outofssa_var_map should be done differently > > by simply only calling it from the coalescing context and passing in the > > var_map rather than initializing it therein and returning it. > > > > This also means the coalesce_vars_p flag in the var_map structure looks > > somewhat out-of-place. That is, it looks you could do with many less > > changes if you refactored what calls what slightly? For example > > the extra arg to gimple_can_coalesce_p looks unneeded. > > > > Just as a note I do have a CFG helper pending that computes RPO order > > for SEME regions (attached). loops are SEME regions, so your RTYPE_SESE > > is somewhat odd - I guess RTYPE_LOOP exists only because of the > > convenience of passing in a loop * to the "constructor". I'd rather > > drop this region_type thing and always assume a SEME region - at least > > I didn't see anything in the patch that depends on any of the forms > > apart from the initial BB gathering. > Hi Richard, > Thanks for reviewing. I refactored tree-ssa-live.c and > tree-ssa-coalesce.c following your comments. > Basically I did following changes: > 1) Remove region_type and only support loop region live range computation. > Also I added one boolean field in var_map indicating whether we > are computing > loop region live range or out-of-ssa. > 2) Refactored create_outofssa_var_map into create_coalesce_list_for_region and > populate_coalesce_list_for_outofssa. Actually the original > function name doesn't > quite make sense because it has nothing to do with var_map. > 3) Hoist init_var_map up in call stack. Now it's called by caller > (out-of-ssa or live range) > and the returned var_map is passed to coalesce_* stuff. > 4) Move global functions to tree-outof-ssa.c and make them static. > 5) Save/restore flag_tree_coalesce_vars in order to avoid updating > checks on the flag. > So how is this one? Patch attached. A lot better. Few things I noticed: + map->bmp_bbs = BITMAP_ALLOC (NULL); use a bitmap_head member and bitmap_initialize (). + map->vec_bbs = new vec (); use a vec<> member and map->vec_bbs = vNULL; Both changes remove an unnecessary indirection. + map->outofssa_p = true; + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + { + bitmap_set_bit (map->bmp_bbs, bb->index); + map->vec_bbs->safe_push (bb); + } I think you can avoid populating the bitmap and return true unconditionally for outofssa_p in the contains function? Ah, you already do - so why populate the bitmap? +/* Return TRUE if region of the MAP contains basic block BB. */ + +inline bool +region_contains_p (var_map map, basic_block bb) +{ + if (bb == ENTRY_BLOCK_PTR_FOR_FN (cfun) + || bb == EXIT_BLOCK_PTR_FOR_FN (cfun)) +return false; + + if (map->outofssa_p) +return true; + + return bitmap_bit_p (map->bmp_bbs, bb->index); the entry/exit block check should be conditional in map->outofssa_p but I think we should never get the function called with those args so we can as well use a gcc_checking_assert ()? I think as followup we should try to get a BB order that is more suited for the dataflow problem. Btw, I was thinking about adding anoter "visited" BB flag that is guaranteed to be unset and free to be used by infrastructure. So the bitmap could be elided for a bb flag check (but we need to clear that flag at the end of processing). Not sure if it's worth to add a machinery to dynamically assign pass-specific flags... it would at least be less error prone. Sth to think about. So -- I think the patch is ok with the two indirections removed, the rest can be optimized as followup. Thanks, Richard. > Thanks, > bin > 2018-05-15 Bin Cheng > * tree-outof-ssa.c (tree-ssa.h, tree-dfa.h): Include header files. > (create_default_def, for_all_parms): Moved from tree-ssa-coalesce.c. > (parm_default_def_partition_arg): Ditto. > (set_parm_default_def_partition): Ditto. > (get_parm_default_def_partitions): Ditto and make it static. > (get_undefined_value_partitions): Ditto and make it static. > (remove_ssa_form): Refactor call to init_
Re: [PATCH GCC][4/6]Support regional coalesce and live range computation
On Fri, May 11, 2018 at 1:53 PM, Richard Biener wrote: > On Fri, May 4, 2018 at 6:23 PM, Bin Cheng wrote: >> Hi, >> Following Jeff's suggestion, I am now using existing tree-ssa-live.c and >> tree-ssa-coalesce.c to compute register pressure, rather than inventing >> another live range solver. >> >> The major change is to record region's basic blocks in var_map and use that >> information in computation, rather than FOR_EACH_BB_FN. For now only loop >> and function type regions are supported. The default one is function type >> region which is used in out-of-ssa. Loop type region will be used in next >> patch to compute information for a loop. >> >> Bootstrap and test on x86_64 and AArch64 ongoing. Any comments? > > I believe your changes to create_outofssa_var_map should be done differently > by simply only calling it from the coalescing context and passing in the > var_map rather than initializing it therein and returning it. > > This also means the coalesce_vars_p flag in the var_map structure looks > somewhat out-of-place. That is, it looks you could do with many less > changes if you refactored what calls what slightly? For example > the extra arg to gimple_can_coalesce_p looks unneeded. > > Just as a note I do have a CFG helper pending that computes RPO order > for SEME regions (attached). loops are SEME regions, so your RTYPE_SESE > is somewhat odd - I guess RTYPE_LOOP exists only because of the > convenience of passing in a loop * to the "constructor". I'd rather > drop this region_type thing and always assume a SEME region - at least > I didn't see anything in the patch that depends on any of the forms > apart from the initial BB gathering. Hi Richard, Thanks for reviewing. I refactored tree-ssa-live.c and tree-ssa-coalesce.c following your comments. Basically I did following changes: 1) Remove region_type and only support loop region live range computation. Also I added one boolean field in var_map indicating whether we are computing loop region live range or out-of-ssa. 2) Refactored create_outofssa_var_map into create_coalesce_list_for_region and populate_coalesce_list_for_outofssa. Actually the original function name doesn't quite make sense because it has nothing to do with var_map. 3) Hoist init_var_map up in call stack. Now it's called by caller (out-of-ssa or live range) and the returned var_map is passed to coalesce_* stuff. 4) Move global functions to tree-outof-ssa.c and make them static. 5) Save/restore flag_tree_coalesce_vars in order to avoid updating checks on the flag. So how is this one? Patch attached. Thanks, bin 2018-05-15 Bin Cheng * tree-outof-ssa.c (tree-ssa.h, tree-dfa.h): Include header files. (create_default_def, for_all_parms): Moved from tree-ssa-coalesce.c. (parm_default_def_partition_arg): Ditto. (set_parm_default_def_partition): Ditto. (get_parm_default_def_partitions): Ditto and make it static. (get_undefined_value_partitions): Ditto and make it static. (remove_ssa_form): Refactor call to init_var_map here. * tree-ssa-coalesce.c (build_ssa_conflict_graph): Support live range computation for loop region. (coalesce_partitions, compute_optimized_partition_bases): Ditto. (register_default_def): Delete. (for_all_parms, create_default_def): Move to tree-outof-ssa.c. (parm_default_def_partition_arg): Ditto. (set_parm_default_def_partition): Ditto. (get_parm_default_def_partitions): Ditto and make it static. (get_undefined_value_partitions): Ditto and make it static. (coalesce_with_default, coalesce_with_default): Update comment. (create_coalesce_list_for_region): New func factored out from create_outofssa_var_map. (populate_coalesce_list_for_outofssa): New func factored out from create_outofssa_var_map and coalesce_ssa_name. (create_outofssa_var_map): Delete. (coalesce_ssa_name): Refactor to support live range computation. * tree-ssa-coalesce.h (coalesce_ssa_name): Change decl. (get_parm_default_def_partitions): Delete. (get_undefined_value_partitions): Ditto. * tree-ssa-live.c (init_var_map, delete_var_map): Support live range computation for loop region. (new_tree_live_info, loe_visit_block): Ditto. (live_worklist, set_var_live_on_entry): Ditto. (calculate_live_on_exit, verify_live_on_entry): Ditto. * tree-ssa-live.h (struct _var_map): New fields. (init_var_map): Change decl. (region_contains_p): New. From 1043540cd5325c65e60df25cad22c343e4312fd4 Mon Sep 17 00:00:00 2001 From: Bin Cheng Date: Fri, 4 May 2018 09:39:17 +0100 Subject: [PATCH 4/6] liverange-support-region-20180504 --- gcc/tree-outof-ssa.c| 102 +++- gcc/tree-ssa-coalesce.c | 317 +--- gcc/tree-ssa-coalesce.h | 4 +- gcc/tree-ssa-live.c | 78 gcc/tree-ssa-live.h | 30 - 5 files changed, 298 insertions(+), 233 deletions(-) diff --git a/gcc/tree-
Re: [PATCH GCC][4/6]Support regional coalesce and live range computation
On Fri, May 4, 2018 at 6:23 PM, Bin Cheng wrote: > Hi, > Following Jeff's suggestion, I am now using existing tree-ssa-live.c and > tree-ssa-coalesce.c to compute register pressure, rather than inventing > another live range solver. > > The major change is to record region's basic blocks in var_map and use that > information in computation, rather than FOR_EACH_BB_FN. For now only loop > and function type regions are supported. The default one is function type > region which is used in out-of-ssa. Loop type region will be used in next > patch to compute information for a loop. > > Bootstrap and test on x86_64 and AArch64 ongoing. Any comments? I believe your changes to create_outofssa_var_map should be done differently by simply only calling it from the coalescing context and passing in the var_map rather than initializing it therein and returning it. This also means the coalesce_vars_p flag in the var_map structure looks somewhat out-of-place. That is, it looks you could do with many less changes if you refactored what calls what slightly? For example the extra arg to gimple_can_coalesce_p looks unneeded. Just as a note I do have a CFG helper pending that computes RPO order for SEME regions (attached). loops are SEME regions, so your RTYPE_SESE is somewhat odd - I guess RTYPE_LOOP exists only because of the convenience of passing in a loop * to the "constructor". I'd rather drop this region_type thing and always assume a SEME region - at least I didn't see anything in the patch that depends on any of the forms apart from the initial BB gathering. Thanks, Richard. > Thanks, > bin > 2018-04-27 Bin Cheng > > * tree-outof-ssa.c (remove_ssa_form): Update use. > * tree-ssa-coalesce.c (build_ssa_conflict_graph): Support regional > coalesce. > (coalesce_with_default): Update comment. > (create_outofssa_var_map): Support regional coalesce. Rename to... > (create_var_map): ...this. > (coalesce_partitions): Support regional coalesce. > (gimple_can_coalesce_p, compute_optimized_partition_bases): Ditto. > (coalesce_ssa_name): Ditto. > * tree-ssa-coalesce.h (coalesce_ssa_name, gimple_can_coalesce_p): > Add parameter in declarations. > * tree-ssa-live.c (init_var_map, delete_var_map): Support regional > coalesce. > (new_tree_live_info, loe_visit_block, set_var_live_on_entry): Ditto. > (calculate_live_on_exit, verify_live_on_entry): Ditto. > * tree-ssa-live.h (enum region_type): New. > (struct _var_map): New fields. > (init_var_map): Add parameter in declaration. > (function_region_p, region_contains_p): New. > * tree-ssa-uncprop.c (uncprop_into_successor_phis): Update uses. p Description: Binary data