Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic
On Mon, Nov 13, 2017 at 9:14 PM, Junio C Hamanowrote: > Junio C Hamano writes: > >> Elijah Newren writes: >> >>> +struct rename_info { >>> +struct string_list *head_renames; >>> +struct string_list *merge_renames; >>> +}; >> >> This type is added in order to allow the caller and the helper to >> communicate the findings in a single logical structure, instead of >> having to pass them as separate parameters, etc. If we anticipate >> that the information that needs to be passed will grow richer in >> later steps (or a follow-up series), such encapsulation makes a lot > > Hmph, I actually am quite confused with the existing code. > > The caller (originally in merge_trees(), now in handle_renames()) > calls get_renames() twice and have the list of renamed paths in > these two string lists. get_renames() mostly works with the > elements in the "entries" list and adds the "struct rename" to the > string list that is to be returned. And the caller uses these two > string lists get_renames() returns when calling process_renames(), > but once process_renames() is done with them, these two string lists > are never looked at by anybody. Actually, if I remember correctly, my first stab was to do all the cleanup at the end of handle_renames(), but then I ran into use-after-free errors. I'm not sure if I remember all the details, but I'll try to lay out the path: process_renames() can't handle conflicts immediately because of D/F concerns (if all entries in the competing directory resolve away, then there's no more D/F conflict, but we have to wait until each of those entries is processed to find out if that happens or if a D/F conflict remains). Because of that, process_renames() needs to store information into a rename_conflict_info struct for process_entry() to look at later. Included in rename_conflict_info are things like diff_filepair and stage_data entries, both taken from the rename lists. If the rename lists are freed at the end of handle_renames(), then this information is freed before process_entry() runs and thus we get a use-after-free error. Since both you and I thought to push this cleanup to the end of handle_renames(), though, I should probably add that explanation to the commit message. Granted, it isn't actually needed for this particular commit, because up to this point all the information used in rename_conflict_info was leaked anyway. But it becomes an issue with patch 17 when we start freeing that info.
Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic
Junio C Hamanowrites: > Elijah Newren writes: > >> +struct rename_info { >> +struct string_list *head_renames; >> +struct string_list *merge_renames; >> +}; > > This type is added in order to allow the caller and the helper to > communicate the findings in a single logical structure, instead of > having to pass them as separate parameters, etc. If we anticipate > that the information that needs to be passed will grow richer in > later steps (or a follow-up series), such encapsulation makes a lot Hmph, I actually am quite confused with the existing code. The caller (originally in merge_trees(), now in handle_renames()) calls get_renames() twice and have the list of renamed paths in these two string lists. get_renames() mostly works with the elements in the "entries" list and adds the "struct rename" to the string list that is to be returned. And the caller uses these two string lists get_renames() returns when calling process_renames(), but once process_renames() is done with them, these two string lists are never looked at by anybody. So do we really need to pass this structure around in the first place? I am wondering if we can do the cleanup_rename() on both of these lists after handle_renames() calls process_renames() before returning, which will eliminate the need for this structure and a separate cleanup_renames() helper that clears the structure and the two string lists in it.
Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic
Elijah Newrenwrites: > +struct rename_info { > + struct string_list *head_renames; > + struct string_list *merge_renames; > +}; This type is added in order to allow the caller and the helper to communicate the findings in a single logical structure, instead of having to pass them as separate parameters, etc. If we anticipate that the information that needs to be passed will grow richer in later steps (or a follow-up series), such encapsulation makes a lot of sence. > +static struct rename_info *handle_renames(struct merge_options *o, > + struct tree *common, > + struct tree *head, > + struct tree *merge, > + struct string_list *entries, > + int *clean) > +{ > + struct rename_info *rei = xcalloc(1, sizeof(struct rename_info)); I however notice that there is only one caller of this helper at this step, and also at the end of this series. I suspect that it would probably be a better design to make "clean" the return value of this helper, and instead have the caller pass an uninitialised rename_info structure on its stack by address to be filled by the helper. > + rei->head_renames = get_renames(o, head, common, head, merge, entries); > + rei->merge_renames = get_renames(o, merge, common, head, merge, > entries); > + *clean = process_renames(o, rei->head_renames, rei->merge_renames); > + > + return rei; > +} > + > +static void cleanup_renames(struct rename_info *re_info) > +{ > + string_list_clear(re_info->head_renames, 0); > + string_list_clear(re_info->merge_renames, 0); > + > + free(re_info->head_renames); > + free(re_info->merge_renames); > + > + free(re_info); > +} > static struct object_id *stage_oid(const struct object_id *oid, unsigned > mode) > { > return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid; > @@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o, > } > > if (unmerged_cache()) { > - struct string_list *entries, *re_head, *re_merge; > + struct string_list *entries; > + struct rename_info *re_info; > int i; > /* >* Only need the hashmap while processing entries, so > @@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o, > get_files_dirs(o, merge); > > entries = get_unmerged(); > - re_head = get_renames(o, head, common, head, merge, entries); > - re_merge = get_renames(o, merge, common, head, merge, entries); > - clean = process_renames(o, re_head, re_merge); > + re_info = handle_renames(o, common, head, merge, entries, > ); > record_df_conflict_files(o, entries); > if (clean < 0) > goto cleanup; > @@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o, > } > > cleanup: > - string_list_clear(re_merge, 0); > - string_list_clear(re_head, 0); > + cleanup_renames(re_info); > + > string_list_clear(entries, 1); > + free(entries); > > hashmap_free(>current_file_dir_set, 1); > > - free(re_merge); > - free(re_head); > - free(entries); > - > if (clean < 0) > return clean; > }
[PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic
The amount of logic in merge_trees() relative to renames was just a few lines, but split it out into new handle_renames() and cleanup_renames() functions to prepare for additional logic to be added to each. No code or logic changes, just a new place to put stuff for when the rename detection gains additional checks. Signed-off-by: Elijah Newren--- merge-recursive.c | 48 ++-- 1 file changed, 38 insertions(+), 10 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 382016508b..49710c0964 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -1638,6 +1638,38 @@ static int process_renames(struct merge_options *o, return clean_merge; } +struct rename_info { + struct string_list *head_renames; + struct string_list *merge_renames; +}; + +static struct rename_info *handle_renames(struct merge_options *o, + struct tree *common, + struct tree *head, + struct tree *merge, + struct string_list *entries, + int *clean) +{ + struct rename_info *rei = xcalloc(1, sizeof(struct rename_info)); + + rei->head_renames = get_renames(o, head, common, head, merge, entries); + rei->merge_renames = get_renames(o, merge, common, head, merge, entries); + *clean = process_renames(o, rei->head_renames, rei->merge_renames); + + return rei; +} + +static void cleanup_renames(struct rename_info *re_info) +{ + string_list_clear(re_info->head_renames, 0); + string_list_clear(re_info->merge_renames, 0); + + free(re_info->head_renames); + free(re_info->merge_renames); + + free(re_info); +} + static struct object_id *stage_oid(const struct object_id *oid, unsigned mode) { return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid; @@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o, } if (unmerged_cache()) { - struct string_list *entries, *re_head, *re_merge; + struct string_list *entries; + struct rename_info *re_info; int i; /* * Only need the hashmap while processing entries, so @@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o, get_files_dirs(o, merge); entries = get_unmerged(); - re_head = get_renames(o, head, common, head, merge, entries); - re_merge = get_renames(o, merge, common, head, merge, entries); - clean = process_renames(o, re_head, re_merge); + re_info = handle_renames(o, common, head, merge, entries, ); record_df_conflict_files(o, entries); if (clean < 0) goto cleanup; @@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o, } cleanup: - string_list_clear(re_merge, 0); - string_list_clear(re_head, 0); + cleanup_renames(re_info); + string_list_clear(entries, 1); + free(entries); hashmap_free(>current_file_dir_set, 1); - free(re_merge); - free(re_head); - free(entries); - if (clean < 0) return clean; } -- 2.15.0.5.g9567be9905