Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic

2017-11-14 Thread Elijah Newren
On Mon, Nov 13, 2017 at 9:14 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Elijah Newren  writes:
>>
>>> +struct rename_info {
>>> +struct string_list *head_renames;
>>> +struct string_list *merge_renames;
>>> +};
>>
>> This type is added in order to allow the caller and the helper to
>> communicate the findings in a single logical structure, instead of
>> having to pass them as separate parameters, etc.  If we anticipate
>> that the information that needs to be passed will grow richer in
>> later steps (or a follow-up series), such encapsulation makes a lot
>
> Hmph, I actually am quite confused with the existing code.
>
> The caller (originally in merge_trees(), now in handle_renames())
> calls get_renames() twice and have the list of renamed paths in
> these two string lists.  get_renames() mostly works with the
> elements in the "entries" list and adds the "struct rename" to the
> string list that is to be returned.  And the caller uses these two
> string lists get_renames() returns when calling process_renames(),
> but once process_renames() is done with them, these two string lists
> are never looked at by anybody.

Actually, if I remember correctly, my first stab was to do all the
cleanup at the end of handle_renames(), but then I ran into
use-after-free errors.  I'm not sure if I remember all the details,
but I'll try to lay out the path:

process_renames() can't handle conflicts immediately because of D/F
concerns (if all entries in the competing directory resolve away, then
there's no more D/F conflict, but we have to wait until each of those
entries is processed to find out if that happens or if a D/F conflict
remains).  Because of that, process_renames() needs to store
information into a rename_conflict_info struct for process_entry() to
look at later.  Included in rename_conflict_info are things like
diff_filepair and stage_data entries, both taken from the rename
lists.  If the rename lists are freed at the end of handle_renames(),
then this information is freed before process_entry() runs and thus we
get a use-after-free error.

Since both you and I thought to push this cleanup to the end of
handle_renames(), though, I should probably add that explanation to
the commit message.  Granted, it isn't actually needed for this
particular commit, because up to this point all the information used
in rename_conflict_info was leaked anyway.  But it becomes an issue
with patch 17 when we start freeing that info.


Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic

2017-11-13 Thread Junio C Hamano
Junio C Hamano  writes:

> Elijah Newren  writes:
>
>> +struct rename_info {
>> +struct string_list *head_renames;
>> +struct string_list *merge_renames;
>> +};
>
> This type is added in order to allow the caller and the helper to
> communicate the findings in a single logical structure, instead of
> having to pass them as separate parameters, etc.  If we anticipate
> that the information that needs to be passed will grow richer in
> later steps (or a follow-up series), such encapsulation makes a lot

Hmph, I actually am quite confused with the existing code.

The caller (originally in merge_trees(), now in handle_renames())
calls get_renames() twice and have the list of renamed paths in
these two string lists.  get_renames() mostly works with the
elements in the "entries" list and adds the "struct rename" to the
string list that is to be returned.  And the caller uses these two
string lists get_renames() returns when calling process_renames(),
but once process_renames() is done with them, these two string lists
are never looked at by anybody.

So do we really need to pass this structure around in the first
place?  I am wondering if we can do the cleanup_rename() on both of
these lists after handle_renames() calls process_renames() before
returning, which will eliminate the need for this structure and a
separate cleanup_renames() helper that clears the structure and the
two string lists in it.



Re: [PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic

2017-11-13 Thread Junio C Hamano
Elijah Newren  writes:

> +struct rename_info {
> + struct string_list *head_renames;
> + struct string_list *merge_renames;
> +};

This type is added in order to allow the caller and the helper to
communicate the findings in a single logical structure, instead of
having to pass them as separate parameters, etc.  If we anticipate
that the information that needs to be passed will grow richer in
later steps (or a follow-up series), such encapsulation makes a lot
of sence.

> +static struct rename_info *handle_renames(struct merge_options *o,
> +   struct tree *common,
> +   struct tree *head,
> +   struct tree *merge,
> +   struct string_list *entries,
> +   int *clean)
> +{
> + struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));

I however notice that there is only one caller of this helper at
this step, and also at the end of this series.  I suspect that it
would probably be a better design to make "clean" the return value
of this helper, and instead have the caller pass an uninitialised
rename_info structure on its stack by address to be filled by the
helper.

> + rei->head_renames  = get_renames(o, head, common, head, merge, entries);
> + rei->merge_renames = get_renames(o, merge, common, head, merge, 
> entries);
> + *clean = process_renames(o, rei->head_renames, rei->merge_renames);
> +
> + return rei;
> +}
> +
> +static void cleanup_renames(struct rename_info *re_info)
> +{
> + string_list_clear(re_info->head_renames, 0);
> + string_list_clear(re_info->merge_renames, 0);
> +
> + free(re_info->head_renames);
> + free(re_info->merge_renames);
> +
> + free(re_info);
> +}
>  static struct object_id *stage_oid(const struct object_id *oid, unsigned 
> mode)
>  {
>   return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid;
> @@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o,
>   }
>  
>   if (unmerged_cache()) {
> - struct string_list *entries, *re_head, *re_merge;
> + struct string_list *entries;
> + struct rename_info *re_info;
>   int i;
>   /*
>* Only need the hashmap while processing entries, so
> @@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o,
>   get_files_dirs(o, merge);
>  
>   entries = get_unmerged();
> - re_head  = get_renames(o, head, common, head, merge, entries);
> - re_merge = get_renames(o, merge, common, head, merge, entries);
> - clean = process_renames(o, re_head, re_merge);
> + re_info = handle_renames(o, common, head, merge, entries, 
> );
>   record_df_conflict_files(o, entries);
>   if (clean < 0)
>   goto cleanup;
> @@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o,
>   }
>  
>  cleanup:
> - string_list_clear(re_merge, 0);
> - string_list_clear(re_head, 0);
> + cleanup_renames(re_info);
> +
>   string_list_clear(entries, 1);
> + free(entries);
>  
>   hashmap_free(>current_file_dir_set, 1);
>  
> - free(re_merge);
> - free(re_head);
> - free(entries);
> -
>   if (clean < 0)
>   return clean;
>   }


[PATCH 16/30] merge-recursive: Introduce new functions to handle rename logic

2017-11-10 Thread Elijah Newren
The amount of logic in merge_trees() relative to renames was just a few
lines, but split it out into new handle_renames() and cleanup_renames()
functions to prepare for additional logic to be added to each.  No code
or logic changes, just a new place to put stuff for when the rename
detection gains additional checks.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 48 ++--
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 382016508b..49710c0964 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1638,6 +1638,38 @@ static int process_renames(struct merge_options *o,
return clean_merge;
 }
 
+struct rename_info {
+   struct string_list *head_renames;
+   struct string_list *merge_renames;
+};
+
+static struct rename_info *handle_renames(struct merge_options *o,
+ struct tree *common,
+ struct tree *head,
+ struct tree *merge,
+ struct string_list *entries,
+ int *clean)
+{
+   struct rename_info *rei = xcalloc(1, sizeof(struct rename_info));
+
+   rei->head_renames  = get_renames(o, head, common, head, merge, entries);
+   rei->merge_renames = get_renames(o, merge, common, head, merge, 
entries);
+   *clean = process_renames(o, rei->head_renames, rei->merge_renames);
+
+   return rei;
+}
+
+static void cleanup_renames(struct rename_info *re_info)
+{
+   string_list_clear(re_info->head_renames, 0);
+   string_list_clear(re_info->merge_renames, 0);
+
+   free(re_info->head_renames);
+   free(re_info->merge_renames);
+
+   free(re_info);
+}
+
 static struct object_id *stage_oid(const struct object_id *oid, unsigned mode)
 {
return (is_null_oid(oid) || mode == 0) ? NULL: (struct object_id *)oid;
@@ -1989,7 +2021,8 @@ int merge_trees(struct merge_options *o,
}
 
if (unmerged_cache()) {
-   struct string_list *entries, *re_head, *re_merge;
+   struct string_list *entries;
+   struct rename_info *re_info;
int i;
/*
 * Only need the hashmap while processing entries, so
@@ -2003,9 +2036,7 @@ int merge_trees(struct merge_options *o,
get_files_dirs(o, merge);
 
entries = get_unmerged();
-   re_head  = get_renames(o, head, common, head, merge, entries);
-   re_merge = get_renames(o, merge, common, head, merge, entries);
-   clean = process_renames(o, re_head, re_merge);
+   re_info = handle_renames(o, common, head, merge, entries, 
);
record_df_conflict_files(o, entries);
if (clean < 0)
goto cleanup;
@@ -2030,16 +2061,13 @@ int merge_trees(struct merge_options *o,
}
 
 cleanup:
-   string_list_clear(re_merge, 0);
-   string_list_clear(re_head, 0);
+   cleanup_renames(re_info);
+
string_list_clear(entries, 1);
+   free(entries);
 
hashmap_free(>current_file_dir_set, 1);
 
-   free(re_merge);
-   free(re_head);
-   free(entries);
-
if (clean < 0)
return clean;
}
-- 
2.15.0.5.g9567be9905