Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Stefan Beller
On Fri, Jun 30, 2017 at 11:47 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> Ok, let me redo the patch to have fndata at the front.
>>
>> Looking at other places (that have a similar mechanism mechanically,
>> but are semantically different), such as the callback functions for
>> the diff machinery, we have the user provided pointer at the end
>> of the list. But that is because the diff_options are the objects that
>> should be in front row (they are bound to the callback more than
>> some caller needed data).
>
> Quite honestly, I do not care too deeply about an API specific to a
> particular area like "diff" that passes its "configuration" data
> that everybody in the API knows, i.e. diff_options.  If the
> convention for ordinary functions in the API is to pass that in a
> particular location in the parameter list, I would think it is good
> for a application-supplied callback function to follow that pattern.
> After all, it is to configure the behaviour of the "diff" and the
> caller-supplied callback could have been part of a (hypothetically
> richer) API implementation.
>
> But I view a comparison function that is given to hashmap that is
> supplied by the caller a bit differently.  It is not about
> "hashing", so the reason to have the data close to function pointer
> is stronger there.

Yes I agree with that, I forgot to say so.

I just cited the example to see if we have a preference in the project
already and that's about it, but as it is different, we can put the fndata
first.

Thanks,
Stefan


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Junio C Hamano
Stefan Beller  writes:

> Ok, let me redo the patch to have fndata at the front.
>
> Looking at other places (that have a similar mechanism mechanically,
> but are semantically different), such as the callback functions for
> the diff machinery, we have the user provided pointer at the end
> of the list. But that is because the diff_options are the objects that
> should be in front row (they are bound to the callback more than
> some caller needed data).

Quite honestly, I do not care too deeply about an API specific to a
particular area like "diff" that passes its "configuration" data
that everybody in the API knows, i.e. diff_options.  If the
convention for ordinary functions in the API is to pass that in a
particular location in the parameter list, I would think it is good
for a application-supplied callback function to follow that pattern.
After all, it is to configure the behaviour of the "diff" and the
caller-supplied callback could have been part of a (hypothetically
richer) API implementation.

But I view a comparison function that is given to hashmap that is
supplied by the caller a bit differently.  It is not about
"hashing", so the reason to have the data close to function pointer
is stronger there.


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Junio C Hamano
Stefan Beller  writes:

> I am at a loss here after re-reading your answer over and over,
> but I think you are asking if patch_id_cmp can break, as
> we have a callchain like
>
>   patch_id_cmp
> commit_patch_id
>   (diff_root_tree_oid)
> diff_tree_oid
>   ll_diff_tree_oid
>
> passing diff_options down there, and patch_id_cmp may have
> gotten NULL.
>
> The answer is no, it was safe. (by accident?)
>
> That is because we never use hashmap_get_next
> on the hashmap that uses patch_id_cmp as the compare
> function.
>
> hashmap_get_next is the only function that does not pass
> on a keydata, any other has valid caller provided keydata.

Ah, thanks for clarifying.  I think I misread the earlier
discussion.  So unless somebody goes in to patch-ids.c and adds a
call to do get_next, we won't see a breakage, and it is not worth to
do a test-patch-ids.c that peeks into the hashmap patch-ids.c uses
and does get_next() only to demonstrate a potential future breakage.

OK.


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Stefan Beller
On Fri, Jun 30, 2017 at 10:39 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/patch-ids.c b/patch-ids.c
>> index 9c0ab9e67a..b9b2ebbad0 100644
>> --- a/patch-ids.c
>> +++ b/patch-ids.c
>> @@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct 
>> diff_options *options,
>>   */
>>  static int patch_id_cmp(struct patch_id *a,
>>   struct patch_id *b,
>> + const void *unused_keydata,
>>   struct diff_options *opt)
>>  {
>>   if (is_null_oid(>patch_id) &&
>> @@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
>>   ids->diffopts.detect_rename = 0;
>>   DIFF_OPT_SET(>diffopts, RECURSIVE);
>>   diff_setup_done(>diffopts);
>> - hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp, 256);
>> + hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp,
>> +  >diffopts, 256);
>>   return 0;
>>  }
>>
>> @@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
>>   if (init_patch_id_entry(, commit, ids))
>>   return NULL;
>>
>> - return hashmap_get(>patches, , >diffopts);
>> + return hashmap_get(>patches, , NULL);
>>  }

+cc Peff

>
> This actually makes me wonder if we can demonstrate an existing
> breakage in tests.  The old code used to pass NULL to the diffopts,
> causing it to be passed down through commit_patch_id() function down
> to diff_tree_oid() or diff_root_tree_oid().  When the codepath
> triggers the issue that Peff warned about in his old article (was it
> about rehashing or something?)  that makes two entries compared
> (i.e. not using keydata, because we are not comparing an existing
> entry with a key and data only to see if that already exists in the
> hashmap), wouldn't that cause ll_diff_tree_oid() that is called from
> diff_tree_oid() to dereference NULL?
>
> Thanks.

I am at a loss here after re-reading your answer over and over,
but I think you are asking if patch_id_cmp can break, as
we have a callchain like

  patch_id_cmp
commit_patch_id
  (diff_root_tree_oid)
diff_tree_oid
  ll_diff_tree_oid

passing diff_options down there, and patch_id_cmp may have
gotten NULL.

The answer is no, it was safe. (by accident?)

That is because we never use hashmap_get_next
on the hashmap that uses patch_id_cmp as the compare
function.

hashmap_get_next is the only function that does not pass
on a keydata, any other has valid caller provided keydata.

Thanks,
Stefan


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Stefan Beller
On Fri, Jun 30, 2017 at 10:34 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> When using the hashmap a common need is to have access to arbitrary data
>> in the compare function. A couple of times we abuse the keydata field
>> to pass in the data needed. This happens for example in patch-ids.c.
>
> It is not "arbitrary data"; it is very important to streess that we
> are not just passing random crud, but adding a mechanism to
> tailor/curry the function in a way that is fixed throughout the
> lifetime of a hashmap.

Ah yes, I forgot to fix patch1, when spending all time on the docs in p2.

>
>> diff --git a/hashmap.h b/hashmap.h
>> index de6022a3a9..1c26bbad5b 100644
>> --- a/hashmap.h
>> +++ b/hashmap.h
>> @@ -33,11 +33,12 @@ struct hashmap_entry {
>>  };
>>
>>  typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
>> - const void *keydata);
>> + const void *keydata, const void *cbdata);
>
> As I view the new "data" thing the C's (read: poor-man's) way to
> cutomize the function, I would have tweaked the function signature
> by giving the customization data at the front, i.e.
>
> fn(fndata, entry, entry_or_key, keydata);
>
> That would hopefully make it more obvious that the new thing is
> pairs with fn, not with other arguments (and entry-or-key and
> keydata pairs, instead of three old arguments standing as equals).

Ok, let me redo the patch to have fndata at the front.

Looking at other places (that have a similar mechanism mechanically,
but are semantically different), such as the callback functions for
the diff machinery, we have the user provided pointer at the end
of the list. But that is because the diff_options are the objects that
should be in front row (they are bound to the callback more than
some caller needed data).

typedef void (*diff_format_fn_t)(struct diff_queue_struct *q,
  struct diff_options *options, void *data);

typedef struct strbuf *(*diff_prefix_fn_t)(struct diff_options
*opt, void *data);

Thanks!

> As I think the way we wish to be able to express it in a better
> language would have been something like:
>
> (partial_apply(fn, fndata))(entry, entry_or_key, keydata)
>
> that order would match what is going on better.


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/patch-ids.c b/patch-ids.c
> index 9c0ab9e67a..b9b2ebbad0 100644
> --- a/patch-ids.c
> +++ b/patch-ids.c
> @@ -37,6 +37,7 @@ int commit_patch_id(struct commit *commit, struct 
> diff_options *options,
>   */
>  static int patch_id_cmp(struct patch_id *a,
>   struct patch_id *b,
> + const void *unused_keydata,
>   struct diff_options *opt)
>  {
>   if (is_null_oid(>patch_id) &&
> @@ -57,7 +58,8 @@ int init_patch_ids(struct patch_ids *ids)
>   ids->diffopts.detect_rename = 0;
>   DIFF_OPT_SET(>diffopts, RECURSIVE);
>   diff_setup_done(>diffopts);
> - hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp, 256);
> + hashmap_init(>patches, (hashmap_cmp_fn)patch_id_cmp,
> +  >diffopts, 256);
>   return 0;
>  }
>  
> @@ -93,7 +95,7 @@ struct patch_id *has_commit_patch_id(struct commit *commit,
>   if (init_patch_id_entry(, commit, ids))
>   return NULL;
>  
> - return hashmap_get(>patches, , >diffopts);
> + return hashmap_get(>patches, , NULL);
>  }

This actually makes me wonder if we can demonstrate an existing
breakage in tests.  The old code used to pass NULL to the diffopts,
causing it to be passed down through commit_patch_id() function down
to diff_tree_oid() or diff_root_tree_oid().  When the codepath
triggers the issue that Peff warned about in his old article (was it
about rehashing or something?) that makes two entries compared
(i.e. not using keydata, because we are not comparing an existing
entry with a key and data only to see if that already exists in the
hashmap), wouldn't that cause ll_diff_tree_oid() that is called from
diff_tree_oid() to dereference NULL?

Thanks.


Re: [PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-30 Thread Junio C Hamano
Stefan Beller  writes:

> When using the hashmap a common need is to have access to arbitrary data
> in the compare function. A couple of times we abuse the keydata field
> to pass in the data needed. This happens for example in patch-ids.c.

It is not "arbitrary data"; it is very important to streess that we
are not just passing random crud, but adding a mechanism to
tailor/curry the function in a way that is fixed throughout the
lifetime of a hashmap.

> diff --git a/hashmap.h b/hashmap.h
> index de6022a3a9..1c26bbad5b 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -33,11 +33,12 @@ struct hashmap_entry {
>  };
>  
>  typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
> - const void *keydata);
> + const void *keydata, const void *cbdata);

As I view the new "data" thing the C's (read: poor-man's) way to
cutomize the function, I would have tweaked the function signature
by giving the customization data at the front, i.e.

fn(fndata, entry, entry_or_key, keydata);

That would hopefully make it more obvious that the new thing is
pairs with fn, not with other arguments (and entry-or-key and
keydata pairs, instead of three old arguments standing as equals).

As I think the way we wish to be able to express it in a better
language would have been something like:

(partial_apply(fn, fndata))(entry, entry_or_key, keydata)

that order would match what is going on better.


[PATCHv2 1/2] hashmap.h: compare function has access to a data field

2017-06-29 Thread Stefan Beller
When using the hashmap a common need is to have access to arbitrary data
in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.

While at it improve the naming of the fields of all compare functions used
by hashmaps by ensuring unused parameters are prefixed with 'unused_' and
naming the parameters what they are (instead of 'unused' make it
'unused_keydata').

Signed-off-by: Stefan Beller 
---
 attr.c  |  4 ++--
 builtin/describe.c  |  6 --
 builtin/difftool.c  | 20 
 builtin/fast-export.c   |  5 +++--
 config.c|  7 +--
 convert.c   |  3 ++-
 diffcore-rename.c   |  2 +-
 hashmap.c   | 17 -
 hashmap.h   |  9 ++---
 name-hash.c | 12 
 oidset.c|  5 +++--
 patch-ids.c |  6 --
 refs.c  |  4 ++--
 remote.c|  7 +--
 sha1_file.c |  5 +++--
 sub-process.c   |  5 +++--
 sub-process.h   |  6 --
 submodule-config.c  | 10 ++
 t/helper/test-hashmap.c | 15 ++-
 19 files changed, 95 insertions(+), 53 deletions(-)

diff --git a/attr.c b/attr.c
index 37454999d2..2f51417675 100644
--- a/attr.c
+++ b/attr.c
@@ -78,7 +78,7 @@ struct attr_hash_entry {
 /* attr_hashmap comparison function */
 static int attr_hash_entry_cmp(const struct attr_hash_entry *a,
   const struct attr_hash_entry *b,
-  void *unused)
+  void *unused_keydata, void *unused_data)
 {
return (a->keylen != b->keylen) || strncmp(a->key, b->key, a->keylen);
 }
@@ -86,7 +86,7 @@ static int attr_hash_entry_cmp(const struct attr_hash_entry 
*a,
 /* Initialize an 'attr_hashmap' object */
 static void attr_hashmap_init(struct attr_hashmap *map)
 {
-   hashmap_init(>map, (hashmap_cmp_fn) attr_hash_entry_cmp, 0);
+   hashmap_init(>map, (hashmap_cmp_fn) attr_hash_entry_cmp, NULL, 0);
 }
 
 /*
diff --git a/builtin/describe.c b/builtin/describe.c
index 70eb144608..a6c5a969a0 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -55,7 +55,9 @@ static const char *prio_names[] = {
 };
 
 static int commit_name_cmp(const struct commit_name *cn1,
-   const struct commit_name *cn2, const void *peeled)
+  const struct commit_name *cn2,
+  const void *peeled,
+  const void *unused_data)
 {
return oidcmp(>peeled, peeled ? peeled : >peeled);
 }
@@ -501,7 +503,7 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
return cmd_name_rev(args.argc, args.argv, prefix);
}
 
-   hashmap_init(, (hashmap_cmp_fn) commit_name_cmp, 0);
+   hashmap_init(, (hashmap_cmp_fn) commit_name_cmp, NULL, 0);
for_each_rawref(get_name, NULL);
if (!names.size && !always)
die(_("No names found, cannot describe anything."));
diff --git a/builtin/difftool.c b/builtin/difftool.c
index 9199227f6e..80786a95ab 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -131,7 +131,9 @@ struct working_tree_entry {
 };
 
 static int working_tree_entry_cmp(struct working_tree_entry *a,
- struct working_tree_entry *b, void *keydata)
+ struct working_tree_entry *b,
+ void *unused_keydata,
+ void *unused_data)
 {
return strcmp(a->path, b->path);
 }
@@ -146,7 +148,8 @@ struct pair_entry {
const char path[FLEX_ARRAY];
 };
 
-static int pair_cmp(struct pair_entry *a, struct pair_entry *b, void *keydata)
+static int pair_cmp(struct pair_entry *a, struct pair_entry *b,
+   void *unused_keydata, void *unused_data)
 {
return strcmp(a->path, b->path);
 }
@@ -174,7 +177,8 @@ struct path_entry {
char path[FLEX_ARRAY];
 };
 
-static int path_entry_cmp(struct path_entry *a, struct path_entry *b, void 
*key)
+static int path_entry_cmp(struct path_entry *a, struct path_entry *b,
+ void *key, void *unused_data)
 {
return strcmp(a->path, key ? key : b->path);
 }
@@ -367,9 +371,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
wtdir_len = wtdir.len;
 
hashmap_init(_tree_dups,
-(hashmap_cmp_fn)working_tree_entry_cmp, 0);
-   hashmap_init(, (hashmap_cmp_fn)pair_cmp, 0);
-   hashmap_init(,