Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-11 Thread Junio C Hamano
Karsten Blees  writes:

>> In other words, why isn't hashmap_get() more like this:
>>  ...
>> with hashmap_entry_init() purely a static helper in hashmap.c?
>> 
> 1. Performance

OK.

> 2. Simplicity
>
> Hashmap clients will typically provide small, type safe wrappers around the
> hashmap API.

OK.

> 3. Compound keys
>
> The key doesn't always consist of just a single word. E.g. for struct
> dir_entry, the key is [char *name, int len]. So an API like this:
>
>   void *hashmap_get(const struct hashmap *map, const void *key)
>
> won't do in the general case (unless you require clients to define their
> own key structure in addition to the entry structure...).

Yeah, I was being silly.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-11 Thread Karsten Blees
Am 07.07.2014 19:43, schrieb Junio C Hamano:
> Karsten Blees  writes:
> 
>> Hashmap entries are typically looked up by just a key. The hashmap_get()
>> API expects an initialized entry structure instead, to support compound
>> keys. This flexibility is currently only needed by find_dir_entry() in
>> name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
>> (currently five) call sites of hashmap_get() have to set up a near emtpy
> 
> s/emtpy/empty/;
> 
>> entry structure, resulting in duplicate code like this:
>>
>>   struct hashmap_entry keyentry;
>>   hashmap_entry_init(&keyentry, hash(key));
>>   return hashmap_get(map, &keyentry, key);
>>
>> Add a hashmap_get_from_hash() API that allows hashmap lookups by just
>> specifying the key and its hash code, i.e.:
>>
>>   return hashmap_get_from_hash(map, hash(key), key);
> 
> While I think the *_get_from_hash() is an improvement over the
> three-line sequence, I find it somewhat strange that callers of it
> still must compute the hash code themselves, instead of letting the
> hashmap itself to apply the user supplied hash function to the key
> to derive it.  After all, the map must know how to compare two
> entries via a user-supplied cmpfn given at the map initialization
> time, and the algorithm to derive the hash code falls into the same
> category, in the sense that both must stay the same during the
> lifetime of a hashmap, no?  Unless there is a situation where you
> need to be able to initialize a hashmap_entry without knowing which
> hashmap the entry will eventually be stored, it feels dubious API
> that exposes to the outside callers hashmap_entry_init() that takes
> the hash code without taking the hashmap itself.
> 
> In other words, why isn't hashmap_get() more like this:
> 
> void *hashmap_get(const struct hashmap *map, const void *key)
> {
> struct hashmap_entry keyentry;
> hashmap_entry_init(&keyentry, map->hash(key));
> return *find_entry_ptr(map, &keyentry, key);
> }
> 
> with hashmap_entry_init() purely a static helper in hashmap.c?
> 

1. Performance

Calculating hash codes is the most expensive operation when working with
hash tables (unless you already have a good hash such as sha1). And using
hash tables as a cache of some sort is probably the most common use case.
So you'll often have code like this:

1  unsigned int h = hash(key);
2  struct my_entry *e = hashmap_get_from_hash(&map, hash, key);
3  if (!e) {
4e = malloc(sizeof(*e));
5hashmap_entry_init(e, h);
6e->key = key;
6hashmap_add(&map, e);
7  }

Note that the hash code from line 1 can be reused in line 5. You cannot do
that if calculating the hash code is buried in the implementation.

Callbacks cannot be inlined either...


2. Simplicity

Most APIs take a user defined hashmap_entry structure, so you'd either need
two hash functions, or a hash function that can distinguish between the
'entry' and 'key-only' case, e.g.

  unsigned int my_hash(const struct my_entry *entry, const void *keydata)
  {
if (keydata)
  return strhash(keydata);
else
  return strhash(entry->key);
  }

Hashmap clients will typically provide small, type safe wrappers around the
hashmap API. That's perfect for calculating the hash code without breaking
encapsulation. IMO there's no need to make things more complex by requiring
another callback.


3. Compound keys

The key doesn't always consist of just a single word. E.g. for struct
dir_entry, the key is [char *name, int len]. So an API like this:

  void *hashmap_get(const struct hashmap *map, const void *key)

won't do in the general case (unless you require clients to define their
own key structure in addition to the entry structure...).

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-07 Thread Junio C Hamano
Karsten Blees  writes:

> Hashmap entries are typically looked up by just a key. The hashmap_get()
> API expects an initialized entry structure instead, to support compound
> keys. This flexibility is currently only needed by find_dir_entry() in
> name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
> (currently five) call sites of hashmap_get() have to set up a near emtpy

s/emtpy/empty/;

> entry structure, resulting in duplicate code like this:
>
>   struct hashmap_entry keyentry;
>   hashmap_entry_init(&keyentry, hash(key));
>   return hashmap_get(map, &keyentry, key);
>
> Add a hashmap_get_from_hash() API that allows hashmap lookups by just
> specifying the key and its hash code, i.e.:
>
>   return hashmap_get_from_hash(map, hash(key), key);

While I think the *_get_from_hash() is an improvement over the
three-line sequence, I find it somewhat strange that callers of it
still must compute the hash code themselves, instead of letting the
hashmap itself to apply the user supplied hash function to the key
to derive it.  After all, the map must know how to compare two
entries via a user-supplied cmpfn given at the map initialization
time, and the algorithm to derive the hash code falls into the same
category, in the sense that both must stay the same during the
lifetime of a hashmap, no?  Unless there is a situation where you
need to be able to initialize a hashmap_entry without knowing which
hashmap the entry will eventually be stored, it feels dubious API
that exposes to the outside callers hashmap_entry_init() that takes
the hash code without taking the hashmap itself.

In other words, why isn't hashmap_get() more like this:

void *hashmap_get(const struct hashmap *map, const void *key)
{
struct hashmap_entry keyentry;
hashmap_entry_init(&keyentry, map->hash(key));
return *find_entry_ptr(map, &keyentry, key);
}

with hashmap_entry_init() purely a static helper in hashmap.c?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API

2014-07-02 Thread Karsten Blees
Hashmap entries are typically looked up by just a key. The hashmap_get()
API expects an initialized entry structure instead, to support compound
keys. This flexibility is currently only needed by find_dir_entry() in
name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
(currently five) call sites of hashmap_get() have to set up a near emtpy
entry structure, resulting in duplicate code like this:

  struct hashmap_entry keyentry;
  hashmap_entry_init(&keyentry, hash(key));
  return hashmap_get(map, &keyentry, key);

Add a hashmap_get_from_hash() API that allows hashmap lookups by just
specifying the key and its hash code, i.e.:

  return hashmap_get_from_hash(map, hash(key), key);

Signed-off-by: Karsten Blees 
---
 Documentation/technical/api-hashmap.txt | 14 ++
 builtin/describe.c  |  4 +---
 diffcore-rename.c   |  7 +++
 hashmap.h   |  8 
 name-hash.c |  5 ++---
 test-hashmap.c  | 11 +++
 6 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt 
b/Documentation/technical/api-hashmap.txt
index dc21a7c..f9215d6 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -118,6 +118,20 @@ hashmap_entry) that has at least been initialized with the 
proper hash code
 If an entry with matching hash code is found, `key` and `keydata` are passed
 to `hashmap_cmp_fn` to decide whether the entry matches the key.
 
+`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, 
const void *keydata)`::
+
+   Returns the hashmap entry for the specified hash code and key data,
+   or NULL if not found.
++
+`map` is the hashmap structure.
++
+`hash` is the hash code of the entry to look up.
++
+If an entry with matching hash code is found, `keydata` is passed to
+`hashmap_cmp_fn` to decide whether the entry matches the key. The
+`entry_or_key` parameter points to a bogus hashmap_entry structure that
+should not be used in the comparison.
+
 `void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
 
Returns the next equal hashmap entry, or NULL if not found. This can be
diff --git a/builtin/describe.c b/builtin/describe.c
index 57e84c8..ee6a3b9 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -58,9 +58,7 @@ static int commit_name_cmp(const struct commit_name *cn1,
 
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
-   struct commit_name key;
-   hashmap_entry_init(&key, sha1hash(peeled));
-   return hashmap_get(&names, &key, peeled);
+   return hashmap_get_from_hash(&names, sha1hash(peeled), peeled);
 }
 
 static int replace_name(struct commit_name *e,
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6fa97d4..2e44a37 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -257,15 +257,14 @@ static int find_identical_files(struct hashmap *srcs,
int renames = 0;
 
struct diff_filespec *target = rename_dst[dst_index].two;
-   struct file_similarity *p, *best, dst;
+   struct file_similarity *p, *best = NULL;
int i = 100, best_score = -1;
 
/*
 * Find the best source match for specified destination.
 */
-   best = NULL;
-   hashmap_entry_init(&dst, hash_filespec(target));
-   for (p = hashmap_get(srcs, &dst, NULL); p; p = hashmap_get_next(srcs, 
p)) {
+   p = hashmap_get_from_hash(srcs, hash_filespec(target), NULL);
+   for (; p; p = hashmap_get_next(srcs, p)) {
int score;
struct diff_filespec *source = p->filespec;
 
diff --git a/hashmap.h b/hashmap.h
index ed5425a..12f0668 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -68,6 +68,14 @@ extern void *hashmap_put(struct hashmap *map, void *entry);
 extern void *hashmap_remove(struct hashmap *map, const void *key,
const void *keydata);
 
+static inline void *hashmap_get_from_hash(const struct hashmap *map,
+   unsigned int hash, const void *keydata)
+{
+   struct hashmap_entry key;
+   hashmap_entry_init(&key, hash);
+   return hashmap_get(map, &key, keydata);
+}
+
 /* hashmap_iter functions */
 
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
diff --git a/name-hash.c b/name-hash.c
index 49fd508..702cd05 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -213,12 +213,11 @@ struct cache_entry *index_dir_exists(struct index_state 
*istate, const char *nam
 struct cache_entry *index_file_exists(struct index_state *istate, const char 
*name, int namelen, int icase)
 {
struct cache_entry *ce;
-   struct hashmap_entry key;
 
lazy_init_name_hash(istate);
 
-   hashmap_entry_init(&key, memihash(name, namelen));
-   ce = hashmap_get(&istate->name_hash, &key, NULL);
+   ce = hashmap_get_from_hash(&istate->name_hash,
+