Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API
Karsten Blees writes: >> In other words, why isn't hashmap_get() more like this: >> ... >> with hashmap_entry_init() purely a static helper in hashmap.c? >> > 1. Performance OK. > 2. Simplicity > > Hashmap clients will typically provide small, type safe wrappers around the > hashmap API. OK. > 3. Compound keys > > The key doesn't always consist of just a single word. E.g. for struct > dir_entry, the key is [char *name, int len]. So an API like this: > > void *hashmap_get(const struct hashmap *map, const void *key) > > won't do in the general case (unless you require clients to define their > own key structure in addition to the entry structure...). Yeah, I was being silly. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API
Am 07.07.2014 19:43, schrieb Junio C Hamano: > Karsten Blees writes: > >> Hashmap entries are typically looked up by just a key. The hashmap_get() >> API expects an initialized entry structure instead, to support compound >> keys. This flexibility is currently only needed by find_dir_entry() in >> name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other >> (currently five) call sites of hashmap_get() have to set up a near emtpy > > s/emtpy/empty/; > >> entry structure, resulting in duplicate code like this: >> >> struct hashmap_entry keyentry; >> hashmap_entry_init(&keyentry, hash(key)); >> return hashmap_get(map, &keyentry, key); >> >> Add a hashmap_get_from_hash() API that allows hashmap lookups by just >> specifying the key and its hash code, i.e.: >> >> return hashmap_get_from_hash(map, hash(key), key); > > While I think the *_get_from_hash() is an improvement over the > three-line sequence, I find it somewhat strange that callers of it > still must compute the hash code themselves, instead of letting the > hashmap itself to apply the user supplied hash function to the key > to derive it. After all, the map must know how to compare two > entries via a user-supplied cmpfn given at the map initialization > time, and the algorithm to derive the hash code falls into the same > category, in the sense that both must stay the same during the > lifetime of a hashmap, no? Unless there is a situation where you > need to be able to initialize a hashmap_entry without knowing which > hashmap the entry will eventually be stored, it feels dubious API > that exposes to the outside callers hashmap_entry_init() that takes > the hash code without taking the hashmap itself. > > In other words, why isn't hashmap_get() more like this: > > void *hashmap_get(const struct hashmap *map, const void *key) > { > struct hashmap_entry keyentry; > hashmap_entry_init(&keyentry, map->hash(key)); > return *find_entry_ptr(map, &keyentry, key); > } > > with hashmap_entry_init() purely a static helper in hashmap.c? > 1. Performance Calculating hash codes is the most expensive operation when working with hash tables (unless you already have a good hash such as sha1). And using hash tables as a cache of some sort is probably the most common use case. So you'll often have code like this: 1 unsigned int h = hash(key); 2 struct my_entry *e = hashmap_get_from_hash(&map, hash, key); 3 if (!e) { 4e = malloc(sizeof(*e)); 5hashmap_entry_init(e, h); 6e->key = key; 6hashmap_add(&map, e); 7 } Note that the hash code from line 1 can be reused in line 5. You cannot do that if calculating the hash code is buried in the implementation. Callbacks cannot be inlined either... 2. Simplicity Most APIs take a user defined hashmap_entry structure, so you'd either need two hash functions, or a hash function that can distinguish between the 'entry' and 'key-only' case, e.g. unsigned int my_hash(const struct my_entry *entry, const void *keydata) { if (keydata) return strhash(keydata); else return strhash(entry->key); } Hashmap clients will typically provide small, type safe wrappers around the hashmap API. That's perfect for calculating the hash code without breaking encapsulation. IMO there's no need to make things more complex by requiring another callback. 3. Compound keys The key doesn't always consist of just a single word. E.g. for struct dir_entry, the key is [char *name, int len]. So an API like this: void *hashmap_get(const struct hashmap *map, const void *key) won't do in the general case (unless you require clients to define their own key structure in addition to the entry structure...). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API
Karsten Blees writes: > Hashmap entries are typically looked up by just a key. The hashmap_get() > API expects an initialized entry structure instead, to support compound > keys. This flexibility is currently only needed by find_dir_entry() in > name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other > (currently five) call sites of hashmap_get() have to set up a near emtpy s/emtpy/empty/; > entry structure, resulting in duplicate code like this: > > struct hashmap_entry keyentry; > hashmap_entry_init(&keyentry, hash(key)); > return hashmap_get(map, &keyentry, key); > > Add a hashmap_get_from_hash() API that allows hashmap lookups by just > specifying the key and its hash code, i.e.: > > return hashmap_get_from_hash(map, hash(key), key); While I think the *_get_from_hash() is an improvement over the three-line sequence, I find it somewhat strange that callers of it still must compute the hash code themselves, instead of letting the hashmap itself to apply the user supplied hash function to the key to derive it. After all, the map must know how to compare two entries via a user-supplied cmpfn given at the map initialization time, and the algorithm to derive the hash code falls into the same category, in the sense that both must stay the same during the lifetime of a hashmap, no? Unless there is a situation where you need to be able to initialize a hashmap_entry without knowing which hashmap the entry will eventually be stored, it feels dubious API that exposes to the outside callers hashmap_entry_init() that takes the hash code without taking the hashmap itself. In other words, why isn't hashmap_get() more like this: void *hashmap_get(const struct hashmap *map, const void *key) { struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, map->hash(key)); return *find_entry_ptr(map, &keyentry, key); } with hashmap_entry_init() purely a static helper in hashmap.c? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 3/4] hashmap: add simplified hashmap_get_from_hash() API
Hashmap entries are typically looked up by just a key. The hashmap_get() API expects an initialized entry structure instead, to support compound keys. This flexibility is currently only needed by find_dir_entry() in name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other (currently five) call sites of hashmap_get() have to set up a near emtpy entry structure, resulting in duplicate code like this: struct hashmap_entry keyentry; hashmap_entry_init(&keyentry, hash(key)); return hashmap_get(map, &keyentry, key); Add a hashmap_get_from_hash() API that allows hashmap lookups by just specifying the key and its hash code, i.e.: return hashmap_get_from_hash(map, hash(key), key); Signed-off-by: Karsten Blees --- Documentation/technical/api-hashmap.txt | 14 ++ builtin/describe.c | 4 +--- diffcore-rename.c | 7 +++ hashmap.h | 8 name-hash.c | 5 ++--- test-hashmap.c | 11 +++ 6 files changed, 31 insertions(+), 18 deletions(-) diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index dc21a7c..f9215d6 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -118,6 +118,20 @@ hashmap_entry) that has at least been initialized with the proper hash code If an entry with matching hash code is found, `key` and `keydata` are passed to `hashmap_cmp_fn` to decide whether the entry matches the key. +`void *hashmap_get_from_hash(const struct hashmap *map, unsigned int hash, const void *keydata)`:: + + Returns the hashmap entry for the specified hash code and key data, + or NULL if not found. ++ +`map` is the hashmap structure. ++ +`hash` is the hash code of the entry to look up. ++ +If an entry with matching hash code is found, `keydata` is passed to +`hashmap_cmp_fn` to decide whether the entry matches the key. The +`entry_or_key` parameter points to a bogus hashmap_entry structure that +should not be used in the comparison. + `void *hashmap_get_next(const struct hashmap *map, const void *entry)`:: Returns the next equal hashmap entry, or NULL if not found. This can be diff --git a/builtin/describe.c b/builtin/describe.c index 57e84c8..ee6a3b9 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -58,9 +58,7 @@ static int commit_name_cmp(const struct commit_name *cn1, static inline struct commit_name *find_commit_name(const unsigned char *peeled) { - struct commit_name key; - hashmap_entry_init(&key, sha1hash(peeled)); - return hashmap_get(&names, &key, peeled); + return hashmap_get_from_hash(&names, sha1hash(peeled), peeled); } static int replace_name(struct commit_name *e, diff --git a/diffcore-rename.c b/diffcore-rename.c index 6fa97d4..2e44a37 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -257,15 +257,14 @@ static int find_identical_files(struct hashmap *srcs, int renames = 0; struct diff_filespec *target = rename_dst[dst_index].two; - struct file_similarity *p, *best, dst; + struct file_similarity *p, *best = NULL; int i = 100, best_score = -1; /* * Find the best source match for specified destination. */ - best = NULL; - hashmap_entry_init(&dst, hash_filespec(target)); - for (p = hashmap_get(srcs, &dst, NULL); p; p = hashmap_get_next(srcs, p)) { + p = hashmap_get_from_hash(srcs, hash_filespec(target), NULL); + for (; p; p = hashmap_get_next(srcs, p)) { int score; struct diff_filespec *source = p->filespec; diff --git a/hashmap.h b/hashmap.h index ed5425a..12f0668 100644 --- a/hashmap.h +++ b/hashmap.h @@ -68,6 +68,14 @@ extern void *hashmap_put(struct hashmap *map, void *entry); extern void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata); +static inline void *hashmap_get_from_hash(const struct hashmap *map, + unsigned int hash, const void *keydata) +{ + struct hashmap_entry key; + hashmap_entry_init(&key, hash); + return hashmap_get(map, &key, keydata); +} + /* hashmap_iter functions */ extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter); diff --git a/name-hash.c b/name-hash.c index 49fd508..702cd05 100644 --- a/name-hash.c +++ b/name-hash.c @@ -213,12 +213,11 @@ struct cache_entry *index_dir_exists(struct index_state *istate, const char *nam struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int icase) { struct cache_entry *ce; - struct hashmap_entry key; lazy_init_name_hash(istate); - hashmap_entry_init(&key, memihash(name, namelen)); - ce = hashmap_get(&istate->name_hash, &key, NULL); + ce = hashmap_get_from_hash(&istate->name_hash, +