Re: [PATCH v1] config: Add hashtable for config parsing & retrival
One other minor point... On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra wrote: > Subject: config: Add hashtable for config parsing & retrival s/retrival/retrieval/ > Add a hash table to cache all key-value pairs read from config files > (repo specific .git/config, user wide ~/.gitconfig and the global > /etc/gitconfig). Add two external functions `git_config_get_string` and > `git_config_get_string_multi` for querying in a non-callback manner from the > hash table. > > Signed-off-by: Tanay Abhra -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
On Tue, Jun 10, 2014 at 8:35 AM, Tanay Abhra wrote: > Thanks for the review, Eric. I have replied to your comments below, > I will try to reply early and more promptly now. Thanks for responding. More below. > On 06/10/2014 04:27 AM, Eric Sunshine wrote: >>> --- >>> diff --git a/config.c b/config.c >>> index a30cb5c..6f6b04e 100644 >>> --- a/config.c >>> +++ b/config.c >>> @@ -9,6 +9,8 @@ >>> #include "exec_cmd.h" >>> #include "strbuf.h" >>> #include "quote.h" >>> +#include "hashmap.h" >>> +#include "string-list.h" >>> >>> struct config_source { >>> struct config_source *prev; >>> @@ -37,6 +39,120 @@ static struct config_source *cf; >>> >>> static int zlib_compression_seen; >>> >>> +struct config_cache_entry { >>> + struct hashmap_entry ent; >>> + char *key; >>> + struct string_list *value_list; >> >> Same question as in my v1 and v2 reviews [1][2], and reiterated by >> Matthieu [3]: Why is 'value_list' a pointer to a structure rather than >> just a structure? > > Sorry for the lack of response on this part. I thought choosing a pointer to a > structure or just the structure itself was a stylistic choice. Each point or question raised in a review is meaningful to the reviewer and deserves some sort of response or consideration, even if you don't agree with the reviewer's opinion. It's okay to have stylistic or other preferences (and to argue for them), but unless you voice them, the reviewer is likely to conclude that his review effort is wasted. In this case, it's not a stylistic consideration which prompted the question, but practical concern. (More on that below.) > Since most of the > functions just pass the pointer to this struct, I thought it would be more > natural to > use it. But I have changed my mind on this one. In the next iteration I will > be using > a plane old struct. There are practical reasons for choosing a structure rather than a pointer to a structure. Problems with a pointer include: - Expense of an extra (unnecessary) heap allocation. You allocate both config_cache_entry and string_list on the heap for each entry being inserted into the hash, rather than allocating only a config_cache_entry. - Potential additional heap fragmentation. - Increased possibility of leaking memory. The current implementation of config_cache_free() drives this point home: although it correctly clears the string_list, it leaks the heap-allocated string_list itself. - Potentially misleading readers of the code into thinking that there is an important reason for the string_list to be allocated separately from its containing object (for instance, "is the string list shared?" or "does the string list need to exist beyond the lifetime of the config_cache_entry?"). >>> +} >>> + >>> +static void config_cache_init(struct hashmap *config_cache) >>> +{ >>> + hashmap_init(config_cache, (hashmap_cmp_fn) >>> config_cache_entry_cmp_case, 0); >> >> In his review [4], Peff suggested giving config_cache_entry_cmp_case() >> the correct hashmap_cmp_fn signature so that this cast can be dropped. >> It's not clear whether you disagree with his advice or overlooked or >> forgot about it. If you disagree with his suggestion, it's okay to say >> so and explain why you feel the way you do, but without feedback from >> you, he or another reviewer is going to continue suggesting the same >> change. > > Now on this one, I checked the code thoroughly. Every single hashmap_init() > incantation in git code has a hashmap_cmp_fn cast. That's a good argument for explaining to Peff why you did it this way, but unless you voice this, he will likely feel that you simply ignored his review comment. > I don't think that it is necessary to do so. Is it? There are good reasons for avoiding the cast. Jonathan gives one such reason here [1]. In that case, he let it slide for the reason you mention: existing hashmap clients use the cast. (However, the implication is that it would be nice for someone -- not necessarily you -- to submit a patch series eliminating the casts.) Whether Peff will be swayed by the same argument, only he can answer. (If he doesn't answer, taking [1] into consideration, you'd probably be on safe ground to continue using the cast.) [1]: http://thread.gmane.org/gmane.comp.version-control.git/243818/focus=243917 (see his comments about hashmap_init()). >>> + if (ret) >>> + return NULL; >>> + >>> + hashmap_entry_init(&k, strhash(normalized_key)); >>> + k.key = (char*) key; >> >> This looks fishy. You're hashing based upon 'normalized_key' but then >> comparing against the unnormalized 'key'. Peff's suggestion [4] was to >> store the normalized key in the hash, which means that you should use >> 'normalized_key' for both hashing and comparing. (See additional >> commentary about this issue below in config_cache_set_value().) > > Ouch, this I had corrected in a future commit. But forgot to include in > this patch. [Leaving unsnipped since it
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
On 06/10/2014 04:45 AM, Eric Sunshine wrote: > One additional comment... > > On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra wrote: >> +static int config_cache_set_value(const char *key, const char *value) >> +{ >> + struct hashmap *config_cache; >> + struct config_cache_entry *e; >> + >> + config_cache = get_config_cache(); >> + e = config_cache_find_entry(key); >> + if (!e) { >> + e = xmalloc(sizeof(*e)); >> + hashmap_entry_init(e, strhash(key)); >> + e->key = xstrdup(key); >> + e->value_list = xcalloc(sizeof(struct string_list), 1); > > Order of xcalloc() arguments is incorrect [1]. > Noted. Thanks for pointing it out. > [1]: > http://git.661346.n2.nabble.com/PATCH-00-15-Rearrange-xcalloc-arguments-td7611675.html > >> + e->value_list->strdup_strings = 1; >> + string_list_append(e->value_list, value); >> + hashmap_add(config_cache, e); >> + } else { >> + string_list_append(e->value_list, value); >> + } >> + return 0; >> +} -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
Hi, Thanks for the review, Eric. I have replied to your comments below, I will try to reply early and more promptly now. On 06/10/2014 04:27 AM, Eric Sunshine wrote: >> --- >> diff --git a/Documentation/technical/api-config.txt >> b/Documentation/technical/api-config.txt >> index 230b3a0..5b6e376 100644 >> --- a/Documentation/technical/api-config.txt >> +++ b/Documentation/technical/api-config.txt >> @@ -77,6 +77,24 @@ To read a specific file in git-config format, use >> `git_config_from_file`. This takes the same callback and data parameters >> as `git_config`. >> >> +Querying For Specific Variables >> +--- >> + >> +For programs wanting to query for specific variables in a non-callback >> +manner, the config API provides two functions `git_config_get_string` >> +and `git_config_get_string_multi`. They both take a single parameter, >> + >> +- a key string in canonical flat form for which the corresponding values >> + will be retrieved and returned. > > It's strange to have a bulleted list for a single item. It can be > stated more naturally as a full sentence, without the list. Point Noted. >> +They both read values from an internal cache generated previously from >> +reading the config files. `git_config_get_string` returns the value with >> +the highest priority(i.e. for the same variable value in the repo config >> +will be preferred over value in user wide config). >> + >> +`git_config_get_string_multi` returns a `string_list` containing all the >> +values for that particular key, sorted in order of increasing priority. >> + >> Value Parsing Helpers >> - >> >> diff --git a/config.c b/config.c >> index a30cb5c..6f6b04e 100644 >> --- a/config.c >> +++ b/config.c >> @@ -9,6 +9,8 @@ >> #include "exec_cmd.h" >> #include "strbuf.h" >> #include "quote.h" >> +#include "hashmap.h" >> +#include "string-list.h" >> >> struct config_source { >> struct config_source *prev; >> @@ -37,6 +39,120 @@ static struct config_source *cf; >> >> static int zlib_compression_seen; >> >> +struct config_cache_entry { >> + struct hashmap_entry ent; >> + char *key; >> + struct string_list *value_list; > > Same question as in my v1 and v2 reviews [1][2], and reiterated by > Matthieu [3]: Why is 'value_list' a pointer to a structure rather than > just a structure? > Sorry for the lack of response on this part. I thought choosing a pointer to a structure or just the structure itself was a stylistic choice. Since most of the functions just pass the pointer to this struct, I thought it would be more natural to use it. But I have changed my mind on this one. In the next iteration I will be using a plane old struct. > >> +}; >> + >> +static int hashmap_is_init; >> + >> +static int config_cache_set_value(const char *key, const char *value); >> + >> +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1, >> +const struct config_cache_entry *e2, const >> void *unused) >> +{ >> + return strcmp(e1->key, e2->key); > > As suggested by Peff [4], this comparison is now case-sensitive, > instead of case-insensitive as in the previous version. Rather than > changing the appended '_icase' to '_case', a more typical function > name would be simply config_cache_entry_cmp(). Noted. >> +} >> + >> +static void config_cache_init(struct hashmap *config_cache) >> +{ >> + hashmap_init(config_cache, (hashmap_cmp_fn) >> config_cache_entry_cmp_case, 0); > > In his review [4], Peff suggested giving config_cache_entry_cmp_case() > the correct hashmap_cmp_fn signature so that this cast can be dropped. > It's not clear whether you disagree with his advice or overlooked or > forgot about it. If you disagree with his suggestion, it's okay to say > so and explain why you feel the way you do, but without feedback from > you, he or another reviewer is going to continue suggesting the same > change. Now on this one, I checked the code thoroughly. Every single hashmap_init() incantation in git code has a hashmap_cmp_fn cast. I don't think that it is necessary to do so. Is it? >> +} >> + >> +static int config_cache_callback(const char *key, const char *value, void >> *unused) >> +{ >> + config_cache_set_value(key, value); >> + return 0; >> +} >> + >> +static struct hashmap *get_config_cache(void) >> +{ >> + static struct hashmap config_cache; >> + if (!hashmap_is_init) { >> + config_cache_init(&config_cache); >> + hashmap_is_init = 1; >> + git_config(config_cache_callback, NULL); >> + return &config_cache; > > Why do you 'return' here rather than just falling through to the > 'return' outside the conditional? Noted. >> +static struct config_cache_entry *config_cache_find_entry(const char *key) >> +{ >> + struct hashmap *config_cache; >> + struct config_cache_entry k; >> + char *normalized_key; >> + int
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
On Mon, Jun 9, 2014 at 10:24 AM, Matthieu Moy wrote: > Tanay Abhra writes: >> +struct config_cache_entry { >> + struct hashmap_entry ent; >> + char *key; >> + struct string_list *value_list; >> +}; > > I guess this crossed Eric's remark about the fact that this is a > pointer. > >> +static void config_cache_free(void) > > I didn't look closely enough to make sure there were no memory leak > remaining, but did you check with valgrind --leak-check that it is the > case in practice? It does leak the config_cache_entry::value_list member which was xcalloc()'d in config_cache_set_value(). (I didn't mention it in my reviews because I was expecting 'value_list' to be changed to a plain structure rather than a pointer to a structure.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
One additional comment... On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra wrote: > +static int config_cache_set_value(const char *key, const char *value) > +{ > + struct hashmap *config_cache; > + struct config_cache_entry *e; > + > + config_cache = get_config_cache(); > + e = config_cache_find_entry(key); > + if (!e) { > + e = xmalloc(sizeof(*e)); > + hashmap_entry_init(e, strhash(key)); > + e->key = xstrdup(key); > + e->value_list = xcalloc(sizeof(struct string_list), 1); Order of xcalloc() arguments is incorrect [1]. [1]: http://git.661346.n2.nabble.com/PATCH-00-15-Rearrange-xcalloc-arguments-td7611675.html > + e->value_list->strdup_strings = 1; > + string_list_append(e->value_list, value); > + hashmap_add(config_cache, e); > + } else { > + string_list_append(e->value_list, value); > + } > + return 0; > +} -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra wrote: > Add a hash table to cache all key-value pairs read from config files > (repo specific .git/config, user wide ~/.gitconfig and the global > /etc/gitconfig). Add two external functions `git_config_get_string` and > `git_config_get_string_multi` for querying in a non-callback manner from the > hash table. > > Signed-off-by: Tanay Abhra > --- > diff --git a/Documentation/technical/api-config.txt > b/Documentation/technical/api-config.txt > index 230b3a0..5b6e376 100644 > --- a/Documentation/technical/api-config.txt > +++ b/Documentation/technical/api-config.txt > @@ -77,6 +77,24 @@ To read a specific file in git-config format, use > `git_config_from_file`. This takes the same callback and data parameters > as `git_config`. > > +Querying For Specific Variables > +--- > + > +For programs wanting to query for specific variables in a non-callback > +manner, the config API provides two functions `git_config_get_string` > +and `git_config_get_string_multi`. They both take a single parameter, > + > +- a key string in canonical flat form for which the corresponding values > + will be retrieved and returned. It's strange to have a bulleted list for a single item. It can be stated more naturally as a full sentence, without the list. More below. > +They both read values from an internal cache generated previously from > +reading the config files. `git_config_get_string` returns the value with > +the highest priority(i.e. for the same variable value in the repo config > +will be preferred over value in user wide config). > + > +`git_config_get_string_multi` returns a `string_list` containing all the > +values for that particular key, sorted in order of increasing priority. > + > Value Parsing Helpers > - > > diff --git a/config.c b/config.c > index a30cb5c..6f6b04e 100644 > --- a/config.c > +++ b/config.c > @@ -9,6 +9,8 @@ > #include "exec_cmd.h" > #include "strbuf.h" > #include "quote.h" > +#include "hashmap.h" > +#include "string-list.h" > > struct config_source { > struct config_source *prev; > @@ -37,6 +39,120 @@ static struct config_source *cf; > > static int zlib_compression_seen; > > +struct config_cache_entry { > + struct hashmap_entry ent; > + char *key; > + struct string_list *value_list; Same question as in my v1 and v2 reviews [1][2], and reiterated by Matthieu [3]: Why is 'value_list' a pointer to a structure rather than just a structure? This is a genuine question. As a reviewer, I'm not sure how to interpret your lack of response. I can't tell if you merely overlook or forget the question multiple times; if you don't understand the inquiry; or if you have a strong reason to prefer making this a pointer. If you don't understand the question, it's okay to ask for clarification. If there is a strong reason for this to be a pointer, it should be explained. Without feedback from you, reviewers will continue asking the same question(s) upon each submission. > +}; > + > +static int hashmap_is_init; > + > +static int config_cache_set_value(const char *key, const char *value); > + > +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1, > +const struct config_cache_entry *e2, const > void *unused) > +{ > + return strcmp(e1->key, e2->key); As suggested by Peff [4], this comparison is now case-sensitive, instead of case-insensitive as in the previous version. Rather than changing the appended '_icase' to '_case', a more typical function name would be simply config_cache_entry_cmp(). > +} > + > +static void config_cache_init(struct hashmap *config_cache) > +{ > + hashmap_init(config_cache, (hashmap_cmp_fn) > config_cache_entry_cmp_case, 0); In his review [4], Peff suggested giving config_cache_entry_cmp_case() the correct hashmap_cmp_fn signature so that this cast can be dropped. It's not clear whether you disagree with his advice or overlooked or forgot about it. If you disagree with his suggestion, it's okay to say so and explain why you feel the way you do, but without feedback from you, he or another reviewer is going to continue suggesting the same change. > +} > + > +static int config_cache_callback(const char *key, const char *value, void > *unused) > +{ > + config_cache_set_value(key, value); > + return 0; > +} > + > +static struct hashmap *get_config_cache(void) > +{ > + static struct hashmap config_cache; > + if (!hashmap_is_init) { > + config_cache_init(&config_cache); > + hashmap_is_init = 1; > + git_config(config_cache_callback, NULL); > + return &config_cache; Why do you 'return' here rather than just falling through to the 'return' outside the conditional? > + } > + return &config_cache; > +} > + > +static void config_cache_free(void) > +{ > + struct hashmap *config_cache; > + struct c
Re: [PATCH v1] config: Add hashtable for config parsing & retrival
Tanay Abhra writes: > +the highest priority(i.e. for the same variable value in the repo config ^ missing space. > +struct config_cache_entry { > + struct hashmap_entry ent; > + char *key; > + struct string_list *value_list; > +}; I guess this crossed Eric's remark about the fact that this is a pointer. > +static int hashmap_is_init; I'd call it hashmap_initialized, but that's a matter of taste. > +static void config_cache_free(void) I didn't look closely enough to make sure there were no memory leak remaining, but did you check with valgrind --leak-check that it is the case in practice? > + /* contents of config file has changed, so invalidate the > + * config cache used by non-callback based query functions. > + */ Style: Git usually writes multi-line comments /* * like * this */ (not always applied, but documented in Documentation/CodingGuidelines) (no time for a more detailed review, sorry) -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html