Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Eric Sunshine
One other minor point...

On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra  wrote:
> Subject: config: Add hashtable for config parsing & retrival

s/retrival/retrieval/

> Add a hash table to cache all key-value pairs read from config files
> (repo specific .git/config, user wide ~/.gitconfig and the global
> /etc/gitconfig). Add two external functions `git_config_get_string` and
> `git_config_get_string_multi` for querying in a non-callback manner from the
> hash table.
>
> Signed-off-by: Tanay Abhra 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Eric Sunshine
On Tue, Jun 10, 2014 at 8:35 AM, Tanay Abhra  wrote:
> Thanks for the review, Eric. I have replied to your comments below,
> I will try to reply early and more promptly now.

Thanks for responding. More below.

> On 06/10/2014 04:27 AM, Eric Sunshine wrote:
>>> ---
>>> diff --git a/config.c b/config.c
>>> index a30cb5c..6f6b04e 100644
>>> --- a/config.c
>>> +++ b/config.c
>>> @@ -9,6 +9,8 @@
>>>  #include "exec_cmd.h"
>>>  #include "strbuf.h"
>>>  #include "quote.h"
>>> +#include "hashmap.h"
>>> +#include "string-list.h"
>>>
>>>  struct config_source {
>>> struct config_source *prev;
>>> @@ -37,6 +39,120 @@ static struct config_source *cf;
>>>
>>>  static int zlib_compression_seen;
>>>
>>> +struct config_cache_entry {
>>> +   struct hashmap_entry ent;
>>> +   char *key;
>>> +   struct string_list *value_list;
>>
>> Same question as in my v1 and v2 reviews [1][2], and reiterated by
>> Matthieu [3]: Why is 'value_list' a pointer to a structure rather than
>> just a structure?
>
> Sorry for the lack of response on this part. I thought choosing a pointer to a
> structure or just the structure itself was a stylistic choice.

Each point or question raised in a review is meaningful to the
reviewer and deserves some sort of response or consideration, even if
you don't agree with the reviewer's opinion. It's okay to have
stylistic or other preferences (and to argue for them), but unless you
voice them, the reviewer is likely to conclude that his review effort
is wasted.

In this case, it's not a stylistic consideration which prompted the
question, but practical concern. (More on that below.)

> Since most of the
> functions just pass the pointer to this struct, I thought it would be more 
> natural to
> use it. But I have changed my mind on this one. In the next iteration I will 
> be using
> a plane old struct.

There are practical reasons for choosing a structure rather than a
pointer to a structure. Problems with a pointer include:

- Expense of an extra (unnecessary) heap allocation. You allocate both
config_cache_entry and string_list on the heap for each entry being
inserted into the hash, rather than allocating only a
config_cache_entry.

- Potential additional heap fragmentation.

- Increased possibility of leaking memory. The current implementation
of config_cache_free() drives this point home: although it correctly
clears the string_list, it leaks the heap-allocated string_list
itself.

- Potentially misleading readers of the code into thinking that there
is an important reason for the string_list to be allocated separately
from its containing object (for instance, "is the string list shared?"
or "does the string list need to exist beyond the lifetime of the
config_cache_entry?").

>>> +}
>>> +
>>> +static void config_cache_init(struct hashmap *config_cache)
>>> +{
>>> +   hashmap_init(config_cache, (hashmap_cmp_fn) 
>>> config_cache_entry_cmp_case, 0);
>>
>> In his review [4], Peff suggested giving config_cache_entry_cmp_case()
>> the correct hashmap_cmp_fn signature so that this cast can be dropped.
>> It's not clear whether you disagree with his advice or overlooked or
>> forgot about it. If you disagree with his suggestion, it's okay to say
>> so and explain why you feel the way you do, but without feedback from
>> you, he or another reviewer is going to continue suggesting the same
>> change.
>
> Now on this one, I checked the code thoroughly. Every single hashmap_init()
> incantation in git code has a hashmap_cmp_fn cast.

That's a good argument for explaining to Peff why you did it this way,
but unless you voice this, he will likely feel that you simply ignored
his review comment.

> I don't think that it is necessary to do so. Is it?

There are good reasons for avoiding the cast. Jonathan gives one such
reason here [1]. In that case, he let it slide for the reason you
mention: existing hashmap clients use the cast. (However, the
implication is that it would be nice for someone -- not necessarily
you -- to submit a patch series eliminating the casts.)

Whether Peff will be swayed by the same argument, only he can answer.
(If he doesn't answer, taking [1] into consideration, you'd probably
be on safe ground to continue using the cast.)

[1]: http://thread.gmane.org/gmane.comp.version-control.git/243818/focus=243917
(see his comments about hashmap_init()).

>>> +   if (ret)
>>> +   return NULL;
>>> +
>>> +   hashmap_entry_init(&k, strhash(normalized_key));
>>> +   k.key = (char*) key;
>>
>> This looks fishy. You're hashing based upon 'normalized_key' but then
>> comparing against the unnormalized 'key'. Peff's suggestion [4] was to
>> store the normalized key in the hash, which means that you should use
>> 'normalized_key' for both hashing and comparing. (See additional
>> commentary about this issue below in config_cache_set_value().)
>
> Ouch, this I had corrected in a future commit. But forgot to include in
> this patch.

[Leaving unsnipped since it 

Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Tanay Abhra
On 06/10/2014 04:45 AM, Eric Sunshine wrote:
> One additional comment...
> 
> On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra  wrote:
>> +static int config_cache_set_value(const char *key, const char *value)
>> +{
>> +   struct hashmap *config_cache;
>> +   struct config_cache_entry *e;
>> +
>> +   config_cache = get_config_cache();
>> +   e = config_cache_find_entry(key);
>> +   if (!e) {
>> +   e = xmalloc(sizeof(*e));
>> +   hashmap_entry_init(e, strhash(key));
>> +   e->key = xstrdup(key);
>> +   e->value_list = xcalloc(sizeof(struct string_list), 1);
> 
> Order of xcalloc() arguments is incorrect [1].
> 

Noted. Thanks for pointing it out.

> [1]: 
> http://git.661346.n2.nabble.com/PATCH-00-15-Rearrange-xcalloc-arguments-td7611675.html
> 
>> +   e->value_list->strdup_strings = 1;
>> +   string_list_append(e->value_list, value);
>> +   hashmap_add(config_cache, e);
>> +   } else {
>> +   string_list_append(e->value_list, value);
>> +   }
>> +   return 0;
>> +}

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Tanay Abhra
Hi,

Thanks for the review, Eric. I have replied to your comments below,
I will try to reply early and more promptly now.

On 06/10/2014 04:27 AM, Eric Sunshine wrote:
>> ---
>> diff --git a/Documentation/technical/api-config.txt 
>> b/Documentation/technical/api-config.txt
>> index 230b3a0..5b6e376 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,24 @@ To read a specific file in git-config format, use
>>  `git_config_from_file`. This takes the same callback and data parameters
>>  as `git_config`.
>>
>> +Querying For Specific Variables
>> +---
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_string`
>> +and `git_config_get_string_multi`. They both take a single parameter,
>> +
>> +- a key string in canonical flat form for which the corresponding values
>> +  will be retrieved and returned.
> 
> It's strange to have a bulleted list for a single item. It can be
> stated more naturally as a full sentence, without the list.

Point Noted.

>> +They both read values from an internal cache generated previously from
>> +reading the config files. `git_config_get_string` returns the value with
>> +the highest priority(i.e. for the same variable value in the repo config
>> +will be preferred over value in user wide config).
>> +
>> +`git_config_get_string_multi` returns a `string_list` containing all the
>> +values for that particular key, sorted in order of increasing priority.
>> +
>>  Value Parsing Helpers
>>  -
>>
>> diff --git a/config.c b/config.c
>> index a30cb5c..6f6b04e 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -9,6 +9,8 @@
>>  #include "exec_cmd.h"
>>  #include "strbuf.h"
>>  #include "quote.h"
>> +#include "hashmap.h"
>> +#include "string-list.h"
>>
>>  struct config_source {
>> struct config_source *prev;
>> @@ -37,6 +39,120 @@ static struct config_source *cf;
>>
>>  static int zlib_compression_seen;
>>
>> +struct config_cache_entry {
>> +   struct hashmap_entry ent;
>> +   char *key;
>> +   struct string_list *value_list;
> 
> Same question as in my v1 and v2 reviews [1][2], and reiterated by
> Matthieu [3]: Why is 'value_list' a pointer to a structure rather than
> just a structure?
> 

Sorry for the lack of response on this part. I thought choosing a pointer to a
structure or just the structure itself was a stylistic choice. Since most of the
functions just pass the pointer to this struct, I thought it would be more 
natural to
use it. But I have changed my mind on this one. In the next iteration I will be 
using
a plane old struct.

> 
>> +};
>> +
>> +static int hashmap_is_init;
>> +
>> +static int config_cache_set_value(const char *key, const char *value);
>> +
>> +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1,
>> +const struct config_cache_entry *e2, const 
>> void *unused)
>> +{
>> +   return strcmp(e1->key, e2->key);
> 
> As suggested by Peff [4], this comparison is now case-sensitive,
> instead of case-insensitive as in the previous version. Rather than
> changing the appended '_icase' to '_case', a more typical function
> name would be simply config_cache_entry_cmp().

Noted.

>> +}
>> +
>> +static void config_cache_init(struct hashmap *config_cache)
>> +{
>> +   hashmap_init(config_cache, (hashmap_cmp_fn) 
>> config_cache_entry_cmp_case, 0);
> 
> In his review [4], Peff suggested giving config_cache_entry_cmp_case()
> the correct hashmap_cmp_fn signature so that this cast can be dropped.
> It's not clear whether you disagree with his advice or overlooked or
> forgot about it. If you disagree with his suggestion, it's okay to say
> so and explain why you feel the way you do, but without feedback from
> you, he or another reviewer is going to continue suggesting the same
> change.

Now on this one, I checked the code thoroughly. Every single hashmap_init()
incantation in git code has a hashmap_cmp_fn cast. I don't think that it is
necessary to do so. Is it?

>> +}
>> +
>> +static int config_cache_callback(const char *key, const char *value, void 
>> *unused)
>> +{
>> +   config_cache_set_value(key, value);
>> +   return 0;
>> +}
>> +
>> +static struct hashmap *get_config_cache(void)
>> +{
>> +   static struct hashmap config_cache;
>> +   if (!hashmap_is_init) {
>> +   config_cache_init(&config_cache);
>> +   hashmap_is_init = 1;
>> +   git_config(config_cache_callback, NULL);
>> +   return &config_cache;
> 
> Why do you 'return' here rather than just falling through to the
> 'return' outside the conditional?

Noted.

>> +static struct config_cache_entry *config_cache_find_entry(const char *key)
>> +{
>> +   struct hashmap *config_cache;
>> +   struct config_cache_entry k;
>> +   char *normalized_key;
>> +   int 

Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Eric Sunshine
On Mon, Jun 9, 2014 at 10:24 AM, Matthieu Moy
 wrote:
> Tanay Abhra  writes:
>> +struct config_cache_entry {
>> + struct hashmap_entry ent;
>> + char *key;
>> + struct string_list *value_list;
>> +};
>
> I guess this crossed Eric's remark about the fact that this is a
> pointer.
>
>> +static void config_cache_free(void)
>
> I didn't look closely enough to make sure there were no memory leak
> remaining, but did you check with valgrind --leak-check that it is the
> case in practice?

It does leak the config_cache_entry::value_list member which was
xcalloc()'d in config_cache_set_value().  (I didn't mention it in my
reviews because I was expecting 'value_list' to be changed to a plain
structure rather than a pointer to a structure.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Eric Sunshine
One additional comment...

On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra  wrote:
> +static int config_cache_set_value(const char *key, const char *value)
> +{
> +   struct hashmap *config_cache;
> +   struct config_cache_entry *e;
> +
> +   config_cache = get_config_cache();
> +   e = config_cache_find_entry(key);
> +   if (!e) {
> +   e = xmalloc(sizeof(*e));
> +   hashmap_entry_init(e, strhash(key));
> +   e->key = xstrdup(key);
> +   e->value_list = xcalloc(sizeof(struct string_list), 1);

Order of xcalloc() arguments is incorrect [1].

[1]: 
http://git.661346.n2.nabble.com/PATCH-00-15-Rearrange-xcalloc-arguments-td7611675.html

> +   e->value_list->strdup_strings = 1;
> +   string_list_append(e->value_list, value);
> +   hashmap_add(config_cache, e);
> +   } else {
> +   string_list_append(e->value_list, value);
> +   }
> +   return 0;
> +}
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-10 Thread Eric Sunshine
On Mon, Jun 9, 2014 at 8:49 AM, Tanay Abhra  wrote:
> Add a hash table to cache all key-value pairs read from config files
> (repo specific .git/config, user wide ~/.gitconfig and the global
> /etc/gitconfig). Add two external functions `git_config_get_string` and
> `git_config_get_string_multi` for querying in a non-callback manner from the
> hash table.
>
> Signed-off-by: Tanay Abhra 
> ---
> diff --git a/Documentation/technical/api-config.txt 
> b/Documentation/technical/api-config.txt
> index 230b3a0..5b6e376 100644
> --- a/Documentation/technical/api-config.txt
> +++ b/Documentation/technical/api-config.txt
> @@ -77,6 +77,24 @@ To read a specific file in git-config format, use
>  `git_config_from_file`. This takes the same callback and data parameters
>  as `git_config`.
>
> +Querying For Specific Variables
> +---
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_string`
> +and `git_config_get_string_multi`. They both take a single parameter,
> +
> +- a key string in canonical flat form for which the corresponding values
> +  will be retrieved and returned.

It's strange to have a bulleted list for a single item. It can be
stated more naturally as a full sentence, without the list.

More below.

> +They both read values from an internal cache generated previously from
> +reading the config files. `git_config_get_string` returns the value with
> +the highest priority(i.e. for the same variable value in the repo config
> +will be preferred over value in user wide config).
> +
> +`git_config_get_string_multi` returns a `string_list` containing all the
> +values for that particular key, sorted in order of increasing priority.
> +
>  Value Parsing Helpers
>  -
>
> diff --git a/config.c b/config.c
> index a30cb5c..6f6b04e 100644
> --- a/config.c
> +++ b/config.c
> @@ -9,6 +9,8 @@
>  #include "exec_cmd.h"
>  #include "strbuf.h"
>  #include "quote.h"
> +#include "hashmap.h"
> +#include "string-list.h"
>
>  struct config_source {
> struct config_source *prev;
> @@ -37,6 +39,120 @@ static struct config_source *cf;
>
>  static int zlib_compression_seen;
>
> +struct config_cache_entry {
> +   struct hashmap_entry ent;
> +   char *key;
> +   struct string_list *value_list;

Same question as in my v1 and v2 reviews [1][2], and reiterated by
Matthieu [3]: Why is 'value_list' a pointer to a structure rather than
just a structure?

This is a genuine question. As a reviewer, I'm not sure how to
interpret your lack of response. I can't tell if you merely overlook
or forget the question multiple times; if you don't understand the
inquiry; or if you have a strong reason to prefer making this a
pointer. If you don't understand the question, it's okay to ask for
clarification. If there is a strong reason for this to be a pointer,
it should be explained. Without feedback from you, reviewers will
continue asking the same question(s) upon each submission.

> +};
> +
> +static int hashmap_is_init;
> +
> +static int config_cache_set_value(const char *key, const char *value);
> +
> +static int config_cache_entry_cmp_case(const struct config_cache_entry *e1,
> +const struct config_cache_entry *e2, const 
> void *unused)
> +{
> +   return strcmp(e1->key, e2->key);

As suggested by Peff [4], this comparison is now case-sensitive,
instead of case-insensitive as in the previous version. Rather than
changing the appended '_icase' to '_case', a more typical function
name would be simply config_cache_entry_cmp().

> +}
> +
> +static void config_cache_init(struct hashmap *config_cache)
> +{
> +   hashmap_init(config_cache, (hashmap_cmp_fn) 
> config_cache_entry_cmp_case, 0);

In his review [4], Peff suggested giving config_cache_entry_cmp_case()
the correct hashmap_cmp_fn signature so that this cast can be dropped.
It's not clear whether you disagree with his advice or overlooked or
forgot about it. If you disagree with his suggestion, it's okay to say
so and explain why you feel the way you do, but without feedback from
you, he or another reviewer is going to continue suggesting the same
change.

> +}
> +
> +static int config_cache_callback(const char *key, const char *value, void 
> *unused)
> +{
> +   config_cache_set_value(key, value);
> +   return 0;
> +}
> +
> +static struct hashmap *get_config_cache(void)
> +{
> +   static struct hashmap config_cache;
> +   if (!hashmap_is_init) {
> +   config_cache_init(&config_cache);
> +   hashmap_is_init = 1;
> +   git_config(config_cache_callback, NULL);
> +   return &config_cache;

Why do you 'return' here rather than just falling through to the
'return' outside the conditional?

> +   }
> +   return &config_cache;
> +}
> +
> +static void config_cache_free(void)
> +{
> +   struct hashmap *config_cache;
> +   struct c

Re: [PATCH v1] config: Add hashtable for config parsing & retrival

2014-06-09 Thread Matthieu Moy
Tanay Abhra  writes:

> +the highest priority(i.e. for the same variable value in the repo config
   ^
missing space.

> +struct config_cache_entry {
> + struct hashmap_entry ent;
> + char *key;
> + struct string_list *value_list;
> +};

I guess this crossed Eric's remark about the fact that this is a
pointer.

> +static int hashmap_is_init;

I'd call it hashmap_initialized, but that's a matter of taste.

> +static void config_cache_free(void)

I didn't look closely enough to make sure there were no memory leak
remaining, but did you check with valgrind --leak-check that it is the
case in practice?

> + /* contents of config file has changed, so invalidate the
> +  * config cache used by non-callback based query functions.
> +  */

Style: Git usually writes multi-line comments

/*
 * like
 * this
 */

(not always applied, but documented in Documentation/CodingGuidelines)

(no time for a more detailed review, sorry)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html