Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-07 Thread Junio C Hamano
Tanay Abhra  writes:

> On 7/4/2014 2:47 PM, Matthieu Moy wrote:
>> Tanay Abhra  writes:
>> 
>>> Hi,
>>>
>>> I have cooked up a single hashmap implementation. What are your
>>> thoughts about it?
>> 
>> I had a quick look, and it looks good to me. I'll make a more detailed
>> review when you send the next series.
>>
>
> One more doubt, does  for every value has any use other than
> raising semantic error in typespecific API functions.
>
> For example, if we call git_config_get_int(foo.bar), we can show to the user
> "value not a int at ". Other than that I cannot think of
> any other use of it. Currently `git_config_int` dies if value put for
> parsing is not an int.
>
> Junio and Karsten, both raised the point for saving , but I 
> can't
> find any use cases for it other than what I mentioned above.

Yes, error reporting is what the pair needs to be kept for.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-04 Thread Matthieu Moy
Tanay Abhra  writes:

> One more doubt, does  for every value has any use other than
> raising semantic error in typespecific API functions.

I don't see any other. My suggestion would be: ignore this for now, it's
not needed to get a new API that has at least as many features as the
existing one.

It's easy to add to the data-structure afterwards (although a bit less
easy to add to the code that fills-in the data structure).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-04 Thread Tanay Abhra


On 7/4/2014 2:47 PM, Matthieu Moy wrote:
> Tanay Abhra  writes:
> 
>> Hi,
>>
>> I have cooked up a single hashmap implementation. What are your
>> thoughts about it?
> 
> I had a quick look, and it looks good to me. I'll make a more detailed
> review when you send the next series.
>

One more doubt, does  for every value has any use other than
raising semantic error in typespecific API functions.

For example, if we call git_config_get_int(foo.bar), we can show to the user
"value not a int at ". Other than that I cannot think of
any other use of it. Currently `git_config_int` dies if value put for
parsing is not an int.

Junio and Karsten, both raised the point for saving , but I 
can't
find any use cases for it other than what I mentioned above.

Thanks.
Tanay.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-04 Thread Matthieu Moy
Tanay Abhra  writes:

> Hi,
>
> I have cooked up a single hashmap implementation. What are your
> thoughts about it?

I had a quick look, and it looks good to me. I'll make a more detailed
review when you send the next series.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-03 Thread Tanay Abhra
Hi,

I have cooked up a single hashmap implementation. What are your
thoughts about it? I must say it is much cleaner than the
previous attempt and it passes all the tests.
I will send the revised patch tomorrow with the corrected
documentation, till then please say so if you prefer this one
or the multi hashmap one.

Cheers,
Tanay Abhra.

-- >8 --
diff --git a/Makefile b/Makefile
index 07ea105..d503f78 100644
--- a/Makefile
+++ b/Makefile
@@ -777,6 +777,7 @@ LIB_OBJS += commit.o
 LIB_OBJS += compat/obstack.o
 LIB_OBJS += compat/terminal.o
 LIB_OBJS += config.o
+LIB_OBJS += config-hash.o
 LIB_OBJS += connect.o
 LIB_OBJS += connected.o
 LIB_OBJS += convert.o
diff --git a/cache.h b/cache.h
index df65231..2a675f6 100644
--- a/cache.h
+++ b/cache.h
@@ -7,6 +7,7 @@
 #include "advice.h"
 #include "gettext.h"
 #include "convert.h"
+#include "string-list.h"

 #include SHA1_HEADER
 #ifndef git_SHA_CTX
@@ -1325,6 +1326,41 @@ extern int parse_config_key(const char *var,
const char **subsection, int *subsection_len,
const char **key);

+/* config-hash.c */
+
+struct config_set {
+   struct hashmap config_hash;
+   int hash_initialized;
+};
+
+extern struct config_set *git_configset_init(void);
+extern int git_configset_add_file(struct config_set *cs, const char *filename);
+extern int git_configset_get_value(struct config_set *cs, const char *key, 
const char **value);
+extern const struct string_list *git_configset_get_value_multi(struct 
config_set *cs, const char *key);
+extern void git_configset_clear(struct config_set *cs);
+extern void git_configset_iter(struct config_set *cs, config_fn_t fn, void 
*data);
+extern int git_configset_get_string(struct config_set *cs, const char *key, 
const char **dest);
+extern int git_configset_get_int(struct config_set *cs, const char *key, int 
*dest);
+extern int git_configset_get_ulong(struct config_set *cs, const char *key, 
unsigned long *dest);
+extern int git_configset_get_bool(struct config_set *cs, const char *key, int 
*dest);
+extern int git_configset_get_bool_or_int(struct config_set *cs, const char 
*key, int *is_bool, int *dest);
+extern int git_configset_get_maybe_bool(struct config_set *cs, const char 
*key, int *dest);
+extern int git_configset_get_pathname(struct config_set *cs, const char *key, 
const char **dest);
+
+extern int git_config_get_value(const char *key, const char **value);
+extern const struct string_list *git_config_get_value_multi(const char *key);
+extern void git_config_clear(void);
+extern void git_config_iter(config_fn_t fn, void *data);
+extern int git_config_get_string(const char *key, const char **dest);
+extern int git_config_get_int(const char *key, int *dest);
+extern int git_config_get_ulong(const char *key, unsigned long *dest);
+extern int git_config_get_bool(const char *key, int *dest);
+extern int git_config_get_bool_or_int(const char *key, int *is_bool, int 
*dest);
+extern int git_config_get_maybe_bool(const char *key, int *dest);
+extern int git_config_get_pathname(const char *key, const char **dest);
+
+
 extern int committer_ident_sufficiently_given(void);
 extern int author_ident_sufficiently_given(void);

diff --git a/config-hash.c b/config-hash.c
new file mode 100644
index 000..e12bf57
--- /dev/null
+++ b/config-hash.c
@@ -0,0 +1,325 @@
+#include "cache.h"
+#include "hashmap.h"
+#include "string-list.h"
+
+
+/*
+ * Default config_set that contains key-value pairs from the usual set of 
config
+ * config files (i.e repo specific .git/config, user wide ~/.gitconfig and the
+ * global /etc/gitconfig)
+ */
+static struct config_set the_config_set;
+
+struct config_hash_entry {
+   struct hashmap_entry ent;
+   char *key;
+   struct string_list value_list;
+};
+
+static int config_hash_add_value(const char *, const char *, struct hashmap *);
+
+static int config_hash_entry_cmp(const struct config_hash_entry *e1,
+const struct config_hash_entry *e2, const void 
*unused)
+{
+   return strcmp(e1->key, e2->key);
+}
+
+static void config_hash_init(struct hashmap *config_hash)
+{
+   hashmap_init(config_hash, (hashmap_cmp_fn)config_hash_entry_cmp, 0);
+}
+
+static int config_hash_callback(const char *key, const char *value, void *cb)
+{
+   struct config_set *cs = cb;
+   config_hash_add_value(key, value, &cs->config_hash);
+   return 0;
+}
+
+static int add_configset_hash(const char *filename, struct config_set *cs)
+{
+   int ret = 0;
+   if (!cs->hash_initialized)
+   config_hash_init(&cs->config_hash);
+   cs->hash_initialized = 1;
+   ret = git_config_from_file(config_hash_callback, filename, cs);
+   if (!ret)
+   return 0;
+   else {
+   hashmap_free(&cs->config_hash, 1);
+   cs->hash_initialized = 0;
+   return -1;
+   }
+}
+
+static struct config_hash_entry *config_hash_find_entry(const char *key,
+

Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-03 Thread Tanay Abhra


On 7/2/2014 10:30 PM, Junio C Hamano wrote:
> Tanay Abhra  writes:
> 
>> diff --git a/Documentation/technical/api-config.txt 
>> b/Documentation/technical/api-config.txt
>> index 230b3a0..2c02fee 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>> +`git_config_get_value` returns 0 on success, or -1 for no value found.
>> +
>> +`git_config_get_value_multi` allocates and returns a `string_list`
>> +containing all the values for the key passed as parameter, sorted in order
>> +of increasing priority (Note: caller has to call `string_list_clear` on
>> +the returned pointer and then free it).
>> +
>> +`git_config_get_value_multi` returns NULL for no value found.
>> +
>> +`git_config_clear` is provided for resetting and invalidating the cache.
>> +
>> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
>> +`git_config` callback function signature is used for iterating.
> 
> Instead of doing prose above and then enumeration below,
> consistently using the enumeration-style would make the descriptions
> of API functions easier to find and read.  Also show what parameters
> each function takes.  E.g.
> 

Noted.

> 
> A few random thoughts.
> 
>  - Are "a value for the variable is found" and "no value for the
>variable is found" the only possible outcome?  Should somebody
>(not necessarily the calling code) be notified of an error---for
>example, if you opened a config file successfully but later found
>that the file could not be parsed due to a syntax error or an I/O
>error, is it possible the caller of this function to tell, or are
>these just some special cases of "variable not found"?

The syntactical errors when parsing the file are shown when it fails to
construct the hashmap. Whenever a caller calls for a value for the first
time, the file is read and parsed and if it fails during parsing it dies
raising a error specifying the lineno and filename.

Variable not found means that the variable is not present in the file,
nothing more. Note: the hashmap code uses git_config() parsing stack
so whatever error it raises in normal git_config() invocation, it
would be raised here too.

>  - The same goes for the "multi" variant but it is a bit more grave,
>as a function that returns an "int" can later be updated to
>return different negative values to signal different kinds of
>errors, but a function that returns a pointer to string-list can
>only return one kind of NULL ;-)
>

Null pointer just means no variable found in the files. What other kind
of errors may arise?

>  - For the purpose of "git_config_get_value()", what is the value
>returned for core.filemode when the configuration file says
>"[core] filemode\n", i.e. when git_config() callback would get a
>NULL for value to signal a boolean true?

NULL in value pointer with 0 return value denoting variable found.

>  - Is there a reason why you need a separate and new "config_iter"?
>What does it do differently from the good-old git_config() and
>how?  It does not belong to "Querying for Specific Variables"
>anyway.
>

You mentioned previously in the list for a analogue to git_config()
which supports iterating over the keys.
The link is [1] I think, gmane is not working for me currently.

http://thread.gmane.org/gmane.comp.version-control.git/252567

>> +The config API also provides type specific API functions which do conversion
>> +as well as retrieval for the queried variable, including:
>> +
>> +`git_config_get_int`::
>> +Parse the retrieved value to an integer, including unit factors. Dies on
>> +error; otherwise, allocates and copies the integer into the dest parameter.
>> +Returns 0 on success, or 1 if no value was found.
> 
> "allocates and copies"  Is a caller forced to do something like
> this?
> 
>   int *val;
> 
>   if (!git_config_get_int("int.one", &val)) {
>   use(*val);
> free(val);
>   }
> 
> I'd expect it to be more like:
> 
>   int val;
> if (!git_config_get_int("int.one", &val))
>   use(val);
>

Yup, you are right, my fault.

> Also, is there a reason why the "not found" signal is "1" (as
> opposed to "-1" for the lower-level get_value() API)?
> 

Many of the type specific functions return -1 for wrongful parsing
like git_config_get_string and git_config_get_maybe_bool, that is
why I changed the return value for such functions.

>> +Custom Configsets
>> +-
>> +
>> +A `config_set` points at an ordered set of config files, each of
>> +which represents what was read and cached in-core from a file.
> 
> This over-specifies the implementation.  Judging from the list of
> API functions below, an implementation is possible without having to
> keep track of what source files were fed in what order (i.e. it can
> just keep a single hash-table

Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Matthieu Moy
Junio C Hamano  writes:

> Tanay Abhra  writes:
>
>> diff --git a/Documentation/technical/api-config.txt 
>> b/Documentation/technical/api-config.txt
>> index 230b3a0..2c02fee 100644
>> --- a/Documentation/technical/api-config.txt
>> +++ b/Documentation/technical/api-config.txt
>> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>>  `git_config_from_file`. This takes the same callback and data parameters
>>  as `git_config`.
>>  
>> +Querying For Specific Variables
>> +---
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_value`
>> +and `git_config_get_value_multi`.They both read values from an internal
>> +cache generated previously from reading the config files.
>> +
>> +`git_config_get_value` takes two parameters,
>> +
>> +- a key string in canonical flat form for which the corresponding value
>
> What is "canonical flat form"?

Actually, it's defined above in the same file (was here before the
patch):

  - the name of the parsed variable. This is in canonical "flat" form: the
section, subsection, and variable segments will be separated by dots,
and the section and variable segments will be all lowercase. E.g.,

But it might make sense to reword the doc, e.g. introduce a glossary
section at the beginning.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Junio C Hamano
Matthieu Moy  writes:

> I don't like the name "the_config_set". It's not the only one. Perhaps
> repo_config_set? (not totally satisfactory as it does not contain only
> the repo, but the repo+user+system)
>
> What do others think?

I actually do like "the_configset", which goes nicely parallel to
"the_index".  Neither is the only one, but most of the time our code
operates on the primary one and "the_frotz" refers to it (and
convenience wrappers that does not specify which set defaults to
it).

> What is the reason to deal with `the_config_set` and other config sets
> differently? You're giving arguments to store `the_config_set` as a
> single hashmap, but what is the reason to store others as multiple
> hashmaps?

Confusion, probably.  "the_configset" should be just a singleton
instance used to store the values we use for the default config
system, but its shape should be exactly the same as the other ones
users can use to read .gitmodules and friends with.

> And actually, I don't completely buy your arguments: having 3 or 4
> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
> could deal with include directives by having ".git/config and included
> files" in a hashmap, "~/.gitconfig and included files" in a second
> hashmap, ...

OK.

> I would personally find it much simpler to have a single hashmap. We'd
> lose the ability to invalidate the cache for only a single file, but I'm
> not sure it's worth the code complexity.

OK, too.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Junio C Hamano
Tanay Abhra  writes:

> diff --git a/Documentation/technical/api-config.txt 
> b/Documentation/technical/api-config.txt
> index 230b3a0..2c02fee 100644
> --- a/Documentation/technical/api-config.txt
> +++ b/Documentation/technical/api-config.txt
> @@ -77,6 +77,75 @@ To read a specific file in git-config format, use
>  `git_config_from_file`. This takes the same callback and data parameters
>  as `git_config`.
>  
> +Querying For Specific Variables
> +---
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal
> +cache generated previously from reading the config files.
> +
> +`git_config_get_value` takes two parameters,
> +
> +- a key string in canonical flat form for which the corresponding value

What is "canonical flat form"?

> +  with the highest priority (i.e. value in the repo config will be
> +  preferred over value in user wide config for the same variable) will
> +  be retrieved.
> +
> +- a pointer to a string which will point to the retrieved value. The caller
> +  should not free or modify the value returned as it is owned by the cache.
> +
> +`git_config_get_value` returns 0 on success, or -1 for no value found.
> +
> +`git_config_get_value_multi` allocates and returns a `string_list`
> +containing all the values for the key passed as parameter, sorted in order
> +of increasing priority (Note: caller has to call `string_list_clear` on
> +the returned pointer and then free it).
> +
> +`git_config_get_value_multi` returns NULL for no value found.
> +
> +`git_config_clear` is provided for resetting and invalidating the cache.
> +
> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

Instead of doing prose above and then enumeration below,
consistently using the enumeration-style would make the descriptions
of API functions easier to find and read.  Also show what parameters
each function takes.  E.g.

`git_config_get_value(const char *key, const char **value)`::
Find the highest-priority value for the
configuration variable `key`, store the pointer to
it in `value` and return 0.  When the configuration
variable `key` is not found, return -1 without
touching `value`.

A few random thoughts.

 - Are "a value for the variable is found" and "no value for the
   variable is found" the only possible outcome?  Should somebody
   (not necessarily the calling code) be notified of an error---for
   example, if you opened a config file successfully but later found
   that the file could not be parsed due to a syntax error or an I/O
   error, is it possible the caller of this function to tell, or are
   these just some special cases of "variable not found"?

 - The same goes for the "multi" variant but it is a bit more grave,
   as a function that returns an "int" can later be updated to
   return different negative values to signal different kinds of
   errors, but a function that returns a pointer to string-list can
   only return one kind of NULL ;-)

 - For the purpose of "git_config_get_value()", what is the value
   returned for core.filemode when the configuration file says
   "[core] filemode\n", i.e. when git_config() callback would get a
   NULL for value to signal a boolean true?

 - Is there a reason why you need a separate and new "config_iter"?
   What does it do differently from the good-old git_config() and
   how?  It does not belong to "Querying for Specific Variables"
   anyway.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

"allocates and copies"  Is a caller forced to do something like
this?

int *val;

if (!git_config_get_int("int.one", &val)) {
use(*val);
free(val);
}

I'd expect it to be more like:

int val;
if (!git_config_get_int("int.one", &val))
use(val);

Also, is there a reason why the "not found" signal is "1" (as
opposed to "-1" for the lower-level get_value() API)?

> +`git_config_get_ulong`::
> +Identical to `git_config_get_int` but for unsigned longs.

Obviously this is not identical but merely "Similar" to.

> +`git_config_bool`::

git_config_get_bool, perhaps?

> +Custom Configsets
> +-
> +
> +A `config_set` points at an ordered set of config files, each of
> +which represents what was read and cached in-core from a file.

This over-specifies the impleme

Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Matthieu Moy
Tanay Abhra  writes:
> On 7/2/2014 2:44 PM, Matthieu Moy wrote:
>> Tanay Abhra  writes:

> Maybe a reworded sentence may work,
> `git_configset_get_value_multi` returns a list of values sorted in order of
> increasing priority (i.e. last match will be at the end of the list.)

OK.

>>> read from usual config files(repo specific .git/config, user wide
>>> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
>>> single hashmap populated using git_config(). The reason for doing so is
>>> twofold, one is to honour include directives, another is to guarantee O(1)
>>> lookups for usual config values, as values are queried for hundred of
>>> times during a run of a git program.
>> 
>> What is the reason to deal with `the_config_set` and other config sets
>> differently? You're giving arguments to store `the_config_set` as a
>> single hashmap, but what is the reason to store others as multiple
>> hashmaps?
>> 
>> And actually, I don't completely buy your arguments: having 3 or 4
>> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
>> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
>> could deal with include directives by having ".git/config and included
>> files" in a hashmap, "~/.gitconfig and included files" in a second
>> hashmap, ...
>> 
>> My guess is that the real argument is "git_config already does what I
>> want and I'm too lazy to change it". And I do consider lazyness as a
>> quality for a computer-scientist ;-).
>> 
>> I would personally find it much simpler to have a single hashmap. We'd
>> lose the ability to invalidate the cache for only a single file, but I'm
>> not sure it's worth the code complexity.
>> 
>
> Point noted. I can take the multiple hashmap route for `the_config_set`.
> Still, since it will be the most used config set in the code and for each
> query it would have to look through n hashmaps hampering performance. I
> can change it if you want but like you, I don't think it is worth the code
> complexity.

That's why my suggestion is to use a single hashmap everywhere.

I don't have strong opinion either way, but whichever way you go, you
should justify the choice better in the commit message.

>>> +The config API also provides type specific API functions which do 
>>> conversion
>>> +as well as retrieval for the queried variable, including:
>>> +
>>> +`git_config_get_int`::
>>> +Parse the retrieved value to an integer, including unit factors. Dies on
>>> +error; otherwise, allocates and copies the integer into the dest parameter.
>>> +Returns 0 on success, or 1 if no value was found.
>> 
>> It seems strange to me to return 1 here, and -1 in git_config_get_value
>> to mean the same thing.
>> 
>
> Noted. Some of the type specific function return -1 on wrong parsing, I will
> put return value 1 for no value found in all of the cases.

I'm not sure I fully get the existing convention. My understanding is
that when the extracted value is returned, -1 is used as a special value
to mean "no value" (e.g. git_config_maybe_bool can return 1, 0 or -1),
but when the extracted value is written to a by-address parameter, then
the return value is 1 or 0.

>>> +static struct hashmap *get_config_hash(const char *filename, struct 
>>> config_set *cs)
>>> +{
>>> +   int i;
>>> +   for(i = 0; i < cs->nr; i++) {
>>> +   if (!strcmp(cs->item[i].filename, filename))
>>> +   return &cs->item[i].config_hash;
>>> +   }
>>> +   return add_config_hash(filename, cs);
>>> +}
>>> +
>>> +static struct config_hash_entry *config_hash_find_entry(const char *key, 
>>> const char* filename,
>>> +   struct config_set *cs)
>> 
>> I don't get the logic here.
>> 
>> Either the caller explicitly manages a config_set variable like
>> 
>>   config_set my_set = git_configset_init();
>>   git_configset_add_file(my_set, "my-file");
>>   git_configset_get_value(my_set, "some.variable.name");
>> 
>> Or there's an implicit singleton mapping files to hashmaps to allow the
>> user to say
>> 
>>   git_configset_get_value("my-file", "some.variable.name");
>> 
>> without asking the user to explicitly manipulate config_set variables.
>> 
>> It seems to me that your solution is a bad compromise between both
>> solutions: you do require the user to manipulate config_set variables,
>> but you also require a filename to look for the right entry in the list.
>> 
>> Did you miss the end of Junio's message:
>> 
>>   http://thread.gmane.org/gmane.comp.version-control.git/252567
>> 
>> ?
>>
>
> This is an internal function which is used to iterate over every `config_set`
> item which contains a hashmap and filename as an identifier.
> The exposed API which I documented above doesn't require the user to pass the
> filename when querying for a value.
>
> The API works like this, suppose there are two files,
> A.config  B.config
> [foo] [foo]
> a = b a = d
> a = c a = e
>
> config_set *my_set = git_config

Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Tanay Abhra


On 7/2/2014 2:44 PM, Matthieu Moy wrote:
> Tanay Abhra  writes:
> 
>> Add a `config_set` construct that points to an ordered set of config files
>> specified by the user, each of which represents what was read and cached
>> in core as hashmaps. Add two external functions `git_configset_get_value`
>> and `git_configset_get_value_multi` for querying from the config sets,
>> which follow `last one wins` semantic(i.e. if there are multiple matches
> 
> Space before (
> 

Noted. Sorry about that.

>> for the queried key in the files of the configset the value returned will
>> the last value in the last matching file of the configset) Add type

> Does this remark also apply to git_configset_get_value_multi? My
> understanding was that "last one wins" applied only to
> git_configset_get_value.
> 

Maybe a reworded sentence may work,
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list.)

>> Add a default `config_set`, `the_config_set` to cache all key-value pairs
> 
> I don't like the name "the_config_set". It's not the only one. Perhaps
> repo_config_set? (not totally satisfactory as it does not contain only
> the repo, but the repo+user+system)
> 
> What do others think?
> 
>> read from usual config files(repo specific .git/config, user wide
>> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
>> single hashmap populated using git_config(). The reason for doing so is
>> twofold, one is to honour include directives, another is to guarantee O(1)
>> lookups for usual config values, as values are queried for hundred of
>> times during a run of a git program.
> 
> What is the reason to deal with `the_config_set` and other config sets
> differently? You're giving arguments to store `the_config_set` as a
> single hashmap, but what is the reason to store others as multiple
> hashmaps?
> 
> And actually, I don't completely buy your arguments: having 3 or 4
> hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
> /etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
> could deal with include directives by having ".git/config and included
> files" in a hashmap, "~/.gitconfig and included files" in a second
> hashmap, ...
> 
> My guess is that the real argument is "git_config already does what I
> want and I'm too lazy to change it". And I do consider lazyness as a
> quality for a computer-scientist ;-).
> 
> I would personally find it much simpler to have a single hashmap. We'd
> lose the ability to invalidate the cache for only a single file, but I'm
> not sure it's worth the code complexity.
> 

Point noted. I can take the multiple hashmap route for `the_config_set`.
Still, since it will be the most used config set in the code and for each
query it would have to look through n hashmaps hampering performance. I
can change it if you want but like you, I don't think it is worth the code
complexity.

>> +Querying For Specific Variables
>> +---
>> +
>> +For programs wanting to query for specific variables in a non-callback
>> +manner, the config API provides two functions `git_config_get_value`
>> +and `git_config_get_value_multi`. They both read values from an internal
>> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
>> +`git_config` callback function signature is used for iterating.
> 
> "Existing" refers to the old state. It's OK in a commit message, but
> won't make sense in the future, when your non-callback API and the old
> callback API will live side by side.
>

Noted.

>> +The config API also provides type specific API functions which do conversion
>> +as well as retrieval for the queried variable, including:
>> +
>> +`git_config_get_int`::
>> +Parse the retrieved value to an integer, including unit factors. Dies on
>> +error; otherwise, allocates and copies the integer into the dest parameter.
>> +Returns 0 on success, or 1 if no value was found.
> 
> It seems strange to me to return 1 here, and -1 in git_config_get_value
> to mean the same thing.
> 

Noted. Some of the type specific function return -1 on wrong parsing, I will
put return value 1 for no value found in all of the cases.

>> +`git_config_bool`::
>> +/* config-hash.c */

>> +#define CONFIG_SET_INIT { NULL, 0, 0 }
>> +
>> +struct config_set {
>> +struct config_set_item *item;
>> +unsigned int nr, alloc;
>> +};
>> +
>> +struct config_set_item {
>> +struct hashmap config_hash;
>> +char *filename;
>> +static struct hashmap *add_config_hash(const char *filename, struct 
>> config_set *cs)
>> +{
>> +int ret = 0;
>> +struct config_set_item *ct;
>> +struct config_set_cb cb;
>> +ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
>> +ct = &cs->item[cs->nr++];
>> +ct->filename = xstrdup(filename);
>> +config_hash_init(&ct->config_hash);
>> +cb.cs = cs;
>> +cb.ct = ct;
>> +/*
>> + * When filename is "default", hashmap is 

Re: [PATCH v4 1/2] add `config_set` API for caching config files

2014-07-02 Thread Matthieu Moy
Tanay Abhra  writes:

> Add a `config_set` construct that points to an ordered set of config files
> specified by the user, each of which represents what was read and cached
> in core as hashmaps. Add two external functions `git_configset_get_value`
> and `git_configset_get_value_multi` for querying from the config sets,
> which follow `last one wins` semantic(i.e. if there are multiple matches

Space before (

> for the queried key in the files of the configset the value returned will
> the last value in the last matching file of the configset) Add type

will _be_ ?

Does this remark also apply to git_configset_get_value_multi? My
understanding was that "last one wins" applied only to
git_configset_get_value.

> Add a default `config_set`, `the_config_set` to cache all key-value pairs

I don't like the name "the_config_set". It's not the only one. Perhaps
repo_config_set? (not totally satisfactory as it does not contain only
the repo, but the repo+user+system)

What do others think?

> read from usual config files(repo specific .git/config, user wide

(You always forget space before '('...)

> ~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
> single hashmap populated using git_config(). The reason for doing so is
> twofold, one is to honour include directives, another is to guarantee O(1)
> lookups for usual config values, as values are queried for hundred of
> times during a run of a git program.

What is the reason to deal with `the_config_set` and other config sets
differently? You're giving arguments to store `the_config_set` as a
single hashmap, but what is the reason to store others as multiple
hashmaps?

And actually, I don't completely buy your arguments: having 3 or 4
hashmaps (.git/config, ~/.gitconfig, ~/.config/git/config and
/etc/gitconfig) would be a O(4) lookup, which is still O(1), and you
could deal with include directives by having ".git/config and included
files" in a hashmap, "~/.gitconfig and included files" in a second
hashmap, ...

My guess is that the real argument is "git_config already does what I
want and I'm too lazy to change it". And I do consider lazyness as a
quality for a computer-scientist ;-).

I would personally find it much simpler to have a single hashmap. We'd
lose the ability to invalidate the cache for only a single file, but I'm
not sure it's worth the code complexity.

> +Querying For Specific Variables
> +---
> +
> +For programs wanting to query for specific variables in a non-callback
> +manner, the config API provides two functions `git_config_get_value`
> +and `git_config_get_value_multi`.They both read values from an internal

Space after .

> +`git_config_iter` gives a way to iterate over the keys in cache. Existing
> +`git_config` callback function signature is used for iterating.

"Existing" refers to the old state. It's OK in a commit message, but
won't make sense in the future, when your non-callback API and the old
callback API will live side by side.

> +The config API also provides type specific API functions which do conversion
> +as well as retrieval for the queried variable, including:
> +
> +`git_config_get_int`::
> +Parse the retrieved value to an integer, including unit factors. Dies on
> +error; otherwise, allocates and copies the integer into the dest parameter.
> +Returns 0 on success, or 1 if no value was found.

It seems strange to me to return 1 here, and -1 in git_config_get_value
to mean the same thing.

> +`git_config_bool`::

Isn't it git_config_get_bool?

> +/* config-hash.c */

You may point to the documentation in the comment too.

> +#define CONFIG_SET_INIT { NULL, 0, 0 }
> +
> +struct config_set {
> + struct config_set_item *item;
> + unsigned int nr, alloc;
> +};
> +
> +struct config_set_item {
> + struct hashmap config_hash;
> + char *filename;

Perhaps const char *?

> +static struct hashmap *add_config_hash(const char *filename, struct 
> config_set *cs)
> +{
> + int ret = 0;
> + struct config_set_item *ct;
> + struct config_set_cb cb;
> + ALLOC_GROW(cs->item, cs->nr + 1, cs->alloc);
> + ct = &cs->item[cs->nr++];
> + ct->filename = xstrdup(filename);
> + config_hash_init(&ct->config_hash);
> + cb.cs = cs;
> + cb.ct = ct;
> + /*
> +  * When filename is "default", hashmap is constructed from the usual 
> set of
> +  * config files (i.e repo specific .git/config, user wide ~/.gitconfig 
> and the
> +  * global /etc/gitconfig), used in `the_config_set`
> +  */
> + if (!strcmp(filename, "default"))

How does one read a file actually called "default" with this API?

More generally, why do you need a special-case here? I think it would be
much better to leave this part unaware of the default, and have a
separate function like create_repo_config_hash (or
create_the_config_hash) that would call git_config(...). There actually
isn't much shared code between the "default" case and the others in your
function.

> +

[PATCH v4 1/2] add `config_set` API for caching config files

2014-07-01 Thread Tanay Abhra
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a `config_set` construct that points to an ordered set of config files
specified by the user, each of which represents what was read and cached
in core as hashmaps. Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets,
which follow `last one wins` semantic(i.e. if there are multiple matches
for the queried key in the files of the configset the value returned will
the last value in the last matching file of the configset) Add type
specific query functions like `git_configset_get_bool` and similar.

Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files(repo specific .git/config, user wide
~/.gitconfig and the global /etc/gitconfig). `the_config_set` uses a
single hashmap populated using git_config(). The reason for doing so is
twofold, one is to honour include directives, another is to guarantee O(1)
lookups for usual config values, as values are queried for hundred of
times during a run of a git program.
Add two external functions `git_config_get_value` and 
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.

Signed-off-by: Tanay Abhra 
---
 Documentation/technical/api-config.txt | 144 
 Makefile   |   1 +
 cache.h|  41 
 config-hash.c  | 405 +
 config.c   |   3 +
 5 files changed, 594 insertions(+)
 create mode 100644 config-hash.c

diff --git a/Documentation/technical/api-config.txt 
b/Documentation/technical/api-config.txt
index 230b3a0..2c02fee 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -77,6 +77,75 @@ To read a specific file in git-config format, use
 `git_config_from_file`. This takes the same callback and data parameters
 as `git_config`.
 
+Querying For Specific Variables
+---
+
+For programs wanting to query for specific variables in a non-callback
+manner, the config API provides two functions `git_config_get_value`
+and `git_config_get_value_multi`.They both read values from an internal
+cache generated previously from reading the config files.
+
+`git_config_get_value` takes two parameters,
+
+- a key string in canonical flat form for which the corresponding value
+  with the highest priority (i.e. value in the repo config will be
+  preferred over value in user wide config for the same variable) will
+  be retrieved.
+
+- a pointer to a string which will point to the retrieved value. The caller
+  should not free or modify the value returned as it is owned by the cache.
+
+`git_config_get_value` returns 0 on success, or -1 for no value found.
+
+`git_config_get_value_multi` allocates and returns a `string_list`
+containing all the values for the key passed as parameter, sorted in order
+of increasing priority (Note: caller has to call `string_list_clear` on
+the returned pointer and then free it).
+
+`git_config_get_value_multi` returns NULL for no value found.
+
+`git_config_clear` is provided for resetting and invalidating the cache.
+
+`git_config_iter` gives a way to iterate over the keys in cache. Existing
+`git_config` callback function signature is used for iterating.
+
+The config API also provides type specific API functions which do conversion
+as well as retrieval for the queried variable, including:
+
+`git_config_get_int`::
+Parse the retrieved value to an integer, including unit factors. Dies on
+error; otherwise, allocates and copies the integer into the dest parameter.
+Returns 0 on success, or 1 if no value was found.
+
+`git_config_get_ulong`::
+Identical to `git_config_get_int` but for unsigned longs.
+
+`git_config_bool`::
+Parse the retrieved value into a boolean value, respecting keywords like
+"true" and "false". Integer values are converted into true/false values
+(when they are non-zero or zero, respectively). Other values cause a die().
+If parsing is successful, allocates and copies the result into the dest
+parameter. Returns 0 on success, or 1 if no value is found.
+
+`git_config_get_bool_or_int`::
+Same as `git_config_get_bool`, except that integers are copied as-is, and
+an `is_bool` flag is unset.
+
+`git_config_get_maybe_bool`::
+Same as `git_config_get_bool`, except that it returns -1 on error rather
+than dying.
+
+`git_config_get_string`::
+Allocates and copies the retrieved value string into the `dest` parameter;
+if NULL string is given, prints an error message and ret