Re: [PATCH v8 1/2] add `config_set` API for caching config-like files

2014-07-15 Thread Tanay Abhra


On 7/11/2014 7:51 PM, Matthieu Moy wrote:
 +`int git_configset_add_file(struct config_set *cs, const char *filename)`::
 +
 +Parses the file and adds the variable-value pairs to the `config_set`,
 +dies if there is an error in parsing the file.
 
 The return value is undocumented.
 
 If I read correctly, the only codepath from this to a syntax error sets
 die_on_error, hence dies if there is an error in parsing the file is
 correct.
 
 Still, there are errors like unreadable file or no such file that do
 not die (nor emit any error message, which is not very good for the
 user), and lead to returning -1 here.
 
 I'm not sure this distinction is right (why die on syntax error and
 continue running on unreadable file?).
 

I checked the whole codebase and in all of the cases if they cannot read a file
they return -1 and continue running. So, I have updated the API to document the
return value.

I think if the file is unreadable. we must continue running as no harm has been
done yet, worse is parsing a file with wrong syntax which can cause reading
wrong config values. So the decision to die on syntax error sounds alright
to me.

Cheers,
Tanay Abhra.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/2] add `config_set` API for caching config-like files

2014-07-15 Thread Matthieu Moy
Tanay Abhra tanay...@gmail.com writes:

 I checked the whole codebase and in all of the cases if they cannot read a 
 file
 they return -1 and continue running.

More precisely: in git_config_from_file, any fopen failure results in
return -1. But in git_config_early (one caller of
git_config_from_file()), the file is checked before access, e.g.:

if (git_config_system()  !access_or_die(git_etc_gitconfig(), R_OK, 
0)) {
ret += git_config_from_file(fn, git_etc_gitconfig(),
data);
found += 1;
}

Essentially, if a config file does not exist, it's skipped (obviously
desirable), but if some really weird error occur (if err == ENOENT ||
err == ENOTDIR || ((flag  ACCESS_EACCES_OK)  err == EACCES is false,
from access_eacces_ok() in wrapper.c), then the process dies.

Permission denied errors are allowed for user-wide config, but not for
others. Read the log for commit 4698c8feb for more details.

Anyway, this is the part of the code you're not touching.

(I actually consider it as a bug that git config --file no-such-file
foo.bar and git config --file unreadable-file foo.bar fail without
error message, but your commit does not change this).

 I think if the file is unreadable. we must continue running as no harm has 
 been
 done yet, worse is parsing a file with wrong syntax which can cause reading
 wrong config values. So the decision to die on syntax error sounds alright
 to me.

In the case of git_config_check_init(), I think it makes sense to die,
because file accesses are protected with access_or_die(), so the return
value can be negative only if something really went wrong.

If you chose not to die, then you should check the return value in the
callers of git_config_check_init().

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/2] add `config_set` API for caching config-like files

2014-07-11 Thread Matthieu Moy
Hi,

I had a closer look at error management (once more, sorry: I should have
done this earlier...), and it seems to me that:

* Not all errors are managed properly

* Most error cases are untested

Among the cases I can think of:

* Syntax error when parsing the file

* Non-existant file

* Unreadable file (chmod -r)

Tanay Abhra tanay...@gmail.com writes:

 +`int git_configset_add_file(struct config_set *cs, const char *filename)`::
 +
 + Parses the file and adds the variable-value pairs to the `config_set`,
 + dies if there is an error in parsing the file.

The return value is undocumented.

If I read correctly, the only codepath from this to a syntax error sets
die_on_error, hence dies if there is an error in parsing the file is
correct.

Still, there are errors like unreadable file or no such file that do
not die (nor emit any error message, which is not very good for the
user), and lead to returning -1 here.

I'm not sure this distinction is right (why die on syntax error and
continue running on unreadable file?).

In any case, it should be documented and tested. I'll send a fixup patch
with a few more example tests (probably insufficient).

 +static int git_config_check_init(void)
 +{
 + int ret = 0;
 + if (the_config_set.hash_initialized)
 + return 0;
 + configset_init(the_config_set);
 + ret = git_config(config_hash_callback, the_config_set);
 + if (ret = 0)
 + return 0;
 + else {
 + hashmap_free(the_config_set.config_hash, 1);
 + the_config_set.hash_initialized = 0;
 + return -1;
 + }
 +}

We have the same convention for errors here. But a more serious issue is
that the return value of this function is ignored most of the time.

It seems git_config should never return a negative value, as it calls
git_config_with_options - git_config_early, which checks for file
existance and permission before calling git_config_from_file. Indeed,
Git's tests still pass after this:

--- a/config.c
+++ b/config.c
@@ -1225,7 +1225,10 @@ int git_config_with_options(config_fn_t fn, void *data,
 
 int git_config(config_fn_t fn, void *data)
 {
-   return git_config_with_options(fn, data, NULL, 1);
+   int ret = git_config_with_options(fn, data, NULL, 1);
+   if (ret  0)
+   die(Negative return value in git_config);
+   return ret;
 }

Still, we can imagine cases like race condition between access_or_die()
and git_config_from_file() in

if (git_config_system()  !access_or_die(git_etc_gitconfig(), R_OK, 
0)) {
ret += git_config_from_file(fn, git_etc_gitconfig(),
data);
found += 1;
}

where the function would indeed return -1. In any case, either we
consider that git_config should never return -1, and we should die in
this case, or we consider that it may happen, and that the else branch
of the if/else above is not dead code, and then we can't just ignore the
return value.

I think we should just do something like this:

diff --git a/config.c b/config.c
index 74adbbd..5c023e8 100644
--- a/config.c
+++ b/config.c
@@ -1428,7 +1428,7 @@ int git_configset_get_pathname(struct config_set *cs, 
const char *key, const cha
return 1;
 }
 
-static int git_config_check_init(void)
+static void git_config_check_init(void)
 {
int ret = 0;
if (the_config_set.hash_initialized)
@@ -1437,11 +1437,8 @@ static int git_config_check_init(void)
ret = git_config(config_hash_callback, the_config_set);
if (ret = 0)
return 0;
-   else {
-   hashmap_free(the_config_set.config_hash, 1);
-   the_config_set.hash_initialized = 0;
-   return -1;
-   }
+   else
+   die(Unknown error when parsing one of the configuration 
files.);
 }
 
If not, a comment should explain what the else branch corresponds to,
and why/when the return value can be safely ignored.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8 1/2] add `config_set` API for caching config-like files

2014-07-11 Thread Junio C Hamano
Tanay Abhra tanay...@gmail.com writes:

 diff --git a/config.c b/config.c
 index ba882a1..aa58275 100644
 --- a/config.c
 +++ b/config.c
 @@ -9,6 +9,8 @@
  #include exec_cmd.h
  #include strbuf.h
  #include quote.h
 +#include hashmap.h
 +#include string-list.h
  
  struct config_source {
   struct config_source *prev;
 @@ -33,10 +35,23 @@ struct config_source {
   long (*do_ftell)(struct config_source *c);
  };
  
 +struct config_hash_entry {
 + struct hashmap_entry ent;
 + char *key;
 + struct string_list value_list;
 +};
 +
  static struct config_source *cf;
  
  static int zlib_compression_seen;
  
 +/*
 + * Default config_set that contains key-value pairs from the usual set of 
 config
 + * config files (i.e repo specific .git/config, user wide ~/.gitconfig, XDG
 + * config file and the global /etc/gitconfig)
 + */
 +static struct config_set the_config_set;
 +
  static int config_file_fgetc(struct config_source *conf)
  {
   return fgetc(conf-u.file);
 @@ -1212,6 +1227,284 @@ int git_config(config_fn_t fn, void *data)
   return git_config_with_options(fn, data, NULL, 1);
  }
  
 +static int config_hash_add_value(const char *, const char *, struct hashmap 
 *);

This is a funny forward declaration.  If you define add-file and
friends that modify the config-set _after_ you define find-entry and
friends that are used to look-up, would you still need to do a
forward declaration?

In any case, please give names to these first two parameters as what
they are for can be unclear; because they are of the same type,
there is one less clue than there usually are.

The function signature makes it sound as if this is not specific to
config; what takes a hashmap, a key and a value and is called add
is a function to add key, value pair to a hashmap.  Why doesn't it
take struct config_set?  In other words, I would have expected

static int config_set_add(struct config_set *, const char *key, const char 
*value)

instead.  Not a complaint, but is puzzled why you chose not to do
so.

I suspect almost everywhere in this patch, you would want to do
s/config_hash/config_set/.  s/config_hash_entry/config_set_element/
might be a good idea, too.  You have the concept of the config
set, and each element of that set is a config-set element, not a
config-hash entry, isn't it?

 +static int config_hash_entry_cmp(const struct config_hash_entry *e1,
 +  const struct config_hash_entry *e2, const void 
 *unused)
 +{
 + return strcmp(e1-key, e2-key);
 +}
 +
 +static void configset_init(struct config_set *cs)
 +{
 + if (!cs-hash_initialized) {
 + hashmap_init(cs-config_hash, 
 (hashmap_cmp_fn)config_hash_entry_cmp, 0);
 + cs-hash_initialized = 1;
 + }
 +}

Have uninitializer here, immediately after you defined the initializer.

 +static int config_hash_callback(const char *key, const char *value, void *cb)
 +{
 + struct config_set *cs = cb;
 + config_hash_add_value(key, value, cs-config_hash);
 + return 0;
 +}
 +
 +int git_configset_add_file(struct config_set *cs, const char *filename)
 +{
 + int ret = 0;
 + configset_init(cs);
 + ret = git_config_from_file(config_hash_callback, filename, cs);
 + if (!ret)
 + return 0;
 + else {
 + hashmap_free(cs-config_hash, 1);
 + cs-hash_initialized = 0;
 + return -1;

Does calling configset_clear() do too much for the purpose of this
error path?  In other words, is it deliberate that you do not touch
any string-list items you may have accumulated by calling the
callback?

 +static struct config_hash_entry *config_hash_find_entry(const char *key,
 + struct hashmap 
 *config_hash)
 +{
 + struct config_hash_entry k;
 + struct config_hash_entry *found_entry;
 + char *normalized_key;
 + int ret;
 + /*
 +  * `key` may come from the user, so normalize it before using it
 +  * for querying entries from the hashmap.
 +  */
 + ret = git_config_parse_key(key, normalized_key, NULL);
 +
 + if (ret)
 + return NULL;
 +
 + hashmap_entry_init(k, strhash(normalized_key));
 + k.key = normalized_key;
 + found_entry = hashmap_get(config_hash, k, NULL);
 + free(normalized_key);
 + return found_entry;
 +}
 +
 +static struct string_list *configset_get_list(const char *key, struct 
 config_set *cs)
 +{
 + struct config_hash_entry *e = config_hash_find_entry(key, 
 cs-config_hash);
 + return e ? e-value_list : NULL;
 +}
 +
 +static int config_hash_add_value(const char *key, const char *value, struct 
 hashmap *config_hash)
 +{
 + struct config_hash_entry *e;
 + e = config_hash_find_entry(key, config_hash);
 + /*
 +  * Since the keys are being fed by git_config*() callback mechanism, 
 they
 +  * are already normalized. So simply add them without any further 
 munging.
 +  */
 + if (!e) {
 +   

[PATCH v8 1/2] add `config_set` API for caching config-like files

2014-07-10 Thread Tanay Abhra
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.

Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.

Add two external functions `git_config_get_value` and 
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.

Signed-off-by: Tanay Abhra tanay...@gmail.com
---
 Documentation/technical/api-config.txt | 134 +++
 cache.h|  31 
 config.c   | 296 +
 3 files changed, 461 insertions(+)

diff --git a/Documentation/technical/api-config.txt 
b/Documentation/technical/api-config.txt
index 230b3a0..bb43830 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -77,6 +77,81 @@ To read a specific file in git-config format, use
 `git_config_from_file`. This takes the same callback and data parameters
 as `git_config`.
 
+Querying For Specific Variables
+---
+
+For programs wanting to query for specific variables in a non-callback
+manner, the config API provides two functions `git_config_get_value`
+and `git_config_get_value_multi`. They both read values from an internal
+cache generated previously from reading the config files.
+
+`int git_config_get_value(const char *key, const char **value)`::
+
+   Finds the highest-priority value for the configuration variable `key`,
+   stores the pointer to it in `value` and returns 0. When the
+   configuration variable `key` is not found, returns 1 without touching
+   `value`. The caller should not free or modify `value`, as it is owned
+   by the cache.
+
+`const struct string_list *git_config_get_value_multi(const char *key)`::
+
+   Finds and returns the value list, sorted in order of increasing priority
+   for the configuration variable `key`. When the configuration variable
+   `key` is not found, returns NULL. The caller should not free or modify
+   the returned pointer, as it is owned by the cache.
+
+`void git_config_clear(void)`::
+
+   Resets and invalidates the config cache.
+
+The config API also provides type specific API functions which do conversion
+as well as retrieval for the queried variable, including:
+
+`int git_config_get_int(const char *key, int *dest)`::
+
+   Finds and parses the value to an integer for the configuration variable
+   `key`. Dies on error; otherwise, stores the value of the parsed integer 
in
+   `dest` and returns 0. When the configuration variable `key` is not 
found,
+   returns 1 without touching `dest`.
+
+`int git_config_get_ulong(const char *key, unsigned long *dest)`::
+
+   Similar to `git_config_get_int` but for unsigned longs.
+
+`int git_config_get_bool(const char *key, int *dest)`::
+
+   Finds and parses the value into a boolean value, for the configuration
+   variable `key` respecting keywords like true and false. Integer
+   values are converted into true/false values (when they are non-zero or
+   zero, respectively). Other values cause a die(). If parsing is 
successful,
+   stores the value of the parsed result in `dest` and returns 0. When the
+   configuration variable `key` is not found, returns 1 without touching
+   `dest`.
+
+`int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest)`::
+
+   Similar to `git_config_get_bool`, except that integers are copied as-is,
+   and `is_bool` flag is unset.
+
+`int git_config_get_maybe_bool(const char *key, int *dest)`::
+
+   Similar to `git_config_get_bool`, except that it returns -1 on error
+   rather than dying.
+
+`int git_config_get_string(const char *key, const char **dest)`::
+
+