Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Karsten Blees writes: > Am 28.06.2014 08:01, schrieb Matthieu Moy: >> Karsten Blees writes: >> >>> I still don't like that the invalidation is done in git_config_set, though, >>> as >>> this is also used to write completely unrelated files. >> >> I don't get it. It is used to write the config files. Yes, we trigger a >> complete reload instead of just changing this particular value in the >> hashmap, but I do not see "unrelated files" in the picture. >> > > git-cherry-pick, revert: writes '.git/sequencer/opts' (sequencer.c:save_opts) > git-mv : writes '.gitmodules' > (submodule.c:update_path_in_gitmodules) > ...and the submodule's '.git/config' > (submodule.c:connect_work_tree_and_git_dir) In the previous patch series, the cache was only about the system, user and repo config files. So, .git/sequencer/opts, .gitmodules are totally independant. In the latest series, there's a new notion of config_set. .gitmodules, .git/sequencer/opts and the usual config files are in different config sets, and their cache is invalidated independently. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Am 28.06.2014 08:01, schrieb Matthieu Moy: > Karsten Blees writes: > >> I still don't like that the invalidation is done in git_config_set, though, >> as >> this is also used to write completely unrelated files. > > I don't get it. It is used to write the config files. Yes, we trigger a > complete reload instead of just changing this particular value in the > hashmap, but I do not see "unrelated files" in the picture. > git-cherry-pick, revert: writes '.git/sequencer/opts' (sequencer.c:save_opts) git-mv : writes '.gitmodules' (submodule.c:update_path_in_gitmodules) ...and the submodule's '.git/config' (submodule.c:connect_work_tree_and_git_dir) Note that the typical configuration commands git-config/remote/branch seem to call git_config_set* immediately before exit, so no need to invalidate the config cache here either. Which leaves clone, init and push (transport.c:set_upstreams, seems to be just before exit as well). I didn't look further after finding the example in cmd_clone, so with the statistics above, it may well be the only instance of {git_config; git_config_set; git_config}, I don't know. I've attached the reverse call hierarchy as generated by Eclipse - quite useful for things like this. >> Wouldn't it be better to have a 'git_config_refresh()' that could be >> used in place of (or before) current 'git_config(callback)' calls? The >> initial implementation could just invalidate the config cache. If >> there's time and energy to spare, a more advanced version could first >> check if any of the involved config files has changed. > > That would not change the "xstrdup" vs "no xstrdup" issue, right? > Right. (Unless it turns out invalidation isn't needed after all. But even then using interned strings for the config would be a Good Thing IMO.) >> The xstrdup() problem could be solved by interning strings (see the >> attached patch for a trivial implementation). I.e. allocate each distinct >> string only once (and keep it allocated). > > That's an option. We need to be carefull not to modify any string > in-place, Hopefully an illegal non-const cast will be caught in the review process. git_config_set_multivar_in_file(const char *, const char *, const char *, const char *, int) : int - /git/config.c cmd_config(int, const char * *, const char *) : int - /git/builtin/config.c (5 matches) git_config_set_in_file(const char *, const char *, const char *) : int - /git/config.c cmd_config(int, const char * *, const char *) : int - /git/builtin/config.c (2 matches) connect_work_tree_and_git_dir(const char *, const char *) : void - /git/submodule.c cmd_mv(int, const char * *, const char *) : int - /git/builtin/mv.c save_opts(replay_opts *) : void - /git/sequencer.c (8 matches) sequencer_pick_revisions(replay_opts *) : int - /git/sequencer.c cmd_cherry_pick(int, const char * *, const char *) : int - /git/builtin/revert.c cmd_revert(int, const char * *, const char *) : int - /git/builtin/revert.c update_path_in_gitmodules(const char *, const char *) : int - /git/submodule.c cmd_mv(int, const char * *, const char *) : int - /git/builtin/mv.c git_config_set_multivar(const char *, const char *, const char *, int) : int - /git/config.c add_branch(const char *, const char *, const char *, int, strbuf *) : int - /git/builtin/remote.c add_branches(remote *, const char * *, const char *) : int - /git/builtin/remote.c set_remote_branches(const char *, const char * *, int) : int - /git/builtin/remote.c set_branches(int, const char * *) : int - /git/builtin/remote.c cmd_remote(int, const char * *, const char *) : int - /git/builtin/remote.c add(int, const char * *) : int - /git/builtin/remote.c cmd_remote(int, const char * *, const char *) : int - /git/builtin/remote.c cmd_branch(int, const char * *, const char *) : int - /git/builtin/branch.c (2 matches) git_config_set(const char *, const char *) : int - /git/config.c add(int, const char * *) : int - /git/builtin/remote.c (3 matches) cmd_remote(int, const char * *, const char *) : int - /git/builtin/remote.c cmd_clone(int, const char * *, const char *) : int - /git/builtin/clone.c (2 matches) create_default_files(const char *) : int - /git/builtin/init-db.c (8 matches) init_db(const char *, unsigned int) : int - /git/builtin/init-db.c cmd_clone(int, const char * *, const char *) : int -
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Karsten Blees writes: > I still don't like that the invalidation is done in git_config_set, though, as > this is also used to write completely unrelated files. I don't get it. It is used to write the config files. Yes, we trigger a complete reload instead of just changing this particular value in the hashmap, but I do not see "unrelated files" in the picture. > Wouldn't it be better to have a 'git_config_refresh()' that could be > used in place of (or before) current 'git_config(callback)' calls? The > initial implementation could just invalidate the config cache. If > there's time and energy to spare, a more advanced version could first > check if any of the involved config files has changed. That would not change the "xstrdup" vs "no xstrdup" issue, right? > The xstrdup() problem could be solved by interning strings (see the > attached patch for a trivial implementation). I.e. allocate each distinct > string only once (and keep it allocated). That's an option. We need to be carefull not to modify any string in-place, but I guess that would considerably simplify memory management for strings. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Am 27.06.2014 21:19, schrieb Matthieu Moy: > > The reason for which setting any config value invalidates the cache is > that it is _much_ simpler to implement. Less complexity, less > corner-cases, less bugs. > I think I see what you mean. E.g. in cmd_clone we have: write_config(&option_config); git_config(git_default_config, NULL); ... git_config_set("remote.Foo.url", repo); ... remote = remote_get(option_origin); /* which does 'git_config(handle_config)' */ So if we load the config cache at 'git_config(git_default_config)', then 'remote_get' won't see "remote.Foo.url" unless we invalidate the cache first. I still don't like that the invalidation is done in git_config_set, though, as this is also used to write completely unrelated files. Wouldn't it be better to have a 'git_config_refresh()' that could be used in place of (or before) current 'git_config(callback)' calls? The initial implementation could just invalidate the config cache. If there's time and energy to spare, a more advanced version could first check if any of the involved config files has changed. The xstrdup() problem could be solved by interning strings (see the attached patch for a trivial implementation). I.e. allocate each distinct string only once (and keep it allocated). So if there are 100 instances of "true" in the config file, the interned string pool would contain only one instance (i.e. 5 bytes + hashmap_entry + malloc overhead, vs. 100 * (5 bytes + malloc overhead) in case of xstrdup()). If the config cache is cleared, the interned string in the pool would still remain valid. If the config cache is reloaded, unmodified values would reuse the existing strings from the pool. Side note: interning getenv() results would fix the non-POSIX-compliant getenv()-usage in git [1]. [1] https://groups.google.com/d/msg/msysgit/4cVWWJkN4to/DR8FGTQ09Q0J 8< Subject: [PATCH] hashmap: add string interning API Interning short strings that are likely to have multiple copies may reduce memory footprint and speed up string comparisons. Add a strintern() API that uses a hashmap to manage the pool of interned strings. Signed-off-by: Karsten Blees --- Documentation/technical/api-hashmap.txt | 11 +++ hashmap.c | 34 + hashmap.h | 4 t/t0011-hashmap.sh | 13 + test-hashmap.c | 14 ++ 5 files changed, 76 insertions(+) diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt index b977ae8..db7c955 100644 --- a/Documentation/technical/api-hashmap.txt +++ b/Documentation/technical/api-hashmap.txt @@ -162,6 +162,17 @@ more entries. `hashmap_iter_first` is a combination of both (i.e. initializes the iterator and returns the first entry, if any). +`const char *strintern(const char *string)`:: + + Returns the unique, interned version of the specified string, similar + to the `String.intern` API in Java and .NET. Interned strings remain + valid for the entire lifetime of the process. ++ +Can be used as `[x]strdup()` replacement, except that the interned string must +not be modified or freed. ++ +Uses a hashmap to store the pool of interned strings. + Usage example - diff --git a/hashmap.c b/hashmap.c index d1b8056..03cd8f3 100644 --- a/hashmap.c +++ b/hashmap.c @@ -226,3 +226,37 @@ void *hashmap_iter_next(struct hashmap_iter *iter) current = iter->map->table[iter->tablepos++]; } } + +struct string_pool_entry { + struct hashmap_entry ent; + char key[FLEX_ARRAY]; +}; + +static int string_pool_cmp(const struct string_pool_entry *e1, + const struct string_pool_entry *e2, + const char *key) +{ + return strcmp(e1->key, key ? key : e2->key); +} + +const char *strintern(const char *string) +{ + static struct hashmap map; + struct string_pool_entry key, *e; + /* initialize string pool hashmap */ + if (!map.tablesize) + hashmap_init(&map, (hashmap_cmp_fn) string_pool_cmp, 0); + + /* lookup interned string in pool */ + hashmap_entry_init(&key, strhash(string)); + e = hashmap_get(&map, &key, string); + if (!e) { + /* not found: create it */ + int len = strlen(string); + e = xmalloc(sizeof(struct string_pool_entry) + len + 1); + hashmap_entry_init(e, key.ent.hash); + memcpy(e->key, string, len + 1); + hashmap_add(&map, e); + } + return e->key; +} diff --git a/hashmap.h b/hashmap.h index a816ad4..b428677 100644 --- a/hashmap.h +++ b/hashmap.h @@ -68,4 +68,8 @@ static inline void *hashmap_iter_first(struct hashmap *map, return hashmap_iter_next(iter); } +/* string interning */ + +extern const char *strintern(const char *string); +
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Karsten Blees writes: > Am 27.06.2014 13:55, schrieb Matthieu Moy: >> Karsten Blees writes: >> >>> If for some reason a config string is accessed after config_cache_free() >>> (which would be a bug), you won't notice if strings are xstrdup()ed (i.e. >>> git >>> will continue to run with some invalid configuration). This is IMO much >>> worse >>> than failing with segfault. >> >> I disagree. In most case, continuing to use the old config value is the >> right thing. >> >> First, config_cache_free() is called whenever _any_ configuration >> variable is set. > > This is illogical. An API should do the right thing, and if the > right thing is to keep the old values, as you pointed out, then > setting a config value shouldn't invalidate the cache (at least > not automatically). The reason for which setting any config value invalidates the cache is that it is _much_ simpler to implement. Less complexity, less corner-cases, less bugs. >> So, setting "core.foo" also invalidates "core.bar" in >> the cache, while a user of "core.bar" could continue working with the >> old, unchanged value. > > But a user of an xstrdup()ed copy of "core.foo" would also continue > to work with the old value. Exactly like the current code does. Has it ever been even close to being a problem? What you're suggesting brings a lot more complexity in the code, and I can't imagine a case where it would have a real benefit. So, why bother? -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Am 27.06.2014 13:55, schrieb Matthieu Moy: > Karsten Blees writes: > >> If for some reason a config string is accessed after config_cache_free() >> (which would be a bug), you won't notice if strings are xstrdup()ed (i.e. git >> will continue to run with some invalid configuration). This is IMO much worse >> than failing with segfault. > > I disagree. In most case, continuing to use the old config value is the > right thing. > > First, config_cache_free() is called whenever _any_ configuration > variable is set. This is illogical. An API should do the right thing, and if the right thing is to keep the old values, as you pointed out, then setting a config value shouldn't invalidate the cache (at least not automatically). I can think of only git-clone and git-init that may want to refresh the global config cache, otherwise, config_cache_free() should probably _never_ be called. Least of all as a side effect of git_config_set_multivar_in_file(), which is used to write all kinds of unrelated files that just happen to have config-file format (e.g. '.gitmodules', '.git/sequencer/opts'). > So, setting "core.foo" also invalidates "core.bar" in > the cache, while a user of "core.bar" could continue working with the > old, unchanged value. But a user of an xstrdup()ed copy of "core.foo" would also continue to work with the old value. So if everyone keeps copies of config strings, invalidating the cache when setting a value has no effect. We could just as well not invalidate the cache and not xstrdup() strings. Perhaps we should see this as an opportunity to get rid of all the memory leaks in current config code (each string value defined in system / global config and overridden locally will currently be leaked with the 'standard' xstrdup() pattern). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Karsten Blees writes: > If for some reason a config string is accessed after config_cache_free() > (which would be a bug), you won't notice if strings are xstrdup()ed (i.e. git > will continue to run with some invalid configuration). This is IMO much worse > than failing with segfault. I disagree. In most case, continuing to use the old config value is the right thing. First, config_cache_free() is called whenever _any_ configuration variable is set. So, setting "core.foo" also invalidates "core.bar" in the cache, while a user of "core.bar" could continue working with the old, unchanged value. Then, allowing the invalidation of a config variable at any time raises a lot of tricky cases (if a command uses a configuration variable twice, do we really want to implement and test the behavior for all combination of old/new values for both usages?). More tricky cases usually means more bugs on the user-side ... When the code really want the freshest value, it's cheap to re-query the config cache anyway. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
Am 25.06.2014 05:59, schrieb Eric Sunshine: > On Mon, Jun 23, 2014 at 6:41 AM, Tanay Abhra wrote: [...] >> /* returns 0 for "no pager", 1 for "use pager", and -1 for "not specified" >> */ >> int check_pager_config(const char *cmd) >> { >> - struct pager_config c; >> - c.cmd = cmd; >> - c.want = -1; >> - c.value = NULL; >> - git_config(pager_command_config, &c); >> - if (c.value) >> - pager_program = c.value; >> - return c.want; >> + struct strbuf key = STRBUF_INIT; >> + int want = -1; >> + const char *value = NULL; >> + strbuf_addf(&key, "pager.%s", cmd); >> + if (!git_config_get_string(key.buf, &value)) { >> + int b = git_config_maybe_bool(key.buf, value); >> + if (b >= 0) >> + want = b; >> + else >> + want = 1; >> + } >> + if (value) >> + pager_program = value; [...] > > Second, don't you want to xstrdup(value) when assigning to > 'pager_program'? If you don't, then 'pager_program' will become a > dangling pointer when config_cache_free() is invoked. > I don't think that values from the global config cache should be xstrdup()ed. After all, caching the values during the lifetime of the git process is the entire point of the config cache, isn't it? The only reason to call config_cache_free() is to load a _different_ configuration. In this case, however, you would also need to call the relevant config functions again, leaking all xstrdup()ed strings. If for some reason a config string is accessed after config_cache_free() (which would be a bug), you won't notice if strings are xstrdup()ed (i.e. git will continue to run with some invalid configuration). This is IMO much worse than failing with segfault. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
On 6/25/2014 9:29 AM, Eric Sunshine wrote: > On Mon, Jun 23, 2014 at 6:41 AM, Tanay Abhra wrote: >> Use git_config_get_string instead of git_config to take advantage of >> the config hash-table api which provides a cleaner control flow. >> >> Signed-off-by: Tanay Abhra >> --- >> pager.c | 44 +++- >> 1 file changed, 15 insertions(+), 29 deletions(-) >> >> diff --git a/pager.c b/pager.c >> index 8b5cbc5..96abe6d 100644 >> --- a/pager.c >> +++ b/pager.c >> @@ -6,12 +6,6 @@ >> #define DEFAULT_PAGER "less" >> #endif >> >> -struct pager_config { >> - const char *cmd; >> - int want; >> - char *value; >> -}; >> - >> /* >> * This is split up from the rest of git so that we can do >> * something different on Windows. >> @@ -155,30 +149,22 @@ int decimal_width(int number) >> return width; >> } >> >> -static int pager_command_config(const char *var, const char *value, void >> *data) >> -{ >> - struct pager_config *c = data; >> - if (starts_with(var, "pager.") && !strcmp(var + 6, c->cmd)) { >> - int b = git_config_maybe_bool(var, value); >> - if (b >= 0) >> - c->want = b; >> - else { >> - c->want = 1; >> - c->value = xstrdup(value); >> - } >> - } >> - return 0; >> -} >> - >> /* returns 0 for "no pager", 1 for "use pager", and -1 for "not specified" >> */ >> int check_pager_config(const char *cmd) >> { >> - struct pager_config c; >> - c.cmd = cmd; >> - c.want = -1; >> - c.value = NULL; >> - git_config(pager_command_config, &c); >> - if (c.value) >> - pager_program = c.value; >> - return c.want; >> + struct strbuf key = STRBUF_INIT; >> + int want = -1; >> + const char *value = NULL; >> + strbuf_addf(&key, "pager.%s", cmd); >> + if (!git_config_get_string(key.buf, &value)) { >> + int b = git_config_maybe_bool(key.buf, value); >> + if (b >= 0) >> + want = b; >> + else >> + want = 1; >> + } >> + if (value) >> + pager_program = value; > > Two issues: > > First, why is 'if(value)' standing by itself? Although this works, it > seems to imply that 'value' might be able to become non-NULL by some > mechanism other than the get_config_maybe_bool() call, which means > that people reading this code have to spend extra time trying to > understand the overall logic. If you follow the example of the > original code, where 'value' is only ever set when 'b < 0', then it is > obvious even to the most casual reader that 'pager_program' is > assigned only for that one condition. > Noted. > Second, don't you want to xstrdup(value) when assigning to > 'pager_program'? If you don't, then 'pager_program' will become a > dangling pointer when config_cache_free() is invoked. > Noted. Thanks. >> + strbuf_release(&key); >> + return want; >> } >> -- >> 1.9.0.GIT > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] pager.c: replace git_config with git_config_get_string
On Mon, Jun 23, 2014 at 6:41 AM, Tanay Abhra wrote: > Use git_config_get_string instead of git_config to take advantage of > the config hash-table api which provides a cleaner control flow. > > Signed-off-by: Tanay Abhra > --- > pager.c | 44 +++- > 1 file changed, 15 insertions(+), 29 deletions(-) > > diff --git a/pager.c b/pager.c > index 8b5cbc5..96abe6d 100644 > --- a/pager.c > +++ b/pager.c > @@ -6,12 +6,6 @@ > #define DEFAULT_PAGER "less" > #endif > > -struct pager_config { > - const char *cmd; > - int want; > - char *value; > -}; > - > /* > * This is split up from the rest of git so that we can do > * something different on Windows. > @@ -155,30 +149,22 @@ int decimal_width(int number) > return width; > } > > -static int pager_command_config(const char *var, const char *value, void > *data) > -{ > - struct pager_config *c = data; > - if (starts_with(var, "pager.") && !strcmp(var + 6, c->cmd)) { > - int b = git_config_maybe_bool(var, value); > - if (b >= 0) > - c->want = b; > - else { > - c->want = 1; > - c->value = xstrdup(value); > - } > - } > - return 0; > -} > - > /* returns 0 for "no pager", 1 for "use pager", and -1 for "not specified" */ > int check_pager_config(const char *cmd) > { > - struct pager_config c; > - c.cmd = cmd; > - c.want = -1; > - c.value = NULL; > - git_config(pager_command_config, &c); > - if (c.value) > - pager_program = c.value; > - return c.want; > + struct strbuf key = STRBUF_INIT; > + int want = -1; > + const char *value = NULL; > + strbuf_addf(&key, "pager.%s", cmd); > + if (!git_config_get_string(key.buf, &value)) { > + int b = git_config_maybe_bool(key.buf, value); > + if (b >= 0) > + want = b; > + else > + want = 1; > + } > + if (value) > + pager_program = value; Two issues: First, why is 'if(value)' standing by itself? Although this works, it seems to imply that 'value' might be able to become non-NULL by some mechanism other than the get_config_maybe_bool() call, which means that people reading this code have to spend extra time trying to understand the overall logic. If you follow the example of the original code, where 'value' is only ever set when 'b < 0', then it is obvious even to the most casual reader that 'pager_program' is assigned only for that one condition. Second, don't you want to xstrdup(value) when assigning to 'pager_program'? If you don't, then 'pager_program' will become a dangling pointer when config_cache_free() is invoked. > + strbuf_release(&key); > + return want; > } > -- > 1.9.0.GIT -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC/PATCH] pager.c: replace git_config with git_config_get_string
Use git_config_get_string instead of git_config to take advantage of the config hash-table api which provides a cleaner control flow. Signed-off-by: Tanay Abhra --- pager.c | 44 +++- 1 file changed, 15 insertions(+), 29 deletions(-) diff --git a/pager.c b/pager.c index 8b5cbc5..96abe6d 100644 --- a/pager.c +++ b/pager.c @@ -6,12 +6,6 @@ #define DEFAULT_PAGER "less" #endif -struct pager_config { - const char *cmd; - int want; - char *value; -}; - /* * This is split up from the rest of git so that we can do * something different on Windows. @@ -155,30 +149,22 @@ int decimal_width(int number) return width; } -static int pager_command_config(const char *var, const char *value, void *data) -{ - struct pager_config *c = data; - if (starts_with(var, "pager.") && !strcmp(var + 6, c->cmd)) { - int b = git_config_maybe_bool(var, value); - if (b >= 0) - c->want = b; - else { - c->want = 1; - c->value = xstrdup(value); - } - } - return 0; -} - /* returns 0 for "no pager", 1 for "use pager", and -1 for "not specified" */ int check_pager_config(const char *cmd) { - struct pager_config c; - c.cmd = cmd; - c.want = -1; - c.value = NULL; - git_config(pager_command_config, &c); - if (c.value) - pager_program = c.value; - return c.want; + struct strbuf key = STRBUF_INIT; + int want = -1; + const char *value = NULL; + strbuf_addf(&key, "pager.%s", cmd); + if (!git_config_get_string(key.buf, &value)) { + int b = git_config_maybe_bool(key.buf, value); + if (b >= 0) + want = b; + else + want = 1; + } + if (value) + pager_program = value; + strbuf_release(&key); + return want; } -- 1.9.0.GIT -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html