Re: RFC GSoC idea: new "git config" features
Jeff King writes: > On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote: > >> Jeff King writes: >> >> > If we had the keys in-memory, we could reverse this: config code asks >> > for keys it cares about, and we can do an optimized lookup (binary >> > search, hash, etc). >> >> I'm actually dreaming of a system where a configuration variable could >> be "declared" in Git's source code, with associated type (list/single >> value, boolean/string/path/...), default value and documentation (and >> then Documentation/config.txt could become a generated file). One could >> imagine a lot of possibilities like > > Yes, I think something like that would be very nice. ... > ... >> Migrating the whole code to such system would take time, but creating >> the system and applying it to a few examples might be feasible as a GSoC >> project. > > Agreed, as long as we have enough examples to feel confident that the > infrastructure is sufficient. I agree that it would give us a lot of enhancement opportunities if we had a central catalog of what the supported configuration variables are and what semantics (e.g. type, multi-value-ness, etc.) they have. One thing we need to be careful about is that we still must support random configuration items that git-core does not care about at all but scripts (and future versions of git-core) read off of, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote: > Jeff King writes: > > > If we had the keys in-memory, we could reverse this: config code asks > > for keys it cares about, and we can do an optimized lookup (binary > > search, hash, etc). > > I'm actually dreaming of a system where a configuration variable could > be "declared" in Git's source code, with associated type (list/single > value, boolean/string/path/...), default value and documentation (and > then Documentation/config.txt could become a generated file). One could > imagine a lot of possibilities like Yes, I think something like that would be very nice. I am not a big fan of code generation, but if we had config queries like "config_get_bool", then I think it would be reasonably pleasant to take a spec like: Key: help.browser Type: string Description: Specify the browser for help... and turn it into: const char *config_get_help_browser(void) { return config_get_string("help.browser"); } So technically code generation, but all the heavy lifting is done behind the scenes. We're not saving lines in the result so much as avoiding repeating ourselves (that is, the generated code is only mapping the config-type from the spec into a C type and function name that gives us extra compile-time safety). However, I skimmed through config.txt looking for a key to use in my example above, and there are a surprising number of one-off semantics (e.g., things that are mostly bool, but can be "auto" or take some other special value). We may find that the "Type" field has a surprising number of variants that makes a technique like this annoying. But I'd reserve judgement until somebody actually tries encoding a significant chunk of the config keys and we see what it looks like. > Migrating the whole code to such system would take time, but creating > the system and applying it to a few examples might be feasible as a GSoC > project. Agreed, as long as we have enough examples to feel confident that the infrastructure is sufficient. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
Jeff King writes: > Most callbacks would convert to a query system in a pretty > straightforward way, but some that have side effects might be tricky. > Converting them all may be too large for a GSoC project, but I think you > could do it gradually: > > 1. Convert the parser to read into an in-memory representation, but > leave git_config() as a wrapper which iterates over it. > > 2. Add query functions like config_string_get() above. > > 3. Convert callbacks to query functions one by one. > > 4. Eventually drop git_config(). > > A GSoC project could take us partway through (3). I actually discarded the "read from these config files to preparsed structure to memory, later to be consumed by repeated calls to the git_config() callback functions, making the only difference from the current scheme that the preparsed structure will be reset when there is the new 'reset to the original' definition" as obvious and uninteresting. This is one of these times that I find myself blessed with capable others that can go beyond, building on top of such an idea that I may have discarded without thinking it through, around me ;-) Yes, the new abstraction like config__get() that can live alongside the existing "git_config() feeds callback chain everything" and gradually replace the latter, would be a good way forward. Given that we read configuration multiple times anyway for different purposes, even without the new abstraction, the end result might perform better if we read the files once and reused in later calls to git_config(). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
Jeff King writes: > If we had the keys in-memory, we could reverse this: config code asks > for keys it cares about, and we can do an optimized lookup (binary > search, hash, etc). I'm actually dreaming of a system where a configuration variable could be "declared" in Git's source code, with associated type (list/single value, boolean/string/path/...), default value and documentation (and then Documentation/config.txt could become a generated file). One could imagine a lot of possibilities like $ git config --describe Type: boolean Default value: true Description: ... Somehow, do for config variables what has been done for command-line option parsing. Migrating the whole code to such system would take time, but creating the system and applying it to a few examples might be feasible as a GSoC project. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote: > I absolutely understand that changing all of the config parsers is not > feasible. But I had imagined a third route: > > (3) parse the config once, storing the raw values to records in > memory. When an "unset" is seen, delete any previous records that > have accumulated for that key. After the whole config has been > read, iterate through the records, feeding the surviving values > into the callback in the order they were originally read (minus > deletions). > > Do you see any problems with this way of implementing the functionality > (aside from slightly increased overhead)? Yeah, this is something I have considered many times. It does have some overhead, but the existing system is not great either. As you noted, we often read the config several times for a given program invocation. But moreover, we linearly strcmp each config key we find against each one we know about. In some cases we return early if a sub-function is looking for `starts_with(key, "foo.")`, but in most cases we just look for "foo.bar", "foo.baz", and so on. If we had the keys in-memory, we could reverse this: config code asks for keys it cares about, and we can do an optimized lookup (binary search, hash, etc). That also makes many constructs easier to express. Recently we had a problem where the parsing order of "remote.pushdefault" and "branch.*.pushremote" mattered, because they were read into the same variable. The solution is to use two variables and reconcile them after all config is read. But if you can just query the config subsystem directly, the boilerplate of reading them into strings goes away, and you can just do: x = config_string_getf("branch.%s.pushremote", current_branch); if (!x) x = config_string_get("remote.pushdefault"); if (!x) x = config_string_getf("branch.%s.remote", current_branch); if (!x) x = "origin"; As it is now, the code that does this has a lot more boilerplate, and is split across several functions. Another example is the way we have to manage "deferred" config in git-status (see 84b4202). This might be more clear if we could simply `config_get_bool("status.branch")` at the right moment. The talk of efficiency is probably a red-herring here. I do not think config-reading is a significant source of slow-down in the current code. But I'd be in favor of something that reduced boilerplate and made the code easier to read. > > But the side effects these callbacks may cause are not limited to > > setting a simple scaler variable (like 'frotz' in the illustration) > > but would include things that are hard to undo once done > > (e.g. calling a set-up function with a lot of side effects). Most callbacks would convert to a query system in a pretty straightforward way, but some that have side effects might be tricky. Converting them all may be too large for a GSoC project, but I think you could do it gradually: 1. Convert the parser to read into an in-memory representation, but leave git_config() as a wrapper which iterates over it. 2. Add query functions like config_string_get() above. 3. Convert callbacks to query functions one by one. 4. Eventually drop git_config(). A GSoC project could take us partway through (3). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
On 02/28/2014 09:00 PM, Junio C Hamano wrote: > Michael Haggerty writes: > >> I just wrote up another double-idea that has been stewing in my head for >> a while: >> >> * Allow configuration values to be unset via a config file >> * Fix "git config --unset" to clean up detritus from sections that are >> left empty. > > The former is *way* too large for a GSoC project. Most > configuration variables are meant to be read sequencially and affect > in-core variables directly, like > > /* file-scope global */ > static int frotz = -1; /* unset */ > > static int parse_config_frotz(const char *key, const char *value, > void *cb) > { > if (!strcmp(key, "core.frotz")) > frotz = git_config_int(value); > return 0; > } > > ... and somewhere ... > git_config(parse_config_frotz, NULL); > > The config parsers are distributed and there is no single registry > that knows how in-core variables owned by each subsystem represent > an "unset" value. In the above example, -1 is such a sentinel > value, but in some other contexts, the subsystem may choose to use > INT_MAX. The only way to allow "resetting to previous" is to > > (1) come up with a way to pass "this key is being reset to > 'unspecified'" to existing git_config() callback functions > (like parse_config_frotz() in the above illustration), which > may or may not involve changing the function signature of the > callbacks; > > (2) go through all the git_config() callback functions and make > them understand the new "reset to 'unspecified'" convention. I absolutely understand that changing all of the config parsers is not feasible. But I had imagined a third route: (3) parse the config once, storing the raw values to records in memory. When an "unset" is seen, delete any previous records that have accumulated for that key. After the whole config has been read, iterate through the records, feeding the surviving values into the callback in the order they were originally read (minus deletions). Do you see any problems with this way of implementing the functionality (aside from slightly increased overhead)? And once we have a way to store config records in memory, it might also make sense to reuse the parsed values for later config inquiries (after checking that the files have not changed since the last read). After this second step the net performance change might even be advantageous. > which may not sound too bad at the first glance (especially, the > first one is almost trivial). > > But the side effects these callbacks may cause are not limited to > setting a simple scaler variable (like 'frotz' in the illustration) > but would include things that are hard to undo once done > (e.g. calling a set-up function with a lot of side effects). > > The latter, on the other hand, should be a change that is of a > fairly limited scope, and would be a good fit for a GSoC project > (incidentally, it has been one of the items on my leftover-bits list > http://git-blame.blogspot.com/p/leftover-bits.html for quite some > time). But only the latter part would be a bit meager as a GSoC project, don't you think? Thanks for the feedback. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC GSoC idea: new "git config" features
Michael Haggerty writes: > I just wrote up another double-idea that has been stewing in my head for > a while: > > * Allow configuration values to be unset via a config file > * Fix "git config --unset" to clean up detritus from sections that are > left empty. The former is *way* too large for a GSoC project. Most configuration variables are meant to be read sequencially and affect in-core variables directly, like /* file-scope global */ static int frotz = -1; /* unset */ static int parse_config_frotz(const char *key, const char *value, void *cb) { if (!strcmp(key, "core.frotz")) frotz = git_config_int(value); return 0; } ... and somewhere ... git_config(parse_config_frotz, NULL); The config parsers are distributed and there is no single registry that knows how in-core variables owned by each subsystem represent an "unset" value. In the above example, -1 is such a sentinel value, but in some other contexts, the subsystem may choose to use INT_MAX. The only way to allow "resetting to previous" is to (1) come up with a way to pass "this key is being reset to 'unspecified'" to existing git_config() callback functions (like parse_config_frotz() in the above illustration), which may or may not involve changing the function signature of the callbacks; (2) go through all the git_config() callback functions and make them understand the new "reset to 'unspecified'" convention. which may not sound too bad at the first glance (especially, the first one is almost trivial). But the side effects these callbacks may cause are not limited to setting a simple scaler variable (like 'frotz' in the illustration) but would include things that are hard to undo once done (e.g. calling a set-up function with a lot of side effects). The latter, on the other hand, should be a change that is of a fairly limited scope, and would be a good fit for a GSoC project (incidentally, it has been one of the items on my leftover-bits list http://git-blame.blogspot.com/p/leftover-bits.html for quite some time). Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html