Re: RFC GSoC idea: new "git config" features

2014-03-14 Thread Junio C Hamano
Jeff King  writes:

> On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:
>
>> Jeff King  writes:
>> 
>> > If we had the keys in-memory, we could reverse this: config code asks
>> > for keys it cares about, and we can do an optimized lookup (binary
>> > search, hash, etc).
>> 
>> I'm actually dreaming of a system where a configuration variable could
>> be "declared" in Git's source code, with associated type (list/single
>> value, boolean/string/path/...), default value and documentation (and
>> then Documentation/config.txt could become a generated file). One could
>> imagine a lot of possibilities like
>
> Yes, I think something like that would be very nice. ...
> ...
>> Migrating the whole code to such system would take time, but creating
>> the system and applying it to a few examples might be feasible as a GSoC
>> project.
>
> Agreed, as long as we have enough examples to feel confident that the
> infrastructure is sufficient.

I agree that it would give us a lot of enhancement opportunities if
we had a central catalog of what the supported configuration
variables are and what semantics (e.g. type, multi-value-ness, etc.)
they have.

One thing we need to be careful about is that we still must support
random configuration items that git-core does not care about at all
but scripts (and future versions of git-core) read off of, though.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-03-13 Thread Jeff King
On Sat, Mar 01, 2014 at 12:01:44PM +0100, Matthieu Moy wrote:

> Jeff King  writes:
> 
> > If we had the keys in-memory, we could reverse this: config code asks
> > for keys it cares about, and we can do an optimized lookup (binary
> > search, hash, etc).
> 
> I'm actually dreaming of a system where a configuration variable could
> be "declared" in Git's source code, with associated type (list/single
> value, boolean/string/path/...), default value and documentation (and
> then Documentation/config.txt could become a generated file). One could
> imagine a lot of possibilities like

Yes, I think something like that would be very nice. I am not a big
fan of code generation, but if we had config queries like
"config_get_bool", then I think it would be reasonably pleasant to take
a spec like:

  Key: help.browser
  Type: string
  Description: Specify the browser for help...

and turn it into:

  const char *config_get_help_browser(void)
  {
  return config_get_string("help.browser");
  }

So technically code generation, but all the heavy lifting is done behind
the scenes. We're not saving lines in the result so much as avoiding
repeating ourselves (that is, the generated code is only mapping the
config-type from the spec into a C type and function name that gives us
extra compile-time safety).

However, I skimmed through config.txt looking for a key to use in my
example above, and there are a surprising number of one-off semantics
(e.g., things that are mostly bool, but can be "auto" or take some other
special value). We may find that the "Type" field has a surprising
number of variants that makes a technique like this annoying. But I'd
reserve judgement until somebody actually tries encoding a significant
chunk of the config keys and we see what it looks like.

> Migrating the whole code to such system would take time, but creating
> the system and applying it to a few examples might be feasible as a GSoC
> project.

Agreed, as long as we have enough examples to feel confident that the
infrastructure is sufficient.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-03-03 Thread Junio C Hamano
Jeff King  writes:

> Most callbacks would convert to a query system in a pretty
> straightforward way, but some that have side effects might be tricky.
> Converting them all may be too large for a GSoC project, but I think you
> could do it gradually:
>
>   1. Convert the parser to read into an in-memory representation, but
>  leave git_config() as a wrapper which iterates over it.
>
>   2. Add query functions like config_string_get() above.
>
>   3. Convert callbacks to query functions one by one.
>
>   4. Eventually drop git_config().
>
> A GSoC project could take us partway through (3).

I actually discarded the "read from these config files to preparsed
structure to memory, later to be consumed by repeated calls to the
git_config() callback functions, making the only difference from the
current scheme that the preparsed structure will be reset when there
is the new 'reset to the original' definition" as obvious and
uninteresting.

This is one of these times that I find myself blessed with capable
others that can go beyond, building on top of such an idea that I
may have discarded without thinking it through, around me ;-)

Yes, the new abstraction like config__get() that can live
alongside the existing "git_config() feeds callback chain
everything" and gradually replace the latter, would be a good way
forward.  Given that we read configuration multiple times anyway for
different purposes, even without the new abstraction, the end result
might perform better if we read the files once and reused in later
calls to git_config().

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-03-01 Thread Matthieu Moy
Jeff King  writes:

> If we had the keys in-memory, we could reverse this: config code asks
> for keys it cares about, and we can do an optimized lookup (binary
> search, hash, etc).

I'm actually dreaming of a system where a configuration variable could
be "declared" in Git's source code, with associated type (list/single
value, boolean/string/path/...), default value and documentation (and
then Documentation/config.txt could become a generated file). One could
imagine a lot of possibilities like

$ git config --describe 
Type: boolean
Default value: true
Description: ...

Somehow, do for config variables what has been done for command-line
option parsing.

Migrating the whole code to such system would take time, but creating
the system and applying it to a few examples might be feasible as a GSoC
project.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-02-28 Thread Jeff King
On Sat, Mar 01, 2014 at 01:19:32AM +0100, Michael Haggerty wrote:

> I absolutely understand that changing all of the config parsers is not
> feasible.  But I had imagined a third route:
> 
> (3) parse the config once, storing the raw values to records in
> memory.  When an "unset" is seen, delete any previous records that
> have accumulated for that key.  After the whole config has been
> read, iterate through the records, feeding the surviving values
> into the callback in the order they were originally read (minus
> deletions).
> 
> Do you see any problems with this way of implementing the functionality
> (aside from slightly increased overhead)?

Yeah, this is something I have considered many times. It does have some
overhead, but the existing system is not great either. As you noted, we
often read the config several times for a given program invocation.

But moreover, we linearly strcmp each config key we find against each
one we know about. In some cases we return early if a sub-function is
looking for `starts_with(key, "foo.")`, but in most cases we just look
for "foo.bar", "foo.baz", and so on.

If we had the keys in-memory, we could reverse this: config code asks
for keys it cares about, and we can do an optimized lookup (binary
search, hash, etc).

That also makes many constructs easier to express. Recently we had a
problem where the parsing order of "remote.pushdefault" and
"branch.*.pushremote" mattered, because they were read into the same
variable. The solution is to use two variables and reconcile them after
all config is read. But if you can just query the config subsystem
directly, the boilerplate of reading them into strings goes away, and
you can just do:

  x = config_string_getf("branch.%s.pushremote", current_branch);
  if (!x)
  x = config_string_get("remote.pushdefault");
  if (!x)
  x = config_string_getf("branch.%s.remote", current_branch);
  if (!x)
  x = "origin";

As it is now, the code that does this has a lot more boilerplate, and is
split across several functions.

Another example is the way we have to manage "deferred" config in
git-status (see 84b4202). This might be more clear if we could simply
`config_get_bool("status.branch")` at the right moment.

The talk of efficiency is probably a red-herring here. I do not think
config-reading is a significant source of slow-down in the current code.
But I'd be in favor of something that reduced boilerplate and made the
code easier to read.

> > But the side effects these callbacks may cause are not limited to
> > setting a simple scaler variable (like 'frotz' in the illustration)
> > but would include things that are hard to undo once done
> > (e.g. calling a set-up function with a lot of side effects).

Most callbacks would convert to a query system in a pretty
straightforward way, but some that have side effects might be tricky.
Converting them all may be too large for a GSoC project, but I think you
could do it gradually:

  1. Convert the parser to read into an in-memory representation, but
 leave git_config() as a wrapper which iterates over it.

  2. Add query functions like config_string_get() above.

  3. Convert callbacks to query functions one by one.

  4. Eventually drop git_config().

A GSoC project could take us partway through (3).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-02-28 Thread Michael Haggerty
On 02/28/2014 09:00 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> I just wrote up another double-idea that has been stewing in my head for
>> a while:
>>
>> * Allow configuration values to be unset via a config file
>> * Fix "git config --unset" to clean up detritus from sections that are
>> left empty.
> 
> The former is *way* too large for a GSoC project.  Most
> configuration variables are meant to be read sequencially and affect
> in-core variables directly, like
> 
> /* file-scope global */
>   static int frotz = -1;  /* unset */
> 
> static int parse_config_frotz(const char *key, const char *value, 
> void *cb)
>   {
>   if (!strcmp(key, "core.frotz"))
>   frotz = git_config_int(value);
>   return 0;
>   }
> 
>   ... and somewhere ...
>   git_config(parse_config_frotz, NULL);
> 
> The config parsers are distributed and there is no single registry
> that knows how in-core variables owned by each subsystem represent
> an "unset" value.  In the above example, -1 is such a sentinel
> value, but in some other contexts, the subsystem may choose to use
> INT_MAX.  The only way to allow "resetting to previous" is to
> 
>  (1) come up with a way to pass "this key is being reset to
>  'unspecified'" to existing git_config() callback functions
>  (like parse_config_frotz() in the above illustration), which
>  may or may not involve changing the function signature of the
>  callbacks;
> 
>  (2) go through all the git_config() callback functions and make
>  them understand the new "reset to 'unspecified'" convention.

I absolutely understand that changing all of the config parsers is not
feasible.  But I had imagined a third route:

(3) parse the config once, storing the raw values to records in
memory.  When an "unset" is seen, delete any previous records that
have accumulated for that key.  After the whole config has been
read, iterate through the records, feeding the surviving values
into the callback in the order they were originally read (minus
deletions).

Do you see any problems with this way of implementing the functionality
(aside from slightly increased overhead)?

And once we have a way to store config records in memory, it might also
make sense to reuse the parsed values for later config inquiries (after
checking that the files have not changed since the last read).  After
this second step the net performance change might even be advantageous.

> which may not sound too bad at the first glance (especially, the
> first one is almost trivial).
> 
> But the side effects these callbacks may cause are not limited to
> setting a simple scaler variable (like 'frotz' in the illustration)
> but would include things that are hard to undo once done
> (e.g. calling a set-up function with a lot of side effects).
> 
> The latter, on the other hand, should be a change that is of a
> fairly limited scope, and would be a good fit for a GSoC project
> (incidentally, it has been one of the items on my leftover-bits list
> http://git-blame.blogspot.com/p/leftover-bits.html for quite some
> time).

But only the latter part would be a bit meager as a GSoC project, don't
you think?

Thanks for the feedback.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC GSoC idea: new "git config" features

2014-02-28 Thread Junio C Hamano
Michael Haggerty  writes:

> I just wrote up another double-idea that has been stewing in my head for
> a while:
>
> * Allow configuration values to be unset via a config file
> * Fix "git config --unset" to clean up detritus from sections that are
> left empty.

The former is *way* too large for a GSoC project.  Most
configuration variables are meant to be read sequencially and affect
in-core variables directly, like

/* file-scope global */
static int frotz = -1;  /* unset */

static int parse_config_frotz(const char *key, const char *value, void 
*cb)
{
if (!strcmp(key, "core.frotz"))
frotz = git_config_int(value);
return 0;
}

... and somewhere ...
git_config(parse_config_frotz, NULL);

The config parsers are distributed and there is no single registry
that knows how in-core variables owned by each subsystem represent
an "unset" value.  In the above example, -1 is such a sentinel
value, but in some other contexts, the subsystem may choose to use
INT_MAX.  The only way to allow "resetting to previous" is to

 (1) come up with a way to pass "this key is being reset to
 'unspecified'" to existing git_config() callback functions
 (like parse_config_frotz() in the above illustration), which
 may or may not involve changing the function signature of the
 callbacks;

 (2) go through all the git_config() callback functions and make
 them understand the new "reset to 'unspecified'" convention.

which may not sound too bad at the first glance (especially, the
first one is almost trivial).

But the side effects these callbacks may cause are not limited to
setting a simple scaler variable (like 'frotz' in the illustration)
but would include things that are hard to undo once done
(e.g. calling a set-up function with a lot of side effects).

The latter, on the other hand, should be a change that is of a
fairly limited scope, and would be a good fit for a GSoC project
(incidentally, it has been one of the items on my leftover-bits list
http://git-blame.blogspot.com/p/leftover-bits.html for quite some
time).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html