Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jeff King
On Wed, Jul 12, 2017 at 11:24:47AM -0700, Jonathan Nieder wrote:

> Jeff King wrote:
> > On Wed, Jul 12, 2017 at 11:06:03AM -0700, Brandon Williams wrote:
> 
> >> Each 'struct repository' does have its own config so we could
> >> potentially want a config in a submodule to override some config in the
> >> superproject.  Though for right now it may be simpler to not worry about
> >> doing this overriding, mostly because you would only want to allow
> >> overriding of some configuration and not all configuration.  One example
> >> would be the number of threads allowed in grep,
> [...]
> > I think that's probably one of the more complicated cases, and I don't
> > think it really needs to be done on day one.
> 
> That's fair.  Could you give an example of a simpler case?

I think core.quotepath that I gave is one example that I'd expect to
work differently than it does in 'next'. Though I really think that's a
fairly uninteresting one, as it's extremely unlikely to be set on a
per-repo basis.

Something like color.grep.* is in the same boat (I think it ought to
respect submodule config, but I'd be surprised if anybody cares).

I know those are kind of lame examples, and if they remain broken
forever, I don't know if anybody would care. I'm more concerned about us
missing a case that causes a regression (and unlike some regressions,
where you say "oops, a bug" and fix it, I'd worry that we'll have made a
big jump to an in-process model, and the only fixes are "implement a
complete split-config solution" or "revert back to multiple processes").

I do agree that not having concrete examples makes it harder to reason
about the desired semantics.

-Peff


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jeff King
On Wed, Jul 12, 2017 at 11:09:23AM -0700, Jonathan Nieder wrote:

> > I didn't follow the rest of the "struct repository" series closely, but
> > I don't feel like we ever reached a resolution on how config would be
> > handled. I notice that the in-process "ls-files" behaves differently
> > than the old one when config differs between the submodule and the
> > parent repository. As we convert more commands (that use more config)
> > this will become more likely to be noticed by somebody.
> >
> > Do we have a plan for dealing with this? Is our solution just "recursed
> > operations always respect the parent config, deal with it"?
> 
> For settings like branch..remote, I don't think anyone would
> disagree that the right thing to do is to use the per-repository
> config of the submodule.  The repository object is already able to
> handle per-repository config, so this just involves callers being
> careful not to cache values locally in a way that conflates
> repositories.  It should be pretty straightforward (for commands like
> "git fetch --recurse-submodules", for example).

I agree that's the right approach. What I'm worried about is that I see
in-process work proceeding without the "callers being careful" part
being audited, which can lead to regressions (e.g., ls-files with
core.quotepath is "broken" in next right now). Though at least the
regression would be limited to people using submodules.

> For settings like grep.patternType, on the other hand, it would be
> very strange for the behavior to change when grep crosses the
> submodule boundary.  So I think using the parent project config is the
> right thing to do and the old behavior was simply wrong.  In other
> words, I don't think this is so much a case of "deal with it" as
> "sorry we got the behavior so wrong before --- we've finally fixed it
> now".

I think that's not actually about the parent project's config, as much
as it is about the parameters of the current operation. I.e., the
argument is that this particular grep operation is using a particular
pattern type, no matter how we arrived at that decision, and it should
be used for all of the recursive bits. So whether the superproject has
its own grep.patternType set, or whether the user used "-P" on the
command line, the result is the same: we need to tell the submodule to
ignore any config and use the parameters we feed it.

In a multi-process model, that should happen by converting all of the
bits in "struct grep_opt" back into command-line parameters and feeding
them to the recursive processes (which would then give them precedence
over any config). But I'm pretty sure we don't do that.

In the in-process model, that would hopefully be a bit simpler, as we'd
just pass in a pre-made grep_opt.

-Peff


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jonathan Nieder
Jeff King wrote:
> On Wed, Jul 12, 2017 at 11:06:03AM -0700, Brandon Williams wrote:

>> Each 'struct repository' does have its own config so we could
>> potentially want a config in a submodule to override some config in the
>> superproject.  Though for right now it may be simpler to not worry about
>> doing this overriding, mostly because you would only want to allow
>> overriding of some configuration and not all configuration.  One example
>> would be the number of threads allowed in grep,
[...]
> I think that's probably one of the more complicated cases, and I don't
> think it really needs to be done on day one.

That's fair.  Could you give an example of a simpler case?

Thanks,
Jonathan


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jeff King
On Wed, Jul 12, 2017 at 11:06:03AM -0700, Brandon Williams wrote:

> > I didn't follow the rest of the "struct repository" series closely, but
> > I don't feel like we ever reached a resolution on how config would be
> > handled. I notice that the in-process "ls-files" behaves differently
> > than the old one when config differs between the submodule and the
> > parent repository. As we convert more commands (that use more config)
> > this will become more likely to be noticed by somebody.
> > 
> > Do we have a plan for dealing with this? Is our solution just "recursed
> > operations always respect the parent config, deal with it"?
> 
> Each 'struct repository' does have its own config so we could
> potentially want a config in a submodule to override some config in the
> superproject.  Though for right now it may be simpler to not worry about
> doing this overriding, mostly because you would only want to allow
> overriding of some configuration and not all configuration.  One example
> would be the number of threads allowed in grep, it doesn't make much
> sense to let a submodule's configuration of this to trump the
> superproject's since the command was invoked in the context of the
> superproject.

I'm not sure I agree 100% with that example. What makes threads special,
I think, is not the config but the total count spread across all of the
recursive processes. So it's not that we don't want to respect submodule
config so much as we want to take the submodule config into account, but
throttle it based on what other threads are running.

So if your superproject says "1" and the submodule says "8", I'd expect
"8" threads to run in the submodule. If you're already running another 3
threads on behalf of another submodule, I think it would be reasonable
to do some job control and only give the submodule 5 slots. But I don't
think that should happen at the config layer. It should probably happen
when the submodule decides to spawn threads, and it should ask of the
superproject "how many slots am I allowed?".

I think that's probably one of the more complicated cases, and I don't
think it really needs to be done on day one. Setting differing thread
counts is even more unlikely than the rest of the config. I suspect
doing job management in general would come up first, because people
don't want to fork-bomb themselves.

Anyway, that got pretty far afield. What I was trying to say is that I
think you can treat the config uniformly, without making special
exceptions for things like grep.threads.

-Peff


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Stefan Beller
On Wed, Jul 12, 2017 at 11:09 AM, Jonathan Nieder  wrote:
> Hi,
>
> Jeff King wrote:
>
>> I didn't follow the rest of the "struct repository" series closely, but
>> I don't feel like we ever reached a resolution on how config would be
>> handled. I notice that the in-process "ls-files" behaves differently
>> than the old one when config differs between the submodule and the
>> parent repository. As we convert more commands (that use more config)
>> this will become more likely to be noticed by somebody.
>>
>> Do we have a plan for dealing with this? Is our solution just "recursed
>> operations always respect the parent config, deal with it"?
>
> For settings like branch..remote, I don't think anyone would
> disagree that the right thing to do is to use the per-repository
> config of the submodule.  The repository object is already able to
> handle per-repository config, so this just involves callers being
> careful not to cache values locally in a way that conflates
> repositories.  It should be pretty straightforward (for commands like
> "git fetch --recurse-submodules", for example).
>
> For settings like grep.patternType, on the other hand, it would be
> very strange for the behavior to change when grep crosses the
> submodule boundary.

That is because this option relates to the input given.
Other options relate to the items to be processed, such as grep.color.*
which I would not find strange if they were respected in the submodule.

> So I think using the parent project config is the
> right thing to do and the old behavior was simply wrong.  In other
> words, I don't think this is so much a case of "deal with it" as
> "sorry we got the behavior so wrong before --- we've finally fixed it
> now".

In an ideal world we should pay partial attention to the config
in the submodule, and for each config key we'd have to determine
if it rather relates to the general runtime (grep.threads), the command
line options (grep.patternType) or to the specific content inside the repo
(coloring, whether we show the line number).

For now and in reality we can ignore the content specific settings
and claim that any setting here is related to runtime and command line
options, both of which a user may expect the superproject to win in
case of differing configurations.

Thanks,
Stefan


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> I didn't follow the rest of the "struct repository" series closely, but
> I don't feel like we ever reached a resolution on how config would be
> handled. I notice that the in-process "ls-files" behaves differently
> than the old one when config differs between the submodule and the
> parent repository. As we convert more commands (that use more config)
> this will become more likely to be noticed by somebody.
>
> Do we have a plan for dealing with this? Is our solution just "recursed
> operations always respect the parent config, deal with it"?

For settings like branch..remote, I don't think anyone would
disagree that the right thing to do is to use the per-repository
config of the submodule.  The repository object is already able to
handle per-repository config, so this just involves callers being
careful not to cache values locally in a way that conflates
repositories.  It should be pretty straightforward (for commands like
"git fetch --recurse-submodules", for example).

For settings like grep.patternType, on the other hand, it would be
very strange for the behavior to change when grep crosses the
submodule boundary.  So I think using the parent project config is the
right thing to do and the old behavior was simply wrong.  In other
words, I don't think this is so much a case of "deal with it" as
"sorry we got the behavior so wrong before --- we've finally fixed it
now".

But this is subtle.  Maybe some notes in the config documentation for
relevant settings would help.  That would make the intended behavior
clearer and make debugging easier for users.

Thanks,
Jonathan


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Brandon Williams
On 07/12, Jeff King wrote:
> On Tue, Jul 11, 2017 at 03:04:05PM -0700, Brandon Williams wrote:
> 
> > This series utilizes the new 'struct repository' in order to convert grep 
> > to be
> > able to recurse into submodules in-process much like how ls-files was 
> > converted
> > to recuse in-process.  The result is a much smaller code footprint due to 
> > not
> > needing to compile an argv array of options to be used when launched a 
> > process
> > for operating on a submodule.
> 
> I didn't follow the rest of the "struct repository" series closely, but
> I don't feel like we ever reached a resolution on how config would be
> handled. I notice that the in-process "ls-files" behaves differently
> than the old one when config differs between the submodule and the
> parent repository. As we convert more commands (that use more config)
> this will become more likely to be noticed by somebody.
> 
> Do we have a plan for dealing with this? Is our solution just "recursed
> operations always respect the parent config, deal with it"?

Each 'struct repository' does have its own config so we could
potentially want a config in a submodule to override some config in the
superproject.  Though for right now it may be simpler to not worry about
doing this overriding, mostly because you would only want to allow
overriding of some configuration and not all configuration.  One example
would be the number of threads allowed in grep, it doesn't make much
sense to let a submodule's configuration of this to trump the
superproject's since the command was invoked in the context of the
superproject.

-- 
Brandon Williams


Re: [PATCH 0/3] Convert grep to recurse in-process

2017-07-12 Thread Jeff King
On Tue, Jul 11, 2017 at 03:04:05PM -0700, Brandon Williams wrote:

> This series utilizes the new 'struct repository' in order to convert grep to 
> be
> able to recurse into submodules in-process much like how ls-files was 
> converted
> to recuse in-process.  The result is a much smaller code footprint due to not
> needing to compile an argv array of options to be used when launched a process
> for operating on a submodule.

I didn't follow the rest of the "struct repository" series closely, but
I don't feel like we ever reached a resolution on how config would be
handled. I notice that the in-process "ls-files" behaves differently
than the old one when config differs between the submodule and the
parent repository. As we convert more commands (that use more config)
this will become more likely to be noticed by somebody.

Do we have a plan for dealing with this? Is our solution just "recursed
operations always respect the parent config, deal with it"?

-Peff