Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Duy Nguyen
On Sat, Jan 13, 2018 at 11:54 AM, Duy Nguyen  wrote:
> On Sat, Jan 13, 2018 at 5:54 AM, Junio C Hamano  wrote:
>> Jeff King  writes:
>>
>>> I also think this is a special case of a more general problem. FOO could
>>> appear any number of times in the "env" array, as a deletion or with
>>> multiple values. Our prep_childenv() would treat that as "last one
>>> wins", I think. Could we just do the same here?
>>
>> Perhaps this should be squashed into the original 4/4 instead of
>> being a separate patch.  We'd probably want some sort of test, I
>> wonder?  Not tested at all beyond compiling...
>>
>> -- >8 --
>> Subject: [PATCH 7/4] run-command.c: don't be too cute in concatenate_env()
>>
>> Instead of relying on "sort" being stable to sort "unset VAR"
>> immediately before "VAR=VAL" to remove the former, just pick the
>> last manipulation for each VAR from the list of environment tweaks
>> and show them in the output.
>
> This is not enough.

No it is. I misunderstood string_list_insert() and my tests proved me
wrong. Sorry for the noise.
-- 
Duy


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Duy Nguyen
On Sat, Jan 13, 2018 at 5:54 AM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> I also think this is a special case of a more general problem. FOO could
>> appear any number of times in the "env" array, as a deletion or with
>> multiple values. Our prep_childenv() would treat that as "last one
>> wins", I think. Could we just do the same here?
>
> Perhaps this should be squashed into the original 4/4 instead of
> being a separate patch.  We'd probably want some sort of test, I
> wonder?  Not tested at all beyond compiling...
>
> -- >8 --
> Subject: [PATCH 7/4] run-command.c: don't be too cute in concatenate_env()
>
> Instead of relying on "sort" being stable to sort "unset VAR"
> immediately before "VAR=VAL" to remove the former, just pick the
> last manipulation for each VAR from the list of environment tweaks
> and show them in the output.

This is not enough. Imagine we have GIT_DIR=foo in parent env, then a
sequence of "GIT_DIR", "GIT_DIR=foo" in deltaenv. Because we process
set/unset in two separate loops, the "last one wins" rule does not see
that "GIT_DIR=foo" wins over "unset GIT_DIR;". So we might print
"unset GIT_DIR; GIT_DIR=foo", which is fine even if it's redundant.
Except that we don't print that.

The problem comes from comparing with parent env. The new var has the
same value as parent env so we won't print "GIT_DIR=foo", just "unset
GIT_DIR;". This is wrong.

I'm tempted to just get the final child env from prep_childenv() then
compare with parent env and print the differences. It will not work
with Windows though, so Windows gets the old trace line without env
delta. I hope some Windows contributor will jump in at some point if
they want env tracing works for them too.
-- 
Duy


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Junio C Hamano
Jeff King  writes:

> I also think this is a special case of a more general problem. FOO could
> appear any number of times in the "env" array, as a deletion or with
> multiple values. Our prep_childenv() would treat that as "last one
> wins", I think. Could we just do the same here?

Perhaps this should be squashed into the original 4/4 instead of
being a separate patch.  We'd probably want some sort of test, I
wonder?  Not tested at all beyond compiling...

-- >8 --
Subject: [PATCH 7/4] run-command.c: don't be too cute in concatenate_env()

Instead of relying on "sort" being stable to sort "unset VAR"
immediately before "VAR=VAL" to remove the former, just pick the
last manipulation for each VAR from the list of environment tweaks
and show them in the output.

Signed-off-by: Junio C Hamano 
---
 trace.c | 68 -
 1 file changed, 21 insertions(+), 47 deletions(-)

diff --git a/trace.c b/trace.c
index aba2825044..9f49bcdd03 100644
--- a/trace.c
+++ b/trace.c
@@ -272,76 +272,50 @@ void trace_performance_fl(const char *file, int line, 
uint64_t nanos,
 
 #endif /* HAVE_VARIADIC_MACROS */
 
-static void sort_deltaenv(struct string_list *envs,
- const char *const *deltaenv)
+static void concatenate_env(struct strbuf *dst, const char *const *deltaenv)
 {
-   struct strbuf key = STRBUF_INIT;
+   struct string_list envs = STRING_LIST_INIT_DUP;
const char *const *e;
+   int i;
 
+   /* Last one wins... */
for (e = deltaenv; e && *e; e++) {
+   struct strbuf key = STRBUF_INIT;
char *equals = strchr(*e, '=');
 
if (equals) {
strbuf_reset();
strbuf_add(, *e, equals - *e);
-   string_list_append(envs, key.buf)->util = equals + 1;
+   string_list_insert(, key.buf)->util = equals + 1;
} else {
-   string_list_append(envs, *e)->util = NULL;
+   string_list_insert(, *e)->util = NULL;
}
}
-   string_list_sort(envs);
-   strbuf_release();
-}
-
-
-static void concatenate_env(struct strbuf *dst, const char *const *deltaenv)
-{
-   struct string_list envs = STRING_LIST_INIT_DUP;
-   int i;
-
-   /*
-* Construct a sorted string list consisting of the delta
-* env. We need this to detect the case when the same var is
-* deleted first, then added again.
-*/
-   sort_deltaenv(, deltaenv);
 
-   /*
-* variable deletion first because it's printed like separate
-* shell commands
-*/
+   /* series of "unset X; unset Y;..." */
for (i = 0; i < envs.nr; i++) {
-   const char *env = envs.items[i].string;
-   const char *p = envs.items[i].util;
+   const char *var = envs.items[i].string;
+   const char *val = envs.items[i].util;
 
-   if (p || !getenv(env))
+   if (val)
continue;
-
-   /*
-* Do we have a sequence of "unset GIT_DIR; GIT_DIR=foo"?
-* Then don't bother with the unset thing.
-*/
-   if (i + 1 < envs.nr &&
-   !strcmp(env, envs.items[i + 1].string))
-   continue;
-
-   strbuf_addf(dst, " unset %s;", env);
+   if (getenv(var))
+   strbuf_addf(dst, " unset %s;", var);
}
 
+   /* ... followed by "A=B C=D ..." */
for (i = 0; i < envs.nr; i++) {
-   const char *env = envs.items[i].string;
-   const char *p = envs.items[i].util;
+   const char *var = envs.items[i].string;
+   const char *val = envs.items[i].util;
const char *old_value;
 
-   if (!p)
+   if (!val)
continue;
-
-   old_value = getenv(env);
-   if (old_value && !strcmp(old_value, p))
+   old_value = getenv(var);
+   if (old_value && !strcmp(old_value, val))
continue;
-
-   strbuf_addf(dst, " %s=", env);
-   sq_quote_buf_pretty(dst, p);
+   strbuf_addf(dst, " %s=", var);
+   sq_quote_buf_pretty(dst, val);
}
string_list_clear(, 0);
 }


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Brandon Williams
On 01/12, Jeff King wrote:
> On Fri, Jan 12, 2018 at 11:19:44AM -0800, Junio C Hamano wrote:
> 
> > Jeff King  writes:
> > 
> > > The actual prep_childenv() code looks like it would always do
> > > last-one-wins, so we should treat FOO as unset in that final case. But
> > > that only kicks in on non-Windows.
> > >
> > > On Windows we feed cmd->env straight to mingw_spawnvpe().  That ends up
> > > in make_environment_block(), which looks like it does something similar.
> > >
> > > It's too bad the prep code is not shared, since then we could probably
> > > just ask _it_ which deltas it applied.
> > 
> > Yeah, but that function does a lot more than computing delta.  
> > 
> > It's primary point, which comes from ae25394b ("run-command: prepare
> > child environment before forking", 2017-04-19), is to create a full
> > copy of the environment, not just a series of putenv/unsetenv that
> > describes what gets changed, and that is done to avoid any
> > allocation after fork before exec in the child process.
> > 
> > I guess prep_childenv() could take an extra and optional string-list
> > to report what "deltas" it applied to the tracing machinery.  I am
> > not sure if that is worth it, though.
> 
> Yes, that's exactly what I meant. I think if it covered all platforms it
> might be worth it (not so much for code de-duping, but because then we
> would know our display logic always matched what we exec'd). But unless
> somebody wants to refactor all of the Windows spawn code, it's
> definitely not a good idea.

And not having access to a windows box I didn't want to try and refactor
the windows-only code when I introduced prep_childenv() :)

-- 
Brandon Williams


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Jeff King
On Fri, Jan 12, 2018 at 11:19:44AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The actual prep_childenv() code looks like it would always do
> > last-one-wins, so we should treat FOO as unset in that final case. But
> > that only kicks in on non-Windows.
> >
> > On Windows we feed cmd->env straight to mingw_spawnvpe().  That ends up
> > in make_environment_block(), which looks like it does something similar.
> >
> > It's too bad the prep code is not shared, since then we could probably
> > just ask _it_ which deltas it applied.
> 
> Yeah, but that function does a lot more than computing delta.  
> 
> It's primary point, which comes from ae25394b ("run-command: prepare
> child environment before forking", 2017-04-19), is to create a full
> copy of the environment, not just a series of putenv/unsetenv that
> describes what gets changed, and that is done to avoid any
> allocation after fork before exec in the child process.
> 
> I guess prep_childenv() could take an extra and optional string-list
> to report what "deltas" it applied to the tracing machinery.  I am
> not sure if that is worth it, though.

Yes, that's exactly what I meant. I think if it covered all platforms it
might be worth it (not so much for code de-duping, but because then we
would know our display logic always matched what we exec'd). But unless
somebody wants to refactor all of the Windows spawn code, it's
definitely not a good idea.

-Peff


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Junio C Hamano
Jeff King  writes:

> The actual prep_childenv() code looks like it would always do
> last-one-wins, so we should treat FOO as unset in that final case. But
> that only kicks in on non-Windows.
>
> On Windows we feed cmd->env straight to mingw_spawnvpe().  That ends up
> in make_environment_block(), which looks like it does something similar.
>
> It's too bad the prep code is not shared, since then we could probably
> just ask _it_ which deltas it applied.

Yeah, but that function does a lot more than computing delta.  

It's primary point, which comes from ae25394b ("run-command: prepare
child environment before forking", 2017-04-19), is to create a full
copy of the environment, not just a series of putenv/unsetenv that
describes what gets changed, and that is done to avoid any
allocation after fork before exec in the child process.

I guess prep_childenv() could take an extra and optional string-list
to report what "deltas" it applied to the tracing machinery.  I am
not sure if that is worth it, though.





Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Jeff King
On Fri, Jan 12, 2018 at 10:24:28AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> +  /*
> >> +   * Do we have a sequence of "unset GIT_DIR; GIT_DIR=foo"?
> >> +   * Then don't bother with the unset thing.
> >> +   */
> >> +  if (i + 1 < envs.nr &&
> >> +  !strcmp(env, envs.items[i + 1].string))
> >>continue;
> >
> > Are we guaranteed that "FOO" and "FOO=bar" appear next to each other in the
> > sorted list? I think "FOO123=bar" could come between.
> 
> At this point, envs is a string list whose key is FOO for both "FOO"
> (unset) and "FOO=bar" (set); "FOO123=bar" would sort after these two.

Ah, right, I didn't notice that we took just the key name.

> But I did not see anything that attempts to guarantee that "FOO"
> sorts before "FOO=bar" with string_list_sort().  If the sort used in
> the function is stable, and if the case we care about is unset
> followed by set, then the above would catch the case, but even if
> that were the case, it is unclear what we want to do when a set of
> FOO is followed by an unset of FOO.

The actual prep_childenv() code looks like it would always do
last-one-wins, so we should treat FOO as unset in that final case. But
that only kicks in on non-Windows.

On Windows we feed cmd->env straight to mingw_spawnvpe().  That ends up
in make_environment_block(), which looks like it does something similar.

It's too bad the prep code is not shared, since then we could probably
just ask _it_ which deltas it applied. I suspect it would be possible to
share it, since mingw_spawnvpe is our own compat thing (so we could
change its interface to take the whole prepared environment block). But
I won't blame Duy if he doesn't want to refactor all of the
cross-platform exec code. ;)

> And if the sort is not stable, then the above code is just simply
> broken.

Agreed.

> 
> > I also think this is a special case of a more general problem. FOO could
> > appear any number of times in the "env" array, as a deletion or with
> > multiple values. Our prep_childenv() would treat that as "last one
> > wins", I think. Could we just do the same here?
> 
> Yeah, if the last one is to set FOO=bar, then it feels sufficient to
> just check if FOO is set to bar in the original and deciding to show
> or hide; similarly if the last one is to unset FOO, it does not matter
> if the caller sets it to some other value before unsetting, so it
> feels sufficient to just check if FOO is set to anything in the
> original environment.

Yep, agreed again.

-Peff


Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Junio C Hamano
Jeff King  writes:

>> +/*
>> + * Do we have a sequence of "unset GIT_DIR; GIT_DIR=foo"?
>> + * Then don't bother with the unset thing.
>> + */
>> +if (i + 1 < envs.nr &&
>> +!strcmp(env, envs.items[i + 1].string))
>>  continue;
>
> Are we guaranteed that "FOO" and "FOO=bar" appear next to each other in the
> sorted list? I think "FOO123=bar" could come between.

At this point, envs is a string list whose key is FOO for both "FOO"
(unset) and "FOO=bar" (set); "FOO123=bar" would sort after these two.

But I did not see anything that attempts to guarantee that "FOO"
sorts before "FOO=bar" with string_list_sort().  If the sort used in
the function is stable, and if the case we care about is unset
followed by set, then the above would catch the case, but even if
that were the case, it is unclear what we want to do when a set of
FOO is followed by an unset of FOO.

And if the sort is not stable, then the above code is just simply
broken.

> I also think this is a special case of a more general problem. FOO could
> appear any number of times in the "env" array, as a deletion or with
> multiple values. Our prep_childenv() would treat that as "last one
> wins", I think. Could we just do the same here?

Yeah, if the last one is to set FOO=bar, then it feels sufficient to
just check if FOO is set to bar in the original and deciding to show
or hide; similarly if the last one is to unset FOO, it does not matter
if the caller sets it to some other value before unsetting, so it
feels sufficient to just check if FOO is set to anything in the
original environment.





Re: [PATCH v3 4/4] trace.c: be smart about what env to print in trace_run_command()

2018-01-12 Thread Jeff King
On Fri, Jan 12, 2018 at 04:56:07PM +0700, Nguyễn Thái Ngọc Duy wrote:

> The previous concatenate_env() is kinda dumb. It receives a env delta
> in child_process and blindly follows it. If the run_command() user
> "adds" more vars of the same value, or deletes vars that do not exist
> in parent env in the first place (*), then why bother to print them?
> 
> This patch checks child_process.env against parent environment and
> only print the actual differences. The unset syntax (and later on cwd)
> follows closely shell syntax for easy reading/guessing and copy/paste.

I like it.

> There is an interesting thing with this change. In the previous patch,
> we unconditionally print new vars. With submodule code we may have
> something like this
> 
> trace: ... GIT_DIR='.git' git 'status' '--porcelain=2'
> 
> but since parent's GIT_DIR usually has the same value '.git' too, this
> change suppress that, now we can't see that the above command runs in
> a separate repo. This is the run for printing cwd. Now we have
> 
> trace: ... cd foo; git 'status' '--porcelain=2'

Makes sense (though s/run/reason/ in the last paragraph?).

> Another equally interesting thing is, some caller can do "unset GIT_DIR;
> set GIT_DIR=.git". Since parent env can have the same value '.git', we
> don't re-print GIT_DIR=.git. In that case must NOT print "unset GIT_DIR"
> or the user will be misled to think the new command has no GIT_DIR.

Interesting. I wonder if any callers actually do that. It seems like
kind of an odd thing, but maybe a caller sets GIT_DIR on top of the
clearing of local_repo_env.

> A note about performance. Yes we're burning a lot more cycles for
> displaying something this nice. But because it's protected by
> $GIT_TRACE, and run_command() should not happen often anyway, it should
> not feel any slower than before.

I'd agree that performance probably doesn't matter much here.

> + /*
> +  * Do we have a sequence of "unset GIT_DIR; GIT_DIR=foo"?
> +  * Then don't bother with the unset thing.
> +  */
> + if (i + 1 < envs.nr &&
> + !strcmp(env, envs.items[i + 1].string))
>   continue;

Are we guaranteed that "FOO" and "FOO=bar" appear next to each other in the
sorted list? I think "FOO123=bar" could come between.

I also think this is a special case of a more general problem. FOO could
appear any number of times in the "env" array, as a deletion or with
multiple values. Our prep_childenv() would treat that as "last one
wins", I think. Could we just do the same here?

-Peff