Re: [PATCH 7/6] run-command: block signals between fork and execve

2017-04-13 Thread Eric Wong
Eric Wong  wrote:
> Brandon Williams  wrote:
> > On 04/13, Eric Wong wrote:
> > > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, 
> > > struct child_err *cerr)
> > >   error_errno("exec '%s': cd to '%s' failed",
> > >   cmd->argv[0], cmd->dir);
> > >   break;
> > > + case CHILD_ERR_SIGPROCMASK:
> > > + error_errno("sigprocmask failed restoring signals");
> > 
> > missing a break statement here I'll add it in, in the re-roll.
> 
> Good catch, thanks!

Actually, I now wonder if that should be die_errno instead.
sigprocmask failures (EFAULT/EINVAL) would only be due
to programmer error.

In one of my minor projects(*), I do something like this:

# define CHECK(type, expect, expr) do { \
type checkvar = (expr); \
assert(checkvar == (expect) && "BUG" && __FILE__ && __LINE__); \
} while (0)

CHECK(int, 0, sigfillset());
CHECK(int, 0, sigemptyset());
CHECK(int, 0, pthread_sigmask(SIG_SETMASK, , NULL));

Dunno if it's considered good style or not, here.

(*) git clone git://bogomips.org/cmogstored


Re: [PATCH 7/6] run-command: block signals between fork and execve

2017-04-13 Thread Eric Wong
Brandon Williams  wrote:
> On 04/13, Eric Wong wrote:
> > @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, 
> > struct child_err *cerr)
> > error_errno("exec '%s': cd to '%s' failed",
> > cmd->argv[0], cmd->dir);
> > break;
> > +   case CHILD_ERR_SIGPROCMASK:
> > +   error_errno("sigprocmask failed restoring signals");
> 
> missing a break statement here I'll add it in, in the re-roll.

Good catch, thanks!

> > case CHILD_ERR_ENOENT:
> > error_errno("cannot run %s", cmd->argv[0]);
> > break;


Re: [PATCH 7/6] run-command: block signals between fork and execve

2017-04-13 Thread Brandon Williams
On 04/13, Eric Wong wrote:
> @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, 
> struct child_err *cerr)
>   error_errno("exec '%s': cd to '%s' failed",
>   cmd->argv[0], cmd->dir);
>   break;
> + case CHILD_ERR_SIGPROCMASK:
> + error_errno("sigprocmask failed restoring signals");

missing a break statement here I'll add it in, in the re-roll.

>   case CHILD_ERR_ENOENT:
>   error_errno("cannot run %s", cmd->argv[0]);
>   break;

-- 
Brandon Williams


Re: [PATCH v4 0/5] Kill manual ref parsing code in worktree.c

2017-04-13 Thread Junio C Hamano
Junio C Hamano  writes:

> Nguyễn Thái Ngọc Duy   writes:
>
>> v4 adds a new patch, 2/5, which makes the hashmap related functions in
>> refs.c generic, so I could add a new map for worktree ref stores and
>> not abuse submodule hashmap.
>>
>> I mentioned about moving these hashmap back to submodule.c and
>> worktree.c where it can map more than just ref stores (in worktree
>> case, we can cache 'struct worktree', for example). But I'm not doing
>> it now to keep things small.
>>
>> The commit message in 1/5 is rephrased a bit, hopefully clearer.
>
> Michael, does this look good to replace what has been queued?
>
> I did not spot any discussion on this topic after v4 when I skimmed
> the backlog messages, so I am hoping that I can pick this up without
> having to worry about seeing another reroll immediately after that
> (in which case I'd have been better off if I tended other topics in
> flight in the meantime).
>
> Thanks, both.

Oops, I shouldn't have done that.  When applied on top of the
files-backend thing (or have you updated that one and is my tree
lacking it???), this breaks quite a few tests.

t0001#41 dumps core from "git worktree add ../linked-worktree" for
example.





Re: [PATCH v4 0/5] Kill manual ref parsing code in worktree.c

2017-04-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> v4 adds a new patch, 2/5, which makes the hashmap related functions in
> refs.c generic, so I could add a new map for worktree ref stores and
> not abuse submodule hashmap.
>
> I mentioned about moving these hashmap back to submodule.c and
> worktree.c where it can map more than just ref stores (in worktree
> case, we can cache 'struct worktree', for example). But I'm not doing
> it now to keep things small.
>
> The commit message in 1/5 is rephrased a bit, hopefully clearer.

Michael, does this look good to replace what has been queued?

I did not spot any discussion on this topic after v4 when I skimmed
the backlog messages, so I am hoping that I can pick this up without
having to worry about seeing another reroll immediately after that
(in which case I'd have been better off if I tended other topics in
flight in the meantime).

Thanks, both.




Re: [PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 04:11:31PM -0700, Junio C Hamano wrote:

> g...@jeffhostetler.com writes:
> 
> > +   /*
> > +* Fetch the tree from the ODB for each peer directory in the
> > +* n commits.
> > +*
> > +* For 2- and 3-way traversals, we try to avoid hitting the
> > +* ODB twice for the same OID.  This should yield a nice speed
> > +* up in checkouts and merges when the commits are similar.
> > +*
> > +* We don't bother doing the full O(n^2) search for larger n,
> > +* because wider traversals don't happen that often and we
> > +* avoid the search setup.
> > +*
> > +* When 2 peer OIDs are the same, we just copy the tree
> > +* descriptor data.  This implicitly borrows the buffer
> > +* data from the earlier cell.
> 
> This "borrowing" made me worried, but it turns out that this is
> perfectly OK.
> 
> fill_tree_descriptor() uses read_sha1_file() to give a tree_desc its
> own copy of the tree object data, so the code that calls into the
> tree traversal API is responsible for freeing the buffer returned
> from fill_tree_descriptor().  The updated code avoids double freeing
> by setting buf[i] to NULL for borrowing [i].

Yeah, I think that logic is fine. We could also keep a separate counter
for the size of buf, like:

  buf[nr_buf++] = fill_tree_descriptor(t+i, sha1);

and then just count from zero to nr_buf in the freeing loop. I don't
think it matters much either way (the free(NULL) calls are presumably
cheap).

-Peff


Re: [PATCH] repack: respect gc.pid lock

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 1:27 PM, David Turner  wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. Make git repack respect this lock.
>
> Now repack, by default, will refuse to run at the same time as a gc.
> This fixes a concurrency issue: a repack which deleted packs would
> make a concurrent gc sad when its packs were deleted out from under
> it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
> cannot be accessed".  Then it would die, probably leaving a large temp
> pack hanging around.
>
> Git repack learns --no-lock, so that when run under git gc, it doesn't
> attempt to manage the lock itself.
>
> Martin Fick suggested just moving the lock into git repack, but this
> would leave parts of git gc (e.g. git prune) protected by only local
> locks.  I worried that a prune (part of git gc) concurrent with a
> repack could confuse the repack, so I decided to go with this
> solution.
>

The last paragraph could be reworded to be a bit less personal and
more as a direct statement of why moving the lock entirely to repack
is a bad idea.

> Signed-off-by: David Turner 
> ---
>  Documentation/git-repack.txt |  5 +++
>  Makefile |  1 +
>  builtin/gc.c | 72 ++--
>  builtin/repack.c | 13 +++
>  repack.c | 88 
> 
>  repack.h |  8 
>  t/t7700-repack.sh|  8 
>  7 files changed, 127 insertions(+), 68 deletions(-)
>  create mode 100644 repack.c
>  create mode 100644 repack.h
>
> diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
> index 26afe6ed54..b347ff5c62 100644
> --- a/Documentation/git-repack.txt
> +++ b/Documentation/git-repack.txt
> @@ -143,6 +143,11 @@ other objects in that pack they already have locally.
> being removed. In addition, any unreachable loose objects will
> be packed (and their loose counterparts removed).
>
> +--no-lock::
> +   Do not lock the repository, and do not respect any existing lock.
> +   Mostly useful for running repack within git gc.  Do not use this
> +   unless you know what you are doing.
> +

I would have phrased this more like:

  Used internally by git gc to call git repack while already holding
the lock. Do not use unless you know what you're doing.


Re: [PATCH v2 0/6] forking and threading

2017-04-13 Thread Brandon Williams
On 04/13, Jonathan Nieder wrote:
> Brandon Williams wrote:
> 
> > From what I can see, there are now no calls in the child process (after fork
> > and before exec/_exit) which are not Async-Signal-Safe.  This means that
> > fork/exec in a threaded context should work without deadlock
> 
> I don't see why the former implies the latter.  Can you explain
> further?
> 
> You already know my opinions about fork+threads by now.  I continue to
> think this is heading in a direction of decreased maintainability that
> I dread.

I disagree here.  No one thought it was a bad idea back when I was
implementing grep to fork while running with threads.  Id rather fix the
problem in run_command so that we don't ever have to worry about this
again, especially for contributors who are unaware of this issue while
threading.

> 
> That's not to say that this is wasted work.  I would prefer an approach
> like the following:
> 
>  1. First, make grep work without running fork() from threads.
> Jonathan Tan's approach would be one way to do this.  Another way
> would be to simply disable threads in --recurse-submodules mode.
> 
> This would be the first thing to do because it would make tests
> reliable again, without having to wait for deeper changes.

I'm not much of a fan of Jonathan Tan's suggestion, id rather just fix
the problem at its root instead of adding in an additional hack.  If
this series crashes and burns then yes, lets just shut off threading in
grep with --recurse-submodules is uses, otherwise this series will fix
that case.

> 
>  2. Then, teaching run_command to prepare the environment and do $PATH
> lookup before forking.  This might make it possible for run_command
> to use vfork or might not.
> 
>  3. Teaching run_command to delegate chdir to the child, using -C for
> git commands and 'cd' for shell commands, and using a shell where
> necessary where it didn't before.
> 
>  4. Switching run_command to use posix_spawn on platforms where it is
> available, which would make it possible to use in a threaded
> context on those platforms.

After this series it should be easy enough to convert to posix_spawn if
someone wants to follow up with a patch to do that.

-- 
Brandon Williams


Re: [PATCH 7/6] run-command: block signals between fork and execve

2017-04-13 Thread Brandon Williams
On 04/13, Eric Wong wrote:
> Brandon Williams  wrote:
> > v2 does a bit of restructuring based on comments from reviewers.  I took the
> > patch by Eric and broke it up and tweaked it a bit to flow better with v2.  
> > I
> > left out the part of Eric's patch which did signal manipulation as I wasn't
> > experienced enough to know what it was doing or why it was necessary.  
> > Though I
> > believe the code is structured in such a way that Eric could make another 
> > patch
> > on top of this series with just the signal changes.
> 
> Yeah, I think a separate commit message might be necessary to
> explain the signal changes.

Perfect!  I'll carry the changes along in the reroll.

> 
> ---8<-
> Subject: [PATCH] run-command: block signals between fork and execve
> 
> Signal handlers of the parent firing in the forked child may
> have unintended side effects.  Rather than auditing every signal
> handler we have and will ever have, block signals while forking
> and restore default signal handlers in the child before execve.
> 
> Restoring default signal handlers is required because
> execve does not unblock signals, it only restores default
> signal handlers.  So we must restore them with sigprocmask
> before execve, leaving a window when signal handlers
> we control can fire in the child.  Continue ignoring
> ignored signals, but reset the rest to defaults.
> 
> Similarly, disable pthread cancellation to future-proof our code
> in case we start using cancellation; as cancellation is
> implemented with signals in glibc.
> 
> Signed-off-by: Eric Wong 
> ---
>   Changes from my original in <20170411070534.GA10552@whir>:
> 
>   - fixed typo in NO_PTHREADS code
> 
>   - dropped fflush(NULL) before fork, consider us screwed anyways
> if the child uses stdio
> 
>   - respect SIG_IGN in child; that seems to be the prevailing
> wisdom from reading https://ewontfix.com/7/ and process.c
> in ruby (git clone https://github.com/ruby/ruby.git)
> 
>  run-command.c | 69 
> +++
>  1 file changed, 69 insertions(+)
> 
> diff --git a/run-command.c b/run-command.c
> index 1c36e692d..59a8b4806 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -213,6 +213,7 @@ static int child_notifier = -1;
>  
>  enum child_errcode {
>   CHILD_ERR_CHDIR,
> + CHILD_ERR_SIGPROCMASK,
>   CHILD_ERR_ENOENT,
>   CHILD_ERR_SILENT,
>   CHILD_ERR_ERRNO,
> @@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, 
> struct child_err *cerr)
>   error_errno("exec '%s': cd to '%s' failed",
>   cmd->argv[0], cmd->dir);
>   break;
> + case CHILD_ERR_SIGPROCMASK:
> + error_errno("sigprocmask failed restoring signals");
>   case CHILD_ERR_ENOENT:
>   error_errno("cannot run %s", cmd->argv[0]);
>   break;
> @@ -388,6 +391,53 @@ static char **prep_childenv(const char *const *deltaenv)
>  }
>  #endif
>  
> +struct atfork_state {
> +#ifndef NO_PTHREADS
> + int cs;
> +#endif
> + sigset_t old;
> +};
> +
> +#ifndef NO_PTHREADS
> +static void bug_die(int err, const char *msg)
> +{
> + if (err) {
> + errno = err;
> + die_errno("BUG: %s", msg);
> + }
> +}
> +#endif
> +
> +static void atfork_prepare(struct atfork_state *as)
> +{
> + sigset_t all;
> +
> + if (sigfillset())
> + die_errno("sigfillset");
> +#ifdef NO_PTHREADS
> + if (sigprocmask(SIG_SETMASK, , >old))
> + die_errno("sigprocmask");
> +#else
> + bug_die(pthread_sigmask(SIG_SETMASK, , >old),
> + "blocking all signals");
> + bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, >cs),
> + "disabling cancellation");
> +#endif
> +}
> +
> +static void atfork_parent(struct atfork_state *as)
> +{
> +#ifdef NO_PTHREADS
> + if (sigprocmask(SIG_SETMASK, >old, NULL))
> + die_errno("sigprocmask");
> +#else
> + bug_die(pthread_setcancelstate(as->cs, NULL),
> + "re-enabling cancellation");
> + bug_die(pthread_sigmask(SIG_SETMASK, >old, NULL),
> + "restoring signal mask");
> +#endif
> +}
> +
>  static inline void set_cloexec(int fd)
>  {
>   int flags = fcntl(fd, F_GETFD);
> @@ -511,6 +561,7 @@ int start_command(struct child_process *cmd)
>   char **childenv;
>   struct argv_array argv = ARGV_ARRAY_INIT;
>   struct child_err cerr;
> + struct atfork_state as;
>  
>   if (pipe(notify_pipe))
>   notify_pipe[0] = notify_pipe[1] = -1;
> @@ -524,6 +575,7 @@ int start_command(struct child_process *cmd)
>  
>   prepare_cmd(, cmd);
>   childenv = prep_childenv(cmd->env);
> + atfork_prepare();
>  
>   /*
>* NOTE: In order to prevent deadlocking when using threads special
> @@ -537,6 +589,7 @@ int start_command(struct child_process *cmd)
>   cmd->pid = fork();
>   failed_errno = 

Re: [PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout

2017-04-13 Thread Junio C Hamano
g...@jeffhostetler.com writes:

> + /*
> +  * Fetch the tree from the ODB for each peer directory in the
> +  * n commits.
> +  *
> +  * For 2- and 3-way traversals, we try to avoid hitting the
> +  * ODB twice for the same OID.  This should yield a nice speed
> +  * up in checkouts and merges when the commits are similar.
> +  *
> +  * We don't bother doing the full O(n^2) search for larger n,
> +  * because wider traversals don't happen that often and we
> +  * avoid the search setup.
> +  *
> +  * When 2 peer OIDs are the same, we just copy the tree
> +  * descriptor data.  This implicitly borrows the buffer
> +  * data from the earlier cell.

This "borrowing" made me worried, but it turns out that this is
perfectly OK.

fill_tree_descriptor() uses read_sha1_file() to give a tree_desc its
own copy of the tree object data, so the code that calls into the
tree traversal API is responsible for freeing the buffer returned
from fill_tree_descriptor().  The updated code avoids double freeing
by setting buf[i] to NULL for borrowing [i].

> +  */
>   for (i = 0; i < n; i++, dirmask >>= 1) {
> - const unsigned char *sha1 = NULL;
> - if (dirmask & 1)
> - sha1 = names[i].oid->hash;
> - buf[i] = fill_tree_descriptor(t+i, sha1);
> + if (i > 0 && are_same_oid([i], [i - 1])) {
> + t[i] = t[i - 1];
> + buf[i] = NULL;
> + } else if (i > 1 && are_same_oid([i], [i - 2])) {
> + t[i] = t[i - 2];
> + buf[i] = NULL;
> + } else {
> + const unsigned char *sha1 = NULL;
> + if (dirmask & 1)
> + sha1 = names[i].oid->hash;
> + buf[i] = fill_tree_descriptor(t+i, sha1);
> + }
>   }
>  
>   bottom = switch_cache_bottom();


Re: [PATCH] submodule--helper: fix typo in is_active error message

2017-04-13 Thread Brandon Williams
On 04/13, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index e42e671014..b1d4269e10 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1145,7 +1145,7 @@ static int absorb_git_dirs(int argc, const char **argv, 
> const char *prefix)
>  static int is_active(int argc, const char **argv, const char *prefix)
>  {
>   if (argc != 2)
> - die("submodule--helper is-active takes exactly 1 arguments");
> + die("submodule--helper is-active takes exactly 1 argument");

Obvious fix.  Thanks!

>  
>   gitmodules_config();
>  
> -- 
> 2.12.2.603.g7b28dc31ba
> 

-- 
Brandon Williams


Re: [PATCH] worktree add: add --lock option

2017-04-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> As explained in the document. This option has an advantage over the
> command sequence "git worktree add && git worktree lock": there will be
> no gap that somebody can accidentally "prune" the new worktree (or soon,
> explicitly "worktree remove" it).
>
> "worktree add" does keep a lock on while it's preparing the worktree.
> If --lock is specified, this lock remains after the worktree is created.
>
> Suggested-by: David Taylor 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  A patch that adds --lock may look like this.

This looks more like "I do believe the idea by David is a useful
addition and here is how I did it to the best of my ability---let's
make sure we polish it for eventual inclusion" than a mere "it may
look like so---do whatever you want with it" patch.

To me "git worktree add --lock" somehow sounds less correct than
"git worktree add --locked", but I'd appreciate if natives can
correct me.

Thanks.


Re: [PATCH v2 2/6] run-command: prepare command before forking

2017-04-13 Thread Brandon Williams
On 04/13, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > In order to avoid allocation between 'fork()' and 'exec()' the argv
> > array used in the exec call is prepared prior to forking the process.
> 
> nit: s/(the argv array.*) is prepared/prepare \1/
> 
> Git's commit messages are in the imperative mood, as if they are
> ordering the code or the computer to do something.
> 
> More importantly, the commit message is a good place to explain some
> of the motivation behind the patch so that people can understand what
> the patch is for by reading it without having to dig into mailing list
> archives and get the discussion there.
> 
> E.g. this could say
> 
> - that grep tests are intermittently failing in configurations using
>   some versions of tcmalloc
> 
> - that the cause is interaction between fork and threads: malloc holds
>   a lock that those versions of tcmalloc doesn't release in a
>   pthread_atfork handler
> 
> - that according to [1] we need to only call async-signal-safe
>   operations between fork and exec.  Using malloc to build the argv
>   array isn't async-signal-safe
> 
> [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
> 
> > In addition to this, the function used to exec is changed from
> > 'execvp()' to 'execv()' as the (p) variant of exec has the potential to
> > call malloc during the path resolution it performs.
> 
> *puzzled* is execvp actually allowed to call malloc?

It could possible as it isn't async-signal-safe.

> 
> Could this part go in a separate patch?  That would make it easier to
> review.

I'll break this conversion out to a different patch.

> 
> [...]
> > +++ b/run-command.c
> [...]
> > +   /*
> > +* If there are no '/' characters in the command then perform a path
> > +* lookup and use the resolved path as the command to exec.  If there
> > +* are no '/' characters or if the command wasn't found in the path,
> > +* have exec attempt to invoke the command directly.
> > +*/
> > +   if (!strchr(out->argv[0], '/')) {
> > +   char *program = locate_in_PATH(out->argv[0]);
> > +   if (program) {
> > +   free((char *)out->argv[0]);
> > +   out->argv[0] = program;
> > +   }
> > +   }
> 
> This does one half of what execvp does but leaves out the other half.
> http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html
> explains:
> 
>   There are two distinct ways in which the contents of the process
>   image file may cause the execution to fail, distinguished by the
>   setting of errno to either [ENOEXEC] or [EINVAL] (see the ERRORS
>   section). In the cases where the other members of the exec family of
>   functions would fail and set errno to [ENOEXEC], the execlp() and
>   execvp() functions shall execute a command interpreter and the
>   environment of the executed command shall be as if the process
>   invoked the sh utility using execl() as follows:
> 
>   execl(, arg0, file, arg1, ..., (char *)0);
> 
> I think this is what the first patch in the series was about.  Do we
> want to drop that support?
> 
> I think we need to keep it, since it is easy for authors of e.g.
> credential helpers to accidentally rely on it.
> 
> [...]
> > @@ -437,12 +458,9 @@ int start_command(struct child_process *cmd)
> > unsetenv(*cmd->env);
> > }
> > }
> > -   if (cmd->git_cmd)
> > -   execv_git_cmd(cmd->argv);
> > -   else if (cmd->use_shell)
> > -   execv_shell_cmd(cmd->argv);
> > -   else
> > -   sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
> > +
> > +   execv(argv.argv[0], (char *const *) argv.argv);
> 
> What happens in the case sane_execvp was trying to handle?  Does
> prepare_cmd need error handling for when the command isn't found?
> 
> Sorry this got so fussy.
> 
> Thanks and hope that helps,
> Jonathan

-- 
Brandon Williams


Re: [PATCH v1 0/3] travis-ci: build docs with asciidoctor

2017-04-13 Thread Junio C Hamano
Lars Schneider  writes:

> this is a mini series to build the documentation with asciidoctor in
> addition to asciidoc on Travis-CI.

Overall, this looks sensible.  I didn't spot anything questionable
other than a minor style nit, i.e. write these

make doc USE_ASCIIDOCTOR=1
make -j2 doc USE_ASCIIDOCTOR=1

more like this

make USE_ASCIIDOCTOR=1 doc
make -j2 USE_ASCIIDOCTOR=1 doc

Having said that, I wonder if we get some interesting results out of
building the documentation twice, though.  By looking at the Travis
log with timestamps, we probably can see how long each build takes,
but that is much less interesting than learning if new versions of
text used mark-up that does not format correctly on one or the other
(i.e. catch documentation breakage early in each CI run), for
example.  I have an impression that neither AsciiDoc nor AsciiDoctor
"fails" in an obvious way that "make" can notice (i.e. they often
just silently produce nonsense output when fed a malformed input
instead).

Thanks.


Re: [PATCH] xgethostname: handle long hostnames

2017-04-13 Thread René Scharfe
Am 13.04.2017 um 21:23 schrieb David Turner:
> If the full hostname doesn't fit in the buffer supplied to
> gethostname, POSIX does not specify whether the buffer will be
> null-terminated, so to be safe, we should do it ourselves.  Introduce
> new function, xgethostname, which ensures that there is always a \0
> at the end of the buffer.
> 
> Signed-off-by: David Turner 
> ---

> diff --git a/wrapper.c b/wrapper.c
> index 0542fc7582..d837417709 100644
> --- a/wrapper.c
> +++ b/wrapper.c
> @@ -655,3 +655,16 @@ void sleep_millisec(int millisec)
>   {
>   poll(NULL, 0, millisec);
>   }
> +
> +int xgethostname(char *buf, size_t len)
> +{
> + /*
> +  * If the full hostname doesn't fit in buf, POSIX does not
> +  * specify whether the buffer will be null-terminated, so to
> +  * be safe, do it ourselves.
> +  */
> + int ret = gethostname(buf, len);
> + if (!ret)
> + buf[len - 1] = 0;
> + return ret;
> +}

Silent truncation is not ideal, no matter if it's done by the wrapper or
the original function.  It would be better to use a properly sized
buffer.

POSIX requires hostnames to have a maximum length of HOST_NAME_MAX.  So
how about just adding an assert to make sure len is big enough?  Or
evaluate the condition at compile time with BUILD_ASSERT_OR_ZERO?

Downside: Not all platforms define HOST_NAME_MAX.  daemon.c uses 256 as
a fallback.  On Windows a buffer size of 256 is documented to suffice
in all cases [1].  The Linux manpage [2] mentions a hostname length
limit of 255 (plus NUL) as well, even though HOST_NAME_MAX is 64 there.

Another possibility: Die (or at least warn) if the buffer doesn't
contain a NUL byte after calling gethostname().  That only works for
platforms that don't NUL-terminate on truncation, though, so silent
truncation would still go unnoticed.

Anyway, the buffer in builtin/gc.c with its 128 bytes seems to be too
short; the others are at least 256 bytes long.  Replacing the magic
buffer size number with HOST_NAME_MAX + 1 might be a good idea (after
moving the fallback definition to git-compat-util.h).

René


[1] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms738527(v=vs.85).aspx
[2] http://man7.org/linux/man-pages/man2/gethostname.2.html


RE: [PATCH] xgethostname: handle long hostnames

2017-04-13 Thread David Turner
> -Original Message-
> From: Jonathan Nieder [mailto:jrnie...@gmail.com]
> Sent: Thursday, April 13, 2017 6:05 PM
> To: David Turner 
> Cc: git@vger.kernel.org
> Subject: Re: [PATCH] xgethostname: handle long hostnames
> 
> Hi,
> 
> David Turner wrote:
> 
> > If the full hostname doesn't fit in the buffer supplied to
> > gethostname, POSIX does not specify whether the buffer will be
> > null-terminated, so to be safe, we should do it ourselves.
> [...]
> > +++ b/wrapper.c
> > @@ -655,3 +655,16 @@ void sleep_millisec(int millisec)  {
> > poll(NULL, 0, millisec);
> >  }
> > +
> > +int xgethostname(char *buf, size_t len) {
> > +   /*
> > +* If the full hostname doesn't fit in buf, POSIX does not
> > +* specify whether the buffer will be null-terminated, so to
> > +* be safe, do it ourselves.
> > +*/
> > +   int ret = gethostname(buf, len);
> > +   if (!ret)
> > +   buf[len - 1] = 0;
> > +   return ret;
> 
> I wonder if after null-terminating we would want to report this as an error,
> instead of silently using a truncated result.  I.e. something like
> 
> > +   if (!ret)
> > +   buf[len - 1] = 0;
> > +   if (strlen(buf) >= len - 1) {
> > +   errno = ENAMETOOLONG;
> > +   return -1;
> > +   }
>
> (or EINVAL --- either is equally descriptive).

Looking at the users of this function, I think most would be happier with a 
truncated buffer than an error:
gc.c: used to see if we are the same machine as the machine that locked the 
repo. Unlikely that two machines have hostnames that differ only in the 
256th-or-above character.
fetch-pack.c, receive-pack.c: similar to gc.c; the hostname is a note in the 
.keep file
Ident.c: used to make up a fake email address. On my laptop, gethostname 
returns "corey" (no domain part), so the email address is not likely to be 
valid anyway.

> Also POSIX requires that hostnames are <= 255 bytes.  Maybe we can force the
> buffer to be large enough.

That is now how I read it.  I read the limit as HOST_NAME_MAX, which has a 
*minimum* value of 255, but which might be larger.

The existing hostname buffers are 128, 256, and 1024 bytes, so they're pretty 
arbitrary.  



[PATCH] submodule--helper: fix typo in is_active error message

2017-04-13 Thread Stefan Beller
Signed-off-by: Stefan Beller 
---
 builtin/submodule--helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e42e671014..b1d4269e10 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1145,7 +1145,7 @@ static int absorb_git_dirs(int argc, const char **argv, 
const char *prefix)
 static int is_active(int argc, const char **argv, const char *prefix)
 {
if (argc != 2)
-   die("submodule--helper is-active takes exactly 1 arguments");
+   die("submodule--helper is-active takes exactly 1 argument");
 
gitmodules_config();
 
-- 
2.12.2.603.g7b28dc31ba



Re: [PATCH] xgethostname: handle long hostnames

2017-04-13 Thread Jonathan Nieder
Hi,

David Turner wrote:

> If the full hostname doesn't fit in the buffer supplied to
> gethostname, POSIX does not specify whether the buffer will be
> null-terminated, so to be safe, we should do it ourselves.
[...]
> +++ b/wrapper.c
> @@ -655,3 +655,16 @@ void sleep_millisec(int millisec)
>  {
>   poll(NULL, 0, millisec);
>  }
> +
> +int xgethostname(char *buf, size_t len)
> +{
> + /*
> +  * If the full hostname doesn't fit in buf, POSIX does not
> +  * specify whether the buffer will be null-terminated, so to
> +  * be safe, do it ourselves.
> +  */
> + int ret = gethostname(buf, len);
> + if (!ret)
> + buf[len - 1] = 0;
> + return ret;

I wonder if after null-terminating we would want to report this as
an error, instead of silently using a truncated result.  I.e. something
like

> + if (!ret)
> + buf[len - 1] = 0;
> + if (strlen(buf) >= len - 1) {
> + errno = ENAMETOOLONG;
> + return -1;
> + }

(or EINVAL --- either is equally descriptive).

Also POSIX requires that hostnames are <= 255 bytes.  Maybe we can
force the buffer to be large enough.

Thoughts?
Jonathan


Re: [PATCH 1/1] difftool: fix use-after-free

2017-04-13 Thread Jonathan Nieder
Johannes Schindelin wrote:

> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/difftool.c  |  7 +--
>  t/t7800-difftool.sh | 19 +++
>  2 files changed, 24 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder 

Thank you.


Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook

2017-04-13 Thread Eric Wong
Brandon Williams  wrote:
> On 04/13, Eric Wong wrote:
> > Jonathan Nieder  wrote:
> > > Brandon Williams wrote:
> > > > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
> > > > "!#/bin/sh" line which can cause issues with portability.  Instead
> > > > create the hook using the 'write_script' function which includes the
> > > > proper "#!" line.
> > 
> > > This would allow later patches to regress a previously supported
> > > behavior.
> > > 
> > > I agree that it's silly to test that behavior as a side-effect of this
> > > unrelated test, but I don't think we want to lose the test coverage.
> > 
> > I was about to write something similar about this regression.
> > The new execve-using code should handle ENOEXEC as execvpe does
> > and probably a new test for it needs to be written.
> 
> Would it be enough to upon seeing a failed exec call and ENOEXEC to
> retry a single time, invoking the shell to attempt to interpret the
> command?

Yes, that's exactly what glibc does.


Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook

2017-04-13 Thread Brandon Williams
On 04/13, Eric Wong wrote:
> Jonathan Nieder  wrote:
> > Brandon Williams wrote:
> > > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
> > > "!#/bin/sh" line which can cause issues with portability.  Instead
> > > create the hook using the 'write_script' function which includes the
> > > proper "#!" line.
> 
> > This would allow later patches to regress a previously supported
> > behavior.
> > 
> > I agree that it's silly to test that behavior as a side-effect of this
> > unrelated test, but I don't think we want to lose the test coverage.
> 
> I was about to write something similar about this regression.
> The new execve-using code should handle ENOEXEC as execvpe does
> and probably a new test for it needs to be written.

Would it be enough to upon seeing a failed exec call and ENOEXEC to
retry a single time, invoking the shell to attempt to interpret the
command?

-- 
Brandon Williams


Re: [PATCH] Make git log work for git CWD outside of work tree

2017-04-13 Thread Jeff King
On Wed, Apr 12, 2017 at 08:11:22PM +0700, Duy Nguyen wrote:

> On Wed, Apr 12, 2017 at 8:01 PM, Jeff King  wrote:
> > I dunno. Maybe I am missing some subtle case, but it's not clear to me
> > what the user would be trying to do by having git stay in the original
> > directory.
> 
> Not sure if it really happens. But if we jump from worktree is inside
> original cwd and we have to jump to worktree, we set empty prefix, not
> "../../../" one. So we can't refer back to files relative to original
> cwd with prefix. I was kinda hoping "super prefix" would solve it, but
> that one seems designed specifically for submodules.

Ah, right. I think the issue is that "prefix" really serves two uses.
For things like command-line arguments, we use to find the original path
after we've moved. But we also use it for "where in the working tree
are we".

So with an out-of-tree cwd, we'd want to set the first one to the real
path ("../../whatever" or even just an absolute path), but the second
one would probably be empty (i.e., just pretend that we started from the
top-level).

But that would require first refactoring all of the cmd_* functions to
handle those prefixes separately.

-Peff


Re: [PATCH] xgethostname: handle long hostnames

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 03:23:35PM -0400, David Turner wrote:

> If the full hostname doesn't fit in the buffer supplied to
> gethostname, POSIX does not specify whether the buffer will be
> null-terminated

Wow, TIL. What an utterly terrible and error-prone interface (I always
just assumed we'd get ENAMETOOLONG, which is what glibc does).

> so to be safe, we should do it ourselves.  Introduce
> new function, xgethostname, which ensures that there is always a \0
> at the end of the buffer.

Your patch looks good to me.

-Peff


Re: [PATCH v2 2/6] run-command: prepare command before forking

2017-04-13 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> In order to avoid allocation between 'fork()' and 'exec()' the argv
> array used in the exec call is prepared prior to forking the process.

nit: s/(the argv array.*) is prepared/prepare \1/

Git's commit messages are in the imperative mood, as if they are
ordering the code or the computer to do something.

More importantly, the commit message is a good place to explain some
of the motivation behind the patch so that people can understand what
the patch is for by reading it without having to dig into mailing list
archives and get the discussion there.

E.g. this could say

- that grep tests are intermittently failing in configurations using
  some versions of tcmalloc

- that the cause is interaction between fork and threads: malloc holds
  a lock that those versions of tcmalloc doesn't release in a
  pthread_atfork handler

- that according to [1] we need to only call async-signal-safe
  operations between fork and exec.  Using malloc to build the argv
  array isn't async-signal-safe

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

> In addition to this, the function used to exec is changed from
> 'execvp()' to 'execv()' as the (p) variant of exec has the potential to
> call malloc during the path resolution it performs.

*puzzled* is execvp actually allowed to call malloc?

Could this part go in a separate patch?  That would make it easier to
review.

[...]
> +++ b/run-command.c
[...]
> + /*
> +  * If there are no '/' characters in the command then perform a path
> +  * lookup and use the resolved path as the command to exec.  If there
> +  * are no '/' characters or if the command wasn't found in the path,
> +  * have exec attempt to invoke the command directly.
> +  */
> + if (!strchr(out->argv[0], '/')) {
> + char *program = locate_in_PATH(out->argv[0]);
> + if (program) {
> + free((char *)out->argv[0]);
> + out->argv[0] = program;
> + }
> + }

This does one half of what execvp does but leaves out the other half.
http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html
explains:

  There are two distinct ways in which the contents of the process
  image file may cause the execution to fail, distinguished by the
  setting of errno to either [ENOEXEC] or [EINVAL] (see the ERRORS
  section). In the cases where the other members of the exec family of
  functions would fail and set errno to [ENOEXEC], the execlp() and
  execvp() functions shall execute a command interpreter and the
  environment of the executed command shall be as if the process
  invoked the sh utility using execl() as follows:

  execl(, arg0, file, arg1, ..., (char *)0);

I think this is what the first patch in the series was about.  Do we
want to drop that support?

I think we need to keep it, since it is easy for authors of e.g.
credential helpers to accidentally rely on it.

[...]
> @@ -437,12 +458,9 @@ int start_command(struct child_process *cmd)
>   unsetenv(*cmd->env);
>   }
>   }
> - if (cmd->git_cmd)
> - execv_git_cmd(cmd->argv);
> - else if (cmd->use_shell)
> - execv_shell_cmd(cmd->argv);
> - else
> - sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
> +
> + execv(argv.argv[0], (char *const *) argv.argv);

What happens in the case sane_execvp was trying to handle?  Does
prepare_cmd need error handling for when the command isn't found?

Sorry this got so fussy.

Thanks and hope that helps,
Jonathan


[PATCH 7/6] run-command: block signals between fork and execve

2017-04-13 Thread Eric Wong
Brandon Williams  wrote:
> v2 does a bit of restructuring based on comments from reviewers.  I took the
> patch by Eric and broke it up and tweaked it a bit to flow better with v2.  I
> left out the part of Eric's patch which did signal manipulation as I wasn't
> experienced enough to know what it was doing or why it was necessary.  Though 
> I
> believe the code is structured in such a way that Eric could make another 
> patch
> on top of this series with just the signal changes.

Yeah, I think a separate commit message might be necessary to
explain the signal changes.

---8<-
Subject: [PATCH] run-command: block signals between fork and execve

Signal handlers of the parent firing in the forked child may
have unintended side effects.  Rather than auditing every signal
handler we have and will ever have, block signals while forking
and restore default signal handlers in the child before execve.

Restoring default signal handlers is required because
execve does not unblock signals, it only restores default
signal handlers.  So we must restore them with sigprocmask
before execve, leaving a window when signal handlers
we control can fire in the child.  Continue ignoring
ignored signals, but reset the rest to defaults.

Similarly, disable pthread cancellation to future-proof our code
in case we start using cancellation; as cancellation is
implemented with signals in glibc.

Signed-off-by: Eric Wong 
---
  Changes from my original in <20170411070534.GA10552@whir>:

  - fixed typo in NO_PTHREADS code

  - dropped fflush(NULL) before fork, consider us screwed anyways
if the child uses stdio

  - respect SIG_IGN in child; that seems to be the prevailing
wisdom from reading https://ewontfix.com/7/ and process.c
in ruby (git clone https://github.com/ruby/ruby.git)

 run-command.c | 69 +++
 1 file changed, 69 insertions(+)

diff --git a/run-command.c b/run-command.c
index 1c36e692d..59a8b4806 100644
--- a/run-command.c
+++ b/run-command.c
@@ -213,6 +213,7 @@ static int child_notifier = -1;
 
 enum child_errcode {
CHILD_ERR_CHDIR,
+   CHILD_ERR_SIGPROCMASK,
CHILD_ERR_ENOENT,
CHILD_ERR_SILENT,
CHILD_ERR_ERRNO,
@@ -277,6 +278,8 @@ static void child_err_spew(struct child_process *cmd, 
struct child_err *cerr)
error_errno("exec '%s': cd to '%s' failed",
cmd->argv[0], cmd->dir);
break;
+   case CHILD_ERR_SIGPROCMASK:
+   error_errno("sigprocmask failed restoring signals");
case CHILD_ERR_ENOENT:
error_errno("cannot run %s", cmd->argv[0]);
break;
@@ -388,6 +391,53 @@ static char **prep_childenv(const char *const *deltaenv)
 }
 #endif
 
+struct atfork_state {
+#ifndef NO_PTHREADS
+   int cs;
+#endif
+   sigset_t old;
+};
+
+#ifndef NO_PTHREADS
+static void bug_die(int err, const char *msg)
+{
+   if (err) {
+   errno = err;
+   die_errno("BUG: %s", msg);
+   }
+}
+#endif
+
+static void atfork_prepare(struct atfork_state *as)
+{
+   sigset_t all;
+
+   if (sigfillset())
+   die_errno("sigfillset");
+#ifdef NO_PTHREADS
+   if (sigprocmask(SIG_SETMASK, , >old))
+   die_errno("sigprocmask");
+#else
+   bug_die(pthread_sigmask(SIG_SETMASK, , >old),
+   "blocking all signals");
+   bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, >cs),
+   "disabling cancellation");
+#endif
+}
+
+static void atfork_parent(struct atfork_state *as)
+{
+#ifdef NO_PTHREADS
+   if (sigprocmask(SIG_SETMASK, >old, NULL))
+   die_errno("sigprocmask");
+#else
+   bug_die(pthread_setcancelstate(as->cs, NULL),
+   "re-enabling cancellation");
+   bug_die(pthread_sigmask(SIG_SETMASK, >old, NULL),
+   "restoring signal mask");
+#endif
+}
+
 static inline void set_cloexec(int fd)
 {
int flags = fcntl(fd, F_GETFD);
@@ -511,6 +561,7 @@ int start_command(struct child_process *cmd)
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
struct child_err cerr;
+   struct atfork_state as;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
@@ -524,6 +575,7 @@ int start_command(struct child_process *cmd)
 
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
+   atfork_prepare();
 
/*
 * NOTE: In order to prevent deadlocking when using threads special
@@ -537,6 +589,7 @@ int start_command(struct child_process *cmd)
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
+   int sig;
/*
 * Ensure the default die/error/warn routines do not get
 * called, they can take stdio locks and malloc.
@@ -584,6 +637,21 @@ int start_command(struct child_process *cmd)
 

Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook

2017-04-13 Thread Eric Wong
Jonathan Nieder  wrote:
> Brandon Williams wrote:
> > The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
> > "!#/bin/sh" line which can cause issues with portability.  Instead
> > create the hook using the 'write_script' function which includes the
> > proper "#!" line.

> This would allow later patches to regress a previously supported
> behavior.
> 
> I agree that it's silly to test that behavior as a side-effect of this
> unrelated test, but I don't think we want to lose the test coverage.

I was about to write something similar about this regression.
The new execve-using code should handle ENOEXEC as execvpe does
and probably a new test for it needs to be written.


Re: [PATCH v2 0/6] forking and threading

2017-04-13 Thread Jonathan Nieder
Brandon Williams wrote:

> From what I can see, there are now no calls in the child process (after fork
> and before exec/_exit) which are not Async-Signal-Safe.  This means that
> fork/exec in a threaded context should work without deadlock

I don't see why the former implies the latter.  Can you explain
further?

You already know my opinions about fork+threads by now.  I continue to
think this is heading in a direction of decreased maintainability that
I dread.

That's not to say that this is wasted work.  I would prefer an approach
like the following:

 1. First, make grep work without running fork() from threads.
Jonathan Tan's approach would be one way to do this.  Another way
would be to simply disable threads in --recurse-submodules mode.

This would be the first thing to do because it would make tests
reliable again, without having to wait for deeper changes.

 2. Then, teaching run_command to prepare the environment and do $PATH
lookup before forking.  This might make it possible for run_command
to use vfork or might not.

 3. Teaching run_command to delegate chdir to the child, using -C for
git commands and 'cd' for shell commands, and using a shell where
necessary where it didn't before.

 4. Switching run_command to use posix_spawn on platforms where it is
available, which would make it possible to use in a threaded
context on those platforms.

Thoughts?
Jonathan


Re: [PATCH v2 1/6] t5550: use write_script to generate post-update hook

2017-04-13 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
> "!#/bin/sh" line which can cause issues with portability.  Instead
> create the hook using the 'write_script' function which includes the
> proper "#!" line.
>
> Signed-off-by: Brandon Williams 
> ---
>  t/t5550-http-fetch-dumb.sh | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

This would allow later patches to regress a previously supported
behavior.

I agree that it's silly to test that behavior as a side-effect of this
unrelated test, but I don't think we want to lose the test coverage.

Thanks for tracking this down and hope that helps,
Jonathan


"up-to-date" vs. "up to date"

2017-04-13 Thread Jeffrey Manian
Hello git community,

This is about an issue of language style and punctuation, not anything
functional. Apologies in advance if I've brought this to the wrong
place.

There are a bunch of situations in which git will print a message like
"Your branch is up-to-date with 'origin/master'" or "Already
up-to-date."

In many of these cases, including the two examples I just gave, "up to
date" should not be hyphenated --- at least according to most (if not
all) English-language style guides.

Here are a couple posts in support of that, which also explain when it
should and should not be hyphenated:
https://english.stackexchange.com/questions/180611/do-i-keep-myself-up-to-date-or-up-to-date-on-something
http://grammarist.com/usage/up-to-date/

And the Chromium community dealing with the same issue:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-reviews/edodON6G2oY

I thought about submitting a patch, but I started looking through the
source code and found that "up-to-date" appears 61 times. So before I
get into that I thought I would check with the community to see if
this is something that's already been debated and decided.

Awaiting your thoughts,
Jeff Manian


[PATCH] repack: respect gc.pid lock

2017-04-13 Thread David Turner
Git gc locks the repository (using a gc.pid file) so that other gcs
don't run concurrently. Make git repack respect this lock.

Now repack, by default, will refuse to run at the same time as a gc.
This fixes a concurrency issue: a repack which deleted packs would
make a concurrent gc sad when its packs were deleted out from under
it.  The gc would fail with: "fatal: ./objects/pack/pack-$sha.pack
cannot be accessed".  Then it would die, probably leaving a large temp
pack hanging around.

Git repack learns --no-lock, so that when run under git gc, it doesn't
attempt to manage the lock itself.

Martin Fick suggested just moving the lock into git repack, but this
would leave parts of git gc (e.g. git prune) protected by only local
locks.  I worried that a prune (part of git gc) concurrent with a
repack could confuse the repack, so I decided to go with this
solution.

Signed-off-by: David Turner 
---
 Documentation/git-repack.txt |  5 +++
 Makefile |  1 +
 builtin/gc.c | 72 ++--
 builtin/repack.c | 13 +++
 repack.c | 88 
 repack.h |  8 
 t/t7700-repack.sh|  8 
 7 files changed, 127 insertions(+), 68 deletions(-)
 create mode 100644 repack.c
 create mode 100644 repack.h

diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index 26afe6ed54..b347ff5c62 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -143,6 +143,11 @@ other objects in that pack they already have locally.
being removed. In addition, any unreachable loose objects will
be packed (and their loose counterparts removed).
 
+--no-lock::
+   Do not lock the repository, and do not respect any existing lock.
+   Mostly useful for running repack within git gc.  Do not use this
+   unless you know what you are doing.
+
 Configuration
 -
 
diff --git a/Makefile b/Makefile
index 9b36068ac5..7095f03959 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
+LIB_OBJS += repack.o
 LIB_OBJS += replace_object.o
 LIB_OBJS += rerere.o
 LIB_OBJS += resolve-undo.o
diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57bb..9b9c27020b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -18,6 +18,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "repack.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -45,7 +46,6 @@ static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
 
-static struct tempfile pidfile;
 static struct lock_file log_lock;
 
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -234,70 +234,6 @@ static int need_to_gc(void)
return 1;
 }
 
-/* return NULL on success, else hostname running the gc */
-static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
-{
-   static struct lock_file lock;
-   char my_host[128];
-   struct strbuf sb = STRBUF_INIT;
-   struct stat st;
-   uintmax_t pid;
-   FILE *fp;
-   int fd;
-   char *pidfile_path;
-
-   if (is_tempfile_active())
-   /* already locked */
-   return NULL;
-
-   if (gethostname(my_host, sizeof(my_host)))
-   xsnprintf(my_host, sizeof(my_host), "unknown");
-
-   pidfile_path = git_pathdup("gc.pid");
-   fd = hold_lock_file_for_update(, pidfile_path,
-  LOCK_DIE_ON_ERROR);
-   if (!force) {
-   static char locking_host[128];
-   int should_exit;
-   fp = fopen(pidfile_path, "r");
-   memset(locking_host, 0, sizeof(locking_host));
-   should_exit =
-   fp != NULL &&
-   !fstat(fileno(fp), ) &&
-   /*
-* 12 hour limit is very generous as gc should
-* never take that long. On the other hand we
-* don't really need a strict limit here,
-* running gc --auto one day late is not a big
-* problem. --force can be used in manual gc
-* after the user verifies that no gc is
-* running.
-*/
-   time(NULL) - st.st_mtime <= 12 * 3600 &&
-   fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
&&
-   /* be gentle to concurrent "gc" on remote hosts */
-   (strcmp(locking_host, my_host) || !kill(pid, 0) || 
errno == EPERM);
-   if (fp != NULL)
-   fclose(fp);
-   if (should_exit) {
-   if 

Re: Proposal for "fetch-any-blob Git protocol" and server design

2017-04-13 Thread Jonathan Tan

On 04/12/2017 03:02 PM, Kevin David wrote:

Hi Jonathan,

I work on the network protocols for the GVFS project at Microsoft.
I shared a couple thoughts and questions below.


Thanks for your reply!


I know we're considering server behavior here, but how large do you generally
expect these blob-want requests to be? I ask because we took an initial approach
very similar to this, however, we had a hard time being clever about figuring 
out
what set of blobs to request for those clients that didn't want the entire set, 
and
ended up falling back to single-blob requests.

Obviously, this could be due to thenature of our 
filesystem-virtualization-based client,
but I also suspect that the teams attacking this problem are more often than 
not dealing
with very large blob objects, so the cost of a round-trip becomes lower 
relative to sending
the object content itself.


I am envisioning (1a) as described in Jeff Hostetler's e-mail [1] ("a 
pre-command or hook to identify needed blobs and pre-fetch them before 
allowing the actual command to start"), so a Git command would typically 
make a single request that contains all the blobs required, but my 
proposal can also handle (1c) ('"fault" them in as necessary in 
read_object() while the command is running and without any pre-fetch 
(either synchronously or asynchronously and with/without a helper 
process)').


Even if we decided to go with single-blob requests and responses, it is 
still important to send them as packfiles, so that the server can serve 
them directly from its compressed storage without first having to 
uncompress them.


[1] 
https://public-inbox.org/git/1488999039-37631-1-git-send-email-...@jeffhostetler.com/



Along the same lines as above, this is where we started and it worked well for
low-volume requests. However, when we started ramping up the load,
`pack-objects` operating on a very large packed repository (~150 GiB) became
very computationally expensive, even with `--compression=1 --depth=0 
--window=0`.

Being a bit more clever about packing objects (e.g. splitting blobs out from 
commits
and trees) improved this a bit, but we still hit a bottlenecks from what 
appeared to
be a large number of memory-mapping operations on a ~140GiB packfile of blobs.

Each `pack-objects` process would consume approximately one CPU core for the
duration of the request. It's possible that further splitting of these large 
blob packs
would have improved performance in some scenarios, but that would increase the
amount of pack-index lookups necessary to find a single object.


I'm not very experienced with mmap, but I thought that memory-mapping a 
large file in itself does not incur much of a performance penalty (if 
any) - it is the accesses that count. I experimented with 15,000 and 
150,000 MiB files and mmap and they seem to be handled quite well. Also, 
how many objects are "pack-objects" packing here?



=== Endpoint support for forward compatibility

This "server" endpoint requires that the first line be understood, but
will ignore any other lines starting with words that it does not
understand. This allows new "commands" to be added (distinguished by
their first lines) and existing commands to be "upgraded" with backwards 
compatibility.


This seems like a clever way to avoid the canonical 
`/info/refs?service=git-upload-pack`
capability negotiation on every call. However, using error handling to fallback 
seems
slightly wonky to me. Hopefully users are incentivized to upgrade their clients.


By "error handling to fallback", do you mean in my proposal or in a 
possible future one (assuming my proposal is implemented)? I don't think 
my proposal requires any error handling to fallback (since only new 
clients can clone partially - old clients will just clone totally and 
obliviously), but I acknowledge that this proposal does not mean that 
any future proposal can be done without requiring error handling to 
fallback.



=== Indication to use the proposed endpoint

The client will probably already record that at least one of its
remotes (the one that it successfully performed a "partial clone"
from) supports this new endpoint (if not, it can’t determine whether a
missing blob was caused by repo corruption or by the "partial clone").
This knowledge can be used both to know that the server supports
"fetch-blob-pack" and "fetch-commit-pack" (for the latter, the client
can fall back to "fetch-pack"/"upload-pack" when fetching from other servers).


This makes a lot of sense to me. When we built our caching proxy, we had to be 
careful
when designing how we'd handle clients requesting objects missing from the 
proxy.

For example, a client requests a single blob and the proxy doesn't have it - we 
can't simply
download that object from the "authoritative" remote and stick it in the 
`.git\objects\xx\yyy...`
directory, because the repository would be made corrupt.


By proxy, do you mean a Git repository? Sorry, I don't really understand 
this part.

Re: [PATCH v2 4/6] run-command: don't die in child when duping /dev/null

2017-04-13 Thread Brandon Williams
On 04/13, Eric Wong wrote:
> Brandon Williams  wrote:
> > @@ -487,7 +483,7 @@ int start_command(struct child_process *cmd)
> > atexit(notify_parent);
> >  
> > if (cmd->no_stdin)
> > -   dup_devnull(0);
> > +   dup2(null_fd, 0);
> 
> I prefer we keep error checking for dup2 failures,
> and also add more error checking for unchecked dup2 calls.
> Can be a separate patch, I suppose.
> 
> Ditto for other dup2 changes

I simply figured that since we weren't doing the checks on the other
dup2 calls that I would keep it consistent.  But if we wanted to add in
more error checking then we can can in another patch.

> 
> > @@ -558,6 +554,8 @@ int start_command(struct child_process *cmd)
> > }
> > close(notify_pipe[0]);
> >  
> > +   if (null_fd > 0)
> > +   close(null_fd);
> 
> I would prefer:
> 
>   if (null_fd >= 0)
> 
> here, even if we currently do not release stdin.

K will do.

> 
> > argv_array_clear();
> > free(childenv);

-- 
Brandon Williams


Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread SZEDER Gábor
On Thu, Apr 13, 2017 at 9:12 PM, Jeff King  wrote:
> On Thu, Apr 13, 2017 at 09:03:26PM +0200, SZEDER Gábor wrote:
>
>> > Yeah, I had a similar thought. I can't think of any reason why it would
>> > trigger a false positive, as long as we account for test-failure and the
>> > --debug flag properly. I don't think we can just drop the "-f" from the
>> > final "rm", because it may be overriding funny permissions left by
>> > tests.
>>
>> FWIW, I used the following rather simple change during stress-testing
>> these patches (barring white-space damage):
>>
>> diff --git a/t/test-lib.sh b/t/test-lib.sh
>> index 13b569682..d7fa15a69 100644
>> --- a/t/test-lib.sh
>> +++ b/t/test-lib.sh
>> @@ -763,7 +763,7 @@ test_done () {
>>
>> test -d "$remove_trash" &&

I'm not sure under what circumstances the trash dir could be missing at
this point...

>> cd "$(dirname "$remove_trash")" &&
>> -   rm -rf "$(basename "$remove_trash")"
>> +   rm -rf "$(basename "$remove_trash")" || exit 1

... but when it is already removed, then I think we should not exit
with error here.
Nothing that a pair of {} wouldn't handle.

> Oh, right. I thought "-f" would cause it to exit 0 even if some items
> failed to be removed, but that only applies to ENOENT. So I think that
> is probably a good change. I agree it's not a true error, but just a
> sign of something fishy, but I don't think it hurts to have extra lint
> checks.
>
> Replacing it the "exit 1" with a "die" that has a message is probably a
> good idea, though.

If we can't 'cd' or 'rm -rf', then they will tell us why they failed
anyway, most likely including the name of the trash directory.
Do we really need more error messages than that?


Re: [PATCH 3/2] ls-files: only recurse on active submodules

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 12:25 PM, Stefan Beller  wrote:
> On Thu, Apr 13, 2017 at 12:12 PM, Jacob Keller  wrote:
>> On Thu, Apr 13, 2017 at 12:05 PM, Stefan Beller  wrote:
>>> On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams  
>>> wrote:
 Add in a check to see if a submodule is active before attempting to
 recurse.  This prevents 'ls-files' from trying to operate on a submodule
 which may not exist in the working directory.
>>>
>>> What would currently happen on recursing into non-active submodules?
>>> Can we have a test for this?
>>>
>>> Thanks,
>>> Stefan
>>
>> We should be able to test for this. Is it possible that we can recurse
>> into a submodule as long as we have the clone in .git/modules/
>> even if we don't have it checked out currently?
>
> Conceptually that should be possible, e.g.
>
> git ls-files --recurse-submodules 
>
> where the ancient ref contained submodules that are not present any more.
> In that case we would need to do
>
> struct strbuf sb = STRBUF_INIT;
> struct child_process = CHILD_PROCESS_INIT;
> struct submodule *sub = submodule_from_path( \
> , )
> strbuf_git_path(, "modules/%s", sub->name);
>
> argv_array_pushl(, "git", "ls-files", "--recurse", ...);
> cp.dir = sb.buf;
> run_command();
>
> Stefan

This is probably the right flow to use then.

Thanks,
Jake


Re: [PATCH v2 4/6] run-command: don't die in child when duping /dev/null

2017-04-13 Thread Eric Wong
Brandon Williams  wrote:
> @@ -487,7 +483,7 @@ int start_command(struct child_process *cmd)
>   atexit(notify_parent);
>  
>   if (cmd->no_stdin)
> - dup_devnull(0);
> + dup2(null_fd, 0);

I prefer we keep error checking for dup2 failures,
and also add more error checking for unchecked dup2 calls.
Can be a separate patch, I suppose.

Ditto for other dup2 changes

> @@ -558,6 +554,8 @@ int start_command(struct child_process *cmd)
>   }
>   close(notify_pipe[0]);
>  
> + if (null_fd > 0)
> + close(null_fd);

I would prefer:

if (null_fd >= 0)

here, even if we currently do not release stdin.

>   argv_array_clear();
>   free(childenv);


Re: [PATCH 3/2] ls-files: only recurse on active submodules

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 12:12 PM, Jacob Keller  wrote:
> On Thu, Apr 13, 2017 at 12:05 PM, Stefan Beller  wrote:
>> On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams  wrote:
>>> Add in a check to see if a submodule is active before attempting to
>>> recurse.  This prevents 'ls-files' from trying to operate on a submodule
>>> which may not exist in the working directory.
>>
>> What would currently happen on recursing into non-active submodules?
>> Can we have a test for this?
>>
>> Thanks,
>> Stefan
>
> We should be able to test for this. Is it possible that we can recurse
> into a submodule as long as we have the clone in .git/modules/
> even if we don't have it checked out currently?

Conceptually that should be possible, e.g.

git ls-files --recurse-submodules 

where the ancient ref contained submodules that are not present any more.
In that case we would need to do

struct strbuf sb = STRBUF_INIT;
struct child_process = CHILD_PROCESS_INIT;
struct submodule *sub = submodule_from_path( \
, )
strbuf_git_path(, "modules/%s", sub->name);

argv_array_pushl(, "git", "ls-files", "--recurse", ...);
cp.dir = sb.buf;
run_command();

Stefan


[PATCH] xgethostname: handle long hostnames

2017-04-13 Thread David Turner
If the full hostname doesn't fit in the buffer supplied to
gethostname, POSIX does not specify whether the buffer will be
null-terminated, so to be safe, we should do it ourselves.  Introduce
new function, xgethostname, which ensures that there is always a \0
at the end of the buffer.

Signed-off-by: David Turner 
---
 builtin/gc.c   |  2 +-
 builtin/receive-pack.c |  2 +-
 fetch-pack.c   |  2 +-
 git-compat-util.h  |  2 ++
 ident.c|  2 +-
 wrapper.c  | 13 +
 6 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57bb..5633483f56 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -250,7 +250,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
/* already locked */
return NULL;
 
-   if (gethostname(my_host, sizeof(my_host)))
+   if (xgethostname(my_host, sizeof(my_host)))
xsnprintf(my_host, sizeof(my_host), "unknown");
 
pidfile_path = git_pathdup("gc.pid");
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index aca9c33d8d..fb62a94bc3 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1700,7 +1700,7 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
argv_array_pushl(, "index-pack",
 "--stdin", hdr_arg, NULL);
 
-   if (gethostname(hostname, sizeof(hostname)))
+   if (xgethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), "localhost");
argv_array_pushf(,
 "--keep=receive-pack %"PRIuMAX" on %s",
diff --git a/fetch-pack.c b/fetch-pack.c
index d07d85ce30..a899441c34 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -803,7 +803,7 @@ static int get_pack(struct fetch_pack_args *args,
argv_array_push(, "--fix-thin");
if (args->lock_pack || unpack_limit) {
char hostname[256];
-   if (gethostname(hostname, sizeof(hostname)))
+   if (xgethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), 
"localhost");
argv_array_pushf(,
"--keep=fetch-pack %"PRIuMAX " on %s",
diff --git a/git-compat-util.h b/git-compat-util.h
index 8a4a3f85e7..e49b65c235 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -884,6 +884,8 @@ static inline size_t xsize_t(off_t len)
 __attribute__((format (printf, 3, 4)))
 extern int xsnprintf(char *dst, size_t max, const char *fmt, ...);
 
+extern int xgethostname(char *buf, size_t len);
+
 /* in ctype.c, for kwset users */
 extern const unsigned char tolower_trans_tbl[256];
 
diff --git a/ident.c b/ident.c
index c0364fe3a1..7de9f47c41 100644
--- a/ident.c
+++ b/ident.c
@@ -122,7 +122,7 @@ static void add_domainname(struct strbuf *out, int 
*is_bogus)
 {
char buf[1024];
 
-   if (gethostname(buf, sizeof(buf))) {
+   if (xgethostname(buf, sizeof(buf))) {
warning_errno("cannot get host name");
strbuf_addstr(out, "(none)");
*is_bogus = 1;
diff --git a/wrapper.c b/wrapper.c
index 0542fc7582..d837417709 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -655,3 +655,16 @@ void sleep_millisec(int millisec)
 {
poll(NULL, 0, millisec);
 }
+
+int xgethostname(char *buf, size_t len)
+{
+   /*
+* If the full hostname doesn't fit in buf, POSIX does not
+* specify whether the buffer will be null-terminated, so to
+* be safe, do it ourselves.
+*/
+   int ret = gethostname(buf, len);
+   if (!ret)
+   buf[len - 1] = 0;
+   return ret;
+}
-- 
2.11.GIT



[PATCH 1/1] difftool: fix use-after-free

2017-04-13 Thread Johannes Schindelin
The left and right base directories were pointed to the buf field of
two strbufs, which were subject to change.

A contrived test case shows the problem where a file with a long enough
name to force the strbuf to grow is up-to-date (hence the code path is
used where the work tree's version of the file is reused), and then a
file that is not up-to-date needs to be written (hence the code path is
used where checkout_entry() uses the previously recorded base_dir that
is invalid by now).

Let's just copy the base_dir strings for use with checkout_entry(),
never touch them until the end, and release them then. This is an easily
verifiable fix (as opposed to the next-obvious alternative: to re-set
base_dir after every loop iteration).

This fixes https://github.com/git-for-windows/git/issues/1124

Signed-off-by: Johannes Schindelin 
---
 builtin/difftool.c  |  7 +--
 t/t7800-difftool.sh | 19 +++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/difftool.c b/builtin/difftool.c
index 25e54ad3edb..568294aac01 100644
--- a/builtin/difftool.c
+++ b/builtin/difftool.c
@@ -305,6 +305,7 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
struct strbuf rpath = STRBUF_INIT, buf = STRBUF_INIT;
struct strbuf ldir = STRBUF_INIT, rdir = STRBUF_INIT;
struct strbuf wtdir = STRBUF_INIT;
+   char *lbase_dir, *rbase_dir;
size_t ldir_len, rdir_len, wtdir_len;
struct cache_entry *ce = xcalloc(1, sizeof(ce) + PATH_MAX + 1);
const char *workdir, *tmp;
@@ -339,11 +340,11 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
memset(, 0, sizeof(wtindex));
 
memset(, 0, sizeof(lstate));
-   lstate.base_dir = ldir.buf;
+   lstate.base_dir = lbase_dir = xstrdup(ldir.buf);
lstate.base_dir_len = ldir.len;
lstate.force = 1;
memset(, 0, sizeof(rstate));
-   rstate.base_dir = rdir.buf;
+   rstate.base_dir = rbase_dir = xstrdup(rdir.buf);
rstate.base_dir_len = rdir.len;
rstate.force = 1;
 
@@ -626,6 +627,8 @@ static int run_dir_diff(const char *extcmd, int symlinks, 
const char *prefix,
 
 finish:
free(ce);
+   free(lbase_dir);
+   free(rbase_dir);
strbuf_release();
strbuf_release();
strbuf_release();
diff --git a/t/t7800-difftool.sh b/t/t7800-difftool.sh
index 0e7f30db2d9..7f09867478c 100755
--- a/t/t7800-difftool.sh
+++ b/t/t7800-difftool.sh
@@ -393,6 +393,25 @@ test_expect_success 'setup change in subdirectory' '
git commit -m "modified both"
 '
 
+test_expect_success 'difftool -d with growing paths' '
+   a=aaa &&
+   git init growing &&
+   (
+   cd growing &&
+   echo "test -f \"\$2/b\"" | write_script .git/test-for-b.sh &&
+   one=$(printf 1 | git hash-object -w --stdin) &&
+   two=$(printf 2 | git hash-object -w --stdin) &&
+   git update-index --add \
+   --cacheinfo 100644,$one,$a --cacheinfo 100644,$two,b &&
+   tree1=$(git write-tree) &&
+   git update-index --add \
+   --cacheinfo 100644,$two,$a --cacheinfo 100644,$one,b &&
+   tree2=$(git write-tree) &&
+   git checkout -- $a &&
+   git difftool -d --extcmd .git/test-for-b.sh $tree1 $tree2
+   )
+'
+
 run_dir_diff_test () {
test_expect_success "$1 --no-symlinks" "
symlinks=--no-symlinks &&
-- 
2.12.2.windows.2.1.g7df5db8d31e


[PATCH 0/1] difftool: fix a use-after-free bug

2017-04-13 Thread Johannes Schindelin
It has been reported previously that the base_dir recorded at the
beginning of run_dir_diff() may go stale due to the buffer to which it
points potentially being realloc()ed.

This bug has been fixed in Git for Windows 2.12.2(2) already. It took me
this long (!!!) to come up with a reliable test case... But now that I
have it, it can be easily verified.


Johannes Schindelin (1):
  difftool: fix use-after-free

 builtin/difftool.c  |  7 +--
 t/t7800-difftool.sh | 19 +++
 2 files changed, 24 insertions(+), 2 deletions(-)


base-commit: cf11a67975b057a144618badf16dc4e3d25b9407
Published-As: https://github.com/dscho/git/releases/tag/fix-difftool-d-crash-v1
Fetch-It-Via: git fetch https://github.com/dscho/git fix-difftool-d-crash-v1

-- 
2.12.2.windows.2.1.g7df5db8d31e



Re: [PATCH 3/4] submodule.c: harden submodule_move_head against broken submodules

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 12:08 PM, Brandon Williams  wrote:
> On 04/11, Stefan Beller wrote:
>> Early on in submodule_move_head just after the check if the submodule is
>> initialized, we need to check if the submodule is populated correctly.
>>
>> If the submodule is initialized but doesn't look like populated, this
>> is a red flag and can indicate multiple sorts of failures:
>> (1) The submodule may be recorded at an object name, that is missing.
>> (2) The submodule '.git' file link may be broken and it is not pointing
>> at a repository.
>>
>> In both cases we want to complain to the user in the non-forced mode,
>> and in the forced mode ignoring the old state and just moving the
>> submodule into its new state with a fixed '.git' file link.
>
> What about the case where you have marked a submodule as active but
> don't have its respective .gitdir yet?  For now i think it would be
> acceptable to complain and do nothing/ignore it, in the future we may
> want to actually clone and then check it out.

I agree. With this patch we'd complain in non-forced mode, and in
forced mode we'd also complain as we lack the object.

In both cases in the future we may want to fetch the contents instead.


Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands

2017-04-13 Thread Brandon Williams
On 04/13, Stefan Beller wrote:
> On Thu, Apr 13, 2017 at 12:05 PM, Brandon Williams  wrote:
> > On 04/11, Stefan Beller wrote:
> >> This was an oversight when working on the working tree modifying commands
> >> recursing into submodules.
> >>
> >> To test for uninitialized submodules, introduce another submodule, that is
> >> uninitialized in the actual tests. By adding it to the branch "add_sub1",
> >> which is the starting point of all other branches, we have wide coverage.
> >>
> 
> ...
> 
> >
> > The 'submodule add' command will make the submodule active, so you'll
> > need to add in a line to subsequently make the submodule inactive for
> > this to work, unless you do in at a later point in time.
> 
> Yes, it will make it active, but that doesn't matter here, because at this
> point (in create_lib_submodule_repo) we prepare an upstream
> in submodule_update_repo
> 
> Any later test follows the structure of
> 
> prolog &&
> reset_work_tree_to no_submodule &&
> (
> cd submodule_update &&
> # do actual test here, in submodule_update
> )
> 
> Note that 'prolog' performs a clone of submodule_update_repo
> to submodule_update, manually setting 'sub1' to active.
> 
> 'uninitialized_sub' is not active.
> 
> I tried to explain it via
> To test for uninitialized submodules, introduce another submodule,
> that is uninitialized in the actual tests.
> in the commit message, but that is too concise apparently.
> So the resend will explain that a bit more.

Thanks!  I just wanted to be sure as you're more familiar with these
tests.

-- 
Brandon Williams


Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 09:03:26PM +0200, SZEDER Gábor wrote:

> > Yeah, I had a similar thought. I can't think of any reason why it would
> > trigger a false positive, as long as we account for test-failure and the
> > --debug flag properly. I don't think we can just drop the "-f" from the
> > final "rm", because it may be overriding funny permissions left by
> > tests.
> 
> FWIW, I used the following rather simple change during stress-testing
> these patches (barring white-space damage):
> 
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 13b569682..d7fa15a69 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -763,7 +763,7 @@ test_done () {
> 
> test -d "$remove_trash" &&
> cd "$(dirname "$remove_trash")" &&
> -   rm -rf "$(basename "$remove_trash")"
> +   rm -rf "$(basename "$remove_trash")" || exit 1

Oh, right. I thought "-f" would cause it to exit 0 even if some items
failed to be removed, but that only applies to ENOENT. So I think that
is probably a good change. I agree it's not a true error, but just a
sign of something fishy, but I don't think it hurts to have extra lint
checks.

Replacing it the "exit 1" with a "die" that has a message is probably a
good idea, though.

-Peff


Re: [PATCH 3/2] ls-files: only recurse on active submodules

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 12:05 PM, Stefan Beller  wrote:
> On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams  wrote:
>> Add in a check to see if a submodule is active before attempting to
>> recurse.  This prevents 'ls-files' from trying to operate on a submodule
>> which may not exist in the working directory.
>
> What would currently happen on recursing into non-active submodules?
> Can we have a test for this?
>
> Thanks,
> Stefan

We should be able to test for this. Is it possible that we can recurse
into a submodule as long as we have the clone in .git/modules/
even if we don't have it checked out currently?

Thanks,
Jake


Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 12:05 PM, Brandon Williams  wrote:
> On 04/11, Stefan Beller wrote:
>> This was an oversight when working on the working tree modifying commands
>> recursing into submodules.
>>
>> To test for uninitialized submodules, introduce another submodule, that is
>> uninitialized in the actual tests. By adding it to the branch "add_sub1",
>> which is the starting point of all other branches, we have wide coverage.
>>

...

>
> The 'submodule add' command will make the submodule active, so you'll
> need to add in a line to subsequently make the submodule inactive for
> this to work, unless you do in at a later point in time.

Yes, it will make it active, but that doesn't matter here, because at this
point (in create_lib_submodule_repo) we prepare an upstream
in submodule_update_repo

Any later test follows the structure of

prolog &&
reset_work_tree_to no_submodule &&
(
cd submodule_update &&
# do actual test here, in submodule_update
)

Note that 'prolog' performs a clone of submodule_update_repo
to submodule_update, manually setting 'sub1' to active.

'uninitialized_sub' is not active.

I tried to explain it via
To test for uninitialized submodules, introduce another submodule,
that is uninitialized in the actual tests.
in the commit message, but that is too concise apparently.
So the resend will explain that a bit more.

Thanks,
Stefan


Re: [PATCH 3/4] submodule.c: harden submodule_move_head against broken submodules

2017-04-13 Thread Brandon Williams
On 04/11, Stefan Beller wrote:
> Early on in submodule_move_head just after the check if the submodule is
> initialized, we need to check if the submodule is populated correctly.
> 
> If the submodule is initialized but doesn't look like populated, this
> is a red flag and can indicate multiple sorts of failures:
> (1) The submodule may be recorded at an object name, that is missing.
> (2) The submodule '.git' file link may be broken and it is not pointing
> at a repository.
> 
> In both cases we want to complain to the user in the non-forced mode,
> and in the forced mode ignoring the old state and just moving the
> submodule into its new state with a fixed '.git' file link.

What about the case where you have marked a submodule as active but
don't have its respective .gitdir yet?  For now i think it would be
acceptable to complain and do nothing/ignore it, in the future we may
want to actually clone and then check it out.

-- 
Brandon Williams


Re: [PATCH 2/4] submodule.c: uninitialized submodules are ignored in recursive commands

2017-04-13 Thread Brandon Williams
On 04/11, Stefan Beller wrote:
> This was an oversight when working on the working tree modifying commands
> recursing into submodules.
> 
> To test for uninitialized submodules, introduce another submodule, that is
> uninitialized in the actual tests. By adding it to the branch "add_sub1",
> which is the starting point of all other branches, we have wide coverage.
> 
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c   | 3 +++
>  t/lib-submodule-update.sh | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/submodule.c b/submodule.c
> index c52d6634c5..2fa42519a4 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1332,6 +1332,9 @@ int submodule_move_head(const char *path,
>   struct child_process cp = CHILD_PROCESS_INIT;
>   const struct submodule *sub;
>  
> + if (!is_submodule_initialized(path))
> + return 0;
> +
>   sub = submodule_from_path(null_sha1, path);
>  
>   if (!sub)
> diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
> index fb4f7b014e..22dd9e060c 100755
> --- a/t/lib-submodule-update.sh
> +++ b/t/lib-submodule-update.sh
> @@ -73,6 +73,7 @@ create_lib_submodule_repo () {
>  
>   git checkout -b "add_sub1" &&
>   git submodule add ../submodule_update_sub1 sub1 &&
> + git submodule add ../submodule_update_sub1 uninitialized_sub &&

The 'submodule add' command will make the submodule active, so you'll
need to add in a line to subsequently make the submodule inactive for
this to work, unless you do in at a later point in time.

>   git config -f .gitmodules submodule.sub1.ignore all &&
>   git config submodule.sub1.ignore all &&
>   git add .gitmodules &&
> -- 
> 2.12.2.603.g7b28dc31ba
> 

-- 
Brandon Williams


Re: [PATCH 3/2] ls-files: only recurse on active submodules

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 11:57 AM, Brandon Williams  wrote:
> Add in a check to see if a submodule is active before attempting to
> recurse.  This prevents 'ls-files' from trying to operate on a submodule
> which may not exist in the working directory.

What would currently happen on recursing into non-active submodules?
Can we have a test for this?

Thanks,
Stefan


Re: Simultaneous gc and repack

2017-04-13 Thread David Turner
On Thu, 2017-04-13 at 12:36 -0600, Martin Fick wrote:
> On Thursday, April 13, 2017 02:28:07 PM David Turner wrote:
> > On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> > > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller 
> 
> wrote:
> > > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> > > 
> > >  wrote:
> > > > > Git gc locks the repository (using a gc.pid file) so
> > > > > that other gcs don't run concurrently. But git
> > > > > repack
> > > > > doesn't respect this lock, so it's possible to have
> > > > > a
> > > > > repack running at the same time as a gc.  This makes
> > > > > the gc sad when its packs are deleted out from under
> > > > > it
> > > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot
> > > > > be
> > > > > accessed".  Then it dies, leaving a large temp file
> > > > > hanging around.
> > > > > 
> > > > > Does the following seem reasonable?
> > > > > 
> > > > > 1. Make git repack, by default, check for a gc.pid
> > > > > file
> > > > > (using the same logic as git gc itself does).
> > > > > 2. Provide a --force option to git repack to ignore
> > > > > said
> > > > > check. 3. Make git gc provide that --force option
> > > > > when
> > > > > it calls repack under its own lock.
> > > > 
> > > > What about just making the code that calls repack
> > > > today
> > > > just call gc instead? I guess it's more work if you
> > > > don't
> > > > strictly need it but..?
> > > 
> > > There are many scanerios where this does not achieve
> > > the 
> > > same thing.  On the obvious side, gc does more than 
> > > repacking, but on the other side, repacking has many 
> > > switches that are not available via gc.
> > > 
> > > Would it make more sense to move the lock to repack
> > > instead  of to gc?
> > 
> > Other gc operations might step on each other too (e.g.
> > packing refs). That would be less bad (and less common),
> > but it still seems worth avoiding.
> 
> Yes, but all of thsoe operations need to be self protected 
> already, or they risk the same issue.

They are  individually protected.  But I would rather fail fast.  And
I'm not sure what happens if git prune happens during a git repack -a;
might the repack fail if an object that it expects to pack is pruned
out from under it?




Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread SZEDER Gábor
On Thu, Apr 13, 2017 at 7:57 PM, Jeff King  wrote:
> On Thu, Apr 13, 2017 at 10:55:08AM -0700, Stefan Beller wrote:
>
>> On Thu, Apr 13, 2017 at 9:37 AM, Jeff King  wrote:
>> > Ah, OK, that makes more sense. I can detect it reliably by just checking
>> >
>> >   ! test -d $root/trash*
>> >
>> > in my stress-test after the test successfully completes.
>>
>> Would we want to report such an error from the test suite itself?
>> (My line of thinking is that this is a similar issue to broken && chains,
>> which we do report).

A broken &&-chain can harm the test's correctness by hiding errors.
The failing 'rm -rf $trash' during housekeeping, after all the tests
in the test script has been run and their results reported... not
really, I would think, though arguably it's a sign that something is
fishy.

> Yeah, I had a similar thought. I can't think of any reason why it would
> trigger a false positive, as long as we account for test-failure and the
> --debug flag properly. I don't think we can just drop the "-f" from the
> final "rm", because it may be overriding funny permissions left by
> tests.

FWIW, I used the following rather simple change during stress-testing
these patches (barring white-space damage):

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 13b569682..d7fa15a69 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -763,7 +763,7 @@ test_done () {

test -d "$remove_trash" &&
cd "$(dirname "$remove_trash")" &&
-   rm -rf "$(basename "$remove_trash")"
+   rm -rf "$(basename "$remove_trash")" || exit 1

test_at_end_hook_


[PATCH 3/2] ls-files: only recurse on active submodules

2017-04-13 Thread Brandon Williams
Add in a check to see if a submodule is active before attempting to
recurse.  This prevents 'ls-files' from trying to operate on a submodule
which may not exist in the working directory.

Signed-off-by: Brandon Williams 
---

After you mentioned possibly needing to check if a submodule is initialized, I
went back and noticed that there was indeed no check for it...So this patch
adds in the necessary check to see if a submodule is active or not before
attempting to recurse.

 builtin/ls-files.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index d449e46db..10ddc0306 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -15,6 +15,7 @@
 #include "string-list.h"
 #include "pathspec.h"
 #include "run-command.h"
+#include "submodule.h"
 
 static int abbrev;
 static int show_deleted;
@@ -235,7 +236,8 @@ static void show_ce_entry(const char *tag, const struct 
cache_entry *ce)
die("git ls-files: internal error - cache entry not superset of 
prefix");
 
if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
-   submodule_path_match(, name.buf, ps_matched)) {
+   submodule_path_match(, name.buf, ps_matched) &&
+   is_submodule_initialized(ce->name)) {
show_gitlink(ce);
} else if (match_pathspec(, name.buf, name.len,
  len, ps_matched,
@@ -604,8 +606,10 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
if (require_work_tree && !is_inside_work_tree())
setup_work_tree();
 
-   if (recurse_submodules)
+   if (recurse_submodules) {
+   gitmodules_config();
compile_submodule_options(argv, , show_tag);
+   }
 
if (recurse_submodules &&
(show_stage || show_deleted || show_others || show_unmerged ||
-- 
2.12.2.762.g0e3151a226-goog



Re: Simultaneous gc and repack

2017-04-13 Thread Martin Fick
On Thursday, April 13, 2017 02:28:07 PM David Turner wrote:
> On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> > On Thursday, April 13, 2017 11:03:14 AM Jacob Keller 
wrote:
> > > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> > 
> >  wrote:
> > > > Git gc locks the repository (using a gc.pid file) so
> > > > that other gcs don't run concurrently. But git
> > > > repack
> > > > doesn't respect this lock, so it's possible to have
> > > > a
> > > > repack running at the same time as a gc.  This makes
> > > > the gc sad when its packs are deleted out from under
> > > > it
> > > > with: "fatal: ./objects/pack/pack-$sha.pack cannot
> > > > be
> > > > accessed".  Then it dies, leaving a large temp file
> > > > hanging around.
> > > > 
> > > > Does the following seem reasonable?
> > > > 
> > > > 1. Make git repack, by default, check for a gc.pid
> > > > file
> > > > (using the same logic as git gc itself does).
> > > > 2. Provide a --force option to git repack to ignore
> > > > said
> > > > check. 3. Make git gc provide that --force option
> > > > when
> > > > it calls repack under its own lock.
> > > 
> > > What about just making the code that calls repack
> > > today
> > > just call gc instead? I guess it's more work if you
> > > don't
> > > strictly need it but..?
> > 
> > There are many scanerios where this does not achieve
> > the 
> > same thing.  On the obvious side, gc does more than 
> > repacking, but on the other side, repacking has many 
> > switches that are not available via gc.
> > 
> > Would it make more sense to move the lock to repack
> > instead  of to gc?
> 
> Other gc operations might step on each other too (e.g.
> packing refs). That would be less bad (and less common),
> but it still seems worth avoiding.

Yes, but all of thsoe operations need to be self protected 
already, or they risk the same issue.

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules

2017-04-13 Thread Brandon Williams
On 04/13, Stefan Beller wrote:
> On Thu, Apr 13, 2017 at 11:31 AM, Jacob Keller  wrote:
> > Spinning out a process is one of the big downsides of working with
> > submodules in our code. Unfortunately, spinning out a process is also
> > one of the biggest ways we isolate submodules, and if we wanted to do
> > this "in-process" we would need to add an abstraction layer that lets
> > us handle submodules in-process in some clean way.
> 
> Yeah if we had less globals and a repo struct (class) which we could
> operate on, that would be bring in the abstractions that we'd need.

Agreed, though we're probably pretty far from that becoming a reality.
One day though!

-- 
Brandon Williams


Re: Simultaneous gc and repack

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 11:28 AM, David Turner  wrote:
> On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
>> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
>> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner
>>
>>  wrote:
>> > > Git gc locks the repository (using a gc.pid file) so
>> > > that other gcs don't run concurrently. But git repack
>> > > doesn't respect this lock, so it's possible to have a
>> > > repack running at the same time as a gc.  This makes
>> > > the gc sad when its packs are deleted out from under it
>> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
>> > > accessed".  Then it dies, leaving a large temp file
>> > > hanging around.
>> > >
>> > > Does the following seem reasonable?
>> > >
>> > > 1. Make git repack, by default, check for a gc.pid file
>> > > (using the same logic as git gc itself does).
>> > > 2. Provide a --force option to git repack to ignore said
>> > > check. 3. Make git gc provide that --force option when
>> > > it calls repack under its own lock.
>> >
>> > What about just making the code that calls repack today
>> > just call gc instead? I guess it's more work if you don't
>> > strictly need it but..?
>>
>> There are many scanerios where this does not achieve the
>> same thing.  On the obvious side, gc does more than
>> repacking, but on the other side, repacking has many
>> switches that are not available via gc.
>>
>> Would it make more sense to move the lock to repack instead
>> of to gc?
>
> Other gc operations might step on each other too (e.g. packing refs).
> That would be less bad (and less common), but it still seems worth
> avoiding.

It sounds like your original solution would work, though I wouldn't
use "force" and I would either not document or document with "this is
only meant to be used by git-gc internally"

Thanks,
Jake


[PATCH v2 2/6] run-command: prepare command before forking

2017-04-13 Thread Brandon Williams
In order to avoid allocation between 'fork()' and 'exec()' the argv
array used in the exec call is prepared prior to forking the process.

In addition to this, the function used to exec is changed from
'execvp()' to 'execv()' as the (p) variant of exec has the potential to
call malloc during the path resolution it performs.  Instead we simply
do the path resolution ourselves during the preparation stage prior to
forking.

Signed-off-by: Brandon Williams 
---
 run-command.c | 60 +++
 1 file changed, 40 insertions(+), 20 deletions(-)

diff --git a/run-command.c b/run-command.c
index 574b81d3e..9ee9fde97 100644
--- a/run-command.c
+++ b/run-command.c
@@ -221,18 +221,6 @@ static const char **prepare_shell_cmd(struct argv_array 
*out, const char **argv)
 }
 
 #ifndef GIT_WINDOWS_NATIVE
-static int execv_shell_cmd(const char **argv)
-{
-   struct argv_array nargv = ARGV_ARRAY_INIT;
-   prepare_shell_cmd(, argv);
-   trace_argv_printf(nargv.argv, "trace: exec:");
-   sane_execvp(nargv.argv[0], (char **)nargv.argv);
-   argv_array_clear();
-   return -1;
-}
-#endif
-
-#ifndef GIT_WINDOWS_NATIVE
 static int child_notifier = -1;
 
 static void notify_parent(void)
@@ -244,6 +232,35 @@ static void notify_parent(void)
 */
xwrite(child_notifier, "", 1);
 }
+
+static void prepare_cmd(struct argv_array *out, const struct child_process 
*cmd)
+{
+   if (!cmd->argv[0])
+   die("BUG: command is empty");
+
+   if (cmd->git_cmd) {
+   argv_array_push(out, "git");
+   argv_array_pushv(out, cmd->argv);
+   } else if (cmd->use_shell) {
+   prepare_shell_cmd(out, cmd->argv);
+   } else {
+   argv_array_pushv(out, cmd->argv);
+   }
+
+   /*
+* If there are no '/' characters in the command then perform a path
+* lookup and use the resolved path as the command to exec.  If there
+* are no '/' characters or if the command wasn't found in the path,
+* have exec attempt to invoke the command directly.
+*/
+   if (!strchr(out->argv[0], '/')) {
+   char *program = locate_in_PATH(out->argv[0]);
+   if (program) {
+   free((char *)out->argv[0]);
+   out->argv[0] = program;
+   }
+   }
+}
 #endif
 
 static inline void set_cloexec(int fd)
@@ -372,9 +389,13 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   struct argv_array argv = ARGV_ARRAY_INIT;
+
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
+   prepare_cmd(, cmd);
+
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
@@ -437,12 +458,9 @@ int start_command(struct child_process *cmd)
unsetenv(*cmd->env);
}
}
-   if (cmd->git_cmd)
-   execv_git_cmd(cmd->argv);
-   else if (cmd->use_shell)
-   execv_shell_cmd(cmd->argv);
-   else
-   sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
+
+   execv(argv.argv[0], (char *const *) argv.argv);
+
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
error("cannot run %s: %s", cmd->argv[0],
@@ -458,7 +476,7 @@ int start_command(struct child_process *cmd)
mark_child_for_cleanup(cmd->pid, cmd);
 
/*
-* Wait for child's execvp. If the execvp succeeds (or if fork()
+* Wait for child's exec. If the exec succeeds (or if fork()
 * failed), EOF is seen immediately by the parent. Otherwise, the
 * child process sends a single byte.
 * Note that use of this infrastructure is completely advisory,
@@ -467,7 +485,7 @@ int start_command(struct child_process *cmd)
close(notify_pipe[1]);
if (read(notify_pipe[0], _pipe[1], 1) == 1) {
/*
-* At this point we know that fork() succeeded, but execvp()
+* At this point we know that fork() succeeded, but exec()
 * failed. Errors have been reported to our stderr.
 */
wait_or_whine(cmd->pid, cmd->argv[0], 0);
@@ -475,6 +493,8 @@ int start_command(struct child_process *cmd)
cmd->pid = -1;
}
close(notify_pipe[0]);
+
+   argv_array_clear();
 }
 #else
 {
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 11:31 AM, Jacob Keller  wrote:
> Spinning out a process is one of the big downsides of working with
> submodules in our code. Unfortunately, spinning out a process is also
> one of the biggest ways we isolate submodules, and if we wanted to do
> this "in-process" we would need to add an abstraction layer that lets
> us handle submodules in-process in some clean way.

Yeah if we had less globals and a repo struct (class) which we could
operate on, that would be bring in the abstractions that we'd need.

Stefan


[PATCH v2 6/6] run-command: add note about forking and threading

2017-04-13 Thread Brandon Williams
All non-Async-Signal-Safe functions (e.g. malloc and die) were removed
between 'fork' and 'exec' in start_command in order to avoid potential
deadlocking when forking while multiple threads are running.  This
deadlocking is possible when a thread (other than the one forking) has
acquired a lock and didn't get around to releasing it before the fork.
This leaves the lock in a locked state in the resulting process with no
hope of it ever being released.

Add a note describing this potential pitfall before the call to 'fork()'
so people working in this section of the code know to only use
Async-Signal-Safe functions in the child process.

Signed-off-by: Brandon Williams 
---
 run-command.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/run-command.c b/run-command.c
index 4230c4933..1c36e692d 100644
--- a/run-command.c
+++ b/run-command.c
@@ -525,6 +525,15 @@ int start_command(struct child_process *cmd)
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
 
+   /*
+* NOTE: In order to prevent deadlocking when using threads special
+* care should be taken with the function calls made in between the
+* fork() and exec() calls.  No calls should be made to functions which
+* require acquiring a lock (e.g. malloc) as the lock could have been
+* held by another thread at the time of forking, causing the lock to
+* never be released in the child process.  This means only
+* Async-Signal-Safe functions are permitted in the child.
+*/
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
-- 
2.12.2.762.g0e3151a226-goog



[PATCH v2 3/6] run-command: prepare child environment before forking

2017-04-13 Thread Brandon Williams
In order to avoid allocation between 'fork()' and 'exec()' prepare the
environment to be used in the child process prior to forking.

Switch to using 'execve()' so that the construct child environment can
used in the exec'd process.

Signed-off-by: Brandon Williams 
---
 run-command.c | 83 ---
 1 file changed, 74 insertions(+), 9 deletions(-)

diff --git a/run-command.c b/run-command.c
index 9ee9fde97..5e2a03145 100644
--- a/run-command.c
+++ b/run-command.c
@@ -261,6 +261,75 @@ static void prepare_cmd(struct argv_array *out, const 
struct child_process *cmd)
}
}
 }
+
+static int env_isequal(const char *e1, const char *e2)
+{
+   for (;;) {
+   char c1 = *e1++;
+   char c2 = *e2++;
+   c1 = (c1 == '=') ? '\0' : tolower(c1);
+   c2 = (c2 == '=') ? '\0' : tolower(c2);
+
+   if (c1 != c2)
+   return 0;
+   if (c1 == '\0')
+   return 1;
+   }
+}
+
+static int searchenv(char **env, const char *name)
+{
+   int pos = 0;
+
+   for (; env[pos]; pos++)
+   if (env_isequal(env[pos], name))
+   break;
+
+   return pos;
+}
+
+static int do_putenv(char **env, int env_nr, const char *name)
+{
+   int pos = searchenv(env, name);
+
+   if (strchr(name, '=')) {
+   /* ('key=value'), insert of replace entry */
+   if (pos >= env_nr)
+   env_nr++;
+   env[pos] = (char *) name;
+   } else if (pos < env_nr) {
+   /* otherwise ('key') remove existing entry */
+   env_nr--;
+   memmove([pos], [pos + 1],
+   (env_nr - pos) * sizeof(char *));
+   env[env_nr] = NULL;
+   }
+
+   return env_nr;
+}
+
+static char **prep_childenv(const char *const *deltaenv)
+{
+   char **childenv;
+   int childenv_nr = 0, childenv_alloc = 0;
+   int i;
+
+   for (i = 0; environ[i]; i++)
+   childenv_nr++;
+   for (i = 0; deltaenv && deltaenv[i]; i++)
+   childenv_alloc++;
+   /* Add one for the NULL termination */
+   childenv_alloc += childenv_nr + 1;
+
+   childenv = xcalloc(childenv_alloc, sizeof(char *));
+   memcpy(childenv, environ, childenv_nr * sizeof(char *));
+
+   /* merge in deltaenv */
+   for (i = 0; deltaenv && deltaenv[i]; i++)
+   childenv_nr = do_putenv(childenv, childenv_nr, deltaenv[i]);
+
+   return childenv;
+}
 #endif
 
 static inline void set_cloexec(int fd)
@@ -389,12 +458,14 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
prepare_cmd(, cmd);
+   childenv = prep_childenv(cmd->env);
 
cmd->pid = fork();
failed_errno = errno;
@@ -450,16 +521,9 @@ int start_command(struct child_process *cmd)
if (cmd->dir && chdir(cmd->dir))
die_errno("exec '%s': cd to '%s' failed", cmd->argv[0],
cmd->dir);
-   if (cmd->env) {
-   for (; *cmd->env; cmd->env++) {
-   if (strchr(*cmd->env, '='))
-   putenv((char *)*cmd->env);
-   else
-   unsetenv(*cmd->env);
-   }
-   }
 
-   execv(argv.argv[0], (char *const *) argv.argv);
+   execve(argv.argv[0], (char *const *) argv.argv,
+  (char *const *) childenv);
 
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
@@ -495,6 +559,7 @@ int start_command(struct child_process *cmd)
close(notify_pipe[0]);
 
argv_array_clear();
+   free(childenv);
 }
 #else
 {
-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 11:15 AM, Stefan Beller  wrote:
> On Thu, Apr 13, 2017 at 10:12 AM, Jacob Keller  
> wrote:
>> From: Jacob Keller 
>>
>> Don't assume that the current working directory is the root of the
>> repository.
>
> 1)  Oh! This bug might be hidden in other commands, too.
> ($ git grep cp.dir -- submodule.c)
>

Almost certainly. I'm not sure how best to audit this.

> 2) But why?
> Isn't that what most of setup.c is all about ? (discovery of the root of the
> repository, staying there, and invoking the correct subcommand with a prefix)
>
>> Correctly generate the path for the recursing child
>> processes by building it from the work_tree() root instead. Otherwise if
>> we run ls-files using --git-dir or --work-tree it will not work
>> correctly as it attempts to change directory into a potentially invalid
>> location.
>
> Oh, I see. In that case the setup doesn't cd into the worktree.
>

Yea we aren't in the worktree when we thought we were.

>> Best case, it doesn't exist and we produce an error. Worst
>> case we cd into the wrong location and unknown behavior occurs.
>>
>> Add a new test which highlights this possibility.
>>
>> Signed-off-by: Jacob Keller 
>> ---
>> I'm not sure that I'm convinced by this method of solving the problem as
>> I suspect it has some corner cases (what about when run inside a
>> subdirectory? It seems to work for me but I'm not sure...) Additionally,
>> it felt weird that there's no helper function for creating a toplevel
>> relative path.
>
> Do we want to run ls-files from the working tree or from the git dir?
> For the git dir there would be git_pathdup_submodule.
>

Well prior to this code we're assuming we are in the worktree. I
wasn't actually certain if we should run from within the gitdir or the
worktree. Probably we actually want to be in the gitdir so that we can
work even if we're not checked out. Additionally, we probably need to
protect this check with a "is_submodule_initialized" unless ls-files
somehow ignores the submodule already in this case.. it didn't look
like it at a glance.

> We could introduce
>   const char *get_submodule_work_tree(const char *submodule_path);
> as a wrapper around
>   mkpathdup("%s/%s", get_git_work_tree(), ce->name);
>
> Code and test look fine in this patch,
>

Yea adding a helper function seems like a good thing.

Thanks,
Jake

> Thanks,
> Stefan


[PATCH v2 4/6] run-command: don't die in child when duping /dev/null

2017-04-13 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 run-command.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/run-command.c b/run-command.c
index 5e2a03145..6751b8319 100644
--- a/run-command.c
+++ b/run-command.c
@@ -117,18 +117,6 @@ static inline void close_pair(int fd[2])
close(fd[1]);
 }
 
-#ifndef GIT_WINDOWS_NATIVE
-static inline void dup_devnull(int to)
-{
-   int fd = open("/dev/null", O_RDWR);
-   if (fd < 0)
-   die_errno(_("open /dev/null failed"));
-   if (dup2(fd, to) < 0)
-   die_errno(_("dup2(%d,%d) failed"), fd, to);
-   close(fd);
-}
-#endif
-
 static char *locate_in_PATH(const char *file)
 {
const char *p = getenv("PATH");
@@ -458,12 +446,20 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   int null_fd = -1;
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
+   if (cmd->no_stdin || cmd->no_stdout || cmd->no_stderr) {
+   null_fd = open("/dev/null", O_RDWR | O_CLOEXEC | O_NONBLOCK);
+   if (null_fd < 0)
+   die_errno(_("open /dev/null failed"));
+   set_cloexec(null_fd);
+   }
+
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
 
@@ -487,7 +483,7 @@ int start_command(struct child_process *cmd)
atexit(notify_parent);
 
if (cmd->no_stdin)
-   dup_devnull(0);
+   dup2(null_fd, 0);
else if (need_in) {
dup2(fdin[0], 0);
close_pair(fdin);
@@ -497,7 +493,7 @@ int start_command(struct child_process *cmd)
}
 
if (cmd->no_stderr)
-   dup_devnull(2);
+   dup2(null_fd, 2);
else if (need_err) {
dup2(fderr[1], 2);
close_pair(fderr);
@@ -507,7 +503,7 @@ int start_command(struct child_process *cmd)
}
 
if (cmd->no_stdout)
-   dup_devnull(1);
+   dup2(null_fd, 1);
else if (cmd->stdout_to_stderr)
dup2(2, 1);
else if (need_out) {
@@ -558,6 +554,8 @@ int start_command(struct child_process *cmd)
}
close(notify_pipe[0]);
 
+   if (null_fd > 0)
+   close(null_fd);
argv_array_clear();
free(childenv);
 }
-- 
2.12.2.762.g0e3151a226-goog



[PATCH v2 1/6] t5550: use write_script to generate post-update hook

2017-04-13 Thread Brandon Williams
The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
"!#/bin/sh" line which can cause issues with portability.  Instead
create the hook using the 'write_script' function which includes the
proper "#!" line.

Signed-off-by: Brandon Williams 
---
 t/t5550-http-fetch-dumb.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 87308cdce..8552184e7 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -20,8 +20,9 @@ test_expect_success 'create http-accessible bare repository 
with loose objects'
(cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
 git config core.bare true &&
 mkdir -p hooks &&
-echo "exec git update-server-info" >hooks/post-update &&
-chmod +x hooks/post-update &&
+write_script "hooks/post-update" <<-\EOF &&
+exec git update-server-info
+   EOF
 hooks/post-update
) &&
git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
-- 
2.12.2.762.g0e3151a226-goog



[PATCH v2 5/6] run-command: eliminate calls to error handling functions in child

2017-04-13 Thread Brandon Williams
All of our standard error handling paths have the potential to
call malloc or take stdio locks; so we must avoid them inside
the forked child.

Instead, the child only writes an 8 byte struct atomically to
the parent through the notification pipe to propagate an error.
All user-visible error reporting happens from the parent;
even avoiding functions like atexit(3) and exit(3).

Helped-by: Eric Wong 
Signed-off-by: Brandon Williams 
---
 run-command.c | 121 ++
 1 file changed, 89 insertions(+), 32 deletions(-)

diff --git a/run-command.c b/run-command.c
index 6751b8319..4230c4933 100644
--- a/run-command.c
+++ b/run-command.c
@@ -211,14 +211,82 @@ static const char **prepare_shell_cmd(struct argv_array 
*out, const char **argv)
 #ifndef GIT_WINDOWS_NATIVE
 static int child_notifier = -1;
 
-static void notify_parent(void)
+enum child_errcode {
+   CHILD_ERR_CHDIR,
+   CHILD_ERR_ENOENT,
+   CHILD_ERR_SILENT,
+   CHILD_ERR_ERRNO,
+};
+
+struct child_err {
+   enum child_errcode err;
+   int syserr; /* errno */
+};
+
+static void child_die(enum child_errcode err)
 {
-   /*
-* execvp failed.  If possible, we'd like to let start_command
-* know, so failures like ENOENT can be handled right away; but
-* otherwise, finish_command will still report the error.
-*/
-   xwrite(child_notifier, "", 1);
+   struct child_err buf;
+
+   buf.err = err;
+   buf.syserr = errno;
+
+   /* write(2) on buf smaller than PIPE_BUF (min 512) is atomic: */
+   xwrite(child_notifier, , sizeof(buf));
+   _exit(1);
+}
+
+/*
+ * parent will make it look like the child spewed a fatal error and died
+ * this is needed to prevent changes to t0061.
+ */
+static void fake_fatal(const char *err, va_list params)
+{
+   vreportf("fatal: ", err, params);
+}
+
+static void child_error_fn(const char *err, va_list params)
+{
+   const char msg[] = "error() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+}
+
+static void child_warn_fn(const char *err, va_list params)
+{
+   const char msg[] = "warn() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+}
+
+static void NORETURN child_die_fn(const char *err, va_list params)
+{
+   const char msg[] = "die() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+   _exit(2);
+}
+
+/* this runs in the parent process */
+static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
+{
+   static void (*old_errfn)(const char *err, va_list params);
+
+   old_errfn = get_error_routine();
+   set_error_routine(fake_fatal);
+   errno = cerr->syserr;
+
+   switch (cerr->err) {
+   case CHILD_ERR_CHDIR:
+   error_errno("exec '%s': cd to '%s' failed",
+   cmd->argv[0], cmd->dir);
+   break;
+   case CHILD_ERR_ENOENT:
+   error_errno("cannot run %s", cmd->argv[0]);
+   break;
+   case CHILD_ERR_SILENT:
+   break;
+   case CHILD_ERR_ERRNO:
+   error_errno("cannot exec '%s'", cmd->argv[0]);
+   break;
+   }
+   set_error_routine(old_errfn);
 }
 
 static void prepare_cmd(struct argv_array *out, const struct child_process 
*cmd)
@@ -355,13 +423,6 @@ static int wait_or_whine(pid_t pid, const char *argv0, int 
in_signal)
code += 128;
} else if (WIFEXITED(status)) {
code = WEXITSTATUS(status);
-   /*
-* Convert special exit code when execvp failed.
-*/
-   if (code == 127) {
-   code = -1;
-   failed_errno = ENOENT;
-   }
} else {
error("waitpid is confused (%s)", argv0);
}
@@ -449,6 +510,7 @@ int start_command(struct child_process *cmd)
int null_fd = -1;
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
+   struct child_err cerr;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
@@ -467,20 +529,16 @@ int start_command(struct child_process *cmd)
failed_errno = errno;
if (!cmd->pid) {
/*
-* Redirect the channel to write syscall error messages to
-* before redirecting the process's stderr so that all die()
-* in subsequent call paths use the parent's stderr.
+* Ensure the default die/error/warn routines do not get
+* called, they can take stdio locks and malloc.
 */
-   if (cmd->no_stderr || need_err) {
-   int child_err = dup(2);
-   set_cloexec(child_err);
-   set_error_handle(fdopen(child_err, "w"));
-   }
+   

[PATCH v2 0/6] forking and threading

2017-04-13 Thread Brandon Williams
v2 does a bit of restructuring based on comments from reviewers.  I took the
patch by Eric and broke it up and tweaked it a bit to flow better with v2.  I
left out the part of Eric's patch which did signal manipulation as I wasn't
experienced enough to know what it was doing or why it was necessary.  Though I
believe the code is structured in such a way that Eric could make another patch
on top of this series with just the signal changes.

I switched to using 'execve' instead of 'execvpe' because 'execvpe' isn't a
portable call and doesn't work on systems like macOS.  This means that the path
resolution needs to be done by hand before forking (which there already existed
a function to do just that).

>From what I can see, there are now no calls in the child process (after fork
and before exec/_exit) which are not Async-Signal-Safe.  This means that
fork/exec in a threaded context should work without deadlock and we could
potentially move to using vfork instead of fork, though I'll let others more
experienced make that decision.

Brandon Williams (6):
  t5550: use write_script to generate post-update hook
  run-command: prepare command before forking
  run-command: prepare child environment before forking
  run-command: don't die in child when duping /dev/null
  run-command: eliminate calls to error handling functions in child
  run-command: add note about forking and threading

 run-command.c  | 291 ++---
 t/t5550-http-fetch-dumb.sh |   5 +-
 2 files changed, 223 insertions(+), 73 deletions(-)

-- 
2.12.2.762.g0e3151a226-goog



Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 11:03 AM, Brandon Williams  wrote:
> On 04/13, Jacob Keller wrote:
>> From: Jacob Keller 
>>
>> Since commit e77aa336f116 ("ls-files: optionally recurse into
>> submodules", 2016-10-07) ls-files has known how to recurse into
>> submodules when displaying files.
>>
>> Unfortunately this fails for certain cases, including when nesting more
>> than one submodule, called from within a submodule that itself has
>> submodules, or when the GIT_DIR environemnt variable is set.
>>
>> Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to
>> git commands", 2017-03-17) this resulted in an error indicating that
>> --prefix and --super-prefix were incompatible.
>>
>> After this commit, instead, the process loops forever with a GIT_DIR set
>> to the parent and continuously reads the parent submodule files and
>> recursing forever.
>>
>> Fix this by preparing the environment properly for submodules when
>> setting up the child process. This is similar to how other commands such
>> as grep behave.
>>
>> This was not caught by the original tests because the scenario is
>> avoided if the submodules are created separately and not stored as the
>> standard method of putting the submodule git directory under
>> .git/modules/. We can update the test to show the failure by the
>> addition of "git submodule absorbgitdirs" to the test case. However,
>> note that this new test would run forever without the necessary fix in
>> this patch.
>>
>> Signed-off-by: Jacob Keller 
>
> This looks good to me.  Thanks again for catching this.  Dealing with
> submodules definitely isn't easy (I seem to have made a lot of mistakes
> that have been cropping up recently)...it would be easier if we didn't
> have to spin out a process for each submodule but that's not the world
> we live in today :)
>

Spinning out a process is one of the big downsides of working with
submodules in our code. Unfortunately, spinning out a process is also
one of the biggest ways we isolate submodules, and if we wanted to do
this "in-process" we would need to add an abstraction layer that lets
us handle submodules in-process in some clean way.


Re: Simultaneous gc and repack

2017-04-13 Thread David Turner
On Thu, 2017-04-13 at 12:08 -0600, Martin Fick wrote:
> On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
> > On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
> 
>  wrote:
> > > Git gc locks the repository (using a gc.pid file) so
> > > that other gcs don't run concurrently. But git repack
> > > doesn't respect this lock, so it's possible to have a
> > > repack running at the same time as a gc.  This makes
> > > the gc sad when its packs are deleted out from under it
> > > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
> > > accessed".  Then it dies, leaving a large temp file
> > > hanging around.
> > > 
> > > Does the following seem reasonable?
> > > 
> > > 1. Make git repack, by default, check for a gc.pid file
> > > (using the same logic as git gc itself does).
> > > 2. Provide a --force option to git repack to ignore said
> > > check. 3. Make git gc provide that --force option when
> > > it calls repack under its own lock.
> > 
> > What about just making the code that calls repack today
> > just call gc instead? I guess it's more work if you don't
> > strictly need it but..?
> 
> There are many scanerios where this does not achieve the 
> same thing.  On the obvious side, gc does more than 
> repacking, but on the other side, repacking has many 
> switches that are not available via gc.
> 
> Would it make more sense to move the lock to repack instead 
> of to gc?

Other gc operations might step on each other too (e.g. packing refs). 
That would be less bad (and less common), but it still seems worth
avoiding.


Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 10:12 AM, Jacob Keller  wrote:
> From: Jacob Keller 
>
> Don't assume that the current working directory is the root of the
> repository.

1)  Oh! This bug might be hidden in other commands, too.
($ git grep cp.dir -- submodule.c)

2) But why?
Isn't that what most of setup.c is all about ? (discovery of the root of the
repository, staying there, and invoking the correct subcommand with a prefix)

> Correctly generate the path for the recursing child
> processes by building it from the work_tree() root instead. Otherwise if
> we run ls-files using --git-dir or --work-tree it will not work
> correctly as it attempts to change directory into a potentially invalid
> location.

Oh, I see. In that case the setup doesn't cd into the worktree.

> Best case, it doesn't exist and we produce an error. Worst
> case we cd into the wrong location and unknown behavior occurs.
>
> Add a new test which highlights this possibility.
>
> Signed-off-by: Jacob Keller 
> ---
> I'm not sure that I'm convinced by this method of solving the problem as
> I suspect it has some corner cases (what about when run inside a
> subdirectory? It seems to work for me but I'm not sure...) Additionally,
> it felt weird that there's no helper function for creating a toplevel
> relative path.

Do we want to run ls-files from the working tree or from the git dir?
For the git dir there would be git_pathdup_submodule.

We could introduce
  const char *get_submodule_work_tree(const char *submodule_path);
as a wrapper around
  mkpathdup("%s/%s", get_git_work_tree(), ce->name);

Code and test look fine in this patch,

Thanks,
Stefan


Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 08:08:12PM +0200, SZEDER Gábor wrote:

> On Thu, Apr 13, 2017 at 6:44 PM, Jeff King  wrote:
> > On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote:
> 
> > I did wonder what will happen if Windows learns to daemonize() the
> > auto-gc. I don't think we'll get an immediate test failure, but this
> > test will become racy again. But this time we'll actually notice the
> > racy failure, because the "ls" will report extra packs if it runs before
> > the background gc does. At which point we can revisit this.
> 
> Dscho said that it would take significant effort to make daemonize()
> work on Windows, so I guess it will take a while before we'll have to
> revisit this.

Yeah, that's what I figured. I mostly just didn't want to leave a
time-bomb for future developers.

> > I guess we could probably grep for the "in the background" message from
> > the parent gc. OTOH, maybe it is not even worth it.
> 
> That wouldn't work at the moment, because auto gc says that it will go
> to the background even on Windows.

Ah, OK. Let's not worry about it, then. I think the way your test is
constructed we should get a racy failure not long after the change, and
your comments would lead people to realize what is going on.

-Peff


Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules

2017-04-13 Thread Brandon Williams
On 04/13, Jacob Keller wrote:
> From: Jacob Keller 
> 
> Don't assume that the current working directory is the root of the
> repository. Correctly generate the path for the recursing child
> processes by building it from the work_tree() root instead. Otherwise if
> we run ls-files using --git-dir or --work-tree it will not work
> correctly as it attempts to change directory into a potentially invalid
> location. Best case, it doesn't exist and we produce an error. Worst
> case we cd into the wrong location and unknown behavior occurs.
> 
> Add a new test which highlights this possibility.
> 
> Signed-off-by: Jacob Keller 
> ---
> I'm not sure that I'm convinced by this method of solving the problem as
> I suspect it has some corner cases (what about when run inside a
> subdirectory? It seems to work for me but I'm not sure...) Additionally,
> it felt weird that there's no helper function for creating a toplevel
> relative path.

I never considered the case where you use --git-dir or --work-tree,
definitely an oversight on my part.  This change seems reasonable to me.

-- 
Brandon Williams


Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script

2017-04-13 Thread SZEDER Gábor
On Thu, Apr 13, 2017 at 6:44 PM, Jeff King  wrote:
> On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote:

> I did wonder what will happen if Windows learns to daemonize() the
> auto-gc. I don't think we'll get an immediate test failure, but this
> test will become racy again. But this time we'll actually notice the
> racy failure, because the "ls" will report extra packs if it runs before
> the background gc does. At which point we can revisit this.

Dscho said that it would take significant effort to make daemonize()
work on Windows, so I guess it will take a while before we'll have to
revisit this.

> It would be nice if there were a non-racy way to detect whether we
> daemonized or not, and complain on Windows when we do. Then we'll be
> notified immediately when daemonize() changes by the test failure,
> rather than waiting for a racy failure.
>
> I guess we could probably grep for the "in the background" message from
> the parent gc. OTOH, maybe it is not even worth it.

That wouldn't work at the moment, because auto gc says that it will go
to the background even on Windows.

As it is, auto gc first prints the "Auto packing the repo..." message,
and calls daemonize() after that.  Which message it prints, i.e. "in
background" or not, depends solely on the value of the 'gc.autoDetach'
config variable, which is true by default.  The only platform-specific
thing about all this is the #ifdef in daemonize(), but then it's
already too late, the misleading message has already been printed.

See the discussion the patch for the same issue in a different test
script (bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01)),
including a patch at the end that prevents auto gc on Windows from
lying about going to the background (which I, not being a Windows
user, didn't follow through).

  
http://public-inbox.org/git/20160505171430.horde.-guvdpzbfs8vi1zcfn4b...@webmail.informatik.kit.edu/T/#u

> The racy version
> should fail reasonably promptly, I think, and the comments you've left
> would point any investigator in the right direction.


Re: Simultaneous gc and repack

2017-04-13 Thread Martin Fick
On Thursday, April 13, 2017 11:03:14 AM Jacob Keller wrote:
> On Thu, Apr 13, 2017 at 10:31 AM, David Turner 
 wrote:
> > Git gc locks the repository (using a gc.pid file) so
> > that other gcs don't run concurrently. But git repack
> > doesn't respect this lock, so it's possible to have a
> > repack running at the same time as a gc.  This makes
> > the gc sad when its packs are deleted out from under it
> > with: "fatal: ./objects/pack/pack-$sha.pack cannot be
> > accessed".  Then it dies, leaving a large temp file
> > hanging around.
> > 
> > Does the following seem reasonable?
> > 
> > 1. Make git repack, by default, check for a gc.pid file
> > (using the same logic as git gc itself does).
> > 2. Provide a --force option to git repack to ignore said
> > check. 3. Make git gc provide that --force option when
> > it calls repack under its own lock.
> 
> What about just making the code that calls repack today
> just call gc instead? I guess it's more work if you don't
> strictly need it but..?

There are many scanerios where this does not achieve the 
same thing.  On the obvious side, gc does more than 
repacking, but on the other side, repacking has many 
switches that are not available via gc.

Would it make more sense to move the lock to repack instead 
of to gc?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules

2017-04-13 Thread Brandon Williams
On 04/13, Jacob Keller wrote:
> From: Jacob Keller 
> 
> Since commit e77aa336f116 ("ls-files: optionally recurse into
> submodules", 2016-10-07) ls-files has known how to recurse into
> submodules when displaying files.
> 
> Unfortunately this fails for certain cases, including when nesting more
> than one submodule, called from within a submodule that itself has
> submodules, or when the GIT_DIR environemnt variable is set.
> 
> Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to
> git commands", 2017-03-17) this resulted in an error indicating that
> --prefix and --super-prefix were incompatible.
> 
> After this commit, instead, the process loops forever with a GIT_DIR set
> to the parent and continuously reads the parent submodule files and
> recursing forever.
> 
> Fix this by preparing the environment properly for submodules when
> setting up the child process. This is similar to how other commands such
> as grep behave.
> 
> This was not caught by the original tests because the scenario is
> avoided if the submodules are created separately and not stored as the
> standard method of putting the submodule git directory under
> .git/modules/. We can update the test to show the failure by the
> addition of "git submodule absorbgitdirs" to the test case. However,
> note that this new test would run forever without the necessary fix in
> this patch.
> 
> Signed-off-by: Jacob Keller 

This looks good to me.  Thanks again for catching this.  Dealing with
submodules definitely isn't easy (I seem to have made a lot of mistakes
that have been cropping up recently)...it would be easier if we didn't
have to spin out a process for each submodule but that's not the world
we live in today :)

> ---
> I updated and reworded the description so that the problem would be more
> obvious. It doesn't occur always, but only when run nested with properly
> absorbed gitdirs for the submodules. This explains the reason why the
> test case had not caught the issue before.
> 
>  builtin/ls-files.c | 4 
>  t/t3007-ls-files-recurse-submodules.sh | 1 +
>  2 files changed, 5 insertions(+)
> 
> diff --git a/builtin/ls-files.c b/builtin/ls-files.c
> index d449e46db551..e9b3546ca053 100644
> --- a/builtin/ls-files.c
> +++ b/builtin/ls-files.c
> @@ -15,6 +15,7 @@
>  #include "string-list.h"
>  #include "pathspec.h"
>  #include "run-command.h"
> +#include "submodule.h"
>  
>  static int abbrev;
>  static int show_deleted;
> @@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce)
>   struct child_process cp = CHILD_PROCESS_INIT;
>   int status;
>  
> + prepare_submodule_repo_env(_array);
> + argv_array_push(_array, GIT_DIR_ENVIRONMENT);
> +
>   if (prefix_len)
>   argv_array_pushf(_array, "%s=%s",
>GIT_TOPLEVEL_PREFIX_ENVIRONMENT,
> diff --git a/t/t3007-ls-files-recurse-submodules.sh 
> b/t/t3007-ls-files-recurse-submodules.sh
> index 4cf6ccf5a8ea..c8030dd3299a 100755
> --- a/t/t3007-ls-files-recurse-submodules.sh
> +++ b/t/t3007-ls-files-recurse-submodules.sh
> @@ -77,6 +77,7 @@ test_expect_success 'ls-files recurses more than 1 level' '
>   git -C submodule/subsub commit -m "add d" &&
>   git -C submodule submodule add ./subsub &&
>   git -C submodule commit -m "added subsub" &&
> + git submodule absorbgitdirs &&
>   git ls-files --recurse-submodules >actual &&
>   test_cmp expect actual
>  '
> -- 
> 2.12.2.776.gded3dc243c29.dirty
> 

-- 
Brandon Williams


Re: Simultaneous gc and repack

2017-04-13 Thread Jacob Keller
On Thu, Apr 13, 2017 at 10:31 AM, David Turner  wrote:
> Git gc locks the repository (using a gc.pid file) so that other gcs
> don't run concurrently. But git repack doesn't respect this lock, so
> it's possible to have a repack running at the same time as a gc.  This
> makes the gc sad when its packs are deleted out from under it with:
> "fatal: ./objects/pack/pack-$sha.pack cannot be accessed".  Then it
> dies, leaving a large temp file hanging around.
>
> Does the following seem reasonable?
>
> 1. Make git repack, by default, check for a gc.pid file (using the same
> logic as git gc itself does).
> 2. Provide a --force option to git repack to ignore said check.
> 3. Make git gc provide that --force option when it calls repack under
> its own lock.
>

What about just making the code that calls repack today just call gc
instead? I guess it's more work if you don't strictly need it but..?

Thanks,
Jake

> This came up because Gitlab runs a repack after every N pushes and a gc
> after every M commits, where M >> N.  Sometimes, when pushes come in
> rapidly, the repack catches the still-running gc and the above badness
> happens.  At least, that's my understanding: I don't run our Gitlab
> servers, but I talked to the person who does and that's what he said.
>
> Of course, Gitlab could do its own locking, but the general approach
> seems like it would help other folks too.


Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 10:55:08AM -0700, Stefan Beller wrote:

> On Thu, Apr 13, 2017 at 9:37 AM, Jeff King  wrote:
> > Ah, OK, that makes more sense. I can detect it reliably by just checking
> >
> >   ! test -d $root/trash*
> >
> > in my stress-test after the test successfully completes.
> 
> Would we want to report such an error from the test suite itself?
> (My line of thinking is that this is a similar issue to broken && chains,
> which we do report).

Yeah, I had a similar thought. I can't think of any reason why it would
trigger a false positive, as long as we account for test-failure and the
--debug flag properly. I don't think we can just drop the "-f" from the
final "rm", because it may be overriding funny permissions left by
tests.

-Peff


Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 9:37 AM, Jeff King  wrote:
> Ah, OK, that makes more sense. I can detect it reliably by just checking
>
>   ! test -d $root/trash*
>
> in my stress-test after the test successfully completes.

Would we want to report such an error from the test suite itself?
(My line of thinking is that this is a similar issue to broken && chains,
which we do report).

Stefan


Simultaneous gc and repack

2017-04-13 Thread David Turner
Git gc locks the repository (using a gc.pid file) so that other gcs
don't run concurrently. But git repack doesn't respect this lock, so
it's possible to have a repack running at the same time as a gc.  This
makes the gc sad when its packs are deleted out from under it with:
"fatal: ./objects/pack/pack-$sha.pack cannot be accessed".  Then it
dies, leaving a large temp file hanging around.

Does the following seem reasonable?

1. Make git repack, by default, check for a gc.pid file (using the same
logic as git gc itself does).
2. Provide a --force option to git repack to ignore said check.
3. Make git gc provide that --force option when it calls repack under
its own lock.

This came up because Gitlab runs a repack after every N pushes and a gc
after every M commits, where M >> N.  Sometimes, when pushes come in
rapidly, the repack catches the still-running gc and the above badness
happens.  At least, that's my understanding: I don't run our Gitlab
servers, but I talked to the person who does and that's what he said.

Of course, Gitlab could do its own locking, but the general approach
seems like it would help other folks too.


Re: [PATCH] submodule.c: add missing ' in error messages

2017-04-13 Thread Stefan Beller
On Thu, Apr 13, 2017 at 9:40 AM, Ralf Thielow  wrote:
> Signed-off-by: Ralf Thielow 
> ---

Thanks!

Reviewed-by: Stefan Beller 


[PATCH v2 2/2] ls-files: fix path used when recursing into submodules

2017-04-13 Thread Jacob Keller
From: Jacob Keller 

Don't assume that the current working directory is the root of the
repository. Correctly generate the path for the recursing child
processes by building it from the work_tree() root instead. Otherwise if
we run ls-files using --git-dir or --work-tree it will not work
correctly as it attempts to change directory into a potentially invalid
location. Best case, it doesn't exist and we produce an error. Worst
case we cd into the wrong location and unknown behavior occurs.

Add a new test which highlights this possibility.

Signed-off-by: Jacob Keller 
---
I'm not sure that I'm convinced by this method of solving the problem as
I suspect it has some corner cases (what about when run inside a
subdirectory? It seems to work for me but I'm not sure...) Additionally,
it felt weird that there's no helper function for creating a toplevel
relative path.

 builtin/ls-files.c |  5 -
 t/t3007-ls-files-recurse-submodules.sh | 11 +++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e9b3546ca053..a6c70dbe9ec8 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -203,6 +203,7 @@ static void show_gitlink(const struct cache_entry *ce)
 {
struct child_process cp = CHILD_PROCESS_INIT;
int status;
+   char *dir;
 
prepare_submodule_repo_env(_array);
argv_array_push(_array, GIT_DIR_ENVIRONMENT);
@@ -221,8 +222,10 @@ static void show_gitlink(const struct cache_entry *ce)
argv_array_pushv(, submodule_options.argv);
 
cp.git_cmd = 1;
-   cp.dir = ce->name;
+   dir = mkpathdup("%s/%s", get_git_work_tree(), ce->name);
+   cp.dir = dir;
status = run_command();
+   free(dir);
if (status)
exit(status);
 }
diff --git a/t/t3007-ls-files-recurse-submodules.sh 
b/t/t3007-ls-files-recurse-submodules.sh
index c8030dd3299a..ebb956fd16cc 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -82,6 +82,17 @@ test_expect_success 'ls-files recurses more than 1 level' '
test_cmp expect actual
 '
 
+test_expect_success 'ls-files works with GIT_DIR' '
+   cat >expect <<-\EOF &&
+   .gitmodules
+   c
+   subsub/d
+   EOF
+
+   git --git-dir=submodule/.git ls-files --recurse-submodules >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--recurse-submodules and pathspecs setup' '
echo e >submodule/subsub/e.txt &&
git -C submodule/subsub add e.txt &&
-- 
2.12.2.776.gded3dc243c29.dirty



[PATCH v2 1/2] ls-files: fix recurse-submodules with nested submodules

2017-04-13 Thread Jacob Keller
From: Jacob Keller 

Since commit e77aa336f116 ("ls-files: optionally recurse into
submodules", 2016-10-07) ls-files has known how to recurse into
submodules when displaying files.

Unfortunately this fails for certain cases, including when nesting more
than one submodule, called from within a submodule that itself has
submodules, or when the GIT_DIR environemnt variable is set.

Prior to commit b58a68c1c187 ("setup: allow for prefix to be passed to
git commands", 2017-03-17) this resulted in an error indicating that
--prefix and --super-prefix were incompatible.

After this commit, instead, the process loops forever with a GIT_DIR set
to the parent and continuously reads the parent submodule files and
recursing forever.

Fix this by preparing the environment properly for submodules when
setting up the child process. This is similar to how other commands such
as grep behave.

This was not caught by the original tests because the scenario is
avoided if the submodules are created separately and not stored as the
standard method of putting the submodule git directory under
.git/modules/. We can update the test to show the failure by the
addition of "git submodule absorbgitdirs" to the test case. However,
note that this new test would run forever without the necessary fix in
this patch.

Signed-off-by: Jacob Keller 
---
I updated and reworded the description so that the problem would be more
obvious. It doesn't occur always, but only when run nested with properly
absorbed gitdirs for the submodules. This explains the reason why the
test case had not caught the issue before.

 builtin/ls-files.c | 4 
 t/t3007-ls-files-recurse-submodules.sh | 1 +
 2 files changed, 5 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index d449e46db551..e9b3546ca053 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -15,6 +15,7 @@
 #include "string-list.h"
 #include "pathspec.h"
 #include "run-command.h"
+#include "submodule.h"
 
 static int abbrev;
 static int show_deleted;
@@ -203,6 +204,9 @@ static void show_gitlink(const struct cache_entry *ce)
struct child_process cp = CHILD_PROCESS_INIT;
int status;
 
+   prepare_submodule_repo_env(_array);
+   argv_array_push(_array, GIT_DIR_ENVIRONMENT);
+
if (prefix_len)
argv_array_pushf(_array, "%s=%s",
 GIT_TOPLEVEL_PREFIX_ENVIRONMENT,
diff --git a/t/t3007-ls-files-recurse-submodules.sh 
b/t/t3007-ls-files-recurse-submodules.sh
index 4cf6ccf5a8ea..c8030dd3299a 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -77,6 +77,7 @@ test_expect_success 'ls-files recurses more than 1 level' '
git -C submodule/subsub commit -m "add d" &&
git -C submodule submodule add ./subsub &&
git -C submodule commit -m "added subsub" &&
+   git submodule absorbgitdirs &&
git ls-files --recurse-submodules >actual &&
test_cmp expect actual
 '
-- 
2.12.2.776.gded3dc243c29.dirty



Errors running gitk on OS X

2017-04-13 Thread Evan Aad
Running gitk from the command line, while at the top level of a Git
working directory, produces the following error message, and gitk
fails to open:

> objc[1031]: Objective-C garbage collection is no longer supported.

> /usr/local/bin/wish: line 2:  1031 Abort trap: 6   "$(dirname 
> $0)/../../../Library/Frameworks/Tk.framework/Versions/8.5/Resources/Wish.app/Contents/MacOS/Wish"
>  "$@"

Additionally, the following error message pops up in a window:

> Wish quit unexpectedly.

> Click Reopen to open the application again. Click Report to see more detailed 
> information and send a report to Apple.

> Ignore | Report... | Reopen

How can this be fixed?

***

OS: macOS Sierra, Version 10.12.4
Git version: 2.11.0 (Apple Git-81)


Fwd: Errors running gitk on OS X

2017-04-13 Thread EvenEven Odd
Running gitk from the command line, while at the top level of a Git
working directory, produces the following error message, and gitk
fails to open:

> objc[1031]: Objective-C garbage collection is no longer supported.

> /usr/local/bin/wish: line 2:  1031 Abort trap: 6   "$(dirname 
> $0)/../../../Library/Frameworks/Tk.framework/Versions/8.5/Resources/Wish.app/Contents/MacOS/Wish"
>  "$@"

Additionally, the following error message pops up in a window:

> Wish quit unexpectedly.

> Click Reopen to open the application again. Click Report to see more detailed 
> information and send a report to Apple.

> Ignore | Report... | Reopen

How can this be fixed?

***

OS: macOS Sierra, Version 10.12.4
Git version: 2.11.0 (Apple Git-81)


Re: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 12:31:38PM +0200, SZEDER Gábor wrote:

> Note, that this fd trickery doesn't work on Windows, because due to
> MSYS limitations the git process only inherits the standard fds 0, 1
> and 2 from the shell.  Luckily, it doesn't matter in this case,
> because on Windows daemonize() is basically a noop, thus 'git gc
> --auto' always runs in the foreground.
> 
> And since we can now continue the test reliably after the detached gc
> finished, check that there is only a single packfile left at the end,
> i.e. that the detached gc actually did what it was supposed to do.
> Also add a comment at the end of the test script to warn developers of
> future tests about this issue of long running detached gc processes.

The whole thing looks nicely explained, and I'm happy that we're able to
reliably add this extra check at the end of the test.

I did wonder what will happen if Windows learns to daemonize() the
auto-gc. I don't think we'll get an immediate test failure, but this
test will become racy again. But this time we'll actually notice the
racy failure, because the "ls" will report extra packs if it runs before
the background gc does. At which point we can revisit this.

It would be nice if there were a non-racy way to detect whether we
daemonized or not, and complain on Windows when we do. Then we'll be
notified immediately when daemonize() changes by the test failure,
rather than waiting for a racy failure.

I guess we could probably grep for the "in the background" message from
the parent gc. OTOH, maybe it is not even worth it. The racy version
should fail reasonably promptly, I think, and the comments you've left
would point any investigator in the right direction.

-Peff


[PATCH] git-add--interactive.perl: add missing dot in a message

2017-04-13 Thread Ralf Thielow
One message appears twice in the translations and the only
difference is a dot at the end.  So add this dot to make
the messages being identical.

Signed-off-by: Ralf Thielow 
---
 git-add--interactive.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 77b4ed53a..709a5f6ce 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -1040,7 +1040,7 @@ marked for unstaging."),
 marked for applying."),
checkout_index => N__(
 "If the patch applies cleanly, the edited hunk will immediately be
-marked for discarding"),
+marked for discarding."),
checkout_head => N__(
 "If the patch applies cleanly, the edited hunk will immediately be
 marked for discarding."),
-- 
2.12.2.762.g0e3151a22



[PATCH] submodule.c: add missing ' in error messages

2017-04-13 Thread Ralf Thielow
Signed-off-by: Ralf Thielow 
---
 submodule.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index c52d6634c..02033c97e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1251,7 +1251,7 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
cp.dir = path;
if (start_command()) {
if (flags & SUBMODULE_REMOVAL_DIE_ON_ERROR)
-   die(_("could not start 'git status in submodule '%s'"),
+   die(_("could not start 'git status' in submodule '%s'"),
path);
ret = -1;
goto out;
@@ -1264,7 +1264,7 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
 
if (finish_command()) {
if (flags & SUBMODULE_REMOVAL_DIE_ON_ERROR)
-   die(_("could not run 'git status in submodule '%s'"),
+   die(_("could not run 'git status' in submodule '%s'"),
path);
ret = -1;
}
-- 
2.12.2.762.g0e3151a22



Re: [PATCH] t6500: don't run detached auto gc at the end of the test script

2017-04-13 Thread Jeff King
On Thu, Apr 13, 2017 at 12:03:20AM +0200, SZEDER Gábor wrote:

> > I couldn't get the original to show a failure, though, even under heavy
> > load. So maybe widening the race is enough.
> 
> Just to be clear: it's only an occasionally appearing error message.
> There is no failure in the sense of test failure, because 'rm -rf
> $trash' erroring out during housekeeping does not fail the test suite.

Ah, OK, that makes more sense. I can detect it reliably by just checking

  ! test -d $root/trash*

in my stress-test after the test successfully completes.

-Peff


RE: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test script

2017-04-13 Thread David Turner
Thanks for fixing this!

> -Original Message-
> From: SZEDER Gábor [mailto:szeder@gmail.com]
> Sent: Thursday, April 13, 2017 6:32 AM
> To: Junio C Hamano 
> Cc: Jeff King ; Johannes Sixt ; David Turner
> ; git@vger.kernel.org; SZEDER Gábor
> 
> Subject: [PATCHv2.1] t6500: wait for detached auto gc at the end of the test
> script
> 
> The last test in 't6500-gc', 'background auto gc does not run if gc.log is 
> present
> and recent but does if it is old', added in
> a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically trigger 
> an
> error message from the test harness:
> 
>   rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not 
> empty
> 
> The test in question ends with executing an auto gc in the backround, which
> occasionally takes so long that it's still running when 'test_done' is about 
> to
> remove the trash directory.  This 'rm -rf $trash' in the foreground might race
> with the detached auto gc to create and delete files and directories, and gc
> might (re-)create a path that 'rm' already visited and removed, triggering the
> above error message when 'rm' attempts to remove its parent directory.
> 
> Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01) fixed the
> same problem in a different test script by simply disallowing background gc.
> Unfortunately, what worked there is not applicable here, because the purpose
> of this test is to check the behavior of a detached auto gc.
> 
> Make sure that the test doesn't continue before the gc is finished in the
> background with a clever bit of shell trickery:
> 
>   - Open fd 9 in the shell, to be inherited by the background gc
> process, because our daemonize() only closes the standard fds 0,
> 1 and 2.
>   - Duplicate this fd 9 to stdout.
>   - Read 'git gc's stdout, and thus fd 9, through a command
> substitution.  We don't actually care about gc's output, but this
> construct has two useful properties:
>   - This read blocks until stdout or fd 9 are open.  While stdout is
> closed after the main gc process creates the background process
> and exits, fd 9 remains open until the backround process exits.
>   - The variable assignment from the command substitution gets its
> exit status from the command executed within the command
> substitution, i.e. a failing main gc process will cause the test
> to fail.
> 
> Note, that this fd trickery doesn't work on Windows, because due to MSYS
> limitations the git process only inherits the standard fds 0, 1 and 2 from 
> the shell.
> Luckily, it doesn't matter in this case, because on Windows daemonize() is
> basically a noop, thus 'git gc --auto' always runs in the foreground.
> 
> And since we can now continue the test reliably after the detached gc 
> finished,
> check that there is only a single packfile left at the end, i.e. that the 
> detached gc
> actually did what it was supposed to do.
> Also add a comment at the end of the test script to warn developers of future
> tests about this issue of long running detached gc processes.
> 
> Helped-by: Jeff King 
> Helped-by: Johannes Sixt 
> Signed-off-by: SZEDER Gábor 
> ---
> 
> Updated subject line, but otherwise the same as v2.
> 
>  t/t6500-gc.sh | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 08de2e8ab..cc7acd101 100755
> --- a/t/t6500-gc.sh
> +++ b/t/t6500-gc.sh
> @@ -67,6 +67,16 @@ test_expect_success 'auto gc with too many loose
> objects does not attempt to cre
>   test_line_count = 2 new # There is one new pack and its .idx  '
> 
> +run_and_wait_for_auto_gc () {
> + # We read stdout from gc for the side effect of waiting until the
> + # background gc process exits, closing its fd 9.  Furthermore, the
> + # variable assignment from a command substitution preserves the
> + # exit status of the main gc process.
> + # Note: this fd trickery doesn't work on Windows, but there is no
> + # need to, because on Win the auto gc always runs in the foreground.
> + doesnt_matter=$(git gc --auto 9>&1)
> +}
> +
>  test_expect_success 'background auto gc does not run if gc.log is present and
> recent but does if it is old' '
>   test_commit foo &&
>   test_commit bar &&
> @@ -80,7 +90,13 @@ test_expect_success 'background auto gc does not run if
> gc.log is present and re
>   test-chmtime =-345600 .git/gc.log &&
>   test_must_fail git gc --auto &&
>   test_config gc.logexpiry 2.days &&
> - git gc --auto
> + run_and_wait_for_auto_gc &&
> + ls .git/objects/pack/pack-*.pack >packs &&
> + test_line_count = 1 packs
>  '
> 
> +# DO NOT leave a detached auto gc process running near the end of the #
> +test script: it can run long enough in the background to racily #
> 

[PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout

2017-04-13 Thread git
From: Jeff Hostetler 

Version 3 uses a structure copy rather than memcpy and adds a comment.

Jeff Hostetler (1):
  unpack-trees: avoid duplicate ODB lookups during checkout

 unpack-trees.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

-- 
2.9.3



[PATCH v3] unpack-trees: avoid duplicate ODB lookups during checkout

2017-04-13 Thread git
From: Jeff Hostetler 

Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.

In operations such as read-tree and checkout, there will
likely be many peer directories that have the same OID when
the differences between the commits are relatively small.
In these cases we can avoid hitting the ODB multiple times
for the same OID.

This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().


On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===


On Linux using p0006-read-tree-checkout.sh with linux.git:

Test  HEAD^ 
 HEAD
---
0006.2: read-tree br_base br_ballast (57994)  0.24(0.20+0.03)   
 0.24(0.22+0.01) +0.0%
0006.3: switch between br_base br_ballast (57994) 10.58(6.23+2.86)  
 10.67(5.94+2.87) +0.9%
0006.4: switch between br_ballast br_ballast_plus_1 (57994)   0.60(0.44+0.17)   
 0.57(0.44+0.14) -5.0%
0006.5: switch between aliases (57994)0.59(0.48+0.13)   
 0.57(0.44+0.15) -3.4%


Signed-off-by: Jeff Hostetler 
---
 unpack-trees.c | 37 +
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..a674423 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
return ret;
 }
 
+static inline int are_same_oid(struct name_entry *name_j, struct name_entry 
*name_k)
+{
+   return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
+}
+
 static int traverse_trees_recursive(int n, unsigned long dirmask,
unsigned long df_conflicts,
struct name_entry *names,
@@ -553,11 +558,35 @@ static int traverse_trees_recursive(int n, unsigned long 
dirmask,
newinfo.pathlen += tree_entry_len(p) + 1;
newinfo.df_conflicts |= df_conflicts;
 
+   /*
+* Fetch the tree from the ODB for each peer directory in the
+* n commits.
+*
+* For 2- and 3-way traversals, we try to avoid hitting the
+* ODB twice for the same OID.  This should yield a nice speed
+* up in checkouts and merges when the commits are similar.
+*
+* We don't bother doing the full O(n^2) search for larger n,
+* because wider traversals don't happen that often and we
+* avoid the search setup.
+*
+* When 2 peer OIDs are the same, we just copy the tree
+* descriptor data.  This implicitly borrows the buffer
+* data from the earlier cell.
+*/
for (i = 0; i < n; i++, dirmask >>= 1) {
-   const unsigned char *sha1 = NULL;
-   if (dirmask & 1)
-   sha1 = names[i].oid->hash;
-   buf[i] = fill_tree_descriptor(t+i, sha1);
+   if (i > 0 && are_same_oid([i], [i - 1])) {
+   t[i] = t[i - 1];
+   buf[i] = NULL;
+   } else if (i > 1 && are_same_oid([i], [i - 2])) {
+   t[i] = t[i - 2];
+   buf[i] = NULL;
+   } else {
+   const unsigned char *sha1 = NULL;
+   if (dirmask & 1)
+   sha1 = names[i].oid->hash;
+   buf[i] = fill_tree_descriptor(t+i, sha1);
+   }
}
 
bottom = switch_cache_bottom();
-- 
2.9.3



Re: [PATCH 5/7] Add support for gnupg < 1.4

2017-04-13 Thread Ævar Arnfjörð Bjarmason
On Wed, Apr 5, 2017 at 3:45 PM, Ævar Arnfjörð Bjarmason
 wrote:
> On Wed, Apr 5, 2017 at 3:04 PM, Tom G. Christensen  
> wrote:
>> This adds an OLD_GNUPG define to the Makefile which when activated will
>> ensure git does not use the --keyid-format argument when calling the
>> 'gpg' program.
>> This is consistent with how 'gpg' was used in git < 2.10.0 and slightly
>> decreases security.
>
> This changes the code Linus Torvalds added in b624a3e67f to mitigate
> the evil32 project generating keys which looked the same for 32 bit
> signatures.
>
> I think this change makes sense, but the Makefile should have a
> slightly scarier warning, something like:
>
> "Define OLD_GNUPG if you need support for gnupg <1.4. Note that this
> will cause git to only show the first 32 bits of PGP keys instead of
> 64, and there's a wide variety of brute-forced 32 bit keys in the wild
> thanks to the evil32 project (https://evil32.com). Enabling this will
> make GPG work old versions, but you might be fooled into accepting

grammar fix: "work on older versions"

> malicious keys as a result".
>
>> Signed-off-by: Tom G. Christensen 
>> ---
>>  Makefile| 6 ++
>>  gpg-interface.c | 2 ++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/Makefile b/Makefile
>> index ca9f16d19..f8f585d21 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -386,6 +386,8 @@ all::
>>  #
>>  # to say "export LESS=FRX (and LV=-c) if the environment variable
>>  # LESS (and LV) is not set, respectively".
>> +#
>> +# Define OLD_GNUPG if you need support for gnupg < 1.4.
>>
>>  GIT-VERSION-FILE: FORCE
>> @$(SHELL_PATH) ./GIT-VERSION-GEN
>> @@ -1529,6 +1531,10 @@ ifndef PAGER_ENV
>>  PAGER_ENV = LESS=FRX LV=-c
>>  endif
>>
>> +ifdef OLD_GNUPG
>> +   BASIC_CFLAGS += -DOLD_GNUPG
>> +endif
>> +
>>  QUIET_SUBDIR0  = +$(MAKE) -C # space to separate -C and subdir
>>  QUIET_SUBDIR1  =
>>
>> diff --git a/gpg-interface.c b/gpg-interface.c
>> index e44cc27da..57f1ea792 100644
>> --- a/gpg-interface.c
>> +++ b/gpg-interface.c
>> @@ -224,7 +224,9 @@ int verify_signed_buffer(const char *payload, size_t 
>> payload_size,
>> argv_array_pushl(,
>>  gpg_program,
>>  "--status-fd=1",
>> +#ifndef OLD_GNUPG
>>  "--keyid-format=long",
>> +#endif
>>  "--verify", temp.filename.buf, "-",
>>  NULL);
>>
>> --
>> 2.12.2
>>


Re: Rebase sequencer changes prevent exec commands from modifying the todo file?

2017-04-13 Thread Johannes Schindelin
Hi Stephen,

On Wed, 12 Apr 2017, Stephen Hicks wrote:

> On Wed, Apr 12, 2017 at 3:05 PM, Johannes Schindelin
>  wrote:
> >
> > On Tue, 11 Apr 2017, Stephen Hicks wrote:
> >
> > > I'm hesitant to only use mtime, size, and inode, since it's quite
> > > likely that these are all identical even if the file has changed.
> >
> > Not at all. The mtime and the size will most likely be different.
> 
> In my experience, mtime is almost always the same, since the file is
> written pretty much immediately before the exec - as long as the
> command takes a small fraction of a second (which it usually does) the
> mtime (which is in seconds, not millis or micros) will look the same.

Oh, I see, you assume that mtime is in seconds. However, on pretty much
all platforms supported by Git we use nanosecond-precision mtimes [*1*].
In practice, the mtimes are not always nanosecond-precision, as some file
systems have a coarser granularity. It is typically a pretty fine a
granularity, though, much finer than a second [*2*].

My mistake, I should have explained that I wanted to suggest adding a
field of type `struct cache_time` to the `todo_list` structure, and to use
ST_MTIME_NSEC() to populate and compare the `nsec` part of that.

Ciao,
Johannes

Footnote *1*: The platforms for which we disable nanosecond mtimes are:
OSF-1, AIX, HP-UX, Interix, Minix, NONSTOP_KERNEL (?), and QNX. (If you
look at git.git's config.mak.uname, you will see that Git for Windows'
nanosecond support has not yet made it upstream.) In other words, the
major platforms (Windows, MacOSX and Linux) all compare the nanosecond
part of the mtime.

Footnote *2*: the coarsest mtime of which I know is FAT32 (2-second
granularity). If you want to use Git on such a file system, well, that's
what you get. And then some performance penalties, too. It would appear
from a quick web search that ext4 has true nanosecond granularitry. NTFS
has a 100ns granularity which is still plenty enough for our purposes.


gitk feature request: checkout any commit

2017-04-13 Thread stef

Hello,

it would be helpful, if gitk would allow checkouts not only for local  
branches but also for remote branches and any other commit.


For remote branches, it is sufficient to remove the code that greys out  
the context menu. The context menu for regular commits would need a new  
entry.


Of course, this can easily be done via command line - but that's true for  
every gitk feature :)


Another feature that would really suit gitk is the "Tools" menu from  
git-gui. A simple 1:1 copy would be fine ;)




--
regards,
stef


Re: [PATCH 7/7] Do not use curl_easy_strerror with curl < 7.12.0

2017-04-13 Thread Jacob Keller
On Wed, Apr 12, 2017 at 11:28 PM, Junio C Hamano  wrote:
> Jeff King  writes:
>
>> On Wed, Apr 05, 2017 at 03:04:24PM +0200, Tom G. Christensen wrote:
>> ...
>> These kinds of interleaved conditionals make me nervous that we'll get
>> something wrong (especially without braces, it's not immediately clear
>> that both sides are a single statement).
>>
>> I wonder if it would be more readable to do something like:
>>
>>   #if LIBCURL_VERSION_NUM < 0x070c00
>>   static const char *curl_easy_strerror(CURL *curl)
>>   {
>>   return "[error code unavailable; curl version too old]";
>>   }
>>   #endif
>>
>> Then callers don't have to individually deal with the ifdef. It does
>> mean that the user sees that kind-of ugly message, but maybe that is a
>> good thing. They know they need to upgrade curl to see more details.
>
> Yup, thanks for a very good suggestion.

I also think this is a good solution..

Thanks,
Jake


Re: [PATCH] status: show in-progress info for short status

2017-04-13 Thread SZEDER Gábor
On Fri, Apr 7, 2017 at 4:05 PM, Michael J Gruber  wrote:
> SZEDER Gábor venit, vidit, dixit 06.04.2017 16:33:
>>> @@ -1779,6 +1780,31 @@ static void wt_shortstatus_print_tracking(struct 
>>> wt_status *s)
>>>  }
>>>
>>>  color_fprintf(s->fp, header_color, "]");
>>> +
>>> + inprogress:
>>> +if (!s->show_inprogress)
>>> +goto conclude;
>>> +memset(, 0, sizeof(state));
>>> +wt_status_get_state(,
>>> +s->branch && !strcmp(s->branch, "HEAD"));
>>> +if (state.merge_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("MERGING")));
>>> +else if (state.am_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", LABEL(N_("AM")));
>>> +else if (state.rebase_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("REBASE-m")));

In the prompt "REBASE-m" is only shown during 'rebase --merge', while
"REBASE" is shown during a plain 'rebase'.  Here "REBASE-m" will be
shown during both.

>>> +else if (state.rebase_interactive_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("REBASE-i")));
>>> +else if (state.cherry_pick_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("CHERRY-PICKING")));
>>> +else if (state.revert_in_progress)
>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("REVERTING")));
>>> +if (state.bisect_in_progress)
>>
>> else if?
>
> No. You can do all of the above during a bisect.

Right indeed.  I think the prompt should do the same.  Onto the TODO
list it goes, then.

>>> +color_fprintf(s->fp, header_color, "; %s", 
>>> LABEL(N_("BISECTING")));
>>> +free(state.branch);
>>> +free(state.onto);
>>> +free(state.detached_from);
>>> +
>>>   conclude:
>>>  fputc(s->null_termination ? '\0' : '\n', s->fp);
>>>  }
>>
>> This reminded me of a patch that I have been using for almost two
>> years now...
>>
>> git-prompt.sh's similar long conditional chain to show the ongoing
>> operation has an else-branch at the end showing "AM/REBASE".  Your
>> patch doesn't add an equivalent branch.  Is this intentional or an
>> oversight?
>
> I go over all states that wt_status_get_state can give.

Yeah, but are those states exclusive?  Or is it possible that both
'am_in_progress' and 'rebase_in_progress' are set?  I suppose it's not
possible.
It would certainly be more obvious if it were a single enum field
instead of a bunch of ints.

>> I suppose it's intentional, because that "AM/REBASE" branch in the
>> prompt seems to be unreachable (see below), but I never took the
>> effort to actually check that (hence the "seems" and that's why I
>> never submitted it).
>
> Note that wt_status_get_state and the prompt script do things quite
> differently.
>
> I suppose that the prompt could make use of the in-progress info being
> exposed by "git status" rather doing its on thing.

The prompt currently gets all this in-progress info using only Bash
builtins, which is much faster than running a git process in a
subshell.  This is especially important on Windows, where the overhead
of running a git process is large enough to make the runtime of
__git_ps1() noticeable when displaying the prompt.  That's the main
reason for much of the shell complexity and ugliness in git-prompt.sh.

Besides, current output formats make 'git status' unsuitable for the
prompt.


[PATCHv2.1] t6500: wait for detached auto gc at the end of the test script

2017-04-13 Thread SZEDER Gábor
The last test in 't6500-gc', 'background auto gc does not run if
gc.log is present and recent but does if it is old', added in
a831c06a2 (gc: ignore old gc.log files, 2017-02-10), may sporadically
trigger an error message from the test harness:

  rm: cannot remove 'trash directory.t6500-gc/.git/objects': Directory not empty

The test in question ends with executing an auto gc in the backround,
which occasionally takes so long that it's still running when
'test_done' is about to remove the trash directory.  This 'rm -rf
$trash' in the foreground might race with the detached auto gc to
create and delete files and directories, and gc might (re-)create a
path that 'rm' already visited and removed, triggering the above error
message when 'rm' attempts to remove its parent directory.

Commit bb05510e5 (t5510: run auto-gc in the foreground, 2016-05-01)
fixed the same problem in a different test script by simply
disallowing background gc.  Unfortunately, what worked there is not
applicable here, because the purpose of this test is to check the
behavior of a detached auto gc.

Make sure that the test doesn't continue before the gc is finished in
the background with a clever bit of shell trickery:

  - Open fd 9 in the shell, to be inherited by the background gc
process, because our daemonize() only closes the standard fds 0,
1 and 2.
  - Duplicate this fd 9 to stdout.
  - Read 'git gc's stdout, and thus fd 9, through a command
substitution.  We don't actually care about gc's output, but this
construct has two useful properties:
  - This read blocks until stdout or fd 9 are open.  While stdout is
closed after the main gc process creates the background process
and exits, fd 9 remains open until the backround process exits.
  - The variable assignment from the command substitution gets its
exit status from the command executed within the command
substitution, i.e. a failing main gc process will cause the test
to fail.

Note, that this fd trickery doesn't work on Windows, because due to
MSYS limitations the git process only inherits the standard fds 0, 1
and 2 from the shell.  Luckily, it doesn't matter in this case,
because on Windows daemonize() is basically a noop, thus 'git gc
--auto' always runs in the foreground.

And since we can now continue the test reliably after the detached gc
finished, check that there is only a single packfile left at the end,
i.e. that the detached gc actually did what it was supposed to do.
Also add a comment at the end of the test script to warn developers of
future tests about this issue of long running detached gc processes.

Helped-by: Jeff King 
Helped-by: Johannes Sixt 
Signed-off-by: SZEDER Gábor 
---

Updated subject line, but otherwise the same as v2.

 t/t6500-gc.sh | 18 +-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index 08de2e8ab..cc7acd101 100755
--- a/t/t6500-gc.sh
+++ b/t/t6500-gc.sh
@@ -67,6 +67,16 @@ test_expect_success 'auto gc with too many loose objects 
does not attempt to cre
test_line_count = 2 new # There is one new pack and its .idx
 '
 
+run_and_wait_for_auto_gc () {
+   # We read stdout from gc for the side effect of waiting until the
+   # background gc process exits, closing its fd 9.  Furthermore, the
+   # variable assignment from a command substitution preserves the
+   # exit status of the main gc process.
+   # Note: this fd trickery doesn't work on Windows, but there is no
+   # need to, because on Win the auto gc always runs in the foreground.
+   doesnt_matter=$(git gc --auto 9>&1)
+}
+
 test_expect_success 'background auto gc does not run if gc.log is present and 
recent but does if it is old' '
test_commit foo &&
test_commit bar &&
@@ -80,7 +90,13 @@ test_expect_success 'background auto gc does not run if 
gc.log is present and re
test-chmtime =-345600 .git/gc.log &&
test_must_fail git gc --auto &&
test_config gc.logexpiry 2.days &&
-   git gc --auto
+   run_and_wait_for_auto_gc &&
+   ls .git/objects/pack/pack-*.pack >packs &&
+   test_line_count = 1 packs
 '
 
+# DO NOT leave a detached auto gc process running near the end of the
+# test script: it can run long enough in the background to racily
+# interfere with the cleanup in 'test_done'.
+
 test_done
-- 
2.12.2.613.g9c5b79913



Re: [PATCH] status: show in-progress info for short status

2017-04-13 Thread Jacob Keller
On Wed, Apr 12, 2017 at 11:09 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> Personally, I would want this to become the default and not have a new
>> option to trigger it. I think we could also extend the porcelain
>> format to include this information as well, but I'm not too familiar
>> with how the v2 format extends or not.
>
> I think the general rule of thumb for --porcelain is that we can
> freely introduce new record types without version bump, and expect
> the reading scripts to ignore unrecognised records (we may need to
> describe this a bit more strongly in our document, though), while
> changes to the format of existing records must require a command
> line option that cannot be turned on by default with configuration
> (or a version bump, if you want to change the output format by
> default).
>
> I am getting the impression that this "we are doing X" is a new and
> discinct record type that existing readers can safely ignore?  If
> that is the case, it may be better to add it without making it
> optional.

I think we are safe in extending porcelain v2.

Thanks,
Jake


Re: [PATCH] status: show in-progress info for short status

2017-04-13 Thread Jacob Keller
On Wed, Apr 12, 2017 at 11:09 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> Personally, I would want this to become the default and not have a new
>> option to trigger it. I think we could also extend the porcelain
>> format to include this information as well, but I'm not too familiar
>> with how the v2 format extends or not.
>
> I think the general rule of thumb for --porcelain is that we can
> freely introduce new record types without version bump, and expect
> the reading scripts to ignore unrecognised records (we may need to
> describe this a bit more strongly in our document, though), while
> changes to the format of existing records must require a command
> line option that cannot be turned on by default with configuration
> (or a version bump, if you want to change the output format by
> default).
>
> I am getting the impression that this "we are doing X" is a new and
> discinct record type that existing readers can safely ignore?  If
> that is the case, it may be better to add it without making it
> optional.

Correct. But either way we should be free to change and extend the
non-porcelain format without worry I thought?

Thanks,
Jake


Re: [PATCH 5/7] Add support for gnupg < 1.4

2017-04-13 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Wed, Apr 5, 2017 at 3:04 PM, Tom G. Christensen  
> wrote:
>> This adds an OLD_GNUPG define to the Makefile which when activated will
>> ensure git does not use the --keyid-format argument when calling the
>> 'gpg' program.
>> This is consistent with how 'gpg' was used in git < 2.10.0 and slightly
>> decreases security.
>
> This changes the code Linus Torvalds added in b624a3e67f to mitigate
> the evil32 project generating keys which looked the same for 32 bit
> signatures.
>
> I think this change makes sense, but the Makefile should have a
> slightly scarier warning, something like:
>
> "Define OLD_GNUPG if you need support for gnupg <1.4. Note that this
> will cause git to only show the first 32 bits of PGP keys instead of
> 64, and there's a wide variety of brute-forced 32 bit keys in the wild
> thanks to the evil32 project (https://evil32.com). Enabling this will
> make GPG work old versions, but you might be fooled into accepting
> malicious keys as a result".

Very good point.  It surprised me somewhat to see that this is the
only change necessary (iow, there is no need to tweak anything in t/
directory).  Perhaps we would need a test or two (and need to figure
out a way to make them work with both old and more recent GnuPG)?


Re: [PATCH 7/7] Do not use curl_easy_strerror with curl < 7.12.0

2017-04-13 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Apr 05, 2017 at 03:04:24PM +0200, Tom G. Christensen wrote:
> ...
> These kinds of interleaved conditionals make me nervous that we'll get
> something wrong (especially without braces, it's not immediately clear
> that both sides are a single statement).
>
> I wonder if it would be more readable to do something like:
>
>   #if LIBCURL_VERSION_NUM < 0x070c00
>   static const char *curl_easy_strerror(CURL *curl)
>   {
>   return "[error code unavailable; curl version too old]";
>   }
>   #endif
>
> Then callers don't have to individually deal with the ifdef. It does
> mean that the user sees that kind-of ugly message, but maybe that is a
> good thing. They know they need to upgrade curl to see more details.

Yup, thanks for a very good suggestion.


Re: [PATCH v2 2/2] name-hash: fix buffer overrun

2017-04-13 Thread Junio C Hamano
g...@jeffhostetler.com writes:

> From: Kevin Willford 
>
> Add check for the end of the entries for the thread partition.
> Add test for lazy init name hash with specific directory structure
>
> The lazy init hash name was causing a buffer overflow when the last
> entry in the index was multiple folder deep with parent folders that
> did not have any files in them.
>
> This adds a test for the boundary condition of the thread partitions
> with the folder structure that was triggering the buffer overflow.
> The test is skipped on single-cpu machines because the original code
> path is used in name-hash.c
>
> The fix was to check if it is the last entry for the thread partition
> in the handle_range_dir and not try to use the next entry in the cache.

As I merged the older one already to 'next', I'll queue 1/2 of v2 on
top of them and then the following (which is incremental between v1
and this v2 2/2) on top.

-- >8 --
From: Kevin Willford 
Date: Mon, 3 Apr 2017 15:16:42 +
Subject: [PATCH] t3008: skip lazy-init test on a single-core box

The lazy-init codepath will not be exercised uniless threaded.  Skip
the entire test on a single-core box.  Also replace a hard-coded
constant of 2000 (number of cache entries to manifacture for tests)
with a variable with a human readable name.

Signed-off-by: Kevin Willford 
Signed-off-by: Johannes Schindelin 
Signed-off-by: Jeff Hostetler 
Signed-off-by: Junio C Hamano 
---
 t/t3008-ls-files-lazy-init-name-hash.sh | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh 
b/t/t3008-ls-files-lazy-init-name-hash.sh
index 971975bff4..bdf5198b7e 100755
--- a/t/t3008-ls-files-lazy-init-name-hash.sh
+++ b/t/t3008-ls-files-lazy-init-name-hash.sh
@@ -4,14 +4,22 @@ test_description='Test the lazy init name hash with various 
folder structures'
 
 . ./test-lib.sh
 
+if test 1 -eq $($GIT_BUILD_DIR/t/helper/test-online-cpus)
+then
+   skip_all='skipping lazy-init tests, single cpu'
+   test_done
+fi
+
+LAZY_THREAD_COST=2000
+
 test_expect_success 'no buffer overflow in lazy_init_name_hash' '
(
-   test_seq 2000 | sed "s/^/a_/"
+   test_seq $LAZY_THREAD_COST | sed "s/^/a_/"
echo b/b/b
-   test_seq 2000 | sed "s/^/c_/"
+   test_seq $LAZY_THREAD_COST | sed "s/^/c_/"
test_seq 50 | sed "s/^/d_/" | tr "\n" "/"; echo d
) |
-   sed -e "s/^/100644 $EMPTY_BLOB  /" |
+   sed "s/^/100644 $EMPTY_BLOB /" |
git update-index --index-info &&
test-lazy-init-name-hash -m
 '
-- 
2.12.2-776-gded3dc243c



Re: [PATCH] status: show in-progress info for short status

2017-04-13 Thread Junio C Hamano
Jacob Keller  writes:

> Personally, I would want this to become the default and not have a new
> option to trigger it. I think we could also extend the porcelain
> format to include this information as well, but I'm not too familiar
> with how the v2 format extends or not.

I think the general rule of thumb for --porcelain is that we can
freely introduce new record types without version bump, and expect
the reading scripts to ignore unrecognised records (we may need to
describe this a bit more strongly in our document, though), while
changes to the format of existing records must require a command
line option that cannot be turned on by default with configuration
(or a version bump, if you want to change the output format by
default).

I am getting the impression that this "we are doing X" is a new and
discinct record type that existing readers can safely ignore?  If
that is the case, it may be better to add it without making it
optional.