from:"Thomas Gummerer"

Re: [PATCH/RFC v3 00/14] Introduce new commands switch-branch and restore-files

2018-12-02 Thread Thomas Gummerer

On 11/30, Junio C Hamano wrote:
> 
> I am unsure about the wisdom of calling it "--index", though.
> 
> The "--index" option is "the command can work only on the index, or
> only on the working tree files, or on both the index and the working
> tree files, and this option tells it to work in the 'both the index
> and the working tree files' mode", but when "restore-files" touches
> paths, it always modifies both the index and the working tree, so
> the "--index" option does not capture the differences well in this
> context [*1*].  As I saw this was described as "not using the usual
> 'overlay' semantics [*2*]", perhaps --overlay/--no-overlay option
> that defaults to --no-overlay is easier to explain.

Agreed, I think --{no-,}overlay is a much better name for the option,
I'll use that for my patch series (I hope to send that soon after 2.20
is released).

I must admit that I was not aware that the mode is called overlay
mode, before you explained it to me, so I wouldn't expect most users
to know either.  But as it's easy to explain that probably doesn't
matter much.

> side note 1.  I think the original mention of "--index" came in
> the context of contrasting "git reset" with "git checkout".
> "git reset (--hard|--mixed) -- " (that does not move
> HEAD), which does not but perhaps should exist, is very much
> like "git checkout -- ", and if "reset" were written
> after the "--index/--cached" convention was established, "reset
> --hard" would have called "reset --index" while "reset --mixed"
> would have been "reset --cached" (i.e. only work on the index
> and not on the working tree).  And "reset --index "
> would have worked by removing paths in  that are not
> in the HEAD and updating paths in  that are in the
> HEAD, i.e. identical to the non overlay behaviour proposed for
> the "git checkout" command.  So calling the non overlay mode
> "--index" makes sense in the context of discussing "git reset",
> and not in the context of "git checkout".
> 
> side note 2.  "git checkout  " grabs entries
> from the  that patch  and adds them to the
> index and checks them out to the working tree.  If the original
> index has entries that match  but do not appear in
> , they are left in the result.  That is "overlaying
> what was taken from the  on top of what is in the
> index".
> 
> Having said all that, I will not be looking at the series until 2.20
> final.  And I hope more people do the same to concentrate on helping
> us prevent the last minute glitch slipping in the final release.
> 
> Thanks.

Re: [PATCH v11 03/22] strbuf.c: add `strbuf_insertf()` and `strbuf_vinsertf()`

2018-11-27 Thread Thomas Gummerer

On 11/27, Johannes Schindelin wrote:
> Hi,
> 
> On Sun, 25 Nov 2018, Thomas Gummerer wrote:
> 
> > On 11/23, Paul-Sebastian Ungureanu wrote:
> > > Implement `strbuf_insertf()` and `strbuf_vinsertf()` to
> > > insert data using a printf format string.
> > > 
> > > Original-idea-by: Johannes Schindelin 
> > > Signed-off-by: Paul-Sebastian Ungureanu 
> > > ---
> > >  strbuf.c | 36 
> > >  strbuf.h |  9 +
> > >  2 files changed, 45 insertions(+)
> > > 
> > > diff --git a/strbuf.c b/strbuf.c
> > > index 82e90f1dfe..bfbbdadbf3 100644
> > > --- a/strbuf.c
> > > +++ b/strbuf.c
> > > @@ -249,6 +249,42 @@ void strbuf_insert(struct strbuf *sb, size_t pos, 
> > > const void *data, size_t len)
> > >   strbuf_splice(sb, pos, 0, data, len);
> > >  }
> > >  
> > > +void strbuf_vinsertf(struct strbuf *sb, size_t pos, const char *fmt, 
> > > va_list ap)
> > > +{
> > > + int len, len2;
> > > + char save;
> > > + va_list cp;
> > > +
> > > + if (pos > sb->len)
> > > + die("`pos' is too far after the end of the buffer");
> > 
> > I was going to ask about translation of this and other messages in
> > 'die()' calls, but I see other messages in strbuf.c are not marked for
> > translation either.  It may make sense to mark them all for
> > translation at some point in the future, but having them all
> > untranslated for now makes sense.
> > 
> > In the long run it may even be better to return an error here rather
> > than 'die()'ing, but again this is consistent with the rest of the
> > API, so this wouldn't be a good time to take that on.
> 
> I guess I was too overzealous in my copying. These conditions really
> indicate bugs in the caller... So I'd actually rather change that die() to
> BUG().
> 
> But then, the original code in strbuf_vaddf() calls die() and would have
> to be changed, too.

Right, making these 'BUG()' makes sense to me.  But at this stage of
the series it's probably better to just aim for consistency with the
surrounding code without starting to do more cleanups that were not
included in earlier iterations.  I think that's best left for patches
on top.

> > > + va_copy(cp, ap);
> > > + len = vsnprintf(sb->buf + sb->len, 0, fmt, cp);
> > 
> > Here we're just getting the length of what we're trying to format
> > (excluding the final NUL).  As the second argument is 0, we do not
> > modify the strbuf at this point...
> > 
> > > + va_end(cp);
> > > + if (len < 0)
> > > + BUG("your vsnprintf is broken (returned %d)", len);
> > > + if (!len)
> > > + return; /* nothing to do */
> > > + if (unsigned_add_overflows(sb->len, len))
> > > + die("you want to use way too much memory");
> > > + strbuf_grow(sb, len);
> > 
> > ... and then we grow the strbuf by the length we previously, which
> > excludes the NUL character, plus one extra character, so even if pos
> > == len we are sure to have enough space in the strbuf ...
> > 
> > > + memmove(sb->buf + pos + len, sb->buf + pos, sb->len - pos);
> > > + /* vsnprintf() will append a NUL, overwriting one of our characters */
> > > + save = sb->buf[pos + len];
> > > + len2 = vsnprintf(sb->buf + pos, sb->alloc - sb->len, fmt, ap);
> > 
> > ... and we use vsnprintf to write the formatted string to the
> > beginning of the buffer.
> 
> It is not actually the beginning of the buffer, but possibly the middle of
> the buffer ;-)

Oops, you're right of course :) 

> > sb->alloc - sb->len can be larger than 'len', which is fine as vsnprintf
> > doesn't write anything after the NUL character.  And as 'strbuf_grow'
> > adds len + 1 bytes to the strbuf we'll always have enough space for
> > adding the formatted string ...
> > 
> > > + sb->buf[pos + len] = save;
> > > + if (len2 != len)
> > > + BUG("your vsnprintf is broken (returns inconsistent lengths)");
> > > + strbuf_setlen(sb, sb->len + len);
> > 
> > And finally we set the strbuf to the new length.  So all this is just
> > a very roundabout way to say that this function does the right thing
> > according to my reading (and tests).
> 
> :-)
> 
> It seems that Junio likes this way of reviewing, giving him confidence
> that the review was thorough.
>
> Thanks!
> Dscho
> 
> > > +}
&

Re: [RFC] Introduce two new commands, switch-branch and restore-paths

2018-11-25 Thread Thomas Gummerer

On 11/20, Duy Nguyen wrote:
> On Mon, Nov 19, 2018 at 04:19:53PM +0100, Duy Nguyen wrote:
> > I promise to come back with something better (at least it still
> > sounds better in my mind). If that idea does not work out, we can
> > come back and see if we can improve this.
> 
> So this is it. The patch isn't pretty, mostly as a proof of
> concept. Just look at the three functions at the bottom of checkout.c,
> which is the main thing.
> 
> This patch tries to split "git checkout" command in two new ones:
> 
> - git switch-branch is all about switching branches
> - git restore-paths (maybe restore-file is better) for checking out
>   paths
> 
> The main idea is these two commands will co-exist with the good old
> 'git checkout', which will NOT be deprecated. Old timers will still
> use "git checkout". But new people should be introduced to the new two
> instead. And the new ones are just as capable as "git checkout".
> 
> Since the three commands will co-exist (with duplicate functionality),
> maintenance cost must be kept to minimum. The way I did this is simply
> split the command line options into three pieces: common,
> switch-branch and checkout-paths. "git checkout" has all three, the
> other two have common and another piece.
>
> With this, a new option added to git checkout will be automatically
> available in either switch-branch or checkout-paths. Bug fixes apply
> to all relevant commands.
> 
> Later on, we could start to add a bit more stuff in, e.g. some form of
> disambiguation is no longer needed when running as switch-branch, or
> restore-paths.
> 
> So, what do you think?

I like the idea of splitting those commands up, in fact it is
something I've been considering working on myself.  I do think we
should consider if we want to change the behaviour of those new
commands in any way compared to 'git checkout', since we're starting
with a clean slate.

One thing in particular that I have in mind is something I'm currently
working on, namely adding a --index flag to 'git checkout', which
would make 'git checkout' work in non-overlay mode (for more
discussion on that see also [*1*].  I got something working, that
needs to be polished a bit and am hoping to send that to the list
sometime soon.

I wonder if such the --index behaviour could be the default in
restore-paths command?

Most of the underlying machinery for 'checkout' could and should of
course still be shared between the commands.

*1*: 

> -- 8< --
> diff --git a/builtin.h b/builtin.h
> index 6538932e99..6e321ec8a4 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -214,6 +214,7 @@ extern int cmd_remote_fd(int argc, const char **argv, 
> const char *prefix);
>  extern int cmd_repack(int argc, const char **argv, const char *prefix);
>  extern int cmd_rerere(int argc, const char **argv, const char *prefix);
>  extern int cmd_reset(int argc, const char **argv, const char *prefix);
> +extern int cmd_restore_paths(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_rev_list(int argc, const char **argv, const char *prefix);
>  extern int cmd_rev_parse(int argc, const char **argv, const char *prefix);
>  extern int cmd_revert(int argc, const char **argv, const char *prefix);
> @@ -227,6 +228,7 @@ extern int cmd_show_index(int argc, const char **argv, 
> const char *prefix);
>  extern int cmd_status(int argc, const char **argv, const char *prefix);
>  extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
>  extern int cmd_submodule__helper(int argc, const char **argv, const char 
> *prefix);
> +extern int cmd_switch_branch(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
>  extern int cmd_tag(int argc, const char **argv, const char *prefix);
>  extern int cmd_tar_tree(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index acdafc6e4c..868ca3c223 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -33,6 +33,16 @@ static const char * const checkout_usage[] = {
>   NULL,
>  };
>  
> +static const char * const switch_branch_usage[] = {
> + N_("git switch-branch [] "),
> + NULL,
> +};
> +
> +static const char * const restore_paths_usage[] = {
> + N_("git restore-paths [] [] -- ..."),
> + NULL,
> +};
> +
>  struct checkout_opts {
>   int patch_mode;
>   int quiet;
> @@ -44,6 +54,7 @@ struct checkout_opts {
>   int ignore_skipworktree;
>   int ignore_other_worktrees;
>   int show_progress;
> + int dwim_new_local_branch;
>   /*
>* If new checkout options are added, skip_merge_working_tree
>* should be updated accordingly.
> @@ -55,6 +66,7 @@ struct checkout_opts {
>   int new_branch_log;
>   enum branch_track track;
>   struct diff_options diff_options;
> + char *conflict_style;
>  
>   int branch_exists;
>   const char *prefix;
> @@ -1223,78 +1235,105 @@ static int

Re: t5570 shaky for anyone ?

2018-11-25 Thread Thomas Gummerer

On 11/25, Torsten Bögershausen wrote:
> After running the  "Git 2.20-rc1" testsuite here on a raspi,
> the only TC that failed was t5570.
> When the "grep" was run on daemon.log, the file was empty (?).
> When inspecting it later, it was filled, and grep would have found
> the "extended.attribute" it was looking for.

I believe this has been reported before in
https://public-inbox.org/git/1522783990.964448.1325338528.0d49c...@webmail.messagingengine.com/,
but it seems like the thread never ended with actually fixing it.
Reading the first reply Peff seemed to be fine with just removing the
test completely, which would be the easiest solution ;)  Adding him to
Cc: here.  

> The following fixes it, but I am not sure if this is the ideal
> solution.
> 
> 
> diff --git a/t/t5570-git-daemon.sh b/t/t5570-git-daemon.sh
> index 7466aad111..e259fee0ed 100755
> --- a/t/t5570-git-daemon.sh
> +++ b/t/t5570-git-daemon.sh
> @@ -192,6 +192,7 @@ test_expect_success 'daemon log records all attributes' '
>   GIT_OVERRIDE_VIRTUAL_HOST=localhost \
>   git -c protocol.version=1 \
>   ls-remote "$GIT_DAEMON_URL/interp.git" &&
> + sleep 1 &&
>   grep -i extended.attribute daemon.log | cut -d" " -f2- >actual &&
>   test_cmp expect actual
>  '
> 
> A slightly better approach may be to use a "sleep on demand":
> 
> + ( grep -i -q extended.attribute daemon.log || sleep 1 ) &&
>

Re: [PATCH v11 00/22] Convert "git stash" to C builtin

2018-11-25 Thread Thomas Gummerer

On 11/23, Paul-Sebastian Ungureanu wrote:
> Hello,
> 
> This is the 11th iteration of C git stash. Here are some of the changes,
> based on Thomas's and dscho's suggestions (from mailing list / pull request
> #495):

Thanks for your work on this!  I have read through the range-diff and
the new patch of this last round, and this addresses all the comments
I had on v10 (and some more :)).  I consider it
Reviewed-by: Thomas Gummerer 

> - improved memory management. Now, the callers of `do_create_stash()`
> are responsible of freeing the parameter they pass in. Moreover, the
> stash message is now a pointer to a buffer (in the previous iteration
> it was a pointer to a string). This should make it more clear who is
> responsible of freeing the memory.
> 
> - added `strbuf_insertf()` which inserts a format string at a given
> position in the buffer.
> 
> - some minor changes (changed "!oidcmp" to "oideq")
> 
> - fixed merge conflicts
> 
> Best regards,
> Paul
> 
> Joel Teichroeb (5):
>   stash: improve option parsing test coverage
>   stash: convert apply to builtin
>   stash: convert drop and clear to builtin
>   stash: convert branch to builtin
>   stash: convert pop to builtin
> 
> Paul-Sebastian Ungureanu (17):
>   sha1-name.c: add `get_oidf()` which acts like `get_oid()`
>   strbuf.c: add `strbuf_join_argv()`
>   strbuf.c: add `strbuf_insertf()` and `strbuf_vinsertf()`
>   t3903: modernize style
>   stash: rename test cases to be more descriptive
>   stash: add tests for `git stash show` config
>   stash: mention options in `show` synopsis
>   stash: convert list to builtin
>   stash: convert show to builtin
>   stash: convert store to builtin
>   stash: convert create to builtin
>   stash: convert push to builtin
>   stash: make push -q quiet
>   stash: convert save to builtin
>   stash: convert `stash--helper.c` into `stash.c`
>   stash: optimize `get_untracked_files()` and `check_changes()`
>   stash: replace all `write-tree` child processes with API calls
> 
>  Documentation/git-stash.txt  |4 +-
>  Makefile |2 +-
>  builtin.h|1 +
>  builtin/stash.c  | 1596 ++
>  cache.h  |1 +
>  git-stash.sh |  752 
>  git.c|1 +
>  sha1-name.c  |   19 +
>  strbuf.c |   51 ++
>  strbuf.h |   16 +
>  t/t3903-stash.sh |  192 ++--
>  t/t3907-stash-show-config.sh |   83 ++
>  12 files changed, 1897 insertions(+), 821 deletions(-)
>  create mode 100644 builtin/stash.c
>  delete mode 100755 git-stash.sh
>  create mode 100755 t/t3907-stash-show-config.sh
> 
> -- 
> 2.19.1.878.g0482332a22
>

Re: [PATCH v11 03/22] strbuf.c: add `strbuf_insertf()` and `strbuf_vinsertf()`

2018-11-25 Thread Thomas Gummerer

On 11/23, Paul-Sebastian Ungureanu wrote:
> Implement `strbuf_insertf()` and `strbuf_vinsertf()` to
> insert data using a printf format string.
> 
> Original-idea-by: Johannes Schindelin 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  strbuf.c | 36 
>  strbuf.h |  9 +
>  2 files changed, 45 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index 82e90f1dfe..bfbbdadbf3 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -249,6 +249,42 @@ void strbuf_insert(struct strbuf *sb, size_t pos, const 
> void *data, size_t len)
>   strbuf_splice(sb, pos, 0, data, len);
>  }
>  
> +void strbuf_vinsertf(struct strbuf *sb, size_t pos, const char *fmt, va_list 
> ap)
> +{
> + int len, len2;
> + char save;
> + va_list cp;
> +
> + if (pos > sb->len)
> + die("`pos' is too far after the end of the buffer");

I was going to ask about translation of this and other messages in
'die()' calls, but I see other messages in strbuf.c are not marked for
translation either.  It may make sense to mark them all for
translation at some point in the future, but having them all
untranslated for now makes sense.

In the long run it may even be better to return an error here rather
than 'die()'ing, but again this is consistent with the rest of the
API, so this wouldn't be a good time to take that on.

> + va_copy(cp, ap);
> + len = vsnprintf(sb->buf + sb->len, 0, fmt, cp);

Here we're just getting the length of what we're trying to format
(excluding the final NUL).  As the second argument is 0, we do not
modify the strbuf at this point...

> + va_end(cp);
> + if (len < 0)
> + BUG("your vsnprintf is broken (returned %d)", len);
> + if (!len)
> + return; /* nothing to do */
> + if (unsigned_add_overflows(sb->len, len))
> + die("you want to use way too much memory");
> + strbuf_grow(sb, len);

... and then we grow the strbuf by the length we previously, which
excludes the NUL character, plus one extra character, so even if pos
== len we are sure to have enough space in the strbuf ...

> + memmove(sb->buf + pos + len, sb->buf + pos, sb->len - pos);
> + /* vsnprintf() will append a NUL, overwriting one of our characters */
> + save = sb->buf[pos + len];
> + len2 = vsnprintf(sb->buf + pos, sb->alloc - sb->len, fmt, ap);

... and we use vsnprintf to write the formatted string to the
beginning of the buffer.  sb->alloc - sb->len can be larger than
'len', which is fine as vsnprintf doesn't write anything after the NUL
character.  And as 'strbuf_grow' adds len + 1 bytes to the strbuf
we'll always have enough space for adding the formatted string ...

> + sb->buf[pos + len] = save;
> + if (len2 != len)
> + BUG("your vsnprintf is broken (returns inconsistent lengths)");
> + strbuf_setlen(sb, sb->len + len);

And finally we set the strbuf to the new length.  So all this is just
a very roundabout way to say that this function does the right thing
according to my reading (and tests).

> +}
> +
> +void strbuf_insertf(struct strbuf *sb, size_t pos, const char *fmt, ...)
> +{
> + va_list ap;
> + va_start(ap, fmt);
> + strbuf_vinsertf(sb, pos, fmt, ap);
> + va_end(ap);
> +}
> +
>  void strbuf_remove(struct strbuf *sb, size_t pos, size_t len)
>  {
>   strbuf_splice(sb, pos, len, "", 0);
> diff --git a/strbuf.h b/strbuf.h
> index be02150df3..8f8fe01e68 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -244,6 +244,15 @@ void strbuf_addchars(struct strbuf *sb, int c, size_t n);
>   */
>  void strbuf_insert(struct strbuf *sb, size_t pos, const void *, size_t);
>  
> +/**
> + * Insert data to the given position of the buffer giving a printf format
> + * string. The contents will be shifted, not overwritten.
> + */
> +void strbuf_vinsertf(struct strbuf *sb, size_t pos, const char *fmt,
> +  va_list ap);
> +
> +void strbuf_insertf(struct strbuf *sb, size_t pos, const char *fmt, ...);
> +
>  /**
>   * Remove given amount of data from a given position of the buffer.
>   */
> -- 
> 2.19.1.878.g0482332a22
>

Re: Failed stash caused untracked changes to be lost

2018-11-05 Thread Thomas Gummerer

On 11/05, Quinn, David wrote:
> Hi,
>
> Thanks for the reply. Sorry I forgot the version number, completely
> slipped my mind. At the time of writing the report it was Git ~ 2.17
> I believe. All of our software is updated centrally at my work, we
> have received an update since writing this to 2.19.1. Unfortunately
> because of it being centrally controlled, I couldn't update and try
> the latest version at the time (and now I can't go back and check
> exactly what version I had).
>
> I've never even looked at the git source or contributing before so I
> wouldn't be sure where to start. If you (or someone) is happy to
> point me in the right direction I'd be happy to take a look, I can't
> promise I'll be able to get anything done in a timely manner (or at
> all)

Sure I'd be happy to help :) There's a nice document in the
git-for-windows repository [*1*] that gives a good introduction into
developing git.  Some of the advise is applicable to both Windows and
linux, but it should be especially helpful for you as you seem to be
working in a windows environment.

The stash implementation is currently written in shell script, and
lives in 'git-stash.sh'.  There is currently an effort under way of
re-writing this in C, but as we don't know when that's going to be
merged yet, it's probably worth fixing this in the shell script for
now.

Don't worry about making any promises, or getting it done very soon.
This is no longer a data loss bug at this time, so it's not critical
to fix it immediately, but it should definitely be fixed at some
point.

 Some of us also hang out on the #git-devel IRC channel on freenode,
which can be a good place to ask questions.

[*1*]: https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md

> Thanks
> 
> -Original Message-
> From: Thomas Gummerer  
> Sent: 03 November 2018 15:35
> To: Quinn, David 
> Cc: git@vger.kernel.org
> Subject: Re: Failed stash caused untracked changes to be lost
> 
> Exercise Caution: This email is from an external source.
> 
> 
> On 10/23, Quinn, David wrote:
> >
> > Issue: While running a git stash command including the '-u' flag to include 
> > untracked files, the command failed due to arguments in the incorrect 
> > order. After this untracked files the were present had been removed and 
> > permanently lost.
> 
> Thanks for your report (and sorry for the late reply)!
> 
> I believe this (somewhat) fixed in 833622a945 ("stash push: avoid printing 
> errors", 2018-03-19), which was first included in Git 2.18.
> Your message doesn't state which version of Git you encountered the bug, but 
> I'm going to assume with something below 2.18 (For future reference, please 
> include the version of Git in bug reports, or even better test with the 
> latest version of Git, as the bug may have been fixed in the meantime).
> 
> Now I'm saying somewhat fixed above, because we still create an stash if a 
> pathspec that doesn't match any files is passed to the command, but then 
> don't remove anything from the working tree, which is a bit confusing.
> 
> I think the right solution here would be to error out early if we were given 
> a pathspec that doesn't match anything.  I'll look into that, unless you're 
> interested in giving it a try? :)
> 
> > Environment: Windows 10, Powershell w/ PoshGit
> >
> >
> > State before running command: 9 Modified files, 2 (new) untracked 
> > files
> >
> > Note: I only wanted to commit some of the modified files (essentially 
> > all the files/changes I wanted to commit were in one directory)
> >
> > Actual command run:  git stash push -u -- Directory/To/Files/* -m "My 
> > Message"
> >
> > Returned:
> >
> > Saved working directory and index state WIP on [BranchName]: [Commit 
> > hash] [Commit Message]
> > fatal: pathspec '-m' did not match any files
> > error: unrecognized input
> >
> > State after Command ran: 9 Modifed files, 0 untracked files
> >
> >
> > The command I should have ran should have been
> >
> > git stash push -u -m "My Message"? -- Directory/To/Files/*
> >
> >
> > I have found the stash that was created by running this command:
> >
> > gitk --all $(git fsck --no-reflog | Select-String "(dangling 
> > commit )(.*)" | %{ $_.Line.Split(' ')[2] }) ?
> > and searching for the commit number that was returned from the original 
> > (paritally failed??) stash command. However there is nothing in that stash. 
> > It is empty.
> >
> >
> >
> > I think that the fact my untracked files were lost is not correct 
> > behaviour and hence why I'm filin

Re: Failed stash caused untracked changes to be lost

2018-11-03 Thread Thomas Gummerer

On 10/23, Quinn, David wrote:
> 
> Issue: While running a git stash command including the '-u' flag to include 
> untracked files, the command failed due to arguments in the incorrect order. 
> After this untracked files the were present had been removed and permanently 
> lost.

Thanks for your report (and sorry for the late reply)!

I believe this (somewhat) fixed in 833622a945 ("stash push: avoid
printing errors", 2018-03-19), which was first included in Git 2.18.
Your message doesn't state which version of Git you encountered the
bug, but I'm going to assume with something below 2.18 (For future
reference, please include the version of Git in bug reports, or even
better test with the latest version of Git, as the bug may have been
fixed in the meantime).

Now I'm saying somewhat fixed above, because we still create an stash
if a pathspec that doesn't match any files is passed to the command,
but then don't remove anything from the working tree, which is a bit
confusing.

I think the right solution here would be to error out early if we were
given a pathspec that doesn't match anything.  I'll look into that,
unless you're interested in giving it a try? :)

> Environment: Windows 10, Powershell w/ PoshGit
> 
> 
> State before running command: 9 Modified files, 2 (new) untracked files
> 
> Note: I only wanted to commit some of the modified files (essentially all the 
> files/changes I wanted to commit were in one directory)
> 
> Actual command run:  git stash push -u -- Directory/To/Files/* -m "My Message"
> 
> Returned:
> 
> Saved working directory and index state WIP on [BranchName]: [Commit 
> hash] [Commit Message]
> fatal: pathspec '-m' did not match any files
> error: unrecognized input
> 
> State after Command ran: 9 Modifed files, 0 untracked files
> 
> 
> The command I should have ran should have been
> 
> git stash push -u -m "My Message"? -- Directory/To/Files/*
> 
> 
> I have found the stash that was created by running this command:
> 
> gitk --all $(git fsck --no-reflog | Select-String "(dangling commit 
> )(.*)" | %{ $_.Line.Split(' ')[2] })
> ?
> and searching for the commit number that was returned from the original 
> (paritally failed??) stash command. However there is nothing in that stash. 
> It is empty.
> 
> 
> 
> I think that the fact my untracked files were lost is not correct behaviour 
> and hence why I'm filing this bug report
> 
> 
> 
> 
> 
> NOTICE: This message, and any attachments, are for the intended recipient(s) 
> only, may contain information that is privileged, confidential and/or 
> proprietary and subject to important terms and conditions available at 
> E-Communication 
> Disclaimer.
>  If you are not the intended recipient, please delete this message. CME Group 
> and its subsidiaries reserve the right to monitor all email communications 
> that occur on CME Group information systems.

Re: [PATCH] commit-reach: fix sorting commits by generation

2018-10-23 Thread Thomas Gummerer

On 10/22, René Scharfe wrote:
> Am 22.10.2018 um 23:10 schrieb Thomas Gummerer:
> > compare_commit_by_gen is used to sort a list of pointers to 'struct
> > commit'.  The comparison function for qsort is called with pointers to
> > the objects it needs to compare, so when sorting a list of 'struct
> > commit *', the arguments are of type 'struct commit **'.  However,
> > currently the comparison function casts it's arguments to 'struct
> > commit *' and uses those, leading to out of bounds memory access and
> > potentially to wrong results.  Fix that.
> > 
> > Signed-off-by: Thomas Gummerer 
> > ---
> > 
> > I noticed this by running the test suite through valgrind.  I'm not
> > familiar with this code, so I'm not sure why this didn't cause any
> > issues or how they would manifest, but this seems like the right fix
> > for this function either way.
> 
> Right; I sent a similar patch a while ago, but it seems to have fallen
> through the cracks:
> 
> https://public-inbox.org/git/d1b58614-989f-5998-6c53-c19eee409...@web.de/

Whoops I didn't notice that, I only checked whether the problem still
exists in pu.  I'd be more than happy to go with your patch instead.

> Anyway, your implied question was discussed back then.  Derrick wrote:
> 
>The reason to sort is to hopefully minimize the amount we walk by 
>exploring the "lower" commits first. This is a performance-only thing, 
>not a correctness issue (which is why the bug exists). Even then, it is 
>just a heuristic.

Thanks for pointing that out!

> Does b6723e4671 in pu (commit-reach: fix first-parent heuristic) change
> that picture?  Did a quick test and found no performance difference with
> and without the fix on top, i.e. proper sorting didn't seem to matter.

I just gave 'test-tool reach can_all_from_reach' a try and got the
same results, with or without the fix the times are very similar.  I
haven't had time to follow the commit-graph series though, so I'm not
sure I used it correctly.  I tried it on the linux repository with the
following input:

X:v4.10
X:v4.9
X:v4.8
X:v4.7
X:v4.6
X:v4.5
X:v4.4
X:v4.3
X:v4.2
X:v4.1
Y:v3.10
Y:v3.9
Y:v3.8
Y:v3.7
Y:v3.6
Y:v3.5
Y:v3.4
Y:v3.3
Y:v3.2
Y:v3.1

> >  commit-reach.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/commit-reach.c b/commit-reach.c
> > index bc522d6840..9efddfd7a0 100644
> > --- a/commit-reach.c
> > +++ b/commit-reach.c
> > @@ -516,8 +516,8 @@ int commit_contains(struct ref_filter *filter, struct 
> > commit *commit,
> >  
> >  static int compare_commits_by_gen(const void *_a, const void *_b)
> >  {
> > -   const struct commit *a = (const struct commit *)_a;
> > -   const struct commit *b = (const struct commit *)_b;
> > +   const struct commit *a = *(const struct commit **)_a;
> > +   const struct commit *b = *(const struct commit **)_b;
> >  
> > if (a->generation < b->generation)
> > return -1;
> > 
> 
> Looks good to me.
> 
> René

[PATCH] commit-reach: fix sorting commits by generation

2018-10-22 Thread Thomas Gummerer

compare_commit_by_gen is used to sort a list of pointers to 'struct
commit'.  The comparison function for qsort is called with pointers to
the objects it needs to compare, so when sorting a list of 'struct
commit *', the arguments are of type 'struct commit **'.  However,
currently the comparison function casts it's arguments to 'struct
commit *' and uses those, leading to out of bounds memory access and
potentially to wrong results.  Fix that.

Signed-off-by: Thomas Gummerer 
---

I noticed this by running the test suite through valgrind.  I'm not
familiar with this code, so I'm not sure why this didn't cause any
issues or how they would manifest, but this seems like the right fix
for this function either way.

 commit-reach.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index bc522d6840..9efddfd7a0 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -516,8 +516,8 @@ int commit_contains(struct ref_filter *filter, struct 
commit *commit,
 
 static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-   const struct commit *a = (const struct commit *)_a;
-   const struct commit *b = (const struct commit *)_b;
+   const struct commit *a = *(const struct commit **)_a;
+   const struct commit *b = *(const struct commit **)_b;
 
if (a->generation < b->generation)
return -1;
-- 
2.19.1.759.g500967bb5e

Re: [PATCH v10 00/21] Convert "git stash" to C builtin

2018-10-16 Thread Thomas Gummerer

On 10/16, Johannes Schindelin wrote:
> Hi Thomas,
> 
> On Mon, 15 Oct 2018, Thomas Gummerer wrote:
> 
> >  2:  63f2e0e6f9 !  2:  2d45985676 strbuf.c: add `strbuf_join_argv()`
> > @@ -14,19 +14,17 @@
> > strbuf_setlen(sb, sb->len + sb2->len);
> >   }
> >   
> > -+const char *strbuf_join_argv(struct strbuf *buf,
> > -+   int argc, const char **argv, char delim)
> > ++void strbuf_join_argv(struct strbuf *buf,
> > ++int argc, const char **argv, char delim)
> 
> While the patch series does not use the return value, I have to ask
> whether it would really be useful to change it to return `void`. I could
> imagine that there may already be quite a few code paths that would love
> to use strbuf_join_argv(), *and* would benefit from the `const char *`
> return value.

Fair enough.  I did suggest changing the return type to void here, as
I found the API a bit odd compared to the rest of the strbuf API,
however after looking at this again I agree with you, and returning a
const char * here does seem more helpful.  Sorry about the confusion
Paul-Sebastian!

> In other words: just because the *current* patches do not make use of that
> quite convenient return value does not mean that we should remove that
> convenience.
>
> >  7:  a2abd1b4bd !  8:  974dbaa492 stash: convert apply to builtin
> > @@ -370,18 +370,20 @@
> >  +
> >  +  if (diff_tree_binary(, >w_commit)) {
> >  +  strbuf_release();
> > -+  return -1;
> > ++  return error(_("Could not generate diff 
> > %s^!."),
> > ++   
> > oid_to_hex(>w_commit));
> 
> Please start the argument of an `error()` call with a lower-case letter.

I think this comes from your fixup! commit ;) But I do agree, these should be
lower-case.

> >  +  }
> >  +
> >  +  ret = apply_cached();
> >  +  strbuf_release();
> >  +  if (ret)
> > -+  return -1;
> > ++  return error(_("Conflicts in index."
> > ++ "Try without --index."));
> 
> Same here.
> 
> >  +
> >  +  discard_cache();
> >  +  read_cache();
> >  +  if (write_cache_as_tree(_tree, 0, NULL))
> > -+  return -1;
> > ++  return error(_("Could not save index 
> > tree"));
> 
> And here.
> 
> > 15:  bd827be103 ! 15:  989db67e9a stash: convert create to builtin
> > @@ -119,7 +119,6 @@
> >  +static int check_changes(struct pathspec ps, int include_untracked)
> >  +{
> >  +  int result;
> > -+  int ret = 0;
> 
> I was curious about this change, and could not find it in the
> git-stash-v10 tag of https://github.com/ungps/git...

This line has been removed in v10, but did exist in v9, so
the git-stash-v10 should indeed not have this line.  I suggested
removing it in [*1*], because it breaks compilation with DEVELOPER=1
at this step.

> > 18:  1c501ad666 ! 18:  c90e30173a stash: convert save to builtin
> > @@ -72,8 +72,10 @@
> >  +   git_stash_helper_save_usage,
> >  +   PARSE_OPT_KEEP_DASHDASH);
> >  +
> > -+  if (argc)
> > -+  stash_msg = (char*) strbuf_join_argv(, argc, argv, 
> > ' ');
> > ++  if (argc) {
> > ++  strbuf_join_argv(, argc, argv, ' ');
> > ++  stash_msg = buf.buf;
> > ++  }
> 
> Aha! So there *was* a user of that return value. I really would prefer a
> non-void return value here.

Right, I'd argue we're mis-using the API here though.  do_push_stash
who we later pass stash_msg to takes ownership and later free's the
memory before returning.  This doesn't cause issues in the test suite
at the moment, because do_create_stash() doesn't always free stash_msg
before assigning a new value to the pointer, but would cause issues
when do_create_stash exits early.

Rather than the solution I proposed in I think it would be nicer to
use 'stash_msg = strbuf_detach(...)' above.

I'm still happy with the function returning buf->buf as const char *,
but I'm not sure we should use that retur

Re: [PATCH v10 00/21] Convert "git stash" to C builtin

2018-10-15 Thread Thomas Gummerer

On 10/15, Paul-Sebastian Ungureanu wrote:
> Hello,
> 
> This is a new iteration of `git stash`, based on the last review.
> This iteration fixes some code styling issues, bring some changes
> to `do_push_stash()` and `do_create_stash()` to be closer to API by
> following Thomas Gummerer's review of last iteration [1]. Also, there
> were some missing messages [2], which are now included.

Thanks for your work!  I had two more comments (on the patches
inline).  Once those are addressed, I'd be happy for this to be merged
to 'next'.

Since I applied this locally, and it may help someone else who may
want to have a look at this, the range-diff is below:

 1:  b7224e494e =  1:  89142f99e7 sha1-name.c: add `get_oidf()` which acts like 
`get_oid()`
 2:  63f2e0e6f9 !  2:  2d45985676 strbuf.c: add `strbuf_join_argv()`
@@ -14,19 +14,17 @@
strbuf_setlen(sb, sb->len + sb2->len);
  }
  
-+const char *strbuf_join_argv(struct strbuf *buf,
-+   int argc, const char **argv, char delim)
++void strbuf_join_argv(struct strbuf *buf,
++int argc, const char **argv, char delim)
 +{
 +  if (!argc)
-+  return buf->buf;
++  return;
 +
 +  strbuf_addstr(buf, *argv);
 +  while (--argc) {
 +  strbuf_addch(buf, delim);
 +  strbuf_addstr(buf, *(++argv));
 +  }
-+
-+  return buf->buf;
 +}
 +
  void strbuf_addchars(struct strbuf *sb, int c, size_t n)
@@ -40,12 +38,12 @@
   */
  extern void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2);
  
-+
 +/**
-+ *
++ * Join the arguments into a buffer. `delim` is put between every
++ * two arguments.
 + */
-+extern const char *strbuf_join_argv(struct strbuf *buf, int argc,
-+  const char **argv, char delim);
++extern void strbuf_join_argv(struct strbuf *buf, int argc,
++   const char **argv, char delim);
 +
  /**
   * This function can be used to expand a format string containing
 3:  9b9433781b =  3:  63d10ee599 stash: improve option parsing test coverage
 4:  c1d38060c5 !  4:  a6953b57e5 stash: update test cases conform to coding 
guidelines
@@ -1,8 +1,9 @@
 Author: Paul-Sebastian Ungureanu 
 
-stash: update test cases conform to coding guidelines
+t3903: modernize style
 
-Removed whitespaces after redirection operators.
+Remove whitespaces after redirection operators and wrap
+long lines.
 
 Signed-off-by: Paul-Sebastian Ungureanu 

 
 5:  ac7a8267e6 =  5:  9985d8650b stash: rename test cases to be more 
descriptive
 6:  0e6458a280 =  6:  93d7e82b96 stash: add tests for `git stash show` config
13:  6e04c948cf !  7:  e06aca5ff5 stash: mention options in `show` synopsis.
@@ -1,9 +1,9 @@
 Author: Paul-Sebastian Ungureanu 
 
-stash: mention options in `show` synopsis.
+stash: mention options in `show` synopsis
 
-Mention in the usage text and in the documentation, that `show`
-accepts any option known to `git diff`.
+Mention in the documentation, that `show` accepts any
+option known to `git diff`.
 
 Signed-off-by: Paul-Sebastian Ungureanu 

 
@@ -28,25 +28,3 @@
  
Show the changes recorded in the stash entry as a diff between the
stashed contents and the commit back when the stash entry was first
-
- diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
- --- a/builtin/stash--helper.c
- +++ b/builtin/stash--helper.c
-@@
- 
- static const char * const git_stash_helper_usage[] = {
-   N_("git stash--helper list []"),
--  N_("git stash--helper show []"),
-+  N_("git stash--helper show [] []"),
-   N_("git stash--helper drop [-q|--quiet] []"),
-   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
[]"),
-   N_("git stash--helper branch  []"),
-@@
- };
- 
- static const char * const git_stash_helper_show_usage[] = {
--  N_("git stash--helper show []"),
-+  N_("git stash--helper show [] []"),
-   NULL
- };
- 
 7:  a2abd1b4bd !  8:  974dbaa492 stash: convert apply to builtin
@@ -121,8 +121,8 @@
 +  get_oidf(>w_tree, "%s:", revision) ||
 +  get_oidf(>b_tree, "%s^1:", revision) ||
 +  get_oidf(>i_tree, "%s^2:", revision)) {
-+  free_stash_info(info);
 +  error(_("'%s' is not a stash-like commit"), revision);
++  free_stash_info(info);
 +  exit(128);
 +  }
 +}
@@ -370,18 +370,20 @@
 +
 +  if (diff_tree_binary(, >w_commit)) {
 +  strbuf_release();
-+  return -1;
++  return error(_("Could not generate diff %s^!."),
++

Re: [PATCH v10 18/21] stash: convert save to builtin

2018-10-15 Thread Thomas Gummerer

On 10/15, Paul-Sebastian Ungureanu wrote:
> The `-m` option is no longer supported as it might not make
> sense to have two ways of passing a message. Even if this is
> a change in behaviour, the documentation remains the same
> because the `-m` parameter was omitted before.

[...]

> + OPT_STRING('m', "message", _msg, "message",
> +N_("stash message")),
> + OPT_END()

We do seem to support a '-m' option here though.  I'm happy not
supporting it, but the commit message seems to say otherwise.  I can't
remember the discussion her, but either the commit message, or the
option parsing should be updated here.

Re: [PATCH v10 19/21] stash: convert `stash--helper.c` into `stash.c`

2018-10-15 Thread Thomas Gummerer

On 10/15, Paul-Sebastian Ungureanu wrote:
> The old shell script `git-stash.sh`  was removed and replaced
> entirely by `builtin/stash.c`. In order to do that, `create` and
> `push` were adapted to work without `stash.sh`. For example, before
> this commit, `git stash create` called `git stash--helper create
> --message "$*"`. If it called `git stash--helper create "$@"`, then
> some of these changes wouldn't have been necessary.
> 
> This commit also removes the word `helper` since now stash is
> called directly and not by a shell script.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
> @@ -1138,7 +1133,6 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   fprintf_ln(stderr, _("You do not have "
>"the initial commit yet"));
>   ret = -1;
> - *stash_msg = NULL;
>   goto done;
>   } else {
>   head_commit = lookup_commit(the_repository, >b_commit);
> @@ -1146,7 +1140,6 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>  
>   if (!check_changes(ps, include_untracked)) {
>   ret = 1;
> - *stash_msg = NULL;
>   goto done;
>   }
>  
> @@ -1167,7 +1160,6 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   fprintf_ln(stderr, _("Cannot save the current "
>"index state"));
>   ret = -1;
> - *stash_msg = NULL;
>   goto done;
>   }
>  
> @@ -1178,14 +1170,12 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   fprintf_ln(stderr, _("Cannot save "
>"the untracked files"));
>   ret = -1;
> - *stash_msg = NULL;
>   goto done;
>   }
>   untracked_commit_option = 1;
>   }
>   if (patch_mode) {
>   ret = stash_patch(info, ps, patch, quiet);
> - *stash_msg = NULL;
>   if (ret < 0) {
>   if (!quiet)
>   fprintf_ln(stderr, _("Cannot save the current "
> @@ -1200,7 +1190,6 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   fprintf_ln(stderr, _("Cannot save the current "
>"worktree state"));
>   ret = -1;
> - *stash_msg = NULL;
>   goto done;
>   }
>   }
> @@ -1210,7 +1199,7 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   else
>   strbuf_addf(_msg_buf, "On %s: %s", branch_name,
>   *stash_msg);
> - *stash_msg = strbuf_detach(_msg_buf, NULL);
> + *stash_msg = xstrdup(stash_msg_buf.buf);
>  
>   /*
>* `parents` will be empty after calling `commit_tree()`, so there is
> @@ -1244,30 +1233,23 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>  
>  static int create_stash(int argc, const char **argv, const char *prefix)
>  {
> - int include_untracked = 0;
>   int ret = 0;
>   char *stash_msg = NULL;
>   struct stash_info info;
>   struct pathspec ps;
> - struct option options[] = {
> - OPT_BOOL('u', "include-untracked", _untracked,
> -  N_("include untracked files in stash")),
> - OPT_STRING('m', "message", _msg, N_("message"),
> -  N_("stash message")),
> - OPT_END()
> - };
> + struct strbuf stash_msg_buf = STRBUF_INIT;
>  
> - argc = parse_options(argc, argv, prefix, options,
> -  git_stash_helper_create_usage,
> -  0);
> + /* Starting with argv[1], since argv[0] is "create" */
> + strbuf_join_argv(_msg_buf, argc - 1, ++argv, ' ');
> + stash_msg = stash_msg_buf.buf;

stash_msg is just a pointer to stash_msg_buf.buf here..
>  
>   memset(, 0, sizeof(ps));
> - ret = do_create_stash(ps, _msg, include_untracked, 0, ,
> -   NULL, 0);
> + ret = do_create_stash(ps, _msg, 0, 0, , NULL, 0);
>  
>   if (!ret)
>   printf_ln("%s", oid_to_hex(_commit));
>  
> + strbuf_release(_msg_buf);

We release the strbuf here, which means stash_msg_buf.buf is now
'strbuf_slopbuf', which is a global variable and can't be free'd.  If
stash_msg is not changed within do_create_stash, it is now pointing to
'strbuf_slopbuf', and we try to free that below, which makes git
crash in t3903.44, which breaks bisection.

>   free(stash_msg);
>  
>   /*

I think the following diff fixes memory management, by making
do_push_stash responsible for freeing stash_msg when it's done with
it, while the callers of do_create_stash have to free the parameter

Re: [PATCH v10 08/21] stash: convert apply to builtin

2018-10-15 Thread Thomas Gummerer

On 10/15, Johannes Schindelin wrote:
> Hi Paul,
> 
> On Mon, 15 Oct 2018, Paul-Sebastian Ungureanu wrote:
> 
> > +static void assert_stash_like(struct stash_info *info, const char 
> > *revision)
> > +{
> > +   if (get_oidf(>b_commit, "%s^1", revision) ||
> > +   get_oidf(>w_tree, "%s:", revision) ||
> > +   get_oidf(>b_tree, "%s^1:", revision) ||
> > +   get_oidf(>i_tree, "%s^2:", revision)) {
> > +   error(_("'%s' is not a stash-like commit"), revision);
> > +   free_stash_info(info);
> > +   exit(128);
> 
> Thomas had mentioned earlier that this should probably be a die() (and
> that the `free_stash_info()` should simply be dropped), see
> https://public-inbox.org/git/20180930174848.ge2...@hank.intra.tgummerer.com/

I think the way this is now is fine by me.  Not sure how much we care
about freeing 'info' or not (do we care about leaks when we 'die()'
anyway?), but this is done in the right order now, so we don't print
garbage in the error message anymore, and I'm happy with either this
or replacing all this with 'die()'.  

> > +   }
> > +}
> > +
> > +static int get_stash_info(struct stash_info *info, int argc, const char 
> > **argv)
> > +{
> > +   int ret;
> > +   char *end_of_rev;
> > +   char *expanded_ref;
> > +   const char *revision;
> > +   const char *commit = NULL;
> > +   struct object_id dummy;
> > +   struct strbuf symbolic = STRBUF_INIT;
> > +
> > +   if (argc > 1) {
> > +   int i;
> > +   struct strbuf refs_msg = STRBUF_INIT;
> > +   for (i = 0; i < argc; i++)
> > +   strbuf_addf(_msg, " '%s'", argv[i]);
> 
> Thomas had also mentioned that this should be a `strbuf_join_argv()` call
> now.

Re-reading this we quote the individual args here, which is not
possible with 'strbuf_join_argv()', which I failed to notice when
reading this the other time.  We don't currently quote them, but I
think the quoting may actually be useful.

It would however have been nice if the reason why the suggestion was
rejected would have been written down as a reply to my original
review, to avoid misunderstandings like this :)

> Maybe v10 is an accidental re-send of v9?
> 
> Ciao,
> Dscho
> 
> > +
> > +   fprintf_ln(stderr, _("Too many revisions specified:%s"),
> > +  refs_msg.buf);
> > +   strbuf_release(_msg);
> > +
> > +   return -1;
> > +   }
> > +
> > +   if (argc == 1)
> > +   commit = argv[0];
> > +
> > +   strbuf_init(>revision, 0);
> > +   if (!commit) {
> > +   if (!ref_exists(ref_stash)) {
> > +   free_stash_info(info);
> > +   fprintf_ln(stderr, _("No stash entries found."));
> > +   return -1;
> > +   }
> > +
> > +   strbuf_addf(>revision, "%s@{0}", ref_stash);
> > +   } else if (strspn(commit, "0123456789") == strlen(commit)) {
> > +   strbuf_addf(>revision, "%s@{%s}", ref_stash, commit);
> > +   } else {
> > +   strbuf_addstr(>revision, commit);
> > +   }
> > +
> > +   revision = info->revision.buf;
> > +
> > +   if (get_oid(revision, >w_commit)) {
> > +   error(_("%s is not a valid reference"), revision);
> > +   free_stash_info(info);
> > +   return -1;
> > +   }
> > +
> > +   assert_stash_like(info, revision);
> > +
> > +   info->has_u = !get_oidf(>u_tree, "%s^3:", revision);
> > +
> > +   end_of_rev = strchrnul(revision, '@');
> > +   strbuf_add(, revision, end_of_rev - revision);
> > +
> > +   ret = dwim_ref(symbolic.buf, symbolic.len, , _ref);
> > +   strbuf_release();
> > +   switch (ret) {
> > +   case 0: /* Not found, but valid ref */
> > +   info->is_stash_ref = 0;
> > +   break;
> > +   case 1:
> > +   info->is_stash_ref = !strcmp(expanded_ref, ref_stash);
> > +   break;
> > +   default: /* Invalid or ambiguous */
> > +   free_stash_info(info);
> > +   }
> > +
> > +   free(expanded_ref);
> > +   return !(ret == 0 || ret == 1);
> > +}
> > +
> > +static int reset_tree(struct object_id *i_tree, int update, int reset)
> > +{
> > +   int nr_trees = 1;
> > +   struct unpack_trees_options opts;
> > +   struct tree_desc t[MAX_UNPACK_TREES];
> > +   struct tree *tree;
> > +   struct lock_file lock_file = LOCK_INIT;
> > +
> > +   read_cache_preload(NULL);
> > +   if (refresh_cache(REFRESH_QUIET))
> > +   return -1;
> > +
> > +   hold_locked_index(_file, LOCK_DIE_ON_ERROR);
> > +
> > +   memset(, 0, sizeof(opts));
> > +
> > +   tree = parse_tree_indirect(i_tree);
> > +   if (parse_tree(tree))
> > +   return -1;
> > +
> > +   init_tree_desc(t, tree->buffer, tree->size);
> > +
> > +   opts.head_idx = 1;
> > +   opts.src_index = _index;
> > +   opts.dst_index = _index;
> > +   opts.merge = 1;
> > +   opts.reset = reset;
> > +   opts.update = update;
> > +   opts.fn = oneway_merge;
> > +
> > +   if (unpack_trees(nr_trees, t, ))
> > +   return -1;
> > +
> > +   if (write_locked_index(_index, _file,

Re: Does git load index file into memory?

2018-10-14 Thread Thomas Gummerer

On 10/12, Farhan Khan wrote:
> Hi all,
> 
> Does git load the entire index file into memory when it wants to
> edit/view it? I ask because I wonder if this can become a problem with
> the index file becomes arbitrarily large, like for the Linux kernel.

Yes, currently git always loads the entire index file for any
operation on it.  This is not particularly a problem for projects like
the linux kernel, as the index file for it is relatively small, ~6MB
at the moment.

It is more of a problem for larger repositories, such as the windows
repository, which has and index file of ~300-400MB, iirc, where
loading the index has a significant cost.  There's some patch series
in progress to improve the performance, e.g. Ben Pearts series to load
the index in parallel [*1*].

For writing the index to disk again, the split index feature can help
improve performance.  See also 'man git-update-index' and
"core.splitIndex" in 'man git-config'.

[*1*]: https://public-inbox.org/git/20181010155938.20996-1-peart...@gmail.com/

> Thanks,
> --
> Farhan Khan
> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE

Re: [PATCH] config.mak.dev: add -Wformat

2018-10-12 Thread Thomas Gummerer

On 10/12, Jonathan Nieder wrote:
> Jeff King wrote:
> > On Fri, Oct 12, 2018 at 07:40:37PM +0100, Thomas Gummerer wrote:
> 
> >> 801fa63a90 ("config.mak.dev: add -Wformat-security", 2018-09-08) added
> >> the -Wformat-security to the flags set in config.mak.dev.  In the gcc
> >> man page this is documented as:
> >>
> >>  If -Wformat is specified, also warn about uses of format
> >>  functions that represent possible security problems.  [...]
> >>
> >> That commit did however not add the -Wformat flag, and -Wformat is not
> >> specified anywhere else by default, so the added -Wformat-security had
> >> no effect.  Newer versions of gcc (gcc 8.2.1 in this particular case)
> >> warn about this and thus compilation fails with this option set.
> [...]
> > -Wformat is part of -Wall, which we already turn on by default (even for
> > non-developer builds).
> >
> > So I don't think we need to do anything more, though I'm puzzled that
> > you saw a failure. Do you set CFLAGS explicitly in your config.mak to
> > something that doesn't include -Wall?

Whoops embarrassing.  I had this set in my config.mak:

CFLAGS = -O$(O) -g $(EXTRA_CFLAGS)

What happened is that I had included -Wall in an old config.mak that I
copied from Thomas Rast when I started with my GSoC project.  Then
when "DEVELOPER=1" came around I switched to that at some point and
just removed everything from CFLAGS, except the possibility to
override the optimization level, the ability to add extra flags and
including debug symbols, but failed to notice that I had lost -Wall.

Maybe it would still be a good to add -Wall to avoid the surprise for
others.  But then again if someone overrides CFLAGS they should at
least check better what they're overriding ;)

> Thomas, do you use autoconf to generate config.mak.autogen?  I'm
> wondering if that produces a CFLAGS that doesn't include -Wall.

No, this was all my mistake :)

> > I'm not opposed to making config.mak.dev a bit more redundant to handle
> > this case, but we'd probably want to include all of -Wall, since it
> > contains many other warnings we'd want to make sure are enabled.
> 
> Do you mean putting -Wall instead of -Wformat?
> 
> Should we add -Wextra too?  From a quick test, it seems to build okay.

We do have that with setting DEVELOPER=extra-all.

> Thanks,
> Jonathan

[PATCH] config.mak.dev: add -Wformat

2018-10-12 Thread Thomas Gummerer

801fa63a90 ("config.mak.dev: add -Wformat-security", 2018-09-08) added
the -Wformat-security to the flags set in config.mak.dev.  In the gcc
man page this is documented as:

 If -Wformat is specified, also warn about uses of format
 functions that represent possible security problems.  [...]

That commit did however not add the -Wformat flag, and -Wformat is not
specified anywhere else by default, so the added -Wformat-security had
no effect.  Newer versions of gcc (gcc 8.2.1 in this particular case)
warn about this and thus compilation fails with this option set.

Fix that, and make -Wformat-security actually useful by adding the
-Wformat flag as well.  git compiles cleanly with both these flags
applied.

Signed-off-by: Thomas Gummerer 
---

Sorry for not catching this before the patch made it to next.  

 config.mak.dev | 1 +
 1 file changed, 1 insertion(+)

diff --git a/config.mak.dev b/config.mak.dev
index 92d268137f..bf6f943452 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -7,6 +7,7 @@ CFLAGS += -pedantic
 CFLAGS += -DUSE_PARENS_AROUND_GETTEXT_N=0
 endif
 CFLAGS += -Wdeclaration-after-statement
+CFLAGS += -Wformat
 CFLAGS += -Wformat-security
 CFLAGS += -Wno-format-zero-length
 CFLAGS += -Wold-style-definition
-- 
2.19.1.937.g12227c8702.dirty

Re: What's cooking in git.git (Oct 2018, #01; Wed, 10)

2018-10-10 Thread Thomas Gummerer

On 10/10, Junio C Hamano wrote:
> * ps/stash-in-c (2018-08-31) 20 commits
>  - stash: replace all `write-tree` child processes with API calls
>  - stash: optimize `get_untracked_files()` and `check_changes()`
>  - stash: convert `stash--helper.c` into `stash.c`
>  - stash: convert save to builtin
>  - stash: make push -q quiet
>  - stash: convert push to builtin
>  - stash: convert create to builtin
>  - stash: convert store to builtin
>  - stash: mention options in `show` synopsis
>  - stash: convert show to builtin
>  - stash: convert list to builtin
>  - stash: convert pop to builtin
>  - stash: convert branch to builtin
>  - stash: convert drop and clear to builtin
>  - stash: convert apply to builtin
>  - stash: add tests for `git stash show` config
>  - stash: rename test cases to be more descriptive
>  - stash: update test cases conform to coding guidelines
>  - stash: improve option parsing test coverage
>  - sha1-name.c: add `get_oidf()` which acts like `get_oid()`
> 
>  "git stash" rewritten in C.
> 
>  Undecided.  This also has been part of my personal build.  I do not
>  offhand recall if this also had the same exposure to the end users
>  as "rebase" and "rebase -i".  I am tempted to merge this to 'next'
>  soonish.
> 
>  Opinions?

There was a v9 of this series [*1*], which hasn't been picked up yet.
Was that intentional, or an oversight?

I left some comments on that iteration.  Some were just style nits,
but I think at least [*2*] should be addressed before we merge this
down to master, not sure if any of my other comments apply to v8 as
well.  I'm happy to send fixup patches, or a patches on top of
this series for that and my other comments, should they apply to v8,
or wait for Paul-Sebastian to send a re-roll.  What do you prefer?

[*1*]: 
[*2*]: <20180930174848.ge2...@hank.intra.tgummerer.com>

Re: [PATCH v9 19/21] stash: convert `stash--helper.c` into `stash.c`

2018-10-02 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> The old shell script `git-stash.sh`  was removed and replaced
> entirely by `builtin/stash.c`. In order to do that, `create` and
> `push` were adapted to work without `stash.sh`. For example, before
> this commit, `git stash create` called `git stash--helper create
> --message "$*"`. If it called `git stash--helper create "$@"`, then
> some of these changes wouldn't have been necessary.
> 
> This commit also removes the word `helper` since now stash is
> called directly and not by a shell script.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  .gitignore   |   1 -
>  Makefile |   3 +-
>  builtin.h|   2 +-
>  builtin/{stash--helper.c => stash.c} | 162 ---
>  git-stash.sh | 153 -
>  git.c|   2 +-
>  6 files changed, 98 insertions(+), 225 deletions(-)
>  rename builtin/{stash--helper.c => stash.c} (90%)
>  delete mode 100755 git-stash.sh
>
> [...]
>
> @@ -1571,7 +1562,44 @@ int cmd_stash__helper(int argc, const char **argv, 
> const char *prefix)
>   return !!push_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "save"))
>   return !!save_stash(argc, argv, prefix);
> + else if (*argv[0] != '-')
> + usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> +   git_stash_usage, options);
> +
> + if (strcmp(argv[0], "-p")) {
> + while (++i < argc && strcmp(argv[i], "--")) {
> + /*
> +  * `akpqu` is a string which contains all short options,
> +  * except `-m` which is verified separately.
> +  */
> + if ((strlen(argv[i]) == 2) && *argv[i] == '-' &&
> + strchr("akpqu", argv[i][1]))
> + continue;
> +
> + if (!strcmp(argv[i], "--all") ||
> + !strcmp(argv[i], "--keep-index") ||
> + !strcmp(argv[i], "--no-keep-index") ||
> + !strcmp(argv[i], "--patch") ||
> + !strcmp(argv[i], "--quiet") ||
> + !strcmp(argv[i], "--include-untracked"))
> + continue;
> +
> + /*
> +  * `-m` and `--message=` are verified separately because
> +  * they need to be immediately followed by a string
> +  * (i.e.`-m"foobar"` or `--message="foobar"`).
> +  */
> + if ((strlen(argv[i]) > 2 &&
> +  !strncmp(argv[i], "-m", 2)) ||
> + (strlen(argv[i]) > 10 &&
> +  !strncmp(argv[i], "--message=", 10)))

These 'strlen && !strncmp' calls could be replaced with
'starts_with()'.

> + continue;
> +
> + usage_with_options(git_stash_usage, options);
> + }
> + }

This is a bit more complex than what we used to have, which was just
"if it starts with a "-" it's an option", but I don't think it hurts
being more explicit here either.

>  
> - usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> -   git_stash_helper_usage, options);
> + argv_array_push(, "push");
> + argv_array_pushv(, argv);
> + return !!push_stash(args.argc, args.argv, prefix);
>  }

Re: [PATCH v9 16/21] stash: convert push to builtin

2018-10-02 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> Add stash push to the helper.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 244 +++-
>  git-stash.sh|   6 +-
>  2 files changed, 244 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 49b05f2458..d79233d7ec 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -23,6 +23,9 @@ static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
>   N_("git stash--helper clear"),
> + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] [-m|--message 
> ]\n"
> +"  [--] [...]]"),
>   NULL
>  };
>  
> @@ -71,6 +74,13 @@ static const char * const git_stash_helper_create_usage[] 
> = {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_push_usage[] = {
> + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] [-m|--message 
> ]\n"
> +"  [--] [...]]"),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static struct strbuf stash_index_path = STRBUF_INIT;
>  
> @@ -1088,7 +1098,7 @@ static int stash_working_tree(struct stash_info *info, 
> struct pathspec ps)
>  
>  static int do_create_stash(struct pathspec ps, char **stash_msg,
>  int include_untracked, int patch_mode,
> -struct stash_info *info)
> +struct stash_info *info, struct strbuf *patch)
>  {
>   int ret = 0;
>   int flags = 0;
> @@ -1102,7 +1112,6 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   struct strbuf commit_tree_label = STRBUF_INIT;
>   struct strbuf untracked_files = STRBUF_INIT;
>   struct strbuf stash_msg_buf = STRBUF_INIT;
> - struct strbuf patch = STRBUF_INIT;
>  
>   read_cache_preload(NULL);
>   refresh_cache(REFRESH_QUIET);
> @@ -1152,7 +1161,7 @@ static int do_create_stash(struct pathspec ps, char 
> **stash_msg,
>   untracked_commit_option = 1;
>   }
>   if (patch_mode) {
> - ret = stash_patch(info, ps, );
> + ret = stash_patch(info, ps, patch);
>   *stash_msg = NULL;
>   if (ret < 0) {
>   fprintf_ln(stderr, _("Cannot save the current worktree 
> state"));
> @@ -1221,7 +1230,8 @@ static int create_stash(int argc, const char **argv, 
> const char *prefix)
>0);
>  
>   memset(, 0, sizeof(ps));
> - ret = do_create_stash(ps, _msg, include_untracked, 0, );
> + ret = do_create_stash(ps, _msg, include_untracked, 0, ,
> +   NULL);
>  
>   if (!ret)
>   printf_ln("%s", oid_to_hex(_commit));
> @@ -1234,6 +1244,230 @@ static int create_stash(int argc, const char **argv, 
> const char *prefix)
>   return ret < 0;
>  }
>  
> +static int do_push_stash(struct pathspec ps, char *stash_msg, int quiet,
> +  int keep_index, int patch_mode, int include_untracked)
> +{
> + int ret = 0;
> + struct stash_info info;
> + struct strbuf patch = STRBUF_INIT;
> +
> + if (patch_mode && keep_index == -1)
> + keep_index = 1;
> +
> + if (patch_mode && include_untracked) {
> + fprintf_ln(stderr, _("Can't use --patch and --include-untracked"
> +  " or --all at the same time"));
> + ret = -1;

We should set "stash_msg" to NULL here, otherwise we'll get an invalid
free if these options are used together and a message is given.

In general I find this API and the do_create_stash API a bit
cumbersome to use.  Maybe it would help if we'd 'xstrdup' the string
before passing it in, so we can free it unconditionally, without
setting it to NULL everywhere.  As it's only a single strdup for each
command that's run that isn't very costly compared to what else we're
doing here, and I think the readability we're gaining would be worth
it.

> + goto done;
> + }
> +
> + read_cache_preload(NULL);
> + if (!include_untracked && ps.nr) {
> + int i;
> + char *ps_matched = xcalloc(ps.nr, 1);
> +
> + for (i = 0; i < active_nr; i++)
> + ce_path_match(_index, active_cache[i], ,
> +   ps_matched);
> +
> + if (report_path_error(ps_matched, , NULL)) {
> + fprintf_ln(stderr, _("Did you forget to 'git add'?"));
> + stash_msg = NULL;
> + ret = -1;

'ps_matched' is not being free'd in this error case.

> + goto done;
> +

Re: [PATCH v9 15/21] stash: convert create to builtin

2018-10-02 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> Add stash create to the helper.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 450 
>  git-stash.sh|   2 +-
>  2 files changed, 451 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index b7421b68aa..49b05f2458 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -12,6 +12,9 @@
>  #include "rerere.h"
>  #include "revision.h"
>  #include "log-tree.h"
> +#include "diffcore.h"
> +
> +#define INCLUDE_ALL_FILES 2
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> @@ -63,6 +66,11 @@ static const char * const git_stash_helper_store_usage[] = 
> {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_create_usage[] = {
> + N_("git stash--helper create []"),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static struct strbuf stash_index_path = STRBUF_INIT;
>  
> @@ -289,6 +297,24 @@ static int reset_head(void)
>   return run_command();
>  }
>  
> +static void add_diff_to_buf(struct diff_queue_struct *q,
> + struct diff_options *options,
> + void *data)
> +{
> + int i;
> +
> + for (i = 0; i < q->nr; i++) {
> + strbuf_addstr(data, q->queue[i]->one->path);
> +
> + /*
> +  * The reason we add "0" at the end of this strbuf
> +  * is because we will pass the output further to
> +  * "git update-index -z ...".
> +  */
> + strbuf_addch(data, 0);

I'd find it slightly clearer to pass '\0' instead of 0 here.  That
makes it immediately clear to the reader that we mean the NUL
character (even though it all ends up being the same in the end).

> + }
> +}
> +
>  static int get_newly_staged(struct strbuf *out, struct object_id *c_tree)
>  {
>   struct child_process cp = CHILD_PROCESS_INIT;
> @@ -786,6 +812,428 @@ static int store_stash(int argc, const char **argv, 
> const char *prefix)
>   return do_store_stash(, stash_msg, quiet);
>  }
>  
> +static void add_pathspecs(struct argv_array *args,
> +struct pathspec ps) {

This indentation looks a bit weird.

> + int i;
> +
> + for (i = 0; i < ps.nr; i++)
> + argv_array_push(args, ps.items[i].match);
> +}
> +
> +/*
> + * `untracked_files` will be filled with the names of untracked files.
> + * The return value is:
> + *
> + * = 0 if there are not any untracked files
> + * > 0 if there are untracked files
> + */
> +static int get_untracked_files(struct pathspec ps, int include_untracked,
> +struct strbuf *untracked_files)
> +{
> + int i;
> + int max_len;
> + int found = 0;
> + char *seen;
> + struct dir_struct dir;
> +
> + memset(, 0, sizeof(dir));
> + if (include_untracked != INCLUDE_ALL_FILES)
> + setup_standard_excludes();
> +
> + seen = xcalloc(ps.nr, 1);
> +
> + max_len = fill_directory(, the_repository->index, );
> + for (i = 0; i < dir.nr; i++) {
> + struct dir_entry *ent = dir.entries[i];
> + if (dir_path_match(_index, ent, , max_len, seen)) {
> + found++;
> + strbuf_addstr(untracked_files, ent->name);
> + /* NUL-terminate: will be fed to update-index -z */
> + strbuf_addch(untracked_files, 0);
> + }
> + free(ent);
> + }
> +
> + free(seen);
> + free(dir.entries);
> + free(dir.ignored);
> + clear_directory();
> + return found;
> +}
> +
> +/*
> + * The return value of `check_changes()` can be:
> + *
> + * < 0 if there was an error
> + * = 0 if there are no changes.
> + * > 0 if there are changes.
> + */
> +static int check_changes(struct pathspec ps, int include_untracked)
> +{
> + int result;
> + int ret = 0;

This variable is unused, so compilation with -Werror breaks at this
patch.

> + struct rev_info rev;
> + struct object_id dummy;
> + struct strbuf out = STRBUF_INIT;
> +
> + /* No initial commit. */
> + if (get_oid("HEAD", ))
> + return -1;
> +
> + if (read_cache() < 0)
> + return -1;
> +
> + init_revisions(, NULL);
> + rev.prune_data = ps;
> +
> + rev.diffopt.flags.quick = 1;
> + rev.diffopt.flags.ignore_submodules = 1;
> + rev.abbrev = 0;
> +
> + add_head_to_pending();
> + diff_setup_done();
> +
> + result = run_diff_index(, 1);
> + if (diff_result_code(, result))
> + return 1;
> +
> + object_array_clear();
> + result = run_diff_files(, 0);
> + if (diff_result_code(, result))
> + return 1;
> +
> + if (include_untracked && get_untracked_files(ps, include_untracked,
> +

Re: [PATCH v9 13/21] stash: mention options in `show` synopsis.

2018-10-02 Thread Thomas Gummerer

> Subject: stash: mention options in `show` synopsis.

Really minor point, but the '.' in the end should be dropped.

Also as this is fixing a pre-existing problem I would have put this
patch near the beginning of the series, rather than in between
conversions of functions, and just incorporated the bits that changed
in the helper into the patch that converts 'git stash show'.

The rest of the patch looks good to me.

On 09/26, Paul-Sebastian Ungureanu wrote:
> Mention in the usage text and in the documentation, that `show`
> accepts any option known to `git diff`.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  Documentation/git-stash.txt | 4 ++--
>  builtin/stash--helper.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 7ef8c47911..e31ea7d303 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -9,7 +9,7 @@ SYNOPSIS
>  
>  [verse]
>  'git stash' list []
> -'git stash' show []
> +'git stash' show [] []
>  'git stash' drop [-q|--quiet] []
>  'git stash' ( pop | apply ) [--index] [-q|--quiet] []
>  'git stash' branch  []
> @@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
>  The command takes options applicable to the 'git log'
>  command to control what is shown and how. See linkgit:git-log[1].
>  
> -show []::
> +show [] []::
>  
>   Show the changes recorded in the stash entry as a diff between the
>   stashed contents and the commit back when the stash entry was first
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 1bc838ee6b..1f02f5f2e9 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -15,7 +15,7 @@
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> - N_("git stash--helper show []"),
> + N_("git stash--helper show [] []"),
>   N_("git stash--helper drop [-q|--quiet] []"),
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
> @@ -29,7 +29,7 @@ static const char * const git_stash_helper_list_usage[] = {
>  };
>  
>  static const char * const git_stash_helper_show_usage[] = {
> - N_("git stash--helper show []"),
> + N_("git stash--helper show [] []"),
>   NULL
>  };
>  
> -- 
> 2.19.0.rc0.23.g1fb9f40d88
>

Re: [PATCH v9 09/21] stash: convert branch to builtin

2018-09-30 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> From: Joel Teichroeb 
> 
> Add stash branch to the helper and delete the apply_to_branch
> function from the shell script.
> 
> Checkout does not currently provide a function for checking out
> a branch as cmd_checkout does a large amount of sanity checks
> first that we require here.
> 
> Signed-off-by: Joel Teichroeb 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 46 +
>  git-stash.sh| 17 ++-
>  2 files changed, 48 insertions(+), 15 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 72472eaeb7..5841bd0e98 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -14,6 +14,7 @@
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper drop [-q|--quiet] []"),
>   N_("git stash--helper apply [--index] [-q|--quiet] []"),
> + N_("git stash--helper branch  []"),
>   N_("git stash--helper clear"),
>   NULL
>  };
> @@ -28,6 +29,11 @@ static const char * const git_stash_helper_apply_usage[] = 
> {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_branch_usage[] = {
> + N_("git stash--helper branch  []"),
> + NULL
> +};
> +
>  static const char * const git_stash_helper_clear_usage[] = {
>   N_("git stash--helper clear"),
>   NULL
> @@ -536,6 +542,44 @@ static int drop_stash(int argc, const char **argv, const 
> char *prefix)
>   return ret;
>  }
>  
> +static int branch_stash(int argc, const char **argv, const char *prefix)
> +{
> + int ret;
> + const char *branch = NULL;
> + struct stash_info info;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct option options[] = {
> + OPT_END()
> + };
> +
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_helper_branch_usage, 0);
> +
> + if (!argc) {
> + fprintf_ln(stderr, "No branch name specified");

This should be marked for translation.

> + return -1;
> + }
> +
> + branch = argv[0];
> +
> + if (get_stash_info(, argc - 1, argv + 1))
> + return -1;
> +
> + cp.git_cmd = 1;
> + argv_array_pushl(, "checkout", "-b", NULL);
> + argv_array_push(, branch);
> + argv_array_push(, oid_to_hex(_commit));
> + ret = run_command();
> + if (!ret)
> + ret = do_apply_stash(prefix, , 1, 0);
> + if (!ret && info.is_stash_ref)
> + ret = do_drop_stash(prefix, , 0);
> +
> + free_stash_info();
> +
> + return ret;
> +}
> +
>  int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
> @@ -562,6 +606,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!clear_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "drop"))
>   return !!drop_stash(argc, argv, prefix);
> + else if (!strcmp(argv[0], "branch"))
> + return !!branch_stash(argc, argv, prefix);
>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> git_stash_helper_usage, options);
> diff --git a/git-stash.sh b/git-stash.sh
> index a99d5dc9e5..29d9f44255 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -598,20 +598,6 @@ drop_stash () {
>   clear_stash
>  }
>  
> -apply_to_branch () {
> - test -n "$1" || die "$(gettext "No branch name specified")"
> - branch=$1
> - shift 1
> -
> - set -- --index "$@"
> - assert_stash_like "$@"
> -
> - git checkout -b $branch $REV^ &&
> - apply_stash "$@" && {
> - test -z "$IS_STASH_REF" || drop_stash "$@"
> - }
> -}
> -
>  test "$1" = "-p" && set "push" "$@"
>  
>  PARSE_CACHE='--not-parsed'
> @@ -673,7 +659,8 @@ pop)
>   ;;
>  branch)
>   shift
> - apply_to_branch "$@"
> + cd "$START_DIR"
> + git stash--helper branch "$@"
>   ;;
>  *)
>   case $# in
> -- 
> 2.19.0.rc0.23.g1fb9f40d88
>

Re: [PATCH v9 07/21] stash: convert apply to builtin

2018-09-30 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> From: Joel Teichroeb 
> 
> Add a builtin helper for performing stash commands. Converting
> all at once proved hard to review, so starting with just apply
> lets conversion get started without the other commands being
> finished.
> 
> The helper is being implemented as a drop in replacement for
> stash so that when it is complete it can simply be renamed and
> the shell script deleted.
> 
> Delete the contents of the apply_stash shell function and replace
> it with a call to stash--helper apply until pop is also
> converted.
> 
> Signed-off-by: Joel Teichroeb 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  .gitignore  |   1 +
>  Makefile|   1 +
>  builtin.h   |   1 +
>  builtin/stash--helper.c | 452 
>  git-stash.sh|  78 +--
>  git.c   |   1 +
>  6 files changed, 463 insertions(+), 71 deletions(-)
>  create mode 100644 builtin/stash--helper.c
> 
> diff --git a/.gitignore b/.gitignore
> index ffceea7d59..b59661cb88 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -157,6 +157,7 @@
>  /git-show-ref
>  /git-stage
>  /git-stash
> +/git-stash--helper
>  /git-status
>  /git-stripspace
>  /git-submodule
> diff --git a/Makefile b/Makefile
> index d03df31c2a..f900c68e69 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1093,6 +1093,7 @@ BUILTIN_OBJS += builtin/shortlog.o
>  BUILTIN_OBJS += builtin/show-branch.o
>  BUILTIN_OBJS += builtin/show-index.o
>  BUILTIN_OBJS += builtin/show-ref.o
> +BUILTIN_OBJS += builtin/stash--helper.o
>  BUILTIN_OBJS += builtin/stripspace.o
>  BUILTIN_OBJS += builtin/submodule--helper.o
>  BUILTIN_OBJS += builtin/symbolic-ref.o
> diff --git a/builtin.h b/builtin.h
> index 99206df4bd..317bc338f7 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -223,6 +223,7 @@ extern int cmd_show(int argc, const char **argv, const 
> char *prefix);
>  extern int cmd_show_branch(int argc, const char **argv, const char *prefix);
>  extern int cmd_show_index(int argc, const char **argv, const char *prefix);
>  extern int cmd_status(int argc, const char **argv, const char *prefix);
> +extern int cmd_stash__helper(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_stripspace(int argc, const char **argv, const char *prefix);
>  extern int cmd_submodule__helper(int argc, const char **argv, const char 
> *prefix);
>  extern int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> new file mode 100644
> index 00..7819dae332
> --- /dev/null
> +++ b/builtin/stash--helper.c
> @@ -0,0 +1,452 @@
> +#include "builtin.h"
> +#include "config.h"
> +#include "parse-options.h"
> +#include "refs.h"
> +#include "lockfile.h"
> +#include "cache-tree.h"
> +#include "unpack-trees.h"
> +#include "merge-recursive.h"
> +#include "argv-array.h"
> +#include "run-command.h"
> +#include "dir.h"
> +#include "rerere.h"
> +
> +static const char * const git_stash_helper_usage[] = {
> + N_("git stash--helper apply [--index] [-q|--quiet] []"),
> + NULL
> +};
> +
> +static const char * const git_stash_helper_apply_usage[] = {
> + N_("git stash--helper apply [--index] [-q|--quiet] []"),
> + NULL
> +};
> +
> +static const char *ref_stash = "refs/stash";
> +static struct strbuf stash_index_path = STRBUF_INIT;
> +
> +/*
> + * w_commit is set to the commit containing the working tree
> + * b_commit is set to the base commit
> + * i_commit is set to the commit containing the index tree
> + * u_commit is set to the commit containing the untracked files tree
> + * w_tree is set to the working tree
> + * b_tree is set to the base tree
> + * i_tree is set to the index tree
> + * u_tree is set to the untracked files tree
> + */
> +
> +struct stash_info {
> + struct object_id w_commit;
> + struct object_id b_commit;
> + struct object_id i_commit;
> + struct object_id u_commit;
> + struct object_id w_tree;
> + struct object_id b_tree;
> + struct object_id i_tree;
> + struct object_id u_tree;
> + struct strbuf revision;
> + int is_stash_ref;
> + int has_u;
> +};
> +
> +static void free_stash_info(struct stash_info *info)
> +{
> + strbuf_release(>revision);
> +}
> +
> +static void assert_stash_like(struct stash_info *info, const char *revision)
> +{
> + if (get_oidf(>b_commit, "%s^1", revision) ||
> + get_oidf(>w_tree, "%s:", revision) ||
> + get_oidf(>b_tree, "%s^1:", revision) ||
> + get_oidf(>i_tree, "%s^2:", revision)) {
> + free_stash_info(info);
> + error(_("'%s' is not a stash-like commit"), revision);
> + exit(128);

This seems to just emulate 'die()'.  Can we just use that directly?
The only reason I could imagine for not doing that would be to keep
the same exit code, or the exact same message we had in the shell
script.  But we're doing neither here.  The

Re: [PATCH v9 04/21] stash: update test cases conform to coding guidelines

2018-09-30 Thread Thomas Gummerer

> Subject: stash: update test cases conform to coding guidelines

s/stash/t3903/
s/conform/to &/

Alternatively the subject could also be "t3903: modernize style",
which would be a bit shorter, and still convey the same information to
a reader of 'git log --oneline'.

On 09/26, Paul-Sebastian Ungureanu wrote:
> Removed whitespaces after redirection operators.

s/Removed/Remove/.  Commit messages should always use the imperative
mood, as described in Documentation/SubmittingPatches.

> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  t/t3903-stash.sh | 120 ---
>  1 file changed, 61 insertions(+), 59 deletions(-)
> 
> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index af7586d43d..de6cab1fe7 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -8,22 +8,22 @@ test_description='Test git stash'
>  . ./test-lib.sh
>  
>  test_expect_success 'stash some dirty working directory' '
> - echo 1 > file &&
> + echo 1 >file &&
>   git add file &&
>   echo unrelated >other-file &&
>   git add other-file &&
>   test_tick &&
>   git commit -m initial &&
> - echo 2 > file &&
> + echo 2 >file &&
>   git add file &&
> - echo 3 > file &&
> + echo 3 >file &&
>   test_tick &&
>   git stash &&
>   git diff-files --quiet &&
>   git diff-index --cached --quiet HEAD
>  '
>  
> -cat > expect << EOF
> +cat >expect <  diff --git a/file b/file
>  index 0cfbf08..00750ed 100644
>  --- a/file
> @@ -35,7 +35,7 @@ EOF
>  
>  test_expect_success 'parents of stash' '
>   test $(git rev-parse stash^) = $(git rev-parse HEAD) &&
> - git diff stash^2..stash > output &&
> + git diff stash^2..stash >output &&
>   test_cmp output expect
>  '
>  
> @@ -74,7 +74,7 @@ test_expect_success 'apply stashed changes' '
>  
>  test_expect_success 'apply stashed changes (including index)' '
>   git reset --hard HEAD^ &&
> - echo 6 > other-file &&
> + echo 6 >other-file &&
>   git add other-file &&
>   test_tick &&
>   git commit -m other-file &&
> @@ -99,12 +99,12 @@ test_expect_success 'stash drop complains of extra 
> options' '
>  
>  test_expect_success 'drop top stash' '
>   git reset --hard &&
> - git stash list > stashlist1 &&
> - echo 7 > file &&
> + git stash list >expected &&
> + echo 7 >file &&
>   git stash &&
>   git stash drop &&
> - git stash list > stashlist2 &&
> - test_cmp stashlist1 stashlist2 &&
> + git stash list >actual &&
> + test_cmp expected actual &&
>   git stash apply &&
>   test 3 = $(cat file) &&
>   test 1 = $(git show :file) &&
> @@ -113,9 +113,9 @@ test_expect_success 'drop top stash' '
>  
>  test_expect_success 'drop middle stash' '
>   git reset --hard &&
> - echo 8 > file &&
> + echo 8 >file &&
>   git stash &&
> - echo 9 > file &&
> + echo 9 >file &&
>   git stash &&
>   git stash drop stash@{1} &&
>   test 2 = $(git stash list | wc -l) &&
> @@ -160,7 +160,7 @@ test_expect_success 'stash pop' '
>   test 0 = $(git stash list | wc -l)
>  '
>  
> -cat > expect << EOF
> +cat >expect <  diff --git a/file2 b/file2
>  new file mode 100644
>  index 000..1fe912c
> @@ -170,7 +170,7 @@ index 000..1fe912c
>  +bar2
>  EOF
>  
> -cat > expect1 << EOF
> +cat >expect1 <  diff --git a/file b/file
>  index 257cc56..5716ca5 100644
>  --- a/file
> @@ -180,7 +180,7 @@ index 257cc56..5716ca5 100644
>  +bar
>  EOF
>  
> -cat > expect2 << EOF
> +cat >expect2 <  diff --git a/file b/file
>  index 7601807..5716ca5 100644
>  --- a/file
> @@ -198,79 +198,79 @@ index 000..1fe912c
>  EOF
>  
>  test_expect_success 'stash branch' '
> - echo foo > file &&
> + echo foo >file &&
>   git commit file -m first &&
> - echo bar > file &&
> - echo bar2 > file2 &&
> + echo bar >file &&
> + echo bar2 >file2 &&
>   git add file2 &&
>   git stash &&
> - echo baz > file &&
> + echo baz >file &&
>   git commit file -m second &&
>   git stash branch stashbranch &&
>   test refs/heads/stashbranch = $(git symbolic-ref HEAD) &&
>   test $(git rev-parse HEAD) = $(git rev-parse master^) &&
> - git diff --cached > output &&
> + git diff --cached >output &&
>   test_cmp output expect &&
> - git diff > output &&
> + git diff >output &&
>   test_cmp output expect1 &&
>   git add file &&
>   git commit -m alternate\ second &&
> - git diff master..stashbranch > output &&
> + git diff master..stashbranch >output &&
>   test_cmp output expect2 &&
>   test 0 = $(git stash list | wc -l)
>  '
>  
>  test_expect_success 'apply -q is quiet' '
> - echo foo > file &&
> + echo foo >file &&
>   git stash &&
> - git stash apply -q > output.out 2>&1 &&
> + git stash apply -q >output.out 2>&1 &&
>   test_must_be_empty output.out
>  '
>  
>  test_expect_success 'save -q is quiet' '
> - git

Re: [PATCH v9 02/21] strbuf.c: add `strbuf_join_argv()`

2018-09-30 Thread Thomas Gummerer

On 09/26, Paul-Sebastian Ungureanu wrote:
> Implement `strbuf_join_argv()` to join arguments
> into a strbuf.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  strbuf.c | 15 +++
>  strbuf.h |  7 +++
>  2 files changed, 22 insertions(+)
> 
> diff --git a/strbuf.c b/strbuf.c
> index 64041c3c24..3eb431b2b0 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -259,6 +259,21 @@ void strbuf_addbuf(struct strbuf *sb, const struct 
> strbuf *sb2)
>   strbuf_setlen(sb, sb->len + sb2->len);
>  }
>  
> +const char *strbuf_join_argv(struct strbuf *buf,
> +  int argc, const char **argv, char delim)
> +{
> + if (!argc)
> + return buf->buf;
> +
> + strbuf_addstr(buf, *argv);
> + while (--argc) {
> + strbuf_addch(buf, delim);
> + strbuf_addstr(buf, *(++argv));
> + }
> +
> + return buf->buf;

Why are we returning buf-buf here?  The strbuf is modified by the
function, so the caller can just use buf->buf directly if they want
to.  Is there something I'm missing?

> +}
> +
>  void strbuf_addchars(struct strbuf *sb, int c, size_t n)
>  {
>   strbuf_grow(sb, n);
> diff --git a/strbuf.h b/strbuf.h
> index 60a35aef16..7ed859bb8a 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -284,6 +284,13 @@ static inline void strbuf_addstr(struct strbuf *sb, 
> const char *s)
>   */
>  extern void strbuf_addbuf(struct strbuf *sb, const struct strbuf *sb2);
>  
> +

stray newline? We usually only have one blank line between functions.

> +/**
> + *
> + */

Forgot to write some documentation here? :)

> +extern const char *strbuf_join_argv(struct strbuf *buf, int argc,
> + const char **argv, char delim);
> +
>  /**
>   * This function can be used to expand a format string containing
>   * placeholders. To that end, it parses the string and calls the specified
> -- 
> 2.19.0.rc0.23.g1fb9f40d88
>

Re: Null pointer dereference in rerere.c

2018-09-27 Thread Thomas Gummerer

On 09/27, Ruud van Asseldonk wrote:
> Hi,
> 
> I just ran into a segmentation fault during a rebase with rerere
> enabled. Inspecting the core dump with gdb shows:

Thanks for reporting this bug

> (gdb) bt
> #0  0x55d673375ce0 in do_rerere_one_path (update=0x7fff03c37f30,
> rr_item=0x55d6746d0b30) at rerere.c:755
> #1  do_plain_rerere (fd=3, rr=0x7fff03c37ef0) at rerere.c:853
> #2  rerere (flags=flags@entry=0) at rerere.c:918
> #3  0x55d673246b01 in am_resolve (state=0x7fff03c38120) at 
> builtin/am.c:1901
> #4  cmd_am (argc=, argv=,
> prefix=) at builtin/am.c:2394
> #5  0x55d67323f975 in run_builtin (argv=,
> argc=, p=) at git.c:346
> #6  handle_builtin (argc=, argv=) at git.c:554
> #7  0x55d6732405e5 in run_argv (argv=0x7fff03c394a0,
> argcp=0x7fff03c394ac) at git.c:606
> #8  cmd_main (argc=, argv=) at git.c:683
> #9  0x55d67323f64a in main (argc=4, argv=0x7fff03c396f8) at 
> common-main.c:43
> (gdb) info locals
> path = 0x55d6746d08e0 ""
> id = 0x55d6746d01e0
> rr_dir = 0x55d6746ccb80
> variant = 
> path = 
> id = 
> rr_dir = 
> variant = 
> both = 
> vid = 
> path = 
> (gdb) print id
> $1 = (struct rerere_id *) 0x55d6746d01e0
> (gdb) print id->collection
> $2 = (struct rerere_dir *) 0x55d6746ccb80
> (gdb) print id->collection->status
> $3 = (unsigned char *) 0x0
> 
> This is using Git 2.17.1 from the 1:2.17.1-1ubuntu0.1 Ubuntu package.
> Looking at the diff between v2.17.1 and master for rerere.c it looks
> like the part of the rerere.c where the null pointer dereference
> happens has not been touched, so the issue might still be there.
> Unfortunately I was unable to reproduce the bug; after removing
> .git/MERGE_RR.lock and restarting the rebase, it completed fine.

I do believe this bug may actually be fixed in master, by 93406a282f
("rerere: fix crash with files rerere can't handle", 2018-08-05).  Do
you by any chance remember if you committed a file that contained
conflict markers during the rebase at some point?

The problem I found at the time looked the same as your backtrace
above in any case.

Would have been nice if you were able to reproduce it, just to make
sure it's not something else we're seeing here.

> Please let me know if there is anything I can do to help diagnose the
> problem, or whether I should report the bug to Ubuntu instead.
> 
> Kind regards,
> 
> Ruud van Asseldonk

Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Thomas Gummerer

On 09/26, Derrick Stolee wrote:
> This is a bit tricky to do, but I will investigate. For some things, the
> values can conflict with each other (GIT_TEST_SPLIT_INDEX doesn't play
> nicely with other index options, I think).

Just commenting on this point.  I think all the index options should
be playing nicely with eachother.  I occasionally run the test suite
with some of them turned on, and if something failed that was always
an actual bug.  The different modes can be used in different
combinations in the wild as well, so we should get them to interact
nicely in the test suite.

[PATCH v2 2/2] t5551: compare sorted cookies files

2018-09-17 Thread Thomas Gummerer

In t5551 we check that we save cookies correctly to a file when
http.cookiefile and http.savecookies are set.  To do so we create an
expect file that expects the cookies in a certain order.

However after e2ef8d6fa ("cookies: support creation-time attribute for
cookies", 2018-08-28) in curl.git (released in curl 7.61.1) that order
changed.

We document the file format as "Netscape/Mozilla cookie file
format (see curl(1))", so any format produced by libcurl should be
fine here.  Sort the files, to be agnostic to the order of the
cookies, and make the test pass with both curl versions > 7.61.1 and
earlier curl versions.

Reported-by: Todd Zullinger 
Helped-by: Jonathan Nieder 
Signed-off-by: Thomas Gummerer 
---
 t/t5551-http-fetch-smart.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 71535631d3..3dc8f8ecec 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -207,7 +207,7 @@ test_expect_success 'cookies stored in http.cookiefile when 
http.savecookies set
cat >cookies.txt <<-\EOF &&
127.0.0.1   FALSE   /smart_cookies/ FALSE   0   othername   
othervalue
EOF
-   cat >expect_cookies.txt <<-\EOF &&
+   sort >expect_cookies.txt <<-\EOF &&
 
127.0.0.1   FALSE   /smart_cookies/ FALSE   0   othername   
othervalue
127.0.0.1   FALSE   /smart_cookies/repo.git/info/   FALSE   0   
namevalue
@@ -215,7 +215,7 @@ test_expect_success 'cookies stored in http.cookiefile when 
http.savecookies set
git config http.cookiefile cookies.txt &&
git config http.savecookies true &&
git ls-remote $HTTPD_URL/smart_cookies/repo.git master &&
-   tail -3 cookies.txt >cookies_tail.txt &&
+   tail -3 cookies.txt | sort >cookies_tail.txt &&
test_cmp expect_cookies.txt cookies_tail.txt
 '
 
-- 
2.19.0.444.g18242da7ef

[PATCH v2 0/2] t5551: compare sorted cookies files

2018-09-17 Thread Thomas Gummerer

Thanks Jonathan and Junio for the comments on the first round.

Changes since the first round:
- add a preparatory patch to modernize the test script
- add Reported-by to credit Todd
- just use 'sort' instead of 'cat | sort'

Thomas Gummerer (2):
  t5551: move setup code inside test_expect blocks
  t5551: compare sorted cookies files

 t/t5551-http-fetch-smart.sh | 68 ++---
 1 file changed, 34 insertions(+), 34 deletions(-)

-- 
2.19.0.444.g18242da7ef

[PATCH v2 1/2] t5551: move setup code inside test_expect blocks

2018-09-17 Thread Thomas Gummerer

Move setup code inside test_expect blocks, to catch unexpected
failures in the setup steps, and bring the test scripts in line with
our modern test style.

Suggested-by: Jonathan Nieder 
Signed-off-by: Thomas Gummerer 
---
 t/t5551-http-fetch-smart.sh | 66 ++---
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 771f36f9ff..71535631d3 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -23,26 +23,26 @@ test_expect_success 'create http-accessible bare 
repository' '
 
 setup_askpass_helper
 
-cat >exp < GET /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1
-> Accept: */*
-> Accept-Encoding: ENCODINGS
-> Pragma: no-cache
-< HTTP/1.1 200 OK
-< Pragma: no-cache
-< Cache-Control: no-cache, max-age=0, must-revalidate
-< Content-Type: application/x-git-upload-pack-advertisement
-> POST /smart/repo.git/git-upload-pack HTTP/1.1
-> Accept-Encoding: ENCODINGS
-> Content-Type: application/x-git-upload-pack-request
-> Accept: application/x-git-upload-pack-result
-> Content-Length: xxx
-< HTTP/1.1 200 OK
-< Pragma: no-cache
-< Cache-Control: no-cache, max-age=0, must-revalidate
-< Content-Type: application/x-git-upload-pack-result
-EOF
 test_expect_success 'clone http repository' '
+   cat >exp <<-\EOF &&
+   > GET /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1
+   > Accept: */*
+   > Accept-Encoding: ENCODINGS
+   > Pragma: no-cache
+   < HTTP/1.1 200 OK
+   < Pragma: no-cache
+   < Cache-Control: no-cache, max-age=0, must-revalidate
+   < Content-Type: application/x-git-upload-pack-advertisement
+   > POST /smart/repo.git/git-upload-pack HTTP/1.1
+   > Accept-Encoding: ENCODINGS
+   > Content-Type: application/x-git-upload-pack-request
+   > Accept: application/x-git-upload-pack-result
+   > Content-Length: xxx
+   < HTTP/1.1 200 OK
+   < Pragma: no-cache
+   < Cache-Control: no-cache, max-age=0, must-revalidate
+   < Content-Type: application/x-git-upload-pack-result
+   EOF
GIT_TRACE_CURL=true git clone --quiet $HTTPD_URL/smart/repo.git clone 
2>err &&
test_cmp file clone/file &&
tr '\''\015'\'' Q exp <exp <<-\EOF &&
+   GET  /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1 200
+   POST /smart/repo.git/git-upload-pack HTTP/1.1 200
+   GET  /smart/repo.git/info/refs?service=git-upload-pack HTTP/1.1 200
+   POST /smart/repo.git/git-upload-pack HTTP/1.1 200
+   EOF
check_access_log exp
 '
 
@@ -203,15 +203,15 @@ test_expect_success 'dumb clone via http-backend respects 
namespace' '
test_cmp expect actual
 '
 
-cat >cookies.txt <expect_cookies.txt <cookies.txt <<-\EOF &&
+   127.0.0.1   FALSE   /smart_cookies/ FALSE   0   othername   
othervalue
+   EOF
+   cat >expect_cookies.txt <<-\EOF &&
+
+   127.0.0.1   FALSE   /smart_cookies/ FALSE   0   othername   
othervalue
+   127.0.0.1   FALSE   /smart_cookies/repo.git/info/   FALSE   0   
namevalue
+   EOF
git config http.cookiefile cookies.txt &&
git config http.savecookies true &&
git ls-remote $HTTPD_URL/smart_cookies/repo.git master &&
-- 
2.19.0.444.g18242da7ef

Re: [PATCH] t5551: compare sorted cookies files

2018-09-17 Thread Thomas Gummerer

On 09/17, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > In t5551 we check that we save cookies correctly to a file when
> > http.cookiefile and http.savecookies are set.  To do so we create an
> > expect file that expects the cookies in a certain order.
> >
> > However after e2ef8d6fa ("cookies: support creation-time attribute for
> > cookies", 2018-08-28) in curl.git (released in curl 7.61.1) that order
> > changed.
> >
> > We document the file format as "Netscape/Mozilla cookie file
> > format (see curl(1))", so any format produced by libcurl should be
> > fine here.  Sort the files, to be agnostic to the order of the
> > cookies, and make the test pass with both curl versions > 7.61.1 and
> > earlier curl versions.
> >
> > Signed-off-by: Thomas Gummerer 
> > ---
> 
> Thanks.  f5b2c9c9 ("t5551-http-fetch-smart.sh: sort cookies before
> comparing", 2018-09-07) that came from
> 
> https://public-inbox.org/git/20180907232205.31328-1-...@pobox.com
> 
> has almost the identical patch text, and this (presumably an
> independent effort) confirms that the patch is needed.

Whoops awkward, I should have checked 'pu' before starting to work on
this.  This was an independent effort, but I really should
have checked 'pu' before starting on this.

> The other
> effort implicitly depends on the expected output is kept sorted, but
> this one is more explicit---I tend to prefer this approach as tools
> and automation is easier to maintain than having to remember that
> the source must be sorted.

I'm happy going with either patch, but if we want to go with mine, I'd
like to make sure Todd is credited appropriately, as he sent a very
similar patch first.  Not sure what the appropriate way here is
though?

> Thanks.
> 
> >  t/t5551-http-fetch-smart.sh | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> > index 771f36f9ff..d13b993201 100755
> > --- a/t/t5551-http-fetch-smart.sh
> > +++ b/t/t5551-http-fetch-smart.sh
> > @@ -206,7 +206,7 @@ test_expect_success 'dumb clone via http-backend 
> > respects namespace' '
> >  cat >cookies.txt < >  127.0.0.1  FALSE   /smart_cookies/ FALSE   0   othername   
> > othervalue
> >  EOF
> > -cat >expect_cookies.txt < > +cat <expect_cookies.txt
> >  
> >  127.0.0.1  FALSE   /smart_cookies/ FALSE   0   othername   
> > othervalue
> >  127.0.0.1  FALSE   /smart_cookies/repo.git/info/   FALSE   0   name
> > value
> > @@ -215,7 +215,7 @@ test_expect_success 'cookies stored in http.cookiefile 
> > when http.savecookies set
> > git config http.cookiefile cookies.txt &&
> > git config http.savecookies true &&
> > git ls-remote $HTTPD_URL/smart_cookies/repo.git master &&
> > -   tail -3 cookies.txt >cookies_tail.txt &&
> > +   tail -3 cookies.txt | sort >cookies_tail.txt &&
> > test_cmp expect_cookies.txt cookies_tail.txt
> >  '

[PATCH] t5551: compare sorted cookies files

2018-09-17 Thread Thomas Gummerer

In t5551 we check that we save cookies correctly to a file when
http.cookiefile and http.savecookies are set.  To do so we create an
expect file that expects the cookies in a certain order.

However after e2ef8d6fa ("cookies: support creation-time attribute for
cookies", 2018-08-28) in curl.git (released in curl 7.61.1) that order
changed.

We document the file format as "Netscape/Mozilla cookie file
format (see curl(1))", so any format produced by libcurl should be
fine here.  Sort the files, to be agnostic to the order of the
cookies, and make the test pass with both curl versions > 7.61.1 and
earlier curl versions.

Signed-off-by: Thomas Gummerer 
---
 t/t5551-http-fetch-smart.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 771f36f9ff..d13b993201 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -206,7 +206,7 @@ test_expect_success 'dumb clone via http-backend respects 
namespace' '
 cat >cookies.txt <expect_cookies.txt <expect_cookies.txt
 
 127.0.0.1  FALSE   /smart_cookies/ FALSE   0   othername   
othervalue
 127.0.0.1  FALSE   /smart_cookies/repo.git/info/   FALSE   0   name
value
@@ -215,7 +215,7 @@ test_expect_success 'cookies stored in http.cookiefile when 
http.savecookies set
git config http.cookiefile cookies.txt &&
git config http.savecookies true &&
git ls-remote $HTTPD_URL/smart_cookies/repo.git master &&
-   tail -3 cookies.txt >cookies_tail.txt &&
+   tail -3 cookies.txt | sort >cookies_tail.txt &&
test_cmp expect_cookies.txt cookies_tail.txt
 '
 
-- 
2.19.0.444.g18242da7ef

[PATCH v2] linear-assignment: fix potential out of bounds memory access

2018-09-13 Thread Thomas Gummerer

Currently the 'compute_assignment()' function may read memory out
of bounds, even if used correctly.  Namely this happens when we only
have one column.  In that case we try to calculate the initial
minimum cost using '!j1' as column in the reduction transfer code.
That in turn causes us to try and get the cost from column 1 in the
cost matrix, which does not exist, and thus results in an out of
bounds memory read.

In the original paper [1], the example code initializes that minimum
cost to "infinite".  We could emulate something similar by setting the
minimum cost to INT_MAX, which would result in the same minimum cost
as the current algorithm, as we'd always go into the if condition at
least once, except when we only have one column, and column_count thus
equals 1.

If column_count does equal 1, the condition in the loop would always
be false, and we'd end up with a minimum of INT_MAX, which may lead to
integer overflows later in the algorithm.

For a column count of 1, we however do not even really need to go
through the whole algorithm.  A column count of 1 means that there's
no possible assignments, and we can just zero out the column2row and
row2column arrays, and return early from the function, while keeping
the reduction transfer part of the function the same as it is
currently.

Another solution would be to just not call the 'compute_assignment()'
function from the range diff code in this case, however it's better to
make the compute_assignment function more robust, so future callers
don't run into this potential problem.

Note that the test only fails under valgrind on Linux, but the same
command has been reported to segfault on Mac OS.

[1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
 algorithm for dense and sparse linear assignment
 problems. Computing, 38(4), 325–340.

Reported-by: ryenus 
Helped-by: Derrick Stolee 
Signed-off-by: Thomas Gummerer 
---
 linear-assignment.c   | 6 ++
 t/t3206-range-diff.sh | 5 +
 2 files changed, 11 insertions(+)

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..ecffc09be6 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -19,6 +19,12 @@ void compute_assignment(int column_count, int row_count, int 
*cost,
int *free_row, free_count = 0, saved_free_count, *pred, *col;
int i, j, phase;
 
+   if (column_count < 2) {
+   memset(column2row, 0, sizeof(int) * column_count);
+   memset(row2column, 0, sizeof(int) * row_count);
+   return;
+   }
+
memset(column2row, -1, sizeof(int) * column_count);
memset(row2column, -1, sizeof(int) * row_count);
ALLOC_ARRAY(v, column_count);
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..fb4c13a84a 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,4 +142,9 @@ test_expect_success 'changed message' '
test_cmp expected actual
 '
 
+test_expect_success 'no commits on one side' '
+   git commit --amend -m "new message" &&
+   git range-diff master HEAD@{1} HEAD
+'
+
 test_done
-- 
2.19.0.397.gdd90340f6a

Re: [PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-13 Thread Thomas Gummerer

On 09/12, Johannes Schindelin wrote:
> Hi Thomas,
> 
> [quickly, as I will go back to a proper vacation after this]

Sorry about interrupting your vacation, enjoy wherever you are! :)

> On Wed, 12 Sep 2018, Thomas Gummerer wrote:
> 
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> > int *cost,
> > else if (j1 < -1)
> > row2column[i] = -2 - j1;
> > else {
> > -   int min = COST(!j1, i) - v[!j1];
> > -   for (j = 1; j < column_count; j++)
> > +   int min = INT_MAX;
> 
> I am worried about this, as I tried very hard to avoid integer overruns.

Ah fair enough, now I think I understand where the calculation of the
initial value of min comes from, thanks!

> Wouldn't it be possible to replace the `else {` by an appropriate `else if
> (...) { ... } else {`? E.g. `else if (column_count < 2)` or some such?

Yes, I think that would be possible.  However if we're already special
casing "column_count < 2", I think we might as well just exit early
before running through the whole algorithm in that case.  If there's
only one column, there are no commits that can be assigned to
eachother, as there is only the one.

We could also just not run call 'compute_assignment' in the first
place if column_count == 1, however I'd rather make the function safer
to call, just in case we find it useful for something else in the
future.

Will send an updated patch in a bit.

> Ciao,
> Dscho
> 
> > +   for (j = 0; j < column_count; j++)
> > if (j != j1 && min > COST(j, i) - v[j])
> > min = COST(j, i) - v[j];
> > v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
> > -- 
> > 2.19.0.397.gdd90340f6a
> > 
> >

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-13 Thread Thomas Gummerer

On 09/13, Ben Peart wrote:
> 
> 
> On 9/12/2018 6:31 PM, Thomas Gummerer wrote:
> > On 09/12, Ben Peart wrote:
> > > Teach get_index_format_default() to support running the test suite
> > > with specific index versions.  In particular, this enables the test suite
> > > to be run using index version 4 which is not the default so gets less 
> > > testing.
> > 
> > I found this commit message slightly misleading.  Running the test
> > suite with specific index versions is already supported, by defining
> > TEST_GIT_INDEX_VERSION in 'config.mak'.  What we're doing here is
> > introduce an additional environment variable that can also be used to
> > set the index format in tests.
> > 
> > Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the
> > test suite with index-v4.  Admittedly the name is a bit strange
> > compared to our usual GIT_TEST_* environment variable names, and it
> > should probably be documented better (it's only documented in the
> > Makefile currently), but I'm not sure we should introduce another
> > environment variable for this purpose?
> 
> TEST_GIT_INDEX_VERSION enables the testing I was looking for but you're
> right, it isn't well documented and the atypical naming and implementation
> don't help either.
> 
> I checked the documentation and code but didn't see any way to test the V4
> index code path so wrote this patch.  I wonder if we can improve the
> discoverability of TEST_GIT_INDEX_VERSION through better naming and
> documentation.
> 
> How about this as an alternative?

Thanks, I do think this is a good idea.  I do however share Ævar's
concern in https://public-inbox.org/git/87h8itkz2h@evledraar.gmail.com/.
I have TEST_GIT_INDEX_VERSION=4 set in my config.mak since quite a
long time, and had I missed this thread, I would all of a sudden not
run the test suite with index format 4 anymore without any notice.

I think the suggestion of erroring out if TEST_GIT_INDEX_VERSION is
set would be useful in this case (and probably others in which you're
renaming these variables).  Not sure how many people it would affect
(and most of those would probably read the mailing list), but it's not
a big change either.

Btw, I think it would be nice to have all these renaming/documenting
variables for the test suite patches in one series, so they are easier
to look at with more context.

> 
> diff --git a/Makefile b/Makefile
> index 5a969f5830..9e84ef02f7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -400,7 +400,7 @@ all::
>  # (defaults to "man") if you want to have a different default when
>  # "git help" is called without a parameter specifying the format.
>  #
> -# Define TEST_GIT_INDEX_VERSION to 2, 3 or 4 to run the test suite
> +# Define GIT_TEST_INDEX_VERSION to 2, 3 or 4 to run the test suite
>  # with a different indexfile format version.  If it isn't set the index
>  # file format used is index-v[23].
>  #
> @@ -2599,8 +2599,8 @@ endif
>  ifdef GIT_INTEROP_MAKE_OPTS
> @echo GIT_INTEROP_MAKE_OPTS=\''$(subst ','\'',$(subst
> ','\'',$(GIT_INTEROP_MAKE_OPTS)))'\' >>$@+
>  endif
> -ifdef TEST_GIT_INDEX_VERSION
> -   @echo TEST_GIT_INDEX_VERSION=\''$(subst ','\'',$(subst
> ','\'',$(TEST_GIT_INDEX_VERSION)))'\' >>$@+
> +ifdef GIT_TEST_INDEX_VERSION
> +   @echo GIT_TEST_INDEX_VERSION=\''$(subst ','\'',$(subst
> ','\'',$(GIT_TEST_INDEX_VERSION)))'\' >>$@+
> 
>  endif
> @if cmp $@+ $@ >/dev/null 2>&1; then $(RM) $@+; else mv $@+ $@; fi
> 
> diff --git a/t/test-lib.sh b/t/test-lib.sh
> index 44288cbb59..31698c01c4 100644
> --- a/t/test-lib.sh
> +++ b/t/test-lib.sh
> @@ -134,9 +134,9 @@ export EDITOR
>  GIT_TRACE_BARE=1
>  export GIT_TRACE_BARE
> 
> -if test -n "${TEST_GIT_INDEX_VERSION:+isset}"
> +if test -n "${GIT_TEST_INDEX_VERSION:+isset}"
>  then
> -   GIT_INDEX_VERSION="$TEST_GIT_INDEX_VERSION"
> +   GIT_INDEX_VERSION="$GIT_TEST_INDEX_VERSION"
> export GIT_INDEX_VERSION
>  fi
> 
> diff --git a/t/README b/t/README
> index 9028b47d92..f872638a78 100644
> --- a/t/README
> +++ b/t/README
> @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the
> object size is
>   over 2GB. This variable forces the code path on any object larger than
>bytes.
> 
> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>   path where deltas larger than this limit require extra memory
>   allocation for bookkeeping.
> 
> +GIT_TEST_INDEX_VERSION= exercises the index read/write code path
> +for the index version specified.  Can be set to any valid version
> +but the non-default version 4 is probably the most beneficial.
> +
>   Naming Tests
>   
>

Re: [PATCH] linear-assignment: fix potential out of bounds memory access

2018-09-12 Thread Thomas Gummerer

On 09/12, Junio C Hamano wrote:
> Thomas Gummerer  writes:

> > --- >8 ---
> >
> > Subject: [PATCH] linear-assignment: fix potential out of bounds memory 
> > access
> >
> > Currently the 'compute_assignment()' function can may read memory out
> > of bounds, even if used correctly.  Namely this happens when we only
> > have one column.  In that case we try to calculate the initial
> > minimum cost using '!j1' as column in the reduction transfer code.
> > That in turn causes us to try and get the cost from column 1 in the
> > cost matrix, which does not exist, and thus results in an out of
> > bounds memory read.
> 
> This nicely explains what goes wrong.
> 
> > Instead of trying to intialize the minimum cost from another column,
> > just set it to INT_MAX.  This also matches what the example code in the
> > original paper for the algorithm [1] does (it initializes the value to
> > inf, for which INT_MAX is the closest match in C).
> 
> Yeah, if we really want to avoid INT_MAX we could use another "have
> we found any value yet?" boolean variable, but the caller in
> get_correspondences() does not even worry about integer overflows
> when stuffing diffsize to the cost[] array, and the other possible
> value that can go to cost[] array is COST_MAX that is mere 65k, so
> it would be OK to use INT_MAX as sentinel here, I guess.

Right, I'm not sure it would be worth introducing another boolean
variable here.  In the normal case we'll always enter the if condition
inside the loop, and set a reasonable 'min' value.

That does not happen if we only have one column, and the 'min' will
remain 'INT_MAX'.  Now in that case it doesn't matter much, as having
only one column means there's no possibility to assign anything, so
the actual values shouldn't matter (at least that's my understanding
of the algorithm so far).

Another improvement we may be able to make here is to not even try to
compute the assignment if there's only one column for that reason, but
I'm out of time today and the rest of my week looks a bit busy, so I
probably won't get to do anything before the beginning of next week.

> > Note that the test only fails under valgrind on Linux, but the same
> > command has been reported to segfault on Mac OS.
> >
> > Also start from 0 in the loop, which matches what the example code in
> > the original paper does as well.  Starting from 1 means we'd ignore
> > the first column during the reduction transfer phase.  Note that in
> > the original paper the loop does start from 1, but the implementation
> > is in Pascal, where arrays are 1 indexed.
> >
> > [1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
> >  algorithm for dense and sparse linear assignment
> >  problems. Computing, 38(4), 325–340.
> >
> > Reported-by: ryenus 
> > Helped-by: Derrick Stolee 
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  linear-assignment.c   | 4 ++--
> >  t/t3206-range-diff.sh | 5 +
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/linear-assignment.c b/linear-assignment.c
> > index 9b3e56e283..7700b80eeb 100644
> > --- a/linear-assignment.c
> > +++ b/linear-assignment.c
> > @@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, 
> > int *cost,
> > else if (j1 < -1)
> > row2column[i] = -2 - j1;
> > else {
> > -   int min = COST(!j1, i) - v[!j1];
> > -   for (j = 1; j < column_count; j++)
> > +   int min = INT_MAX;
> > +   for (j = 0; j < column_count; j++)
> > if (j != j1 && min > COST(j, i) - v[j])
> > min = COST(j, i) - v[j];
> > v[j1] -= min;
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..fb4c13a84a 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'no commits on one side' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done

Re: [PATCH v1] read-cache: add GIT_TEST_INDEX_VERSION support

2018-09-12 Thread Thomas Gummerer

On 09/12, Ben Peart wrote:
> Teach get_index_format_default() to support running the test suite
> with specific index versions.  In particular, this enables the test suite
> to be run using index version 4 which is not the default so gets less testing.

I found this commit message slightly misleading.  Running the test
suite with specific index versions is already supported, by defining
TEST_GIT_INDEX_VERSION in 'config.mak'.  What we're doing here is
introduce an additional environment variable that can also be used to
set the index format in tests.

Even setting TEST_GIT_INDEX_VERSION=4 in the environment does run the
test suite with index-v4.  Admittedly the name is a bit strange
compared to our usual GIT_TEST_* environment variable names, and it
should probably be documented better (it's only documented in the
Makefile currently), but I'm not sure we should introduce another
environment variable for this purpose?

> Signed-off-by: Ben Peart 
> ---
> 
> Notes:
> Base Ref: v2.19.0
> Web-Diff: https://github.com/benpeart/git/commit/52e733e2ce
> Checkout: git fetch https://github.com/benpeart/git 
> git-test-index-version-v1 && git checkout 52e733e2ce
> 
>  read-cache.c | 47 +--
>  t/README |  6 +-
>  2 files changed, 38 insertions(+), 15 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 7b1354d759..d140ce9989 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1570,26 +1570,45 @@ static unsigned int get_index_format_default(void)
>   char *envversion = getenv("GIT_INDEX_VERSION");
>   char *endp;
>   int value;
> - unsigned int version = INDEX_FORMAT_DEFAULT;
> + unsigned int version = -1;
> +
> + if (envversion) {
> + version = strtoul(envversion, , 10);
> + if (*endp ||
> + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) 
> {
> + warning(_("GIT_INDEX_VERSION set, but the value is 
> invalid.\n"
> + "Using version %i"), INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
> + }
>  
> - if (!envversion) {
> - if (!git_config_get_int("index.version", ))
> + if (version == -1) {
> + if (!git_config_get_int("index.version", )) {
>   version = value;
> - if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
> - warning(_("index.version set, but the value is 
> invalid.\n"
> -   "Using version %i"), INDEX_FORMAT_DEFAULT);
> - return INDEX_FORMAT_DEFAULT;
> + if (version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
> version) {
> + warning(_("index.version set, but the value is 
> invalid.\n"
> + "Using version %i"), 
> INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
>   }
> - return version;
>   }
>  
> - version = strtoul(envversion, , 10);
> - if (*endp ||
> - version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < version) {
> - warning(_("GIT_INDEX_VERSION set, but the value is invalid.\n"
> -   "Using version %i"), INDEX_FORMAT_DEFAULT);
> - version = INDEX_FORMAT_DEFAULT;
> + if (version == -1) {
> + envversion = getenv("GIT_TEST_INDEX_VERSION");
> + if (envversion) {
> + version = strtoul(envversion, , 10);
> + if (*endp ||
> + version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < 
> version) {
> + warning(_("GIT_TEST_INDEX_VERSION set, but the 
> value is invalid.\n"
> + "Using version %i"), 
> INDEX_FORMAT_DEFAULT);
> + version = INDEX_FORMAT_DEFAULT;
> + }
> + }
>   }
> +
> + if (version == -1)
> + version = INDEX_FORMAT_DEFAULT;
> +
>   return version;
>  }
>  
> diff --git a/t/README b/t/README
> index 9028b47d92..f872638a78 100644
> --- a/t/README
> +++ b/t/README
> @@ -315,10 +315,14 @@ packs on demand. This normally only happens when the 
> object size is
>  over 2GB. This variable forces the code path on any object larger than
>   bytes.
>  
> -GIT_TEST_OE_DELTA_SIZE= exercises the uncomon pack-objects code
> +GIT_TEST_OE_DELTA_SIZE= exercises the uncommon pack-objects code
>  path where deltas larger than this limit require extra memory
>  allocation for bookkeeping.
>  
> +GIT_TEST_INDEX_VERSION= exercises the index read/write code path
> +for the index version specified.  Can be set to any valid version
> +but the non-default version 4 is probably the most beneficial.
> +
>  Naming Tests
>  
>  
> 
> base-commit:

[PATCH] linear-assignment: fix potential out of bounds memory access (was: Re: Git 2.19 Segmentation fault 11 on macOS)

2018-09-12 Thread Thomas Gummerer

On 09/11, Thomas Gummerer wrote:
> On 09/11, Thomas Gummerer wrote:
> > I think you're on the right track here.  I can not test this on Mac
> > OS, but on Linux, the following fails when running the test under
> > valgrind:
> > 
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..a8b0ef8c1d 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> > test_cmp expected actual
> >  '
> >  
> > +test_expect_success 'amend and check' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff master HEAD@{1} HEAD
> > +'
> > +
> >  test_done
> > 
> > valgrind gives me the following:
> > 
> > ==18232== Invalid read of size 4
> > ==18232==at 0x34D7B5: compute_assignment (linear-assignment.c:54)
> > ==18232==by 0x2A4253: get_correspondences (range-diff.c:245)
> > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==by 0x122698: run_builtin (git.c:418)
> > ==18232==by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==by 0x122BCC: run_argv (git.c:689)
> > ==18232==by 0x122D90: cmd_main (git.c:766)
> > ==18232==by 0x1D55A3: main (common-main.c:45)
> > ==18232==  Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd
> > ==18232==at 0x483777F: malloc (vg_replace_malloc.c:299)
> > ==18232==by 0x3381B0: do_xmalloc (wrapper.c:60)
> > ==18232==by 0x338283: xmalloc (wrapper.c:87)
> > ==18232==by 0x2A3F8C: get_correspondences (range-diff.c:207)
> > ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> > ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> > ==18232==by 0x122698: run_builtin (git.c:418)
> > ==18232==by 0x1229D8: handle_builtin (git.c:637)
> > ==18232==by 0x122BCC: run_argv (git.c:689)
> > ==18232==by 0x122D90: cmd_main (git.c:766)
> > ==18232==by 0x1D55A3: main (common-main.c:45)
> > ==18232== 
> > 
> > I'm looking into why that fails.  Also adding Dscho to Cc here as the
> > author of this code.
> 
> The diff below seems to fix it.  Not submitting this as a proper
> patch [...]

I found the time to actually have a look at the paper, so here's a
proper patch:

I'm still not entirely sure what the initial code tried to do here,
but I think staying as close as possible to the original is probably
our best option here, also for future readers of this code.

--- >8 ---

Subject: [PATCH] linear-assignment: fix potential out of bounds memory access

Currently the 'compute_assignment()' function can may read memory out
of bounds, even if used correctly.  Namely this happens when we only
have one column.  In that case we try to calculate the initial
minimum cost using '!j1' as column in the reduction transfer code.
That in turn causes us to try and get the cost from column 1 in the
cost matrix, which does not exist, and thus results in an out of
bounds memory read.

Instead of trying to intialize the minimum cost from another column,
just set it to INT_MAX.  This also matches what the example code in the
original paper for the algorithm [1] does (it initializes the value to
inf, for which INT_MAX is the closest match in C).

Note that the test only fails under valgrind on Linux, but the same
command has been reported to segfault on Mac OS.

Also start from 0 in the loop, which matches what the example code in
the original paper does as well.  Starting from 1 means we'd ignore
the first column during the reduction transfer phase.  Note that in
the original paper the loop does start from 1, but the implementation
is in Pascal, where arrays are 1 indexed.

[1]: Jonker, R., & Volgenant, A. (1987). A shortest augmenting path
 algorithm for dense and sparse linear assignment
 problems. Computing, 38(4), 325–340.

Reported-by: ryenus 
Helped-by: Derrick Stolee 
Signed-off-by: Thomas Gummerer 
---
 linear-assignment.c   | 4 ++--
 t/t3206-range-diff.sh | 5 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..7700b80eeb 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -51,8 +51,8 @@ void compute_assignment(int column_count, int row_count, int 
*cost,
else if (j1 < -1)
row2column[i] = -2 - j1;
else {
-   int min = COST(!j1, i) - v[!j1];
-   for (j = 1; j < column_count; j++)
+   int min = INT_MAX;
+   for (j = 0; j &

Re: Git 2.19 Segmentation fault 11 on macOS

2018-09-11 Thread Thomas Gummerer

On 09/11, Thomas Gummerer wrote:
> I think you're on the right track here.  I can not test this on Mac
> OS, but on Linux, the following fails when running the test under
> valgrind:
> 
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 2237c7f4af..a8b0ef8c1d 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> test_cmp expected actual
>  '
>  
> +test_expect_success 'amend and check' '
> +   git commit --amend -m "new message" &&
> +   git range-diff master HEAD@{1} HEAD
> +'
> +
>  test_done
> 
> valgrind gives me the following:
> 
> ==18232== Invalid read of size 4
> ==18232==at 0x34D7B5: compute_assignment (linear-assignment.c:54)
> ==18232==by 0x2A4253: get_correspondences (range-diff.c:245)
> ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> ==18232==by 0x122698: run_builtin (git.c:418)
> ==18232==by 0x1229D8: handle_builtin (git.c:637)
> ==18232==by 0x122BCC: run_argv (git.c:689)
> ==18232==by 0x122D90: cmd_main (git.c:766)
> ==18232==by 0x1D55A3: main (common-main.c:45)
> ==18232==  Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd
> ==18232==at 0x483777F: malloc (vg_replace_malloc.c:299)
> ==18232==by 0x3381B0: do_xmalloc (wrapper.c:60)
> ==18232==by 0x338283: xmalloc (wrapper.c:87)
> ==18232==by 0x2A3F8C: get_correspondences (range-diff.c:207)
> ==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
> ==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
> ==18232==by 0x122698: run_builtin (git.c:418)
> ==18232==by 0x1229D8: handle_builtin (git.c:637)
> ==18232==by 0x122BCC: run_argv (git.c:689)
> ==18232==by 0x122D90: cmd_main (git.c:766)
> ==18232==by 0x1D55A3: main (common-main.c:45)
> ==18232== 
> 
> I'm looking into why that fails.  Also adding Dscho to Cc here as the
> author of this code.

The diff below seems to fix it.  Not submitting this as a proper
patch, as I don't quite understand what the original code tried to do
here.  However this does pass all tests we currently have and fixes
the out of bounds memory read that's caught by valgrind (and that I
imagine could cause the segfault on Mac OS).

This matches how the initial minimum for the reduction transfer is
calculated in [1].

I'll try to convince myself of the right solution, but should someone
more familiar with the linear-assignment algorithm have an idea, feel
free to take this over :)

[1]: https://github.com/src-d/lapjv/blob/master/lap.h#L276

--- >8 ---

diff --git a/linear-assignment.c b/linear-assignment.c
index 9b3e56e283..ab0aa5fd41 100644
--- a/linear-assignment.c
+++ b/linear-assignment.c
@@ -51,7 +51,7 @@ void compute_assignment(int column_count, int row_count, int 
*cost,
else if (j1 < -1)
row2column[i] = -2 - j1;
else {
-   int min = COST(!j1, i) - v[!j1];
+   int min = INT_MAX;
for (j = 1; j < column_count; j++)
if (j != j1 && min > COST(j, i) - v[j])
min = COST(j, i) - v[j];
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..a8b0ef8c1d 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,4 +142,9 @@ test_expect_success 'changed message' '
test_cmp expected actual
 '
 
+test_expect_success 'amend and check' '
+   git commit --amend -m "new message" &&
+   git range-diff master HEAD@{1} HEAD
+'
+
 test_done

--- >8 ---

Re: Git 2.19 Segmentation fault 11 on macOS

2018-09-11 Thread Thomas Gummerer

On 09/11, Derrick Stolee wrote:
> On 9/11/2018 12:04 PM, Derrick Stolee wrote:
> > On 9/11/2018 11:38 AM, Derrick Stolee wrote:
> > The patch below includes a test that fails on Mac OSX with a segfault.
> > 
> > GitGitGadget PR: https://github.com/gitgitgadget/git/pull/36
> > Failed Build: 
> > https://git-for-windows.visualstudio.com/git/_build/results?buildId=18616=logs
> > 
> > -->8--
> > 
> > From 3ee470d09d54b9ad7ab950f17051d625db0c8654 Mon Sep 17 00:00:00 2001
> > From: Derrick Stolee 
> > Date: Tue, 11 Sep 2018 11:42:03 -0400
> > Subject: [PATCH] range-diff: attempt to create test that fails on OSX
> > 
> > Signed-off-by: Derrick Stolee 
> > ---
> >  t/t3206-range-diff.sh | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> > index 2237c7f4af..02744b07a8 100755
> > --- a/t/t3206-range-diff.sh
> > +++ b/t/t3206-range-diff.sh
> > @@ -142,4 +142,9 @@ test_expect_success 'changed message' '
> >     test_cmp expected actual
> >  '
> > 
> > +test_expect_success 'amend and check' '
> > +   git commit --amend -m "new message" &&
> > +   git range-diff changed-message HEAD@{2} HEAD
> > +'
> > +
> >  test_done
> > -- 
> > 2.19.0.rc2.windows.1
> 
> 
> Sorry, nevermind. The test failed for a different reason:

I think you're on the right track here.  I can not test this on Mac
OS, but on Linux, the following fails when running the test under
valgrind:

diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 2237c7f4af..a8b0ef8c1d 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -142,4 +142,9 @@ test_expect_success 'changed message' '
test_cmp expected actual
 '
 
+test_expect_success 'amend and check' '
+   git commit --amend -m "new message" &&
+   git range-diff master HEAD@{1} HEAD
+'
+
 test_done

valgrind gives me the following:

==18232== Invalid read of size 4
==18232==at 0x34D7B5: compute_assignment (linear-assignment.c:54)
==18232==by 0x2A4253: get_correspondences (range-diff.c:245)
==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
==18232==by 0x122698: run_builtin (git.c:418)
==18232==by 0x1229D8: handle_builtin (git.c:637)
==18232==by 0x122BCC: run_argv (git.c:689)
==18232==by 0x122D90: cmd_main (git.c:766)
==18232==by 0x1D55A3: main (common-main.c:45)
==18232==  Address 0x4f4d844 is 0 bytes after a block of size 4 alloc'd
==18232==at 0x483777F: malloc (vg_replace_malloc.c:299)
==18232==by 0x3381B0: do_xmalloc (wrapper.c:60)
==18232==by 0x338283: xmalloc (wrapper.c:87)
==18232==by 0x2A3F8C: get_correspondences (range-diff.c:207)
==18232==by 0x2A4BFB: show_range_diff (range-diff.c:427)
==18232==by 0x19D453: cmd_range_diff (range-diff.c:108)
==18232==by 0x122698: run_builtin (git.c:418)
==18232==by 0x1229D8: handle_builtin (git.c:637)
==18232==by 0x122BCC: run_argv (git.c:689)
==18232==by 0x122D90: cmd_main (git.c:766)
==18232==by 0x1D55A3: main (common-main.c:45)
==18232== 

I'm looking into why that fails.  Also adding Dscho to Cc here as the
author of this code.

Re: Git 2.19 Segmentation fault 11 on macOS

2018-09-11 Thread Thomas Gummerer

Hi,

thanks for your bug report!

On 09/11, ryenus wrote:
> I just updated to 2.19 via Homebrew, git range-diff seems cool, but I
> only got a Segmentation fault: 11
> 
> $ git version; git range-diff origin/master  HEAD@{2} HEAD

Unfortunately the HEAD@{2} syntax needs your reflog, which is not
available when just cloning the repository (the reflog is only local
and not pushed to the remote repository).  Would it be possible to
create a short script to create the repository where you're
experiencing the behaviour, or replacing 'origin/master', 'HEAD@{2}'
and 'HEAD' with the actual commit ids?

I tried with various values, but unfortunately failed to reproduce
this so far (although admittedly I tried it on linux, not Mac OS).

> git version 2.19.0
> Segmentation fault: 11
> 
> Both origin/master and my local branch each got two new commits of their own,
> please correct me if this is not the expected way to use git range-diff.
> 
> FYI, I've created a sample repo here:
> https://github.com/ryenus/range-diff-segfault/

Re: [PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers

2018-09-01 Thread Thomas Gummerer

On 08/29, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Yeah that makes sense.  Maybe something like this?
> >
> > (replying to  here to keep
> > the patches in one thread)
> >
> >  Documentation/technical/rerere.txt | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/Documentation/technical/rerere.txt 
> > b/Documentation/technical/rerere.txt
> > index e65ba9b0c6..8fefe51b00 100644
> > --- a/Documentation/technical/rerere.txt
> > +++ b/Documentation/technical/rerere.txt
> > @@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for 
> > the outer and the
> >  inner conflict.  This is done recursively, so any number of nested
> >  conflicts can be handled.
> >  
> > +Note that this only works for conflict markers that "cleanly nest".  If
> > +there are any unmatched conflict markers, rerere will fail to handle
> > +the conflict and record a conflict resolution.
> > +
> >  The only difference is in how the conflict ID is calculated.  For the
> >  inner conflict, the conflict markers themselves are not stripped out
> >  before calculating the sha1.
> 
> Looks good to me except for the line count on the @@ line.  The
> preimage ought to have 6 (not 7) lines and adding 4 new lines makes
> it a 10 line postimage.  I wonder who miscounted the hunk---it is
> immediately followed by the signature cut mark "-- \n" and some
> tools (including Emacs's patch editing mode) are known to
> misinterpret it as a preimage line that was removed.

Sorry about that.  Yeah Emacs's patch editing mode doing that would
explain it.  I did a round of proof-reading in my editor, and spotted
a typo.  Since it was trivial to fix I just edited the patch
directly, and Emacs changed the line count.  Sorry about that, I'll be
more careful about this in the future.

> What is curious is that your 2/2 counts the preimage lines
> correctly.

I only added some text after the '---' line in 2/2, but did not edit
the patch directly.  Emacs's patch editing mode only seems to change
the line numbers of the patch that's being edited, not if anything
surrounding that is changed, so the line count stayed the same as what
format-patch put in the file in the first place.

> In any case, both patches look good.  Will apply.

Thanks!

> Thanks.

Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-28 Thread Thomas Gummerer

On 08/27, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Agreed.  I think it may be solvable if we'd actually get the
> > information about what belongs to which side from the merge algorithm
> > directly.
> 
> The merge machinery may (eh, rather, "does") know, but we do not
> have a way to express that in the working tree file that becomes the
> input to the rerere algorithm, without making backward-incompatible
> changes to the output format.

Right, I was more thinking along the lines of using the stages in the
index to redo the merge and get the information that way.  But that
may not work as well with using 'git rerere' from the command line,
and have other backwards compatibility woes, that I didn't quite think
through yet :)

> In a sense, that is already a solved problem, even though the
> solution was done a bit differently ;-) If the end users need to
> commit a half-resolved result with conflict markers (perhaps they
> want to share it among themselves and work on resolving further),
> what they can do is to also say that these are now part of contents,
> not conflict markers, with conflict-marker-size attribute.  Perhaps
> they prepare such a half-resolved result with unusual value of the
> attribute, so that later merge of these with standard conflict
> marker size will not get confused.

Right, I wasn't aware of the conflict-marker-size attribute.  Thanks
for mentioning it!

> That reminds me of another thing.  I've been running with these in
> my $GIT_DIR/info/attributes file for the past few years.  Perhaps we
> should add them to Documentation/.gitattributes and t/.gitattributes
> so that project participants would all benefit?
> 
> Documentation/git-merge.txt   conflict-marker-size=32
> Documentation/user-manual.txt conflict-marker-size=32
> t/t-*.sh  conflict-marker-size=32

I do think that would be a good idea.  I am wondering what the right
value is though.  Seeing such a long conflict marker before I knew
about this setting would have struck me as odd, and probably made me
try and track down where it is coming from.  But on the other hand it
makes the conflict markers very easy to tell apart from the rest of
the lines that kind of look like conflict markers.

I think these tradeoffs probably make it worth setting them to a value
this large.

One other file that I see needs such a treatment is
Documentation/gitk.txt, where the first header is 7 "="s, and
therefore could confuse 'git rerere' as well.  Arguably that's less
important, as there's unlikely to be a conflict containing that line,
but it may be worth including for completeness sake.

Maybe something like this?  Though it may be good for others to chime
in if they find this helpful or whether they find the long conflict
markers distracting.

--- >8 ---
Subject: [PATCH] .gitattributes: add conflict-marker-size for relevant files

Some files in git.git contain lines that look like conflict markers,
either in examples or tests, or in the case of Documentation/gitk.txt
because of the asciidoc heading.

Having conflict markers the same length as the actual content can be
confusing for humans, and is impossible to handle for tools like 'git
rerere'.  Work around that by setting the 'conflict-marker-size'
attribute for those files to 32, which makes the conflict markers
unambiguous.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 .gitattributes | 4 
 1 file changed, 4 insertions(+)

diff --git a/.gitattributes b/.gitattributes
index 1bdc91e282..49b3051641 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -9,3 +9,7 @@
 /command-list.txt eol=lf
 /GIT-VERSION-GEN eol=lf
 /mergetools/* eol=lf
+/Documentation/git-merge.txt conflict-marker-size=32
+/Documentation/gitk.txt conflict-marker-size=32
+/Documentation/user-manual.txt conflict-marker-size=32
+/t/t-*.sh conflict-marker-size=32
-- 
2.18.0.1088.ge017bf2cd1

[PATCH v2 2/2] rerere: add note about files with existing conflict markers

2018-08-28 Thread Thomas Gummerer

When a file contains lines that look like conflict markers, 'git
rerere' may fail not be able to record a conflict resolution.
Emphasize that in the man page, and mention a possible workaround for
the issue.

Suggested-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

Compared to v1, this now mentions the workaround of setting the
'conflict-marker-size', as mentioned in


 Documentation/git-rerere.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-rerere.txt b/Documentation/git-rerere.txt
index 031f31fa47..df310d2a58 100644
--- a/Documentation/git-rerere.txt
+++ b/Documentation/git-rerere.txt
@@ -211,6 +211,12 @@ would conflict the same way as the test merge you resolved 
earlier.
 'git rerere' will be run by 'git rebase' to help you resolve this
 conflict.
 
+[NOTE] 'git rerere' relies on the conflict markers in the file to
+detect the conflict.  If the file already contains lines that look the
+same as lines with conflict markers, 'git rerere' may fail to record a
+conflict resolution.  To work around this, the `conflict-marker-size`
+setting in linkgit:gitattributes[5] can be used.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.18.0.1088.ge017bf2cd1

[PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers

2018-08-28 Thread Thomas Gummerer

4af3220 ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.

However this is just a heuristic to punt on such conflicts better, and
doesn't deal with any unmatched conflict markers.  Make that clearer
in the documentation.

Suggested-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

> That's fine.  I'd rather keep it but perhaps add a reminder to tell
> readers that it works only when the merging of contents that already
> records with nested conflict markers happen to "cleanly nest".

Yeah that makes sense.  Maybe something like this?

(replying to  here to keep
the patches in one thread)

 Documentation/technical/rerere.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
index e65ba9b0c6..8fefe51b00 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for the 
outer and the
 inner conflict.  This is done recursively, so any number of nested
 conflicts can be handled.
 
+Note that this only works for conflict markers that "cleanly nest".  If
+there are any unmatched conflict markers, rerere will fail to handle
+the conflict and record a conflict resolution.
+
 The only difference is in how the conflict ID is calculated.  For the
 inner conflict, the conflict markers themselves are not stripped out
 before calculating the sha1.
-- 
2.18.0.1088.ge017bf2cd1

[PATCH 1/2] rerere: remove documentation for "nested conflicts"

2018-08-24 Thread Thomas Gummerer

4af32207bc ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.
However this is just a heuristic to punt on such conflicts better, and
the documentation might be misleading to users, in case we change the
heuristic in the future.

Remove this documentation to avoid being potentially misleading in the
documentation.

Suggested-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

The original series already made it into 'next', so these patches are
on top of that.  I also see it is marked as "will merge to master" in
the "What's cooking" email, so these two patches would be on top of
that.  If you are not planning to merge the series down to master
before 2.19, we could squash this into 10/11, otherwise I'm happy with
the patches on top.

 Documentation/technical/rerere.txt | 42 --
 1 file changed, 42 deletions(-)

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
index e65ba9b0c6..3d10dbfa67 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,45 +138,3 @@ SHA1('BC').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a  character.
-
-Nested conflicts
-
-
-Nested conflicts are handled very similarly to "simple" conflicts.
-Similar to simple conflicts, the conflict is first normalized by
-stripping the labels from conflict markers, stripping the common ancestor
-version, and the sorting the conflict hunks, both for the outer and the
-inner conflict.  This is done recursively, so any number of nested
-conflicts can be handled.
-
-The only difference is in how the conflict ID is calculated.  For the
-inner conflict, the conflict markers themselves are not stripped out
-before calculating the sha1.
-
-Say we have the following conflict for example:
-
-<<<<<<< HEAD
-1
-===
-<<<<<<< HEAD
-3
-===
-2
->>>>>>> branch-2
->>>>>>> branch-3~
-
-After stripping out the labels of the conflict markers, and sorting
-the hunks, the conflict would look as follows:
-
-<<<<<<<
-1
-===
-<<<<<<<
-2
-===
-3
->>>>>>>
->>>>>>>
-
-and finally the conflict ID would be calculated as:
-`sha1('1<<<<<<<\n3\n===\n2\n>>>>>>>')`
-- 
2.18.0.1088.ge017bf2cd1

[PATCH 2/2] rerere: add not about files with existing conflict markers

2018-08-24 Thread Thomas Gummerer

When a file contains lines that look like conflict markers, 'git
rerere' may fail not be able to record a conflict resolution.
Emphasize that in the man page.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

Not sure if there may be a better place in the man page for this, but
this is the best I could come up with.

 Documentation/git-rerere.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-rerere.txt b/Documentation/git-rerere.txt
index 031f31fa47..036ea11528 100644
--- a/Documentation/git-rerere.txt
+++ b/Documentation/git-rerere.txt
@@ -211,6 +211,12 @@ would conflict the same way as the test merge you resolved 
earlier.
 'git rerere' will be run by 'git rebase' to help you resolve this
 conflict.
 
+[NOTE]
+'git rerere' relies on the conflict markers in the file to detect the
+conflict.  If the file already contains lines that look the same as
+lines with conflict markers, 'git rerere' may fail to record a
+conflict resolution.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.18.0.1088.ge017bf2cd1

Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-24 Thread Thomas Gummerer

On 08/22, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Hmm, it does describe what happens in the code, which is what this
> > patch implements.  Maybe we should rephrase the title here?
> >
> > Or are you suggesting dropping this patch (and the next one)
> > completely, as we don't want to try and handle the case where this
> > kind of garbage is thrown at 'rerere'?
> 
> I consider these two patches as merely attempting to punt a bit
> better.  Once users start committing conflict-marker-looking lines
> in the contents, and getting them involved in actual conflicts, I do
> not think any approach (including what the original rerere uses
> before this patch) that assumes the markers will neatly form set of
> blocks of text enclosed in << == >> will reliably step around such
> broken contents.  E.g. it is entirely conceivable both branches have
> the <<< beginning of conflict marker plus contents from the HEAD
> before they recorded the marker that are identical, that diverge as
> you scan the text down and get closer to ===, something like:
> 
> side A  side B
> 
> 
> shared  shared
> <<<<<<< <<<<<<<
> version before  version before
> these guys merged   these guys merged
> their ancestor  their ancestor
> versionsversions.
> but somenow some
> lines are different lines are different
> === 
> and other   totally different
> contentscontents
> ... ...
> 
> And a merge of these may make <<< part shared (i.e. outside the
> conflicted region) while lines near and below  part of conflict,
> which would give us something like
> 
> merge of side A & B
> ---
> 
> shared  
> <<<<<<< (this is part of contents)
> version before  
> these guys merged   
> their ancestor  
> <<<<<<< HEAD(conflict marker)
> versions
> but some
> lines are different
> === (this is part of contents)
> and other
> contents
> ...
> === (conflict marker)
> versions.
> now some
> lines are different
> === (this is part of contents)
> totally different
> contents
> ...
> >>>>>>> theirs  (conflict marker)
> 
> Depending on the shape of the original conflict that was committed,
> we may have two versions of <<<, together with the real conflict
> marker, but shared closing >>> marker.  With contents like that,
> there is no way for us to split these lines into two groups at a
> line '=' (which one?) and swap to come up with the normalized
> shape.
> 
> The original rerere algorithm would punt when such an unmatched
> markers are found, and deals with "nested conflict" situation by
> avoiding to create such a thing altogether.  I am sure your two
> patches may make the code punt less, but I suspect that is not a
> foolproof "solution" but more of a workaround, as I do not think it
> is solvable, once you allow users to commit conflict-marker looking
> strings in contents.

Agreed.  I think it may be solvable if we'd actually get the
information about what belongs to which side from the merge algorithm
directly.  But that sounds way more involved than what I'm able to
commit to for something that I don't forsee running into myself :)

>   As the heuristics used in such a workaround
> are very likely to change, and something the end-users should not
> even rely on, I'd rather not document and promise the exact
> behaviour---perhaps we should stress "don't do that" even stronger
> instead.

Fair enough.  I thought of the technical documentation as something
that doesn't promise users anything, but rather describes how the
internals work right now, which is what this bit of documentation
attempted to write down.  But if we are worried about this giving end
users ideas then I definitely agree and we should get rid of this bit
of documentation.  I'll send a patch for that, and for adding a note
about "don't do that" in the man page.

Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-22 Thread Thomas Gummerer

On 08/22, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason  writes:
> 
> > But why not add this to the git-rerere manpage? These technical docs
> > get way less exposure, and in this case we're not describing some
> > interna implementation detail, which the technical docs are for, but
> > something that's user-visible, let's put that in  the user-visiblee
> > docs.
> 
> I actually consider that the documentation describes low-level
> internal implementation detail, which the end users do not care nor
> need to know in order to make use of "rerere".  How would it help
> the end-users to know that the common ancestor portion of diff3
> style conflict does not participate in conflict identification,
> sides of conflicts sometimes get swapped for easier indexing of
> conflicts, or conflict shapes are hashed via SHA-1 to determine
> which subdirectory of $GIT_DIR/rr-cache/ to use to store it, etc.?

Agreed, I don't think this would be very helpful for users.

> By the way, I just noticed that what the last section (i.e. nested
> conflicts) says is completely bogus.  Nested conflicts are handled
> by lengthening markers for conflict in inner-merge and paying
> attention only to the outermost merge.  The only case where the
> conflict markers can appear in the way depicted in the section is
> when the contents from branches being merged had these conflict
> marker looking strings from the beginning---that's "doctor it hurts
> when I do this---don't do it then" situation.  The section may
> describe correctly what the code happens to do when it gets thrown
> such a garbage at, but I do not think it is a useful piece of
> information about a designed behaviour.

Hmm, it does describe what happens in the code, which is what this
patch implements.  Maybe we should rephrase the title here?

Or are you suggesting dropping this patch (and the next one)
completely, as we don't want to try and handle the case where this
kind of garbage is thrown at 'rerere'?  I don't think it would make
sense to drop this documentation without dropping the patch itself, as
it does document how rerere handles this case.  Without this bit of
documentation (but with the code in this patch), the technical
'rerere' documentation feels incomplete to me.

Re: [GSoC][PATCH v7 26/26] stash: replace all "git apply" child processes with API calls

2018-08-19 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> `apply_all_patches()` does not provide a method to apply patches
> from strbuf. Because of this, this commit introduces a new
> function `apply_patch_from_buf()` which applies a patch from buf.
> It works by saving the strbuf as a file. This way we can call
> `apply_all_patches()`. Before returning, the created file is
> removed.

I'm not a fan of this approach.  We're going from doing the operation
in memory to using a temporary file that we write to disk and have to
re-read afterwards, which I suspect might be slower than using the
'run-command' API.

>From a quick look the 'apply_patch' function almost does what we want.
It reads the patch file, and then does everything else in memory.  It
seems to me that by factoring out reading the patch file from that
function, we should be able to use the rest to do the operation
in-memory here, which would be much nicer.

> ---
>  builtin/stash.c | 61 +++--
>  1 file changed, 34 insertions(+), 27 deletions(-)
> 
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 46e76a34e..74eda822c 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -13,6 +13,7 @@
>  #include "revision.h"
>  #include "log-tree.h"
>  #include "diffcore.h"
> +#include "apply.h"
>  
>  static const char * const git_stash_usage[] = {
>   N_("git stash list []"),
> @@ -277,10 +278,6 @@ static int diff_tree_binary(struct strbuf *out, struct 
> object_id *w_commit)
>   struct child_process cp = CHILD_PROCESS_INIT;
>   const char *w_commit_hex = oid_to_hex(w_commit);
>  
> - /*
> -  * Diff-tree would not be very hard to replace with a native function,
> -  * however it should be done together with apply_cached.
> -  */

Hmm there was probably a good reason why we wrote this comment in the
first place.  I can't recall what that reason was, but we should
probably explore that.  If there was no reason for it, then we should
remove the comment where it was added in the series (since this is all
new code the comment comes from somewhere else in this series).

>   cp.git_cmd = 1;
>   argv_array_pushl(, "diff-tree", "--binary", NULL);
>   argv_array_pushf(, "%s^2^..%s^2", w_commit_hex, w_commit_hex);
> @@ -288,18 +285,36 @@ static int diff_tree_binary(struct strbuf *out, struct 
> object_id *w_commit)
>   return pipe_command(, NULL, 0, out, 0, NULL, 0);
>  }
>  
> -static int apply_cached(struct strbuf *out)
> +static int apply_patch_from_buf(struct strbuf *patch, int cached, int 
> reverse,
> + int check_index)
>  {
> - struct child_process cp = CHILD_PROCESS_INIT;
> + int ret = 0;
> + struct apply_state state;
> + struct argv_array args = ARGV_ARRAY_INIT;
> + const char *patch_path = ".git/stash_patch.patch";

We should not rely on '.git/' here.  This will not work if 'GIT_DIR'
is set, or even in a worktree, where '.git' is just a file, not a
directory.

> + FILE *patch_file;
>  
> - /*
> -  * Apply currently only reads either from stdin or a file, thus
> -  * apply_all_patches would have to be updated to optionally take a
> -  * buffer.
> -  */

Ah and indeed the comment here is suggesting a very similar thing to
what I suggested above :)

> - cp.git_cmd = 1;
> - argv_array_pushl(, "apply", "--cached", NULL);
> - return pipe_command(, out->buf, out->len, NULL, 0, NULL, 0);
> + if (init_apply_state(, NULL))
> + return -1;
> +
> + state.cached = cached;
> + state.apply_in_reverse = reverse;
> + state.check_index = check_index;
> + if (state.cached)
> + state.check_index = 1;
> + if (state.check_index)
> + state.unsafe_paths = 0;
> +
> + patch_file = fopen(patch_path, "w");
> + strbuf_write(patch, patch_file);
> + fclose(patch_file);
> +
> + argv_array_push(, patch_path);
> + ret = apply_all_patches(, args.argc, args.argv, 0);
> +
> + remove_path(patch_path);
> + clear_apply_state();
> + return ret;
>  }
>  
>  static int reset_head(const char *prefix)
> @@ -418,7 +433,7 @@ static int do_apply_stash(const char *prefix, struct 
> stash_info *info,
>   return -1;
>   }
>  
> - ret = apply_cached();
> + ret = apply_patch_from_buf(, 1, 0, 0);
>   strbuf_release();
>   if (ret)
>   return -1;
> @@ -1341,7 +1356,6 @@ static int do_push_stash(int argc, const char **argv, 
> const char *prefix,
>   int i;
>   struct child_process cp1 = CHILD_PROCESS_INIT;
>   struct child_process cp2 = CHILD_PROCESS_INIT;
> - struct child_process cp3 = CHILD_PROCESS_INIT;
>   struct strbuf out = STRBUF_INIT;
>  
>   cp1.git_cmd = 1;
> @@ -1365,11

Re: [GSoC][PATCH v7 25/26] stash: replace all `write-tree` child processes with API calls

2018-08-19 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> This commit replaces spawning `git write-tree` with API calls.
> ---
>  builtin/stash.c | 40 
>  1 file changed, 12 insertions(+), 28 deletions(-)

Nice reduction in lines here!

> 
> diff --git a/builtin/stash.c b/builtin/stash.c
> index 4d5c0d16e..46e76a34e 100644
> --- a/builtin/stash.c
> +++ b/builtin/stash.c
> @@ -949,9 +949,8 @@ static int save_untracked_files(struct stash_info *info, 
> struct strbuf *msg)
>  {
>   int ret = 0;
>   struct strbuf untracked_msg = STRBUF_INIT;
> - struct strbuf out2 = STRBUF_INIT;
>   struct child_process cp = CHILD_PROCESS_INIT;
> - struct child_process cp2 = CHILD_PROCESS_INIT;
> + struct index_state state = { NULL };

We often call this 'istate' throughout the codebase.  Would be nice to
call it that here as well, to reduce the cognitive load for people
already familiar with the codebase.

>  
>   cp.git_cmd = 1;
>   argv_array_pushl(, "update-index", "--add",
> @@ -966,15 +965,11 @@ static int save_untracked_files(struct stash_info 
> *info, struct strbuf *msg)
>   goto done;
>   }
>  
> - cp2.git_cmd = 1;
> - argv_array_push(, "write-tree");
> - argv_array_pushf(_array, "GIT_INDEX_FILE=%s",
> -  stash_index_path.buf);
> - if (pipe_command(, NULL, 0, , 0,NULL, 0)) {
> + if (write_index_as_tree(>u_tree, , stash_index_path.buf, 0,
> + NULL)) {
>   ret = -1;
>   goto done;
>   }
> - get_oid_hex(out2.buf, >u_tree);
>  
>   if (commit_tree(untracked_msg.buf, untracked_msg.len,
>   >u_tree, NULL, >u_commit, NULL, NULL)) {
> @@ -984,7 +979,6 @@ static int save_untracked_files(struct stash_info *info, 
> struct strbuf *msg)
>  
>  done:
>   strbuf_release(_msg);
> - strbuf_release();
>   remove_path(stash_index_path.buf);
>   return ret;
>  }
> @@ -994,11 +988,10 @@ static struct strbuf patch = STRBUF_INIT;
>  static int stash_patch(struct stash_info *info, const char **argv)
>  {
>   int ret = 0;
> - struct strbuf out2 = STRBUF_INIT;
>   struct child_process cp0 = CHILD_PROCESS_INIT;
>   struct child_process cp1 = CHILD_PROCESS_INIT;
> - struct child_process cp2 = CHILD_PROCESS_INIT;
>   struct child_process cp3 = CHILD_PROCESS_INIT;
> + struct index_state state = { NULL };
>  
>   remove_path(stash_index_path.buf);
>  
> @@ -1023,17 +1016,12 @@ static int stash_patch(struct stash_info *info, const 
> char **argv)
>   goto done;
>   }
>  
> - cp2.git_cmd = 1;
> - argv_array_push(, "write-tree");
> - argv_array_pushf(_array, "GIT_INDEX_FILE=%s",
> -  stash_index_path.buf);
> - if (pipe_command(, NULL, 0, , 0,NULL, 0)) {
> + if (write_index_as_tree(>w_tree, , stash_index_path.buf, 0,
> + NULL)) {
>   ret = -1;
>   goto done;
>   }
>  
> - get_oid_hex(out2.buf, >w_tree);
> -
>   cp3.git_cmd = 1;
>   argv_array_pushl(, "diff-tree", "-p", "HEAD",
>oid_to_hex(>w_tree), "--", NULL);
> @@ -1046,7 +1034,6 @@ static int stash_patch(struct stash_info *info, const 
> char **argv)
>   }
>  
>  done:
> - strbuf_release();
>   remove_path(stash_index_path.buf);
>   return ret;
>  }
> @@ -1056,11 +1043,10 @@ static int stash_working_tree(struct stash_info *info,
>  {
>   int ret = 0;
>   struct child_process cp2 = CHILD_PROCESS_INIT;
> - struct child_process cp3 = CHILD_PROCESS_INIT;
> - struct strbuf out3 = STRBUF_INIT;
>   struct argv_array args = ARGV_ARRAY_INIT;
>   struct strbuf diff_output = STRBUF_INIT;
>   struct rev_info rev;
> + struct index_state state = { NULL };
>  
>   set_alternate_index_output(stash_index_path.buf);
>   if (reset_tree(>i_tree, 0, 0)) {
> @@ -1103,20 +1089,18 @@ static int stash_working_tree(struct stash_info *info,
>   goto done;
>   }
>  
> - cp3.git_cmd = 1;
> - argv_array_push(, "write-tree");
> - argv_array_pushf(_array, "GIT_INDEX_FILE=%s",
> -  stash_index_path.buf);
> - if (pipe_command(, NULL, 0, , 0,NULL, 0)) {
> + if (write_index_as_tree(>w_tree, , stash_index_path.buf, 0,
> + NULL)) {
> +
>   ret = -1;
>   goto done;
>   }
>  
> - get_oid_hex(out3.buf, >w_tree);
> + discard_cache();
> + read_cache();

This 'discard_cache()'/'read_cache()' pair surprises me a bit, and I
can't figure out why it's necessary now. 'write_index_as_tree()' reads
and writes from the index file at 'stash_index_path', while
'{discard,read}_cache()' operate on 'the_index', which should always be
distinct from the temporary index we are using here.  So this
shouldn't be needed, at least not because of the changes we are making
in this patch.

>  
>

Re: [GSoC][PATCH v7 24/26] stash: optimize `get_untracked_files()` and `check_changes()`

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> This commits introduces a optimization by avoiding calling the
> same functions again. For example, `git stash push -u`
> would call at some points the following functions:
> 
>  * `check_changes()`
>  * `do_create_stash()`, which calls: `check_changes()` and
> `get_untracked_files()`
> 
> Note that `check_changes()` also calls `get_untracked_files()`.
> So, `check_changes()` is called 2 times and `get_untracked_files()`
> 3 times. By checking at the beginning of the function if we already
> performed a check, we can avoid making useless calls.

While I can see that this may give us some performance gains, what's
being described above sounds like we should look into why we are
making these duplicate calls in the first place, rather than trying to
return early from them.  I feel like the duplicate calls are mostly a
remnant from the way the shell script was written, but not inherent to
the design of 'git stash'. 

For example 'check_changes' could be called from 'create_stash'
directly, so we don't have to call it in 'do_create_stash', in the
process removing the duplicate call from the 'git stash push'
codepath.  That would provide the same improvements, and keep the code
cleaner, rather than introducing more special cases for these
functions.

I haven't looked into the 'get_untracked_files()' call chain yet, but
I imagine we can do something similar for that.

Re: [GSoC][PATCH v7 21/26] stash: replace spawning `git ls-files` child process

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> This commit replaces spawning `git ls-files` child process with
> API calls to get the untracked files.
> ---
>  builtin/stash--helper.c | 49 +++--
>  1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 4fd79532c..5c27f5dcf 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -813,27 +813,42 @@ static int store_stash(int argc, const char **argv, 
> const char *prefix)
>  /*
>   * `out` will be filled with the names of untracked files. The return value 
> is:
>   *
> - * < 0 if there was a bug (any arg given outside the repo will be detected
> - * by `setup_revision()`)
>   * = 0 if there are not any untracked files
>   * > 0 if there are untracked files
>   */
> -static int get_untracked_files(const char **argv, int line_term,
> +static int get_untracked_files(const char **argv, const char *prefix,
>  int include_untracked, struct strbuf *out)
>  {
> - struct child_process cp = CHILD_PROCESS_INIT;
> - cp.git_cmd = 1;
> - argv_array_pushl(, "ls-files", "-o", NULL);
> - if (line_term)
> - argv_array_push(, "-z");
> + int max_len;
> + int i;
> + char *seen;
> + struct dir_struct dir;
> + struct pathspec pathspec;
> +
> + memset(, 0, sizeof(dir));
>   if (include_untracked != 2)
> - argv_array_push(, "--exclude-standard");
> - argv_array_push(, "--");
> - if (argv)
> - argv_array_pushv(, argv);
> + setup_standard_excludes();
>  
> - if (pipe_command(, NULL, 0, out, 0, NULL, 0))
> - return -1;
> + parse_pathspec(, 0,
> +PATHSPEC_PREFER_FULL,
> +prefix, argv);
> + seen = xcalloc(pathspec.nr, 1);
> +
> + max_len = fill_directory(, the_repository->index, );
> + for (i = 0; i < dir.nr; i++) {
> + struct dir_entry *ent = dir.entries[i];
> + if (!dir_path_match(ent, , max_len, seen)) {
> + free(ent);
> + continue;
> + }
> + strbuf_addf(out, "%s\n", ent->name);

As mentioned in my comments about the 'diff-index' replacement, this
'\n' should probably be '\0', and we should keep the '-z' flag in 'git
update-index', in case somebody has a '\n' in their filenames.

While creating such a file is probably not a good idea anyway, we
should still be able to handle it (and have been before this, so we
shouldn't break it).

> + free(ent);
> + }
> +
> + free(dir.entries);
> + free(dir.ignored);
> + clear_directory();
> + free(seen);
>   return out->len;
>  }
>  
> @@ -888,7 +903,7 @@ static int check_changes(const char **argv, int 
> include_untracked,
>   goto done;
>   }
>  
> - if (include_untracked && get_untracked_files(argv, 0,
> + if (include_untracked && get_untracked_files(argv, prefix,
>include_untracked, ))
>   ret = 1;
>  
> @@ -908,7 +923,7 @@ static int save_untracked_files(struct stash_info *info, 
> struct strbuf *msg,
>   struct child_process cp2 = CHILD_PROCESS_INIT;
>  
>   cp.git_cmd = 1;
> - argv_array_pushl(, "update-index", "-z", "--add",
> + argv_array_pushl(, "update-index", "--add",
>"--remove", "--stdin", NULL);
>   argv_array_pushf(_array, "GIT_INDEX_FILE=%s",
>stash_index_path.buf);
> @@ -1134,7 +1149,7 @@ static int do_create_stash(int argc, const char **argv, 
> const char *prefix,
>   goto done;
>   }
>  
> - if (include_untracked && get_untracked_files(argv, 1,
> + if (include_untracked && get_untracked_files(argv, prefix,
>include_untracked, )) {
>   if (save_untracked_files(info, , )) {
>   if (!quiet)
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 17/26] stash: avoid spawning a "diff-index" process

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> This commits replaces spawning `diff-index` child process by using
> the already existing `diff` API

I think this should be squashed into the previous commit.  It's easier
to review a commit that replaces all the 'run_command'/'pipe_command'
calls in one function, rather than doing it call by call, especially
if they interact with eachother.

E.g. I was going to suggest replacing the 'write_tree' call as well,
but reading ahead in the series I see that that's already being done :)  

While replacing all the calls of one type with the internal API call
is probably easiest for writing the patches, at least I would find it
easier to review replacing the run-command API calls in one codepath
at a time.

> ---
>  builtin/stash--helper.c | 56 ++---
>  1 file changed, 42 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 887b78d05..f905d3908 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -12,6 +12,7 @@
>  #include "rerere.h"
>  #include "revision.h"
>  #include "log-tree.h"
> +#include "diffcore.h"
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> @@ -297,6 +298,18 @@ static int reset_head(const char *prefix)
>   return run_command();
>  }
>  
> +static void add_diff_to_buf(struct diff_queue_struct *q,
> + struct diff_options *options,
> + void *data)
> +{
> + int i;
> + for (i = 0; i < q->nr; i++) {
> + struct diff_filepair *p = q->queue[i];
> + strbuf_addstr(data, p->one->path);
> + strbuf_addch(data, '\n');

What about filenames that include a '\n'?  I think this in combination
with removing the '-z' flag from the 'update-index' call will break
with filenames that have a LF in them.  This should be a '\0' instead
of a '\n', and we should still be using the '-z' flag in
'update-index'.

> + }
> +}
> +
>  static int get_newly_staged(struct strbuf *out, struct object_id *c_tree)
>  {
>   struct child_process cp = CHILD_PROCESS_INIT;
> @@ -981,14 +994,16 @@ static int stash_patch(struct stash_info *info, const 
> char **argv)
>   return ret;
>  }
>  
> -static int stash_working_tree(struct stash_info *info, const char **argv)
> +static int stash_working_tree(struct stash_info *info,
> +   const char **argv, const char *prefix)
>  {
>   int ret = 0;
> - struct child_process cp1 = CHILD_PROCESS_INIT;
>   struct child_process cp2 = CHILD_PROCESS_INIT;
>   struct child_process cp3 = CHILD_PROCESS_INIT;
> - struct strbuf out1 = STRBUF_INIT;
>   struct strbuf out3 = STRBUF_INIT;

We're left with cp{2,3} and out3 here, which is a bit weird.  Then
again renaming them in this patch adds more churn, making it harder to
review.  Maybe instead of numbering them it would be better to name
them after the child process they are calling?  e.g. 'cp1' could
become 'cp_di', and so on?

> + struct argv_array args = ARGV_ARRAY_INIT;
> + struct strbuf diff_output = STRBUF_INIT;
> + struct rev_info rev;
>  
>   set_alternate_index_output(stash_index_path.buf);
>   if (reset_tree(>i_tree, 0, 0)) {
> @@ -997,26 +1012,36 @@ static int stash_working_tree(struct stash_info *info, 
> const char **argv)
>   }
>   set_alternate_index_output(".git/index");
>  
> - cp1.git_cmd = 1;
> - argv_array_pushl(, "diff-index", "--name-only", "-z",
> - "HEAD", "--", NULL);
> + argv_array_push(, "dummy");

Not being familiar with the setup_revisions code, I had to dig a bit
to figure out why this makes sense.  This is a dummy replacement for
argv[0] in normal operation.  In retrospect it's kind of obvious, but
maybe call "dummy" "fake_argv0" instead, to help nudge future readers
in the right direction?

>   if (argv)
> - argv_array_pushv(, argv);
> - argv_array_pushf(_array, "GIT_INDEX_FILE=%s",
> -  stash_index_path.buf);
> + argv_array_pushv(, argv);
> + git_config(git_diff_basic_config, NULL);
> + init_revisions(, prefix);
> + args.argc = setup_revisions(args.argc, args.argv, , NULL);
> +
> + rev.diffopt.output_format |= DIFF_FORMAT_CALLBACK;
> + rev.diffopt.format_callback = add_diff_to_buf;
> + rev.diffopt.format_callback_data = _output;
> +
> + if (read_cache_preload() < 0) {
> + ret = -1;
> + goto done;
> + }
>  
> - if (pipe_command(, NULL, 0, , 0, NULL, 0)) {
> + add_pending_object(, parse_object(the_repository, >b_commit), 
> "");
> + if (run_diff_index(, 0)) {
>   ret = -1;
>   goto done;
>   }
>  
>   cp2.git_cmd = 1;
> - argv_array_pushl(, "update-index", "-z", "--add",
> + argv_array_pushl(, "update-index", "--add",
>"--remove",

Re: [GSoC][PATCH v7 16/26] stash: replace spawning a "read-tree" process

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Instead of spawning a child process, make use of `reset_tree()`
> function already implemented in `stash-helper.c`.
> ---
>  builtin/stash--helper.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index a4e57899b..887b78d05 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -984,21 +984,18 @@ static int stash_patch(struct stash_info *info, const 
> char **argv)
>  static int stash_working_tree(struct stash_info *info, const char **argv)
>  {
>   int ret = 0;
> - struct child_process cp0 = CHILD_PROCESS_INIT;
>   struct child_process cp1 = CHILD_PROCESS_INIT;
>   struct child_process cp2 = CHILD_PROCESS_INIT;
>   struct child_process cp3 = CHILD_PROCESS_INIT;
>   struct strbuf out1 = STRBUF_INIT;
>   struct strbuf out3 = STRBUF_INIT;
>  
> - cp0.git_cmd = 1;
> - argv_array_push(, "read-tree");
> - argv_array_pushf(, "--index-output=%s", stash_index_path.buf);
> - argv_array_pushl(, "-m", oid_to_hex(>i_tree), NULL);
> - if (run_command()) {
> + set_alternate_index_output(stash_index_path.buf);
> + if (reset_tree(>i_tree, 0, 0)) {
>   ret = -1;
>   goto done;
>   }
> + set_alternate_index_output(".git/index");

I think this second 'set_alternate_index_output()' should be
'set_alternate_index_output(NULL)', which has slightly different
semantics than setting it to '.git/index'.  Having it set means that
the index is written unconditionally even if it is not set.

Also the index file could be something other than ".git/index", if
the GIT_INDEX_FILE environment variable is set, so it should be
replaced with 'get_index_file()' if we were to keep this.

I was also wondering if we could avoid writing a temporary index to
disk, in 'stash_working_tree', but I don't see an easy way for doing
that.

>  
>   cp1.git_cmd = 1;
>   argv_array_pushl(, "diff-index", "--name-only", "-z",
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 15/26] stash: convert create to builtin

2018-08-18 Thread Thomas Gummerer

On 08/18, Paul Sebastian Ungureanu wrote:
> On Thu, Aug 16, 2018 at 1:13 AM, Thomas Gummerer  wrote:
> > On 08/08, Paul-Sebastian Ungureanu wrote:
> >>
> >>  [...]
> >>
> >> + ret = -1;
> >> + goto done;
> >> + }
> >> + untracked_commit_option = 1;
> >> + }
> >> + if (patch_mode) {
> >> + ret = stash_patch(info, argv);
> >> + if (ret < 0) {
> >> + printf_ln("Cannot save the current worktree state");
> >> + goto done;
> >> + } else if (ret > 0) {
> >> + goto done;
> >> + }
> >> + } else {
> >> + if (stash_working_tree(info, argv)) {
> >> + printf_ln("Cannot save the current worktree state");
> >> + ret = -1;
> >> + goto done;
> >> + }
> >> + }
> >> +
> >> + if (!*stash_msg || !strlen(*stash_msg))
> >> + strbuf_addf(_stash_msg, "WIP on %s", msg.buf);
> >> + else
> >> + strbuf_addf(_stash_msg, "On %s: %s\n", branch_name,
> >> + *stash_msg);
> >> + *stash_msg = strbuf_detach(_stash_msg, NULL);
> >
> > strbuf_detach means we're taking ownership of the memory, so we'll
> > have to free it afterwards. Looking at this we may not even want to
> > re-use the 'stash_msg' variable here, but instead introduce another
> > variable for it, so it doesn't have a dual meaning in this function.
> >
> >> +
> >> + /*
> >> +  * `parents` will be empty after calling `commit_tree()`, so there is
> >> +  * no need to call `free_commit_list()`
> >> +  */
> >> + parents = NULL;
> >> + if (untracked_commit_option)
> >> + commit_list_insert(lookup_commit(the_repository, 
> >> >u_commit), );
> >> + commit_list_insert(lookup_commit(the_repository, >i_commit), 
> >> );
> >> + commit_list_insert(head_commit, );
> >> +
> >> + if (commit_tree(*stash_msg, strlen(*stash_msg), >w_tree,
> >> + parents, >w_commit, NULL, NULL)) {
> >> + printf_ln("Cannot record working tree state");
> >> + ret = -1;
> >> + goto done;
> >> + }
> >> +
> >> +done:
> >> + strbuf_release(_tree_label);
> >> + strbuf_release();
> >> + strbuf_release();
> >> + strbuf_release(_stash_msg);
> >> + return ret;
> >> +}
> >> +
> >> +static int create_stash(int argc, const char **argv, const char *prefix)
> >> +{
> >> + int include_untracked = 0;
> >> + int ret = 0;
> >> + const char *stash_msg = NULL;
> >> + struct stash_info info;
> >> + struct option options[] = {
> >> + OPT_BOOL('u', "include-untracked", _untracked,
> >> +  N_("include untracked files in stash")),
> >> + OPT_STRING('m', "message", _msg, N_("message"),
> >> +  N_("stash message")),
> >> + OPT_END()
> >> + };
> >> +
> >> + argc = parse_options(argc, argv, prefix, options,
> >> +  git_stash_helper_create_usage,
> >> +  0);
> >> +
> >> + ret = do_create_stash(argc, argv, prefix, _msg,
> >> +   include_untracked, 0, );
> >
> > stash_msg doesn't have to be passed as a pointer to a pointer here, as
> > we never need the modified value after this function returns.  I think
> > just passing 'stash_msg' here instead of '_msg' will make
> > 'do_create_stash' slightly easier to read.
> 
> That's right, but `do_create_stash()` is also called by
> `do_push_stash()`, which will need the modified value.

Ah right, I didn't read that far yet when leaving this comment :)

Reading the original push_stash again though, do we actually need the
modified value in 'do_push_stash()'?  The original lines are:

create_stash -m "$stash_msg" -u "$untracked" -- "$@"
store_stash -m "$stash_msg" -q $w_commit ||
die "$(gettext "Cannot save the current status")"

'$stash_msg' gets passed in to 'create_stash()', but is the
'stash_msg' variable inside of 'create_stash()' is local and only the
local copy is modified, so 'store_stash()' would still get the
original.  Or am I missing something?

Re: [GSoC][PATCH v7 23/26] stash: convert `stash--helper.c` into `stash.c`

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> The old shell script `git-stash.sh`  was removed and replaced
> entirely by `builtin/stash.c`. In order to do that, `create` and
> `push` were adapted to work without `stash.sh`. For example, before
> this commit, `git stash create` called `git stash--helper create
> --message "$*"`. If it called `git stash--helper create "$@"`, then
> some of these changes wouldn't have been necessary.
> 
> This commit also removes the word `helper` since now stash is
> called directly and not by a shell script.
> ---
>  .gitignore   |   1 -
>  Makefile |   3 +-
>  builtin.h|   2 +-
>  builtin/{stash--helper.c => stash.c} | 132 ---
>  git-stash.sh | 153 ---
>  git.c|   2 +-
>  6 files changed, 74 insertions(+), 219 deletions(-)
>  rename builtin/{stash--helper.c => stash.c} (91%)
>  delete mode 100755 git-stash.sh
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash.c
> similarity index 91%
> rename from builtin/stash--helper.c
> rename to builtin/stash.c
> index f54a476e3..0ef88408a 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash.c
>
> [...]
>
> @@ -1445,9 +1448,10 @@ static int push_stash(int argc, const char **argv, 
> const char *prefix)
>   OPT_END()
>   };
>  
> - argc = parse_options(argc, argv, prefix, options,
> -  git_stash_helper_push_usage,
> -  0);
> + if (argc)
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_push_usage,
> +  0);

This change is a bit surprising here.  Why is this necessary?  I
thought parse_options would handle no arguments just fine?

>   return do_push_stash(argc, argv, prefix, keep_index, patch_mode,
>include_untracked, quiet, stash_msg);
> @@ -1479,7 +1483,7 @@ static int save_stash(int argc, const char **argv, 
> const char *prefix)
>   };
>  
>   argc = parse_options(argc, argv, prefix, options,
> -  git_stash_helper_save_usage,
> +  git_stash_save_usage,
>0);
>  
>   for (i = 0; i < argc; ++i)
> @@ -1491,7 +1495,7 @@ static int save_stash(int argc, const char **argv, 
> const char *prefix)
>include_untracked, quiet, stash_msg);
>  }
>  
> -int cmd_stash__helper(int argc, const char **argv, const char *prefix)
> +int cmd_stash(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
>   const char *index_file;
> @@ -1502,16 +1506,16 @@ int cmd_stash__helper(int argc, const char **argv, 
> const char *prefix)
>  
>   git_config(git_default_config, NULL);
>  
> - argc = parse_options(argc, argv, prefix, options, 
> git_stash_helper_usage,
> + argc = parse_options(argc, argv, prefix, options, git_stash_usage,
>PARSE_OPT_KEEP_UNKNOWN | PARSE_OPT_KEEP_DASHDASH);
>  
>   index_file = get_index_file();
>   strbuf_addf(_index_path, "%s.stash.%" PRIuMAX, index_file,
>   (uintmax_t)pid);
>  
> - if (argc < 1)
> - usage_with_options(git_stash_helper_usage, options);
> - if (!strcmp(argv[0], "apply"))
> + if (argc == 0)
> + return !!push_stash(0, NULL, prefix);
> + else if (!strcmp(argv[0], "apply"))
>   return !!apply_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "clear"))
>   return !!clear_stash(argc, argv, prefix);
> @@ -1533,7 +1537,13 @@ int cmd_stash__helper(int argc, const char **argv, 
> const char *prefix)
>   return !!push_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "save"))
>   return !!save_stash(argc, argv, prefix);
> + if (*argv[0] == '-') {
> + struct argv_array args = ARGV_ARRAY_INIT;
> + argv_array_push(, "push");
> + argv_array_pushv(, argv);
> + return !!push_stash(args.argc, args.argv, prefix);
> + }

This is a bit different than what the current code does.  Currently
the rules for when a plain 'git stash' becomes 'git stash push' are
the following:

- If there are no arguments.
- If all arguments are option arguments.
- If the first argument of 'git stash' is '-p'.
- If the first argument of 'git stash' is '--'.

This is to avoid someone typing 'git stash -q drop' for example, and
then being surprised that a new stash was created instead of an old
one being dropped, which what we have above would do.

For more reasoning about these aliasing rules see also the thread at [1].

[1]: 
https://public-inbox.org/git/20170213200950.m3bcyp52wd25p...@sigill.intra.peff.net/

>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> -

Re: [GSoC][PATCH v7 22/26] stash: convert save to builtin

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash save to the helper and delete functions which are no
> longer needed (`show_help()`, `save_stash()`, `push_stash()`,
> `create_stash()`, `clear_stash()`, `untracked_files()` and
> `no_changes()`).
> ---
>  builtin/stash--helper.c |  48 +++
>  git-stash.sh| 311 +---
>  2 files changed, 50 insertions(+), 309 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 5c27f5dcf..f54a476e3 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -26,6 +26,8 @@ static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
>  "  [-u|--include-untracked] [-a|--all] [-m|--message 
> ]\n"
>  "  [--] [...]]"),
> + N_("git stash--helper save [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] []"),
>   NULL
>  };
>  
> @@ -81,6 +83,12 @@ static const char * const git_stash_helper_push_usage[] = {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_save_usage[] = {
> + N_("git stash--helper save [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] []"),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static int quiet;
>  static struct strbuf stash_index_path = STRBUF_INIT;
> @@ -1445,6 +1453,44 @@ static int push_stash(int argc, const char **argv, 
> const char *prefix)
>include_untracked, quiet, stash_msg);
>  }
>  
> +static int save_stash(int argc, const char **argv, const char *prefix)
> +{
> + int i;
> + int keep_index = -1;
> + int patch_mode = 0;
> + int include_untracked = 0;
> + int quiet = 0;
> + char *stash_msg = NULL;
> + struct strbuf alt_stash_msg = STRBUF_INIT;
> + struct option options[] = {
> + OPT_SET_INT('k', "keep-index", _index,
> + N_("keep index"), 1),
> + OPT_BOOL('p', "patch", _mode,
> + N_("stash in patch mode")),
> + OPT_BOOL('q', "quiet", ,
> + N_("quiet mode")),

This could be OPT__QUIET again as mentioned in the previous patch as
well.

> + OPT_BOOL('u', "include-untracked", _untracked,
> +  N_("include untracked files in stash")),
> + OPT_SET_INT('a', "all", _untracked,
> + N_("include ignore files"), 2),
> + OPT_STRING('m', "message", _msg, N_("message"),
> +  N_("stash message")),
> + OPT_END()
> + };
> +
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_helper_save_usage,
> +  0);
> +
> + for (i = 0; i < argc; ++i)
> + strbuf_addf(_stash_msg, "%s ", argv[i]);
> +
> + stash_msg = strbuf_detach(_stash_msg, NULL);

We unconditionally overwrite 'stash_msg' here, even if a '-m'
parameter was given earlier, which I don't think is what we intended
here.  I think we started "supporting" (not erroring out on rather)
the '-m' flag accidentally (my bad, sorry) in 'git stash save' rather
than intentionally, and indeed it has the same behaviour as the code
above.

However I think we should just not add support the '-m' flag here, as
it doesn't make a lot of sense to have two ways of passing a message.

We also never free the memory we get back here from 'strbuf_detach'.
As this is not code in 'libgit.a' that's probably fine, and we can
just add an 'UNLEAK(stash_msg)' here I think.

It may generally be interesting to consider using a leak checker, and
see how far we can get in making this leak free.  It may not be
possible to make 'git stash' completely leak free, as the underlying
APIs may not be, but it may be interesting to see how far we can get
for the new code only.

> +
> + return do_push_stash(0, NULL, prefix, keep_index, patch_mode,
> +  include_untracked, quiet, stash_msg);
> +}
> +
>  int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
> @@ -1485,6 +1531,8 @@ int cmd_stash__helper(int argc, const char **argv, 
> const char *prefix)
>   return !!create_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "push"))
>   return !!push_stash(argc, argv, prefix);
> + else if (!strcmp(argv[0], "save"))
> + return !!save_stash(argc, argv, prefix);
>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> git_stash_helper_usage, options);
>
> [...]

Re: [GSoC][PATCH v7 20/26] stash: add tests for `git stash push -q`

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> This commit introduces more tests for the quiet option of
> `git stash push`.

I think this commit should be squashed into the previous one, so we
have implementation and tests in one commit.  That way it's easier to
see during review that there are tests for the change.  For more
discussion on that also see [1].

[1]: 
https://public-inbox.org/git/20180806144726.gb97...@aiede.svl.corp.google.com/

> ---
>  t/t3903-stash.sh | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index 8d002a7f2..b78db74ae 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -1064,6 +1064,27 @@ test_expect_success 'push:  not in the 
> repository errors out' '
>   test_path_is_file untracked
>  '
>  
> +test_expect_success 'push: -q is quiet with changes' '
> + >foo &&
> + git stash push -q >output 2>&1 &&

We create an untracked file here and then call 'git stash push', which
will not create a new stash, as we don't use the --include-untracked
option.  In fact, right now this test is doing the same thing as the
test below.  There should be a 'git add foo' above the 'git stash
push' call to test what we're claiming to test here.

> + test_must_be_empty output
> +'
> +
> +test_expect_success 'push: -q is quiet with no changes' '
> + git stash push -q >output 2>&1 &&
> + test_must_be_empty output
> +'
> +
> +test_expect_success 'push: -q is quiet even if there is no initial commit' '
> + git init foo_dir &&
> + cd foo_dir &&
> + touch bar &&

The typical style in the test suite for creating a new file is to use
'>bar', unless you care about the 'mtime' the file has.  We don't seem
to care about that in this test, so avoiding 'touch' would be better.

> + test_must_fail git stash push -q >output 2>&1 &&
> + test_must_be_empty output &&
> + cd .. &&

The above should be in a subshell, i.e.

(
cd foo_dir &&
touch bar &&
test_must_fail git stash push -q >output 2>&1 &&
test_must_be_empty output &&
)

then you don't have to do the 'cd ..' in the end.  With the 'cd ..' in
the end, if one of the commands between the 'cd foo_dir' and 'cd ..'
fails, all subsequent tests will be run inside of 'foo_dir', which
puts them in a different environment than they expect.  That can cause
all kinds of weirdness.

If inside a subshell, the current working directory of the parent
shell is unaffected, so we don't have to worry about cd'ing back, and
subsequent tests will get the correct cwd even if things go wrong in
this test.

> + rm -rf foo_dir

We'll want to run this cleanup to run even if the test fails.  To do
so, the 'test_when_finished' helper can be used.  Using that, this
would go at the top of the test, as 'test_when_finished rm -rf
foo_dir'.  Otherwise if any of the commands above fail, 'foo_dir' will
not be removed, and may interfere with subsequent tests.

> +'
> +
>  test_expect_success 'untracked files are left in place when -u is not given' 
> '
>   >file &&
>   git add file &&
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 19/26] stash: make push to be quiet

2018-08-18 Thread Thomas Gummerer

> Subject: stash: make push to be quiet

Nit: maybe "stash: make push -q quiet"?  I think the subject should at
least mention the -q option.

On 08/08, Paul-Sebastian Ungureanu wrote:
> There is a change in behaviour with this commit. When there was
> no initial commit, the shell version of stash would still display
> a message. This commit makes `push` to not display any message if
> `--quiet` or `-q` is specified.

Yeah, not being quiet here cna be considered a bug, so this change in
behaviour makes sense.

Should the "No changes selected" message in 'stash_patch' also be made
quiet?

> ---
>  builtin/stash--helper.c | 41 +++--
>  1 file changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index c26cad3d5..4fd79532c 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -1079,7 +1079,7 @@ static int stash_working_tree(struct stash_info *info,
>  
>  static int do_create_stash(int argc, const char **argv, const char *prefix,
>  const char **stash_msg, int include_untracked,
> -int patch_mode, struct stash_info *info)
> +int patch_mode, struct stash_info *info, int quiet)
>  {
>   int untracked_commit_option = 0;
>   int ret = 0;
> @@ -1105,7 +1105,8 @@ static int do_create_stash(int argc, const char **argv, 
> const char *prefix,
>   }
>  
>   if (get_oid("HEAD", >b_commit)) {
> - fprintf_ln(stderr, "You do not have the initial commit yet");
> + if (!quiet)
> + fprintf_ln(stderr, "You do not have the initial commit 
> yet");
>   ret = -1;
>   goto done;
>   } else {
> @@ -1127,7 +1128,8 @@ static int do_create_stash(int argc, const char **argv, 
> const char *prefix,
>   if (write_cache_as_tree(>i_tree, 0, NULL) ||
>   commit_tree(commit_tree_label.buf, commit_tree_label.len,
>   >i_tree, parents, >i_commit, NULL, NULL)) {
> - fprintf_ln(stderr, "Cannot save the current index state");
> + if (!quiet)
> + fprintf_ln(stderr, "Cannot save the current index 
> state");
>   ret = -1;
>   goto done;
>   }
> @@ -1135,7 +1137,8 @@ static int do_create_stash(int argc, const char **argv, 
> const char *prefix,
>   if (include_untracked && get_untracked_files(argv, 1,
>include_untracked, )) {
>   if (save_untracked_files(info, , )) {
> - printf_ln("Cannot save the untracked files");
> + if (!quiet)
> + printf_ln("Cannot save the untracked files");
>   ret = -1;
>   goto done;
>   }
> @@ -1144,14 +1147,16 @@ static int do_create_stash(int argc, const char 
> **argv, const char *prefix,
>   if (patch_mode) {
>   ret = stash_patch(info, argv);
>   if (ret < 0) {
> - printf_ln("Cannot save the current worktree state");
> + if (!quiet)
> + printf_ln("Cannot save the current worktree 
> state");
>   goto done;
>   } else if (ret > 0) {
>   goto done;
>   }
>   } else {
>   if (stash_working_tree(info, argv, prefix)) {
> - printf_ln("Cannot save the current worktree state");
> + if (!quiet)
> + printf_ln("Cannot save the current worktree 
> state");
>   ret = -1;
>   goto done;
>   }
> @@ -1176,7 +1181,8 @@ static int do_create_stash(int argc, const char **argv, 
> const char *prefix,
>  
>   if (commit_tree(*stash_msg, strlen(*stash_msg), >w_tree,
>   parents, >w_commit, NULL, NULL)) {
> - printf_ln("Cannot record working tree state");
> + if (!quiet)
> + printf_ln("Cannot record working tree state");
>   ret = -1;
>   goto done;
>   }
> @@ -1208,7 +1214,7 @@ static int create_stash(int argc, const char **argv, 
> const char *prefix)
>0);
>  
>   ret = do_create_stash(argc, argv, prefix, _msg,
> -   include_untracked, 0, );
> +   include_untracked, 0, , 0);
>  
>   if (!ret)
>   printf_ln("%s", oid_to_hex(_commit));
> @@ -1261,25 +1267,31 @@ static int do_push_stash(int argc, const char **argv, 
> const char *prefix,
>   return -1;
>  
>   if (!check_changes(argv, include_untracked, prefix)) {
> - fprintf_ln(stdout, "No local changes to save");
> + if (!quiet)
> + fprintf_ln(stdout, "No local changes to save");
>

Re: [GSoC][PATCH v7 18/26] stash: convert push to builtin

2018-08-18 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash push to the helper.
> ---

This (and the previous two and I think most subsequent patches) are
missing your sign-off.

>  builtin/stash--helper.c | 209 
>  git-stash.sh|   6 +-
>  2 files changed, 213 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index f905d3908..c26cad3d5 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -23,6 +23,9 @@ static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper clear"),
>   N_("git stash--helper store [-m|--message ] [-q|--quiet] 
> "),
>   N_("git stash--helper create []"),
> + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] [-m|--message 
> ]\n"
> +"  [--] [...]]"),
>   NULL
>  };
>  
> @@ -71,6 +74,13 @@ static const char * const git_stash_helper_create_usage[] 
> = {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_push_usage[] = {
> + N_("git stash--helper [push [-p|--patch] [-k|--[no-]keep-index] 
> [-q|--quiet]\n"
> +"  [-u|--include-untracked] [-a|--all] [-m|--message 
> ]\n"
> +"  [--] [...]]"),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static int quiet;
>  static struct strbuf stash_index_path = STRBUF_INIT;
> @@ -1210,6 +1220,203 @@ static int create_stash(int argc, const char **argv, 
> const char *prefix)
>   return ret < 0;
>  }
>  
> +static int do_push_stash(int argc, const char **argv, const char *prefix,
> +  int keep_index, int patch_mode, int include_untracked,
> +  int quiet, const char *stash_msg)
> +{
> + int ret = 0;
> + struct pathspec ps;
> + struct stash_info info;
> + if (patch_mode && keep_index == -1)
> + keep_index = 1;
> +
> + if (patch_mode && include_untracked) {
> + fprintf_ln(stderr, "Can't use --patch and --include-untracked 
> or --all at the same time");

This should be marked for translation.  Similar for the messages
below.  I noticed this in a previous patch as well, so it may be worth
reviewing all the output, and checking that it's going to the right
stream and is marked for translation.

> + return -1;
> + }
> +
> + parse_pathspec(, 0, PATHSPEC_PREFER_FULL, prefix, argv);
> +
> + if (read_cache() < 0)
> + die(_("index file corrupt"));
> +
> + if (!include_untracked && ps.nr) {
> + int i;
> + char *ps_matched = xcalloc(ps.nr, 1);
> +
> + for (i = 0; i < active_nr; ++i) {
> + const struct cache_entry *ce = active_cache[i];
> + if (!ce_path_match(ce, , ps_matched))
> + continue;
> + }
> +
> + if (report_path_error(ps_matched, , prefix)) {
> + fprintf_ln(stderr, "Did you forget to 'git add'?");
> + return -1;
> + }
> + }
> +
> + read_cache_preload(NULL);

Instead of doing a 'read_cache' before and 'read_cache_preload(NULL)'
here, we could just use 'read_cache_preload(NULL)' above.
'read_cache' does return early if the index has already been read, so
there's no big harm in doing this twice, but just having one call is
still neater I think.

It would make the command slightly slower in the error case above, but
I doubt that's worth worrying about.

> + if (refresh_cache(REFRESH_QUIET))
> + return -1;
> +
> + if (!check_changes(argv, include_untracked, prefix)) {
> + fprintf_ln(stdout, "No local changes to save");
> + return 0;
> + }
> +
> + if (!reflog_exists(ref_stash) && do_clear_stash()) {
> + fprintf_ln(stderr, "Cannot initialize stash");
> + return -1;
> + }
> +
> + if ((ret = do_create_stash(argc, argv, prefix, _msg,
> +include_untracked, patch_mode, )))
> + return ret;

Should this be 'return ret < 0'?  'ret == 1' means there are no
changes, for which we currently get a 0 exit code.  Though on second
thought that can't happen, because we already have 'check_changes'
above.  Why do we want the 'ret' variable here?

Something I notice here is that we are passing 'argc' and 'argv'
around a lot.  We passed that through parse-options already, and it
seems to me that we're mostly left with pathspecs here, rather than
'argv'.  It looks to me like we could just parse the pathspecs in the
callers (which we do in some places, but maybe not in all of them) and
then pass 'struct pathspec' around instead of the leftover argv, which
is easier to understand, and gives all these functions a neater/easier
to understand interface.

Also looking at 'do_create_stash', the 'argc' argument seems

Re: [PATCH v2] worktree: add --quiet option

2018-08-15 Thread Thomas Gummerer

On 08/15, Elia Pinto wrote:
> Add the '--quiet' option to git worktree,
> as for the other git commands. 'add' is the
> only command affected by it since all other
> commands, except 'list', are currently
> silent by default.
> 
> Helped-by: Martin Ågren 
> Helped-by: Duy Nguyen 
> Helped-by: Eric Sunshine 
> Signed-off-by: Elia Pinto 
> ---
> This is the second version of the patch.
> 
> Changes from the first version
> (https://public-inbox.org/git/CACsJy8A=zp7nfbuwyfep4uff3kssiaor3m0mtgvnhceyhsw...@mail.gmail.com/T/):
> 
> - deleted garbage in git-worktree.c and deleted
> superfluous blank line in git-worktree.txt.
> - when giving "--quiet" to 'add', call git symbolic-ref also with
> "--quiet".
> - changed the commit message to be more general, but
> specifying why the "--quiet" option is meaningful only for
> the 'add' command of git-worktree.
> - in git-worktree.txt the option
> "--quiet" is described near the "--verbose" option.
> 
>  Documentation/git-worktree.txt |  4 
>  builtin/worktree.c | 16 +---
>  t/t2025-worktree-add.sh|  5 +
>  3 files changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
> index 9c26be40f..29a5b7e25 100644
> --- a/Documentation/git-worktree.txt
> +++ b/Documentation/git-worktree.txt
> @@ -173,6 +173,10 @@ This can also be set up as the default behaviour by 
> using the
>   This format will remain stable across Git versions and regardless of 
> user
>   configuration.  See below for details.
>  
> +-q::
> +--quiet::
> + With 'add', suppress feedback messages.

Very minor nit here, we seem to use backticks everywhere else in this
document, maybe we sould do that here as well?  Not sure it's worth
another iteration though.

The rest of the patch looks good to me, thanks!

>  -v::
>  --verbose::
>   With `prune`, report all removals.
> diff --git a/builtin/worktree.c b/builtin/worktree.c
> index a763dbdcc..41e771439 100644
> --- a/builtin/worktree.c
> +++ b/builtin/worktree.c
> @@ -27,6 +27,7 @@ static const char * const worktree_usage[] = {
>  struct add_opts {
>   int force;
>   int detach;
> + int quiet;
>   int checkout;
>   int keep_locked;
>  };
> @@ -303,9 +304,13 @@ static int add_worktree(const char *path, const char 
> *refname,
>   if (!is_branch)
>   argv_array_pushl(, "update-ref", "HEAD",
>oid_to_hex(>object.oid), NULL);
> - else
> + else {
>   argv_array_pushl(, "symbolic-ref", "HEAD",
>symref.buf, NULL);
> + if (opts->quiet)
> + argv_array_push(, "--quiet");
> + }
> +
>   cp.env = child_env.argv;
>   ret = run_command();
>   if (ret)
> @@ -315,6 +320,8 @@ static int add_worktree(const char *path, const char 
> *refname,
>   cp.argv = NULL;
>   argv_array_clear();
>   argv_array_pushl(, "reset", "--hard", NULL);
> + if (opts->quiet)
> + argv_array_push(, "--quiet");
>   cp.env = child_env.argv;
>   ret = run_command();
>   if (ret)
> @@ -437,6 +444,7 @@ static int add(int ac, const char **av, const char 
> *prefix)
>   OPT_BOOL(0, "detach", , N_("detach HEAD at named 
> commit")),
>   OPT_BOOL(0, "checkout", , N_("populate the new 
> working tree")),
>   OPT_BOOL(0, "lock", _locked, N_("keep the new working 
> tree locked")),
> + OPT__QUIET(, N_("suppress progress reporting")),
>   OPT_PASSTHRU(0, "track", _track, NULL,
>N_("set up tracking mode (see git-branch(1))"),
>PARSE_OPT_NOARG | PARSE_OPT_OPTARG),
> @@ -491,8 +499,8 @@ static int add(int ac, const char **av, const char 
> *prefix)
>   }
>   }
>   }
> -
> - print_preparing_worktree_line(opts.detach, branch, new_branch, 
> !!new_branch_force);
> + if (!opts.quiet)
> + print_preparing_worktree_line(opts.detach, branch, new_branch, 
> !!new_branch_force);
>  
>   if (new_branch) {
>   struct child_process cp = CHILD_PROCESS_INIT;
> @@ -500,6 +508,8 @@ static int add(int ac, const char **av, const char 
> *prefix)
>   argv_array_push(, "branch");
>   if (new_branch_force)
>   argv_array_push(, "--force");
> + if (opts.quiet)
> + argv_array_push(, "--quiet");
>   argv_array_push(, new_branch);
>   argv_array_push(, branch);
>   if (opt_track)
> diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
> index be6e09314..658647d83 100755
> --- a/t/t2025-worktree-add.sh
> +++ b/t/t2025-worktree-add.sh
> @@ -252,6 +252,11 @@ test_expect_success 'add -B' '
>   test_cmp_rev master^ poodle
>  '
>  
> +test_expect_success

Re: [GSoC][PATCH v7 00/26] Convert "git stash" to C builtin

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Hello,
> 
> Here is the whole `git stash` C version. Some of the previous
> patches were already reviewed (up to and including "stash: convert
> store to builtin"), but there are some which were not
> (starting with "stash: convert create to builtin").

Thanks for this new iteration, and sorry I took a while to find some
time to review this.  I had another read through the patches up until
patch 15, and left some comments, before running out of time again.  I
hope to find some time in the next few days to go through the rest of
the series as well.

One more comment in terms of the structure of the series.  The
patches doing the actual conversion from shell to C seem to be
interleaved with cleanup patches and patches that make the C version
use more internal APIs.  I'd suggest putting all the cleanup patches
(e.g. "stash: change `git stash show` usage text and documentation")
to the front of the series, as that's more likely to be
uncontroversial, and could maybe even be merged by itself.

Then I'd put all the conversion from shell to C patches, and only once
everything is converted I'd put the patches to use more of the
internal APIs rather than using run_command everywhere.  A possible
alternative would be to squash the patches to replace the run_command
calls with patches that use the internal API directly, to save the
reviewers some time by reading through less churn.  Though I'm kind of
on the fence with that, as a faithful conversion using 'run_command'
may be easier to review as a first step.

Hope this helps!

> In order to see the difference between the shell version and
> the C version, I ran `time` on:
> 
> * git test suite (t3903-stash.sh, t3904-stash-patch.sh,
> t3905-stash-include-untracked.sh and t3906-stash-submodule.sh)
> 
> t3903-stash.sh:
> ** SHELL: 12,69s user 9,95s system 109% cpu 20,730 total
> ** C:  2,67s user 2,84s system 105% cpu  5,206 total
> 
> t3904-stash-patch.sh:
> ** SHELL: 1,43s user 0,94s system 106% cpu 2,242 total
> ** C: 1,01s user 0,58s system 104% cpu 1,530 total
> 
> t3905-stash-include-untracked.sh
> ** SHELL: 2,22s user 1,73s system 110% cpu 3,569 total
> ** C: 0,59s user 0,57s system 106% cpu 1,085 total
> 
> t3906-stash-submodule.sh
> ** SHELL: 2,89s user 2,99s system 106% cpu 5,527 total
> ** C: 2,21s user 2,61s system 105% cpu 4,568 total
> 
> TOTAL:
> ** SHELL: 19.23s user 15.61s system
> ** C:  6.48s user  6.60s system

Awesome!

> * a git repository with 4000 files: 1000 not changed,
> 1000 staged files, 1000 unstaged files, 1000 untracked.
> In this case I ran some of the most used commands:
> 
> git stash push:
> 
> ** SHELL: 0,12s user 0,21s system 101% cpu 0,329 total
> ** C: 0,06s user 0,13s system 105% cpu 0,185 total
> 
> git stash push -u:
> 
> ** SHELL: 0,18s user 0,27s system  108% cpu 0,401 total
> ** C: 0,09s user 0,19s system  103% cpu 0,267 total
> 
> git stash pop:
> 
> ** SHELL: 0,16s user 0,26s system 103% cpu 0,399 total
> ** C: 0,13s user 0,19s system 102% cpu 0,308 total
> 
> Best regards,
> Paul Ungureanu
> 
> 
> Joel Teichroeb (5):
>   stash: improve option parsing test coverage
>   stash: convert apply to builtin
>   stash: convert drop and clear to builtin
>   stash: convert branch to builtin
>   stash: convert pop to builtin
> 
> Paul-Sebastian Ungureanu (21):
>   sha1-name.c: added 'get_oidf', which acts like 'get_oid'
>   stash: update test cases conform to coding guidelines
>   stash: renamed test cases to be more descriptive
>   stash: implement the "list" command in the builtin
>   stash: convert show to builtin
>   stash: change `git stash show` usage text and documentation
>   stash: refactor `show_stash()` to use the diff API
>   stash: update `git stash show` documentation
>   stash: convert store to builtin
>   stash: convert create to builtin
>   stash: replace spawning a "read-tree" process
>   stash: avoid spawning a "diff-index" process
>   stash: convert push to builtin
>   stash: make push to be quiet
>   stash: add tests for `git stash push -q`
>   stash: replace spawning `git ls-files` child process
>   stash: convert save to builtin
>   stash: convert `stash--helper.c` into `stash.c`
>   stash: optimize `get_untracked_files()` and `check_changes()`
>   stash: replace all `write-tree` child processes with API calls
>   stash: replace all "git apply" child processes with API calls
> 
>  Documentation/git-stash.txt |7 +-
>  Makefile|2 +-
>  builtin.h   |1 +
>  builtin/stash.c | 1562 +++
>  cache.h |1 +
>  git-stash.sh|  752 -
>  git.c   |1 +
>  sha1-name.c |   19 +
>

Re: [GSoC][PATCH v7 15/26] stash: convert create to builtin

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash create to the helper.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 406 
>  git-stash.sh|   2 +-
>  2 files changed, 407 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 5ff810f8c..a4e57899b 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -21,6 +21,7 @@ static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper branch  []"),
>   N_("git stash--helper clear"),
>   N_("git stash--helper store [-m|--message ] [-q|--quiet] 
> "),
> + N_("git stash--helper create []"),
>   NULL
>  };
>  
> @@ -64,6 +65,11 @@ static const char * const git_stash_helper_store_usage[] = 
> {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_create_usage[] = {
> + N_("git stash--helper create []"),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static int quiet;
>  static struct strbuf stash_index_path = STRBUF_INIT;
> @@ -781,6 +787,404 @@ static int store_stash(int argc, const char **argv, 
> const char *prefix)
>   return do_store_stash(argv[0], stash_msg, quiet);
>  }
>
> [...]
> 
> +
> +static int do_create_stash(int argc, const char **argv, const char *prefix,
> +const char **stash_msg, int include_untracked,
> +int patch_mode, struct stash_info *info)
> +{
> + int untracked_commit_option = 0;
> + int ret = 0;
> + int subject_len;
> + int flags;
> + const char *head_short_sha1 = NULL;
> + const char *branch_ref = NULL;
> + const char *head_subject = NULL;
> + const char *branch_name = "(no branch)";
> + struct commit *head_commit = NULL;
> + struct commit_list *parents = NULL;
> + struct strbuf msg = STRBUF_INIT;
> + struct strbuf commit_tree_label = STRBUF_INIT;
> + struct strbuf out = STRBUF_INIT;
> + struct strbuf final_stash_msg = STRBUF_INIT;
> +
> + read_cache_preload(NULL);
> + refresh_cache(REFRESH_QUIET);
> +
> + if (!check_changes(argv, include_untracked, prefix)) {
> + ret = 1;
> + goto done;

I wonder if we can just 'exit(0)' here, instead of returning.  This
whole command is a builtin, and I *think* outside of 'libgit.a' exiting
early is fine.  It does mean that we're not free'ing the memory
though, which means a leak checker would probably complain.  So
dunno.  It would simplify the code a little, but not sure it's worth it.

> + }
> +
> + if (get_oid("HEAD", >b_commit)) {
> + fprintf_ln(stderr, "You do not have the initial commit yet");
> + ret = -1;
> + goto done;
> + } else {
> + head_commit = lookup_commit(the_repository, >b_commit);
> + }
> +
> + branch_ref = resolve_ref_unsafe("HEAD", 0, NULL, );
> + if (flags & REF_ISSYMREF)
> + branch_name = strrchr(branch_ref, '/') + 1;
> + head_short_sha1 = find_unique_abbrev(_commit->object.oid,
> +  DEFAULT_ABBREV);
> + subject_len = find_commit_subject(get_commit_buffer(head_commit, NULL),
> +   _subject);
> + strbuf_addf(, "%s: %s %.*s\n", branch_name, head_short_sha1,
> + subject_len, head_subject);

I think this can be written in a slightly simpler way:

head_short_sha1 = find_unique_abbrev(_commit->object.oid,
 DEFAULT_ABBREV);
strbuf_addf(, "%s: %s", branch_name, head_short_sha1);
pp_commit_easy(CMIT_FMT_ONELINE, head_commit, );
strbuf_addch(, '\n');

The other advantage this brings is that it is consistent with other
places where we print/use the subject of a commit (e.g. in 'git reset
--hard').

> +
> + strbuf_addf(_tree_label, "index on %s\n", msg.buf);
> + commit_list_insert(head_commit, );
> + if (write_cache_as_tree(>i_tree, 0, NULL) ||
> + commit_tree(commit_tree_label.buf, commit_tree_label.len,
> + >i_tree, parents, >i_commit, NULL, NULL)) {
> + fprintf_ln(stderr, "Cannot save the current index state");

Looks like this message is translated in the current 'git stash'
implementation, so it should be here as well.  Same for the messages
below.

> + ret = -1;
> + goto done;
> + }
> +
> + if (include_untracked && get_untracked_files(argv, 1,
> +  include_untracked, )) {
> + if (save_untracked_files(info, , )) {
> + printf_ln("Cannot save the untracked files");

Why does this go to stdout, whereas "Cannot save the current index
state" above goes to stderr?  In the shell version of git stash these
all go to stderr fwiw.  There are a few similar cases, it would
probably be worth going

Re: [GSoC][PATCH v7 14/26] stash: convert store to builtin

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash store to the helper and delete the store_stash function
> from the shell script.
> 
> Add the usage string which was forgotten in the shell script.

I think similarly to 'git stash create', which also doesn't appear in
the usage, this was intentionally omitted in the shell script.  The
reason for the omission is that this is only intended to be useful in
scripts, and not in interactive usage.  As such it doesn't add much
value in showing it in 'git stash -h'.  Meanwhile it is in the
synopsis in the man page.

If we want to add it to the help output, I think it would be best to
do so in a separate commit, and for 'git stash create' as well.  But
I'm not sure that's a good change.

> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 52 +
>  git-stash.sh| 43 ++
>  2 files changed, 54 insertions(+), 41 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index ec8c38c6f..5ff810f8c 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -20,6 +20,7 @@ static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
>   N_("git stash--helper clear"),
> + N_("git stash--helper store [-m|--message ] [-q|--quiet] 
> "),
>   NULL
>  };
>  
> @@ -58,6 +59,11 @@ static const char * const git_stash_helper_clear_usage[] = 
> {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_store_usage[] = {
> + N_("git stash--helper store [-m|--message ] [-q|--quiet] 
> "),
> + NULL
> +};
> +
>  static const char *ref_stash = "refs/stash";
>  static int quiet;
>  static struct strbuf stash_index_path = STRBUF_INIT;
> @@ -731,6 +737,50 @@ static int show_stash(int argc, const char **argv, const 
> char *prefix)
>   return 0;
>  }
>  
> +static int do_store_stash(const char *w_commit, const char *stash_msg,
> +   int quiet)
> +{
> + int ret = 0;
> + struct object_id obj;
> +
> + if (!stash_msg)
> + stash_msg  = xstrdup("Created via \"git stash--helper 
> store\".");

I assume we're going to s/--helper// in a later commit?  Not sure
adding the '--helper' here is necessary, as a user would never invoke
'git stash--helper' directly, so they would expect the stash to be
created by 'git stash store'.  Anyway that's fairly minor, as I assume
this is going to change by the end of the patch series.

> +
> + ret = get_oid(w_commit, );
> + if (!ret) {
> + ret = update_ref(stash_msg, ref_stash, , NULL,
> +  REF_FORCE_CREATE_REFLOG,
> +  quiet ? UPDATE_REFS_QUIET_ON_ERR :
> +  UPDATE_REFS_MSG_ON_ERR);
> + }
> + if (ret && !quiet)
> + fprintf_ln(stderr, _("Cannot update %s with %s"),
> +ref_stash, w_commit);
> +
> + return ret;
> +}
> +
> +static int store_stash(int argc, const char **argv, const char *prefix)
> +{
> + const char *stash_msg = NULL;
> + struct option options[] = {
> + OPT__QUIET(, N_("be quiet, only report errors")),
> + OPT_STRING('m', "message", _msg, "message", N_("stash 
> message")),
> + OPT_END()
> + };
> +
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_helper_store_usage,
> +  PARSE_OPT_KEEP_UNKNOWN);
> +
> + if (argc != 1) {
> + fprintf(stderr, _("\"git stash--helper store\" requires one 
>  argument\n"));
> + return -1;
> + }
> +
> + return do_store_stash(argv[0], stash_msg, quiet);
> +}
> +
>  int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
> @@ -765,6 +815,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!list_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "show"))
>   return !!show_stash(argc, argv, prefix);
> + else if (!strcmp(argv[0], "store"))
> + return !!store_stash(argc, argv, prefix);
>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> git_stash_helper_usage, options);
> diff --git a/git-stash.sh b/git-stash.sh
> index 0d05cbc1e..5739c5152 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -191,45 +191,6 @@ create_stash () {
>   die "$(gettext "Cannot record working tree state")"
>  }
>  
> -store_stash () {
> - while test $# != 0
> - do
> - case "$1" in
> - -m|--message)
> - shift
> - stash_msg="$1"
> - ;;
> - -m*)
> - stash_msg=${1#-m}
> - ;;
> -

Re: [GSoC][PATCH v7 13/26] stash: update `git stash show` documentation

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add in documentation about the change of behavior regarding
> the `--quiet` option, which was introduced in the last commit.
> (the `--quiet` option does not exit anymore with erorr if it

s/erorr/error/

> is given an empty stash as argument)

If we want to keep the change in behaviour here (which I'm not sure
about as mentioned in my comment on the previous patch), I think this
should be folded into the previous patch.  I don't think there's much
value in having this as a separate commit, and folding it into the
previous commit has the advantage that we can easily see that the new
behaviour is documented.

> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  Documentation/git-stash.txt | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index e31ea7d30..d60ebdb96 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -117,6 +117,9 @@ show [] []::
>   You can use stash.showStat and/or stash.showPatch config variables
>   to change the default behavior.
>  
> + It accepts any option known to `git diff`, but acts different on

I notice that we are using single quotes for git commands in some
places and backticks in other places in this man page.  We may want to
clean that up at some point.  I wouldn't want to do it in this series
though, as this is already long enough, and we've had this
inconsistency for a while already.

> + `--quiet` option and exit with zero regardless of differences.
> +
>  pop [--index] [-q|--quiet] []::
>  
>   Remove a single stashed state from the stash list and apply it
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 12/26] stash: refactor `show_stash()` to use the diff API

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Currently, `show_stash()` uses `cmd_diff()` to generate
> the output. After this commit, the output will be generated
> using the internal API.
> 
> Before this commit, `git stash show --quiet` would act like
> `git diff` and error out if the stash is not empty. Now, the
> `--quiet` option does not error out given an empty stash.

I think this needs a bit more justification.  As I mentioned in my
comment to a previous patch, I'm not sure '--quiet' makes much sense
with 'git stash show' (it will show nothing, and will always exit with
an error code, as the stash will always contain something), but if we
are supporting the same flags as 'git diff', and essentially just
forwarding them, shouldn't they keep the same behaviour as well?

> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 73 +
>  1 file changed, 45 insertions(+), 28 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index 0c1efca6b..ec8c38c6f 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -10,6 +10,8 @@
>  #include "run-command.h"
>  #include "dir.h"
>  #include "rerere.h"
> +#include "revision.h"
> +#include "log-tree.h"
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> @@ -662,56 +664,71 @@ static int git_stash_config(const char *var, const char 
> *value, void *cb)
>  
>  static int show_stash(int argc, const char **argv, const char *prefix)
>  {
> - int i, ret = 0;
> - struct child_process cp = CHILD_PROCESS_INIT;
> - struct argv_array args_refs = ARGV_ARRAY_INIT;
> + int i;
> + int flags = 0;
>   struct stash_info info;
> + struct rev_info rev;
> + struct argv_array stash_args = ARGV_ARRAY_INIT;
>   struct option options[] = {
>   OPT_END()
>   };
>  
> - argc = parse_options(argc, argv, prefix, options,
> -  git_stash_helper_show_usage,
> -  PARSE_OPT_KEEP_UNKNOWN);
> + init_diff_ui_defaults();
> + git_config(git_diff_ui_config, NULL);
>  
> - cp.git_cmd = 1;
> - argv_array_push(, "diff");
> + init_revisions(, prefix);
>  
> - /* Push arguments which are not options into args_refs. */
> - for (i = 0; i < argc; ++i) {
> - if (argv[i][0] == '-')
> - argv_array_push(, argv[i]);
> + /* Push arguments which are not options into stash_args. */
> + for (i = 1; i < argc; ++i) {
> + if (argv[i][0] != '-')
> + argv_array_push(_args, argv[i]);
>   else
> - argv_array_push(_refs, argv[i]);
> - }
> -
> - if (get_stash_info(, args_refs.argc, args_refs.argv)) {
> - child_process_clear();
> - argv_array_clear(_refs);
> - return -1;
> + flags++;
>   }
>  
>   /*
>* The config settings are applied only if there are not passed
>* any flags.
>*/
> - if (cp.args.argc == 1) {
> + if (!flags) {
>   git_config(git_stash_config, NULL);
>   if (show_stat)
> - argv_array_push(, "--stat");
> + rev.diffopt.output_format |= DIFF_FORMAT_DIFFSTAT;
> + if (show_patch) {
> + rev.diffopt.output_format = ~DIFF_FORMAT_NO_OUTPUT;
> + rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
> + }

I failed to notice this in the previous patch (the problem existed
there as well), but this changes the behaviour of 'git -c
stash.showStat=false stash show '.  Previously doing this would
not show anything, which is the correct behaviour, while now still
shows the diffstat.

I think the show_stat variable is interpreted the wrong way around in
the previous patch.

Something else I noticed now that I was playing around more with the
config options is that the parsing of the config options is not
correctly done in the previous patch.  It does a 'strcmp(var,
"stash.showStat"))', but the config API makes all variables lowercase
(config options are case insensitive, and making everything lowercase
is the way to ensure that), so it should be 'strcmp(var, "stash.showstat"))', 
and similar for the 'stash.showpatch' config option.

This all sounds like it would be nice to have some tests for these
config options, to make sure we get it right, and won't break them in
the future.

> + }
>  
> - if (show_patch)
> - argv_array_push(, "-p");
> + if (get_stash_info(, stash_args.argc, stash_args.argv)) {
> + argv_array_clear(_args);
> + return -1;
>   }
>  
> - argv_array_pushl(, oid_to_hex(_commit),
> -  oid_to_hex(_commit), NULL);
> + argc = setup_revisions(argc, argv, , NULL);
> + if (!rev.diffopt.output_format)
> +

Re: [GSoC][PATCH v7 11/26] stash: change `git stash show` usage text and documentation

2018-08-15 Thread Thomas Gummerer

> Subject: stash: change `git stash show` usage text and documentation

Another nitpick about commit messages.  "change ... usage text and
documentation" doesn't say much about what the actual change is.
How about something like "stash: mention options in "show" synopsis"
instead?

The change itself looks good to me, thanks!

On 08/08, Paul-Sebastian Ungureanu wrote:
> It is already stated in documentation that it will accept any
> option known to `git diff`, but not in the usage text and some
> parts of the documentation.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  Documentation/git-stash.txt | 4 ++--
>  builtin/stash--helper.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 7ef8c4791..e31ea7d30 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -9,7 +9,7 @@ SYNOPSIS
>  
>  [verse]
>  'git stash' list []
> -'git stash' show []
> +'git stash' show [] []
>  'git stash' drop [-q|--quiet] []
>  'git stash' ( pop | apply ) [--index] [-q|--quiet] []
>  'git stash' branch  []
> @@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
>  The command takes options applicable to the 'git log'
>  command to control what is shown and how. See linkgit:git-log[1].
>  
> -show []::
> +show [] []::
>  
>   Show the changes recorded in the stash entry as a diff between the
>   stashed contents and the commit back when the stash entry was first
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index e764cd33e..0c1efca6b 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -13,7 +13,7 @@
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> - N_("git stash--helper show []"),
> + N_("git stash--helper show [] []"),
>   N_("git stash--helper drop [-q|--quiet] []"),
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
> @@ -27,7 +27,7 @@ static const char * const git_stash_helper_list_usage[] = {
>  };
>  
>  static const char * const git_stash_helper_show_usage[] = {
> - N_("git stash--helper show []"),
> + N_("git stash--helper show [] []"),
>   NULL
>  };
>  
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 10/26] stash: convert show to builtin

2018-08-15 Thread Thomas Gummerer

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash show to the helper and delete the show_stash, have_stash,
> assert_stash_like, is_stash_like and parse_flags_and_rev functions
> from the shell script now that they are no longer needed.
> 
> Before this commit, `git stash show` would ignore `--index` and
> `--quiet` options. Now, `git stash show` errors out on `--index`
> and does not display any message on `--quiet`, but errors out
> if the stash is not empty.

I think "errors out" is slightly misleading here.  Maybe "but exits
with an exit code similar to 'git diff'" instead?

Looking at why we ignored them before, it's because we filtered them
out in 'parse_flags_and_rev', which looks more accidental than
intentional, and I think we could consider a bug, so this change in
behaviour here is okay.

'--quiet' doesn't make too much sense to use with 'git stash show', so
I'm not sure whether or not it makes sense to support it at all.  But
we do promise to pass all options through to in our documentation, so
the new behaviour is what we are documenting.

> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c |  78 
>  git-stash.sh| 132 +---
>  2 files changed, 79 insertions(+), 131 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index daa4d0034..e764cd33e 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -13,6 +13,7 @@
>  
>  static const char * const git_stash_helper_usage[] = {
>   N_("git stash--helper list []"),
> + N_("git stash--helper show []"),
>   N_("git stash--helper drop [-q|--quiet] []"),
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
> @@ -25,6 +26,11 @@ static const char * const git_stash_helper_list_usage[] = {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_show_usage[] = {
> + N_("git stash--helper show []"),
> + NULL
> +};
> +
>  static const char * const git_stash_helper_drop_usage[] = {
>   N_("git stash--helper drop [-q|--quiet] []"),
>   NULL
> @@ -638,6 +644,76 @@ static int list_stash(int argc, const char **argv, const 
> char *prefix)
>   return run_command();
>  }
>  
> +static int show_stat = 1;
> +static int show_patch;
> +
> +static int git_stash_config(const char *var, const char *value, void *cb)
> +{
> + if (!strcmp(var, "stash.showStat")) {
> + show_stat = git_config_bool(var, value);
> + return 0;
> + }
> + if (!strcmp(var, "stash.showPatch")) {
> + show_patch = git_config_bool(var, value);
> + return 0;
> + }
> + return git_default_config(var, value, cb);
> +}
> +
> +static int show_stash(int argc, const char **argv, const char *prefix)
> +{
> + int i, ret = 0;
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct argv_array args_refs = ARGV_ARRAY_INIT;
> + struct stash_info info;
> + struct option options[] = {
> + OPT_END()
> + };
> +
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_helper_show_usage,
> +  PARSE_OPT_KEEP_UNKNOWN);
> +
> + cp.git_cmd = 1;
> + argv_array_push(, "diff");
> +
> + /* Push arguments which are not options into args_refs. */
> + for (i = 0; i < argc; ++i) {
> + if (argv[i][0] == '-')
> + argv_array_push(, argv[i]);
> + else
> + argv_array_push(_refs, argv[i]);
> + }
> +
> + if (get_stash_info(, args_refs.argc, args_refs.argv)) {
> + child_process_clear();
> + argv_array_clear(_refs);
> + return -1;
> + }
> +
> + /*
> +  * The config settings are applied only if there are not passed
> +  * any flags.
> +  */
> + if (cp.args.argc == 1) {
> + git_config(git_stash_config, NULL);
> + if (show_stat)
> + argv_array_push(, "--stat");
> +
> + if (show_patch)
> + argv_array_push(, "-p");
> + }
> +
> + argv_array_pushl(, oid_to_hex(_commit),
> +  oid_to_hex(_commit), NULL);
> +
> + ret = run_command();
> +
> + free_stash_info();
> + argv_array_clear(_refs);
> + return ret;
> +}
> +
>  int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
> @@ -670,6 +746,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!branch_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "list"))
>   return !!list_stash(argc, argv, prefix);
> + else if (!strcmp(argv[0], "show"))
> + return !!show_stash(argc, argv, prefix);
>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
>

Re: [GSoC][PATCH v7 09/26] stash: implement the "list" command in the builtin

2018-08-15 Thread Thomas Gummerer

> Subject: stash: implement the "list" command in the builtin

Nit: The previous commit messages all have the format "stash: convert
 to builtin", maybe follow the same pattern here?

The rest of the patch looks good to me.

On 08/08, Paul-Sebastian Ungureanu wrote:
> Add stash list to the helper and delete the list_stash function
> from the shell script.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  builtin/stash--helper.c | 31 +++
>  git-stash.sh|  7 +--
>  2 files changed, 32 insertions(+), 6 deletions(-)
> 
> diff --git a/builtin/stash--helper.c b/builtin/stash--helper.c
> index d6bd468e0..daa4d0034 100644
> --- a/builtin/stash--helper.c
> +++ b/builtin/stash--helper.c
> @@ -12,6 +12,7 @@
>  #include "rerere.h"
>  
>  static const char * const git_stash_helper_usage[] = {
> + N_("git stash--helper list []"),
>   N_("git stash--helper drop [-q|--quiet] []"),
>   N_("git stash--helper ( pop | apply ) [--index] [-q|--quiet] 
> []"),
>   N_("git stash--helper branch  []"),
> @@ -19,6 +20,11 @@ static const char * const git_stash_helper_usage[] = {
>   NULL
>  };
>  
> +static const char * const git_stash_helper_list_usage[] = {
> + N_("git stash--helper list []"),
> + NULL
> +};
> +
>  static const char * const git_stash_helper_drop_usage[] = {
>   N_("git stash--helper drop [-q|--quiet] []"),
>   NULL
> @@ -609,6 +615,29 @@ static int branch_stash(int argc, const char **argv, 
> const char *prefix)
>   return ret;
>  }
>  
> +static int list_stash(int argc, const char **argv, const char *prefix)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct option options[] = {
> + OPT_END()
> + };
> +
> + argc = parse_options(argc, argv, prefix, options,
> +  git_stash_helper_list_usage,
> +  PARSE_OPT_KEEP_UNKNOWN);
> +
> + if (!ref_exists(ref_stash))
> + return 0;
> +
> + cp.git_cmd = 1;
> + argv_array_pushl(, "log", "--format=%gd: %gs", "-g",
> +  "--first-parent", "-m", NULL);
> + argv_array_pushv(, argv);
> + argv_array_push(, ref_stash);
> + argv_array_push(, "--");
> + return run_command();
> +}
> +
>  int cmd_stash__helper(int argc, const char **argv, const char *prefix)
>  {
>   pid_t pid = getpid();
> @@ -639,6 +668,8 @@ int cmd_stash__helper(int argc, const char **argv, const 
> char *prefix)
>   return !!pop_stash(argc, argv, prefix);
>   else if (!strcmp(argv[0], "branch"))
>   return !!branch_stash(argc, argv, prefix);
> + else if (!strcmp(argv[0], "list"))
> + return !!list_stash(argc, argv, prefix);
>  
>   usage_msg_opt(xstrfmt(_("unknown subcommand: %s"), argv[0]),
> git_stash_helper_usage, options);
> diff --git a/git-stash.sh b/git-stash.sh
> index 8f2640fe9..6052441aa 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -382,11 +382,6 @@ have_stash () {
>   git rev-parse --verify --quiet $ref_stash >/dev/null
>  }
>  
> -list_stash () {
> - have_stash || return 0
> - git log --format="%gd: %gs" -g --first-parent -m "$@" $ref_stash --
> -}
> -
>  show_stash () {
>   ALLOW_UNKNOWN_FLAGS=t
>   assert_stash_like "$@"
> @@ -574,7 +569,7 @@ test -n "$seen_non_option" || set "push" "$@"
>  case "$1" in
>  list)
>   shift
> - list_stash "$@"
> + git stash--helper list "$@"
>   ;;
>  show)
>   shift
> -- 
> 2.18.0.573.g56500d98f
>

Re: [GSoC][PATCH v7 04/26] stash: renamed test cases to be more descriptive

2018-08-15 Thread Thomas Gummerer

> Subject: Re: [GSoC][PATCH v7 04/26] stash: renamed test cases to be more 
> descriptive

Please use the imperative mood in the title and the commit messages
themselves.  From Documentation/SubmittingPatches:

Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behavior.

>From a quick skim over the rest of the series, this also applies to
some of the subsequent patches in the series. 

On 08/08, Paul-Sebastian Ungureanu wrote:
> Renamed some test cases' labels to be more descriptive and under 80
> characters per line.
> 
> Signed-off-by: Paul-Sebastian Ungureanu 
> ---
>  t/t3903-stash.sh | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index de6cab1fe..8d002a7f2 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -604,7 +604,7 @@ test_expect_success 'stash show -p - no stashes on stack, 
> stash-like argument' '
>   test_cmp expected actual
>  '
>  
> -test_expect_success 'stash drop - fail early if specified stash is not a 
> stash reference' '
> +test_expect_success 'drop: fail early if specified stash is not a stash ref' 
> '
>   git stash clear &&
>   test_when_finished "git reset --hard HEAD && git stash clear" &&
>   git reset --hard &&
> @@ -618,7 +618,7 @@ test_expect_success 'stash drop - fail early if specified 
> stash is not a stash r
>   git reset --hard HEAD
>  '
>  
> -test_expect_success 'stash pop - fail early if specified stash is not a 
> stash reference' '
> +test_expect_success 'pop: fail early if specified stash is not a stash ref' '
>   git stash clear &&
>   test_when_finished "git reset --hard HEAD && git stash clear" &&
>   git reset --hard &&
> @@ -682,7 +682,7 @@ test_expect_success 'invalid ref of the form "n", n >= N' 
> '
>   git stash drop
>  '
>  
> -test_expect_success 'stash branch should not drop the stash if the branch 
> exists' '
> +test_expect_success 'branch: should not drop the stash if the branch exists' 
> '

Since we're adjusting the titles of the tests here I'll allow myself
to nitpick a little :)

Maybe "branch: do not drop the stash if the branch exists", which
sounds more like an assertion, as the "pop" and "drop" titles above.

>   git stash clear &&
>   echo foo >file &&
>   git add file &&
> @@ -693,7 +693,7 @@ test_expect_success 'stash branch should not drop the 
> stash if the branch exists
>   git rev-parse stash@{0} --
>  '
>  
> -test_expect_success 'stash branch should not drop the stash if the apply 
> fails' '
> +test_expect_success 'branch: should not drop the stash if the apply fails' '
>   git stash clear &&
>   git reset HEAD~1 --hard &&
>   echo foo >file &&
> @@ -707,7 +707,7 @@ test_expect_success 'stash branch should not drop the 
> stash if the apply fails'
>   git rev-parse stash@{0} --
>  '
>  
> -test_expect_success 'stash apply shows status same as git status (relative 
> to current directory)' '
> +test_expect_success 'apply: shows same status as git status (relative to 
> ./)' '

s/shows/show/ above maybe?  This used to be a full sentence
previously, where 'shows' was appropriate, but I think "show" sounds
better after the colon.

>   git stash clear &&
>   echo 1 >subdir/subfile1 &&
>   echo 2 >subdir/subfile2 &&
> @@ -1048,7 +1048,7 @@ test_expect_success 'stash push -p with pathspec shows 
> no changes only once' '
>   test_i18ncmp expect actual
>  '
>  
> -test_expect_success 'stash push with pathspec shows no changes when there 
> are none' '
> +test_expect_success 'push:  shows no changes when there are none' '

Maybe "push : show no changes when there are none"?  "push
" would be the rest of the 'git stash' command, having the
colon in between them seems a bit odd.

>   >foo &&
>   git add foo &&
>   git commit -m "tmp" &&
> @@ -1058,7 +1058,7 @@ test_expect_success 'stash push with pathspec shows no 
> changes when there are no
>   test_i18ncmp expect actual
>  '
>  
> -test_expect_success 'stash push with pathspec not in the repository errors 
> out' '
> +test_expect_success 'push:  not in the repository errors out' '

This one makes sense to me.

>   >untracked &&
>   test_must_fail git stash push untracked &&
>   test_path_is_file untracked
> -- 
> 2.18.0.573.g56500d98f
>

Re: [PATCH v6 00/21] Add range-diff, a tbdiff lookalike

2018-08-13 Thread Thomas Gummerer

On 08/13, Johannes Schindelin wrote:
> Hi,
> 
> On Mon, 13 Aug 2018, Johannes Schindelin via GitGitGadget wrote:
> 
> > The incredibly useful git-tbdiff [https://github.com/trast/tbdiff] tool to
> > compare patch series (say, to see what changed between two iterations sent
> > to the Git mailing list) is slightly less useful for this developer due to
> > the fact that it requires the hungarian and numpy Python packages which are
> > for some reason really hard to build in MSYS2. So hard that I even had to
> > give up, because it was simply easier to re-implement the whole shebang as a
> > builtin command.
> > 
> > The project at https://github.com/trast/tbdiff seems to be dormant, anyway.
> > Funny (and true) story: I looked at the open Pull Requests to see how active
> > that project is, only to find to my surprise that I had submitted one in
> > August 2015, and that it was still unanswered let alone merged.
> > 
> > While at it, I forward-ported AEvar's patch to force --decorate=no because 
> > git -p tbdiff would fail otherwise.
> > 
> > Side note: I work on implementing range-diff not only to make life easier
> > for reviewers who have to suffer through v2, v3, ... of my patch series, but
> > also to verify my changes before submitting a new iteration. And also, maybe
> > even more importantly, I plan to use it to verify my merging-rebases of Git
> > for Windows (for which I previously used to redirect the
> > pre-rebase/post-rebase diffs vs upstream and then compare them using git
> > diff --no-index). And of course any interested person can see what changes
> > were necessary e.g. in the merging-rebase of Git for Windows onto v2.17.0 by
> > running a command like:
> > 
> > base=^{/Start.the.merging-rebase}
> > tag=v2.17.0.windows.1
> > pre=$tag$base^2
> > git range-diff $pre$base..$pre $tag$base..$tag
> > 
> > The command uses what it calls the "dual color mode" (can be disabled via 
> > --no-dual-color) which helps identifying what actually changed: it prefixes
> > lines with a - (and red background) that correspond to the first commit
> > range, and with a + (and green background) that correspond to the second
> > range. The rest of the lines will be colored according to the original
> > diffs.
> 
> Changes since v5:
> 
> - Fixed the bug (introduced in v5) where a dashdash would not be handled
>   appropriately.

Thanks!  I've read through all the patches (and the range-diff :))
again and played around a bit with the newest version, and I think
this is ready for 'next'.

While playing around with it I did find one error message that reads
slightly odd, but it's still understandable, so I'm not sure it's
worth worrying about now (we can always improve it on top):

 $ ./git range-diff -- js/range-diff-v4...HEADt
fatal: ambiguous argument 'HEADt..js/range-diff-v4': unknown revision or 
path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'
error: could not parse log for 'HEADt..js/range-diff-v4'


> [...]

Re: [PATCH v6 11/21] range-diff: add tests

2018-08-13 Thread Thomas Gummerer

On 08/13, Thomas Rast via GitGitGadget wrote:
> From: Thomas Rast 
> 
> These are essentially lifted from https://github.com/trast/tbdiff, with
> light touch-ups to account for the command now being named `git
> range-diff`.
> 
> Apart from renaming `tbdiff` to `range-diff`, only one test case needed
> to be adjusted: 11 - 'changed message'.
> 
> The underlying reason it had to be adjusted is that diff generation is
> sometimes ambiguous. In this case, a comment line and an empty line are
> added, but it is ambiguous whether they were added after the existing
> empty line, or whether an empty line and the comment line are added
> *before* the existing empty line. And apparently xdiff picks a different
> option here than Python's difflib.
>

Just noticed while reading the whole series again (hopefully for the
last time :)), do we need Thomas Rast's Sign-off here, as he is
credited as the author here? 

> Signed-off-by: Johannes Schindelin 
> ---
>  t/.gitattributes   |   1 +
>  t/t3206-range-diff.sh  | 145 ++
>  t/t3206/history.export | 604 +
>  3 files changed, 750 insertions(+)
>  create mode 100755 t/t3206-range-diff.sh
>  create mode 100644 t/t3206/history.export
> 
> diff --git a/t/.gitattributes b/t/.gitattributes
> index 3bd959ae5..b17bf71b8 100644
> --- a/t/.gitattributes
> +++ b/t/.gitattributes
> @@ -1,6 +1,7 @@
>  t[0-9][0-9][0-9][0-9]/* -whitespace
>  /diff-lib/* eol=lf
>  /t0110/url-* binary
> +/t3206/* eol=lf
>  /t3900/*.txt eol=lf
>  /t3901/*.txt eol=lf
>  /t4034/*/* eol=lf
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> new file mode 100755
> index 0..2237c7f4a
> --- /dev/null
> +++ b/t/t3206-range-diff.sh
> @@ -0,0 +1,145 @@
> +#!/bin/sh
> +
> +test_description='range-diff tests'
> +
> +. ./test-lib.sh
> +
> +# Note that because of the range-diff's heuristics, test_commit does more
> +# harm than good.  We need some real history.
> +
> +test_expect_success 'setup' '
> + git fast-import < "$TEST_DIRECTORY"/t3206/history.export
> +'
> +
> +test_expect_success 'simple A..B A..C (unmodified)' '
> + git range-diff --no-color master..topic master..unmodified \
> + >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  35b9b25 s/5/A/
> + 2:  fccce22 = 2:  de345ab s/4/A/
> + 3:  147e64e = 3:  9af6654 s/11/B/
> + 4:  a63e992 = 4:  2901f77 s/12/B/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'simple B...C (unmodified)' '
> + git range-diff --no-color topic...unmodified >actual &&
> + # same "expected" as above
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'simple A B C (unmodified)' '
> + git range-diff --no-color master topic unmodified >actual &&
> + # same "expected" as above
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'trivial reordering' '
> + git range-diff --no-color master topic reordered >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  aca177a s/5/A/
> + 3:  147e64e = 2:  14ad629 s/11/B/
> + 4:  a63e992 = 3:  ee58208 s/12/B/
> + 2:  fccce22 = 4:  307b27a s/4/A/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'removed a commit' '
> + git range-diff --no-color master topic removed >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  7657159 s/5/A/
> + 2:  fccce22 < -:  --- s/4/A/
> + 3:  147e64e = 2:  43d84d3 s/11/B/
> + 4:  a63e992 = 3:  a740396 s/12/B/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'added a commit' '
> + git range-diff --no-color master topic added >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  2716022 s/5/A/
> + 2:  fccce22 = 2:  b62accd s/4/A/
> + -:  --- > 3:  df46cfa s/6/A/
> + 3:  147e64e = 4:  3e64548 s/11/B/
> + 4:  a63e992 = 5:  12b4063 s/12/B/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'new base, A B C' '
> + git range-diff --no-color master topic rebased >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  cc9c443 s/5/A/
> + 2:  fccce22 = 2:  c5d9641 s/4/A/
> + 3:  147e64e = 3:  28cc2b6 s/11/B/
> + 4:  a63e992 = 4:  5628ab7 s/12/B/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'new base, B...C' '
> + # this syntax includes the commits from master!
> + git range-diff --no-color topic...rebased >actual &&
> + cat >expected <<-EOF &&
> + -:  --- > 1:  a31b12e unrelated
> + 1:  4de457d = 2:  cc9c443 s/5/A/
> + 2:  fccce22 = 3:  c5d9641 s/4/A/
> + 3:  147e64e = 4:  28cc2b6 s/11/B/
> + 4:  a63e992 = 5:  5628ab7 s/12/B/
> + EOF
> + test_cmp expected actual
> +'
> +
> +test_expect_success 'changed commit' '
> + git range-diff --no-color topic...changed >actual &&
> + cat >expected <<-EOF &&
> + 1:  4de457d = 1:  a4b s/5/A/
> + 2:

Re: [PATCH v5 05/21] range-diff: also show the diff between patches

2018-08-13 Thread Thomas Gummerer

On 08/13, Johannes Schindelin wrote:
> Hi Thomas,
> 
> On Sun, 12 Aug 2018, Thomas Gummerer wrote:
> 
> > On 08/10, Johannes Schindelin via GitGitGadget wrote:
> > > From: Johannes Schindelin 
> >
> > [...]
> > 
> > I don't think this handles "--" quite as would be expected.  Trying to
> > use "git range-diff -- js/range-diff-v4...HEAD" I get:
> > 
> > $ ./git range-diff -- js/range-diff-v4...HEAD
> > error: need two commit ranges
> > usage: git range-diff [] .. 
> > ..
> >or: git range-diff [] ...
> >or: git range-diff []   
> > 
> > --creation-factor 
> >   Percentage by which creation is weighted
> > --no-dual-color   color both diff and diff-between-diffs
> > 
> > while what I would have expected is to actually get a range diff.
> > This happens because after we break out of the loop we don't add the
> > actual ranges to argv, but just skip them instead.
> 
> Ouch, good point.
> 
> > I think something like the following should be squashed in to this
> > patch.
> > 
> > --->8---
> > diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> > index ef3ba22e29..132574c57a 100644
> > --- a/builtin/range-diff.c
> > +++ b/builtin/range-diff.c
> > @@ -53,6 +53,11 @@ int cmd_range_diff(int argc, const char **argv, const 
> > char *prefix)
> > else
> > i += c;
> > }
> > +   if (i < argc && !strcmp("--", argv[i])) {
> > +   i++; j++;
> > +   while (i < argc)
> > +   argv[j++] = argv[i++];
> > +   }
> > argc = j;
> > diff_setup_done();
> 
> I do not think that is correct. The original idea was for the first
> `parse_options()` call to keep the dashdash, for the second one to keep
> the dashdash, too, and for the final one to swallow it.
> 
> Also, if `i < argc` at this point, we already know that `argv[i]` refers
> to the dashdash, otherwise the previous loop would not have exited early.
> 
> I went with this simple version instead:
> 
>   while (i < argc)
>   argv[j++] = argv[i++];

Right, that's much better, thanks!

> Thanks!
> Dscho

Re: [PATCH v5 05/21] range-diff: also show the diff between patches

2018-08-12 Thread Thomas Gummerer

Hi Dscho,

On 08/10, Johannes Schindelin via GitGitGadget wrote:
> From: Johannes Schindelin 
>
> [..]
> 
> @@ -13,15 +14,38 @@ NULL
>  int cmd_range_diff(int argc, const char **argv, const char *prefix)
>  {
>   int creation_factor = 60;
> + struct diff_options diffopt = { NULL };
>   struct option options[] = {
>   OPT_INTEGER(0, "creation-factor", _factor,
>   N_("Percentage by which creation is weighted")),
>   OPT_END()
>   };
> - int res = 0;
> + int i, j, res = 0;
>   struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT;
>  
> + git_config(git_diff_ui_config, NULL);
> +
> + diff_setup();
> + diffopt.output_format = DIFF_FORMAT_PATCH;
> +
>   argc = parse_options(argc, argv, NULL, options,
> +  builtin_range_diff_usage, PARSE_OPT_KEEP_UNKNOWN |
> +  PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_KEEP_ARGV0);
> +
> + for (i = j = 1; i < argc && strcmp("--", argv[i]); ) {
> + int c = diff_opt_parse(, argv + i, argc - i, prefix);
> +
> + if (!c)
> + argv[j++] = argv[i++];
> + else
> + i += c;
> + }

I don't think this handles "--" quite as would be expected.  Trying to
use "git range-diff -- js/range-diff-v4...HEAD" I get:

$ ./git range-diff -- js/range-diff-v4...HEAD
error: need two commit ranges
usage: git range-diff [] .. 
..
   or: git range-diff [] ...
   or: git range-diff []   

--creation-factor 
  Percentage by which creation is weighted
--no-dual-color   color both diff and diff-between-diffs

while what I would have expected is to actually get a range diff.
This happens because after we break out of the loop we don't add the
actual ranges to argv, but just skip them instead.

I think something like the following should be squashed in to this
patch.

--->8---
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index ef3ba22e29..132574c57a 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -53,6 +53,11 @@ int cmd_range_diff(int argc, const char **argv, const char 
*prefix)
else
i += c;
}
+   if (i < argc && !strcmp("--", argv[i])) {
+   i++; j++;
+   while (i < argc)
+   argv[j++] = argv[i++];
+   }
argc = j;
diff_setup_done();
 
--->8---

> + argc = j;
> + diff_setup_done();
> +
> + /* Make sure that there are no unparsed options */
> + argc = parse_options(argc, argv, NULL,
> +  options + ARRAY_SIZE(options) - 1, /* OPT_END */
>builtin_range_diff_usage, 0);
>  
>   if (argc == 2) {
> @@ -59,7 +83,8 @@ int cmd_range_diff(int argc, const char **argv, const char 
> *prefix)
>   usage_with_options(builtin_range_diff_usage, options);
>   }
>  
> - res = show_range_diff(range1.buf, range2.buf, creation_factor);
> + res = show_range_diff(range1.buf, range2.buf, creation_factor,
> +   );
>  
>   strbuf_release();
>   strbuf_release();
> diff --git a/range-diff.c b/range-diff.c
> index 2d94200d3..71883a4b7 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -6,6 +6,7 @@
>  #include "hashmap.h"
>  #include "xdiff-interface.h"
>  #include "linear-assignment.h"
> +#include "diffcore.h"
>  
>  struct patch_util {
>   /* For the search for an exact match */
> @@ -258,7 +259,31 @@ static const char *short_oid(struct patch_util *util)
>   return find_unique_abbrev(>oid, DEFAULT_ABBREV);
>  }
>  
> -static void output(struct string_list *a, struct string_list *b)
> +static struct diff_filespec *get_filespec(const char *name, const char *p)
> +{
> + struct diff_filespec *spec = alloc_filespec(name);
> +
> + fill_filespec(spec, _oid, 0, 0644);
> + spec->data = (char *)p;
> + spec->size = strlen(p);
> + spec->should_munmap = 0;
> + spec->is_stdin = 1;
> +
> + return spec;
> +}
> +
> +static void patch_diff(const char *a, const char *b,
> +   struct diff_options *diffopt)
> +{
> + diff_queue(_queued_diff,
> +get_filespec("a", a), get_filespec("b", b));
> +
> + diffcore_std(diffopt);
> + diff_flush(diffopt);
> +}
> +
> +static void output(struct string_list *a, struct string_list *b,
> +struct diff_options *diffopt)
>  {
>   int i = 0, j = 0;
>  
> @@ -300,6 +325,9 @@ static void output(struct string_list *a, struct 
> string_list *b)
>   printf("%d: %s ! %d: %s\n",
>  b_util->matching + 1, short_oid(a_util),
>  j + 1, short_oid(b_util));
> + if (!(diffopt->output_format & DIFF_FORMAT_NO_OUTPUT))
> + patch_diff(a->items[b_util->matching].string,

exit code in git diff-index [was: Re: concurrent access to multiple local git repos is error prone]

2018-08-05 Thread Thomas Gummerer

On 08/05, Alexander Mills wrote:
> Also, as an aside, this seems to be a bug, but probably a known bug:
> 
> $ git diff-index  HEAD; echo $?
> 
> :100755 100755 60e5d683c1eb3e61381b1a8ec2db822b94b9faec
>  M  cli/npp_check_merge.sh
> :100644 100644 35a453544de41e2227ab0afab31a396d299139e9
>  M  src/find-projects.ts
> :100644 100644 c1ee7bc18e6604cbf0d16653e9366109d6ac2ec9
>  M  src/tables.ts
> :100644 100644 29d9674fbb48f223f3434179d666b2aa991ad05a
>  M
> src/vcs-helpers/git-helpers.ts
> 0
> 
> $ git diff-index --quiet HEAD; echo $?
> 1
> 
> different exit codes depending on whether --quiet was used. In this
> case, the exit code should be consistent.
> The bug is with the `git diff-index` command, as you can see.

This is not a bug. 'git diff-index' (and 'git diff') only give an exit
code other than 0 in the default case if something actually goes wrong
with generating the diff, which in the usual case it shouldn't.

To get an exit code from 'git diff-index' if there are differences,
you'd have to pass the '--exit-code' flag.  The '--quite' flag implies
'--exit-code', as there's not much use in 'git diff --quiet' if
there's not even an exit code showing whether there are differences or
not.

The original patch (and more importantly the reasoning why
'--exit-code' is not the default behaviour for 'git diff') can be
found at [1].

[1]: 
https://public-inbox.org/git/81b0412b0703131717k7106ee1cg964628f0bda2c...@mail.gmail.com/

> -alex

[PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-05 Thread Thomas Gummerer

Currently rerere can't handle nested conflicts and will error out when
it encounters such conflicts.  Do that by recursively calling the
'handle_conflict' function to normalize the conflict.

Note that a conflict like this would only be produced if a user
commits a file with conflict markers, and gets a conflict including
that in a susbsequent operation.

The conflict ID calculation here deserves some explanation:

As we are using the same handle_conflict function, the nested conflict
is normalized the same way as for non-nested conflicts, which means
the ancestor in the diff3 case is stripped out, and the parts of the
conflict are ordered alphabetically.

The conflict ID is however is only calculated in the top level
handle_conflict call, so it will include the markers that 'rerere'
adds to the output.  e.g. say there's the following conflict:

<<<<<<< HEAD
1
===
<<<<<<< HEAD
3
===
2
>>>>>>> branch-2
>>>>>>> branch-3~

it would be recorde as follows in the preimage:

<<<<<<<
1
===
<<<<<<<
2
===
3
>>>>>>>
>>>>>>>

and the conflict ID would be calculated as

sha1(1<<<<<<<
2
===
    3
>>>>>>>)

Stripping out vs. leaving the conflict markers in place in the inner
conflict should have no practical impact, but it simplifies the
implementation.

Signed-off-by: Thomas Gummerer 
---
 Documentation/technical/rerere.txt | 42 ++
 rerere.c   | 10 +--
 t/t4200-rerere.sh  | 37 ++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
index 3d10dbfa67..e65ba9b0c6 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -138,3 +138,45 @@ SHA1('BC').
 If there are multiple conflicts in one file, the sha1 is calculated
 the same way with all hunks appended to each other, in the order in
 which they appear in the file, separated by a  character.
+
+Nested conflicts
+
+
+Nested conflicts are handled very similarly to "simple" conflicts.
+Similar to simple conflicts, the conflict is first normalized by
+stripping the labels from conflict markers, stripping the common ancestor
+version, and the sorting the conflict hunks, both for the outer and the
+inner conflict.  This is done recursively, so any number of nested
+conflicts can be handled.
+
+The only difference is in how the conflict ID is calculated.  For the
+inner conflict, the conflict markers themselves are not stripped out
+before calculating the sha1.
+
+Say we have the following conflict for example:
+
+<<<<<<< HEAD
+1
+===
+<<<<<<< HEAD
+3
+===
+2
+>>>>>>> branch-2
+>>>>>>> branch-3~
+
+After stripping out the labels of the conflict markers, and sorting
+the hunks, the conflict would look as follows:
+
+<<<<<<<
+1
+===
+<<<<<<<
+2
+===
+3
+>>>>>>>
+>>>>>>>
+
+and finally the conflict ID would be calculated as:
+`sha1('1<<<<<<<\n3\n===\n2\n>>>>>>>')`
diff --git a/rerere.c b/rerere.c
index a35b88916c..f78bef80b1 100644
--- a/rerere.c
+++ b/rerere.c
@@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct 
rerere_io *io,
RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
} hunk = RR_SIDE_1;
struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
-   struct strbuf buf = STRBUF_INIT;
+   struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
int has_conflicts = -1;
 
while (!io->getline(, io)) {
if (is_cmarker(buf.buf, '<', marker_size)) {
-   break;
+   if (handle_conflict(, io, marker_size, NULL) < 
0)
+   break;
+   if (hunk == RR_SIDE_1)
+   strbuf_addbuf(, );
+   else
+   strbuf_addbuf(, );
+   strbuf_release();
} else if (is_cmarker(buf.buf, '|', marker_size)) {
if (hunk != RR_SIDE_1)
break;
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 23f9c0ca45..afaf085e42 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -601,4 +601,41 @@ test_expect_success 'rerere with unexpected conflict 
markers does not crash' '
git rerere clear
 '
 
+test_expect_success 'rerere with

[PATCH v4 08/11] rerere: factor out handle_conflict function

2018-08-05 Thread Thomas Gummerer

Factor out the handle_conflict function, which handles a single
conflict in a path.  This is in preparation for a subsequent commit,
where this function will be re-used.

Note that this does change the behaviour of 'git rerere' slightly.
Where previously we'd consider all files where an unmatched conflict
marker is found as invalid, we now only consider files invalid when
the "ours" conflict marker ("<<<<<<< ") is unmatched, not when
other conflict markers (e.g. "===") is unmatched.

Signed-off-by: Thomas Gummerer 
---
 rerere.c | 87 ++--
 1 file changed, 47 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index bf803043e2..2d62251943 100644
--- a/rerere.c
+++ b/rerere.c
@@ -384,85 +384,92 @@ static int is_cmarker(char *buf, int marker_char, int 
marker_size)
return isspace(*buf);
 }
 
-/*
- * Read contents a file with conflicts, normalize the conflicts
- * by (1) discarding the common ancestor version in diff3-style,
- * (2) reordering our side and their side so that whichever sorts
- * alphabetically earlier comes before the other one, while
- * computing the "conflict ID", which is just an SHA-1 hash of
- * one side of the conflict, NUL, the other side of the conflict,
- * and NUL concatenated together.
- *
- * Return 1 if conflict hunks are found, 0 if there are no conflict
- * hunks and -1 if an error occured.
- */
-static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
marker_size)
+static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX 
*ctx)
 {
-   git_SHA_CTX ctx;
-   int has_conflicts = 0;
enum {
-   RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
-   } hunk = RR_CONTEXT;
+   RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
+   } hunk = RR_SIDE_1;
struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
struct strbuf buf = STRBUF_INIT;
-
-   if (sha1)
-   git_SHA1_Init();
+   int has_conflicts = -1;
 
while (!io->getline(, io)) {
if (is_cmarker(buf.buf, '<', marker_size)) {
-   if (hunk != RR_CONTEXT)
-   goto bad;
-   hunk = RR_SIDE_1;
+   break;
} else if (is_cmarker(buf.buf, '|', marker_size)) {
if (hunk != RR_SIDE_1)
-   goto bad;
+   break;
hunk = RR_ORIGINAL;
} else if (is_cmarker(buf.buf, '=', marker_size)) {
if (hunk != RR_SIDE_1 && hunk != RR_ORIGINAL)
-   goto bad;
+   break;
hunk = RR_SIDE_2;
} else if (is_cmarker(buf.buf, '>', marker_size)) {
if (hunk != RR_SIDE_2)
-   goto bad;
+   break;
if (strbuf_cmp(, ) > 0)
strbuf_swap(, );
has_conflicts = 1;
-   hunk = RR_CONTEXT;
rerere_io_putconflict('<', marker_size, io);
rerere_io_putmem(one.buf, one.len, io);
rerere_io_putconflict('=', marker_size, io);
rerere_io_putmem(two.buf, two.len, io);
rerere_io_putconflict('>', marker_size, io);
-   if (sha1) {
-   git_SHA1_Update(, one.buf ? one.buf : "",
+   if (ctx) {
+   git_SHA1_Update(ctx, one.buf ? one.buf : "",
one.len + 1);
-   git_SHA1_Update(, two.buf ? two.buf : "",
+   git_SHA1_Update(ctx, two.buf ? two.buf : "",
two.len + 1);
}
-   strbuf_reset();
-   strbuf_reset();
+   break;
} else if (hunk == RR_SIDE_1)
strbuf_addbuf(, );
else if (hunk == RR_ORIGINAL)
; /* discard */
else if (hunk == RR_SIDE_2)
strbuf_addbuf(, );
-   else
-   rerere_io_putstr(buf.buf, io);
-   continue;
-   bad:
-   hunk = 99; /* force error exit */
-   break;
}
strbuf_release();
strbuf_release();
strbuf_release();
 
+   return has_conflicts;
+}
+
+/*
+ * Read contents a file with conflicts, normalize the conflicts
+ * by (1) discarding the common ancestor version in diff3-style,
+ * (2) reorde

[PATCH v4 09/11] rerere: return strbuf from handle path

2018-08-05 Thread Thomas Gummerer

Currently we write the conflict to disk directly in the handle_path
function.  To make it re-usable for nested conflicts, instead of
writing the conflict out directly, store it in a strbuf and let the
caller write it out.

This does mean some slight increase in memory usage, however that
increase is limited to the size of the largest conflict we've
currently processed.  We already keep one copy of the conflict in
memory, and it shouldn't be too large, so the increase in memory usage
seems acceptable.

As a bonus this lets us get replace the rerere_io_putconflict function
with a trivial two line function.

Signed-off-by: Thomas Gummerer 
---
 rerere.c | 58 ++--
 1 file changed, 18 insertions(+), 40 deletions(-)

diff --git a/rerere.c b/rerere.c
index 2d62251943..a35b88916c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -302,38 +302,6 @@ static void rerere_io_putstr(const char *str, struct 
rerere_io *io)
ferr_puts(str, io->output, >wrerror);
 }
 
-/*
- * Write a conflict marker to io->output (if defined).
- */
-static void rerere_io_putconflict(int ch, int size, struct rerere_io *io)
-{
-   char buf[64];
-
-   while (size) {
-   if (size <= sizeof(buf) - 2) {
-   memset(buf, ch, size);
-   buf[size] = '\n';
-   buf[size + 1] = '\0';
-   size = 0;
-   } else {
-   int sz = sizeof(buf) - 1;
-
-   /*
-* Make sure we will not write everything out
-* in this round by leaving at least 1 byte
-* for the next round, giving the next round
-* a chance to add the terminating LF.  Yuck.
-*/
-   if (size <= sz)
-   sz -= (sz - size) + 1;
-   memset(buf, ch, sz);
-   buf[sz] = '\0';
-   size -= sz;
-   }
-   rerere_io_putstr(buf, io);
-   }
-}
-
 static void rerere_io_putmem(const char *mem, size_t sz, struct rerere_io *io)
 {
if (io->output)
@@ -384,7 +352,14 @@ static int is_cmarker(char *buf, int marker_char, int 
marker_size)
return isspace(*buf);
 }
 
-static int handle_conflict(struct rerere_io *io, int marker_size, git_SHA_CTX 
*ctx)
+static void rerere_strbuf_putconflict(struct strbuf *buf, int ch, size_t size)
+{
+   strbuf_addchars(buf, ch, size);
+   strbuf_addch(buf, '\n');
+}
+
+static int handle_conflict(struct strbuf *out, struct rerere_io *io,
+  int marker_size, git_SHA_CTX *ctx)
 {
enum {
RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
@@ -410,11 +385,11 @@ static int handle_conflict(struct rerere_io *io, int 
marker_size, git_SHA_CTX *c
if (strbuf_cmp(, ) > 0)
strbuf_swap(, );
has_conflicts = 1;
-   rerere_io_putconflict('<', marker_size, io);
-   rerere_io_putmem(one.buf, one.len, io);
-   rerere_io_putconflict('=', marker_size, io);
-   rerere_io_putmem(two.buf, two.len, io);
-   rerere_io_putconflict('>', marker_size, io);
+   rerere_strbuf_putconflict(out, '<', marker_size);
+   strbuf_addbuf(out, );
+   rerere_strbuf_putconflict(out, '=', marker_size);
+   strbuf_addbuf(out, );
+   rerere_strbuf_putconflict(out, '>', marker_size);
if (ctx) {
git_SHA1_Update(ctx, one.buf ? one.buf : "",
one.len + 1);
@@ -451,21 +426,24 @@ static int handle_conflict(struct rerere_io *io, int 
marker_size, git_SHA_CTX *c
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
marker_size)
 {
git_SHA_CTX ctx;
-   struct strbuf buf = STRBUF_INIT;
+   struct strbuf buf = STRBUF_INIT, out = STRBUF_INIT;
int has_conflicts = 0;
if (sha1)
git_SHA1_Init();
 
while (!io->getline(, io)) {
if (is_cmarker(buf.buf, '<', marker_size)) {
-   has_conflicts = handle_conflict(io, marker_size,
+   has_conflicts = handle_conflict(, io, marker_size,
sha1 ?  : NULL);
if (has_conflicts < 0)
break;
+   rerere_io_putmem(out.buf, out.len, io);
+   strbuf_reset();
} else
rerere_io_putstr(buf.buf, io);
}
strbuf_release();
+   strbuf_release();
 
if (sha1)

[PATCH v4 11/11] rerere: recalculate conflict ID when unresolved conflict is committed

2018-08-05 Thread Thomas Gummerer

Currently when a user doesn't resolve a conflict, commits the results,
and does an operation which creates another conflict, rerere will use
the ID of the previously unresolved conflict for the new conflict.
This is because the conflict is kept in the MERGE_RR file, which
'rerere' reads every time it is invoked.

After the new conflict is solved, rerere will record the resolution
with the ID of the old conflict.  So in order to replay the conflict,
both merges would have to be re-done, instead of just the last one, in
order for rerere to be able to automatically resolve the conflict.

Instead of that, assign a new conflict ID if there are still conflicts
in a file and the file had conflicts at a previous step.  This ID
matches the conflict we actually resolved at the corresponding step.

Note that there are no backwards compatibility worries here, as rerere
would have failed to even normalize the conflict before this patch
series.

Signed-off-by: Thomas Gummerer 
---
 rerere.c  | 7 +++
 t/t4200-rerere.sh | 7 +++
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/rerere.c b/rerere.c
index f78bef80b1..dd81d09e19 100644
--- a/rerere.c
+++ b/rerere.c
@@ -815,7 +815,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
struct rerere_id *id;
unsigned char sha1[20];
const char *path = conflict.items[i].string;
-   int ret, has_string;
+   int ret;
 
/*
 * Ask handle_file() to scan and assign a
@@ -823,12 +823,11 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 * yet.
 */
ret = handle_file(path, sha1, NULL);
-   has_string = string_list_has_string(rr, path);
-   if (ret < 0 && has_string) {
+   if (ret != 0 && string_list_has_string(rr, path)) {
remove_variant(string_list_lookup(rr, path)->util);
string_list_remove(rr, path, 1);
}
-   if (ret < 1 || has_string)
+   if (ret < 1)
continue;
 
id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index afaf085e42..819f6dd672 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -635,6 +635,13 @@ test_expect_success 'rerere with inner conflict markers' '
git commit -q -m "will solve conflicts later" &&
test_must_fail git merge A &&
cat test >actual &&
+   test_cmp expect actual &&
+
+   git add test &&
+   git commit -m "rerere solved conflict" &&
+   git reset --hard HEAD~ &&
+   test_must_fail git merge A &&
+   cat test >actual &&
test_cmp expect actual
 '
 
-- 
2.18.0.720.gf7a957e2e7

[PATCH v4 07/11] rerere: only return whether a path has conflicts or not

2018-08-05 Thread Thomas Gummerer

We currently return the exact number of conflict hunks a certain path
has from the 'handle_paths' function.  However all of its callers only
care whether there are conflicts or not or if there is an error.
Return only that information, and document that only that information
is returned.  This will simplify the code in the subsequent steps.

Signed-off-by: Thomas Gummerer 
---
 rerere.c | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/rerere.c b/rerere.c
index 895ad80c0c..bf803043e2 100644
--- a/rerere.c
+++ b/rerere.c
@@ -393,12 +393,13 @@ static int is_cmarker(char *buf, int marker_char, int 
marker_size)
  * one side of the conflict, NUL, the other side of the conflict,
  * and NUL concatenated together.
  *
- * Return the number of conflict hunks found.
+ * Return 1 if conflict hunks are found, 0 if there are no conflict
+ * hunks and -1 if an error occured.
  */
 static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
marker_size)
 {
git_SHA_CTX ctx;
-   int hunk_no = 0;
+   int has_conflicts = 0;
enum {
RR_CONTEXT = 0, RR_SIDE_1, RR_SIDE_2, RR_ORIGINAL
} hunk = RR_CONTEXT;
@@ -426,7 +427,7 @@ static int handle_path(unsigned char *sha1, struct 
rerere_io *io, int marker_siz
goto bad;
if (strbuf_cmp(, ) > 0)
strbuf_swap(, );
-   hunk_no++;
+   has_conflicts = 1;
hunk = RR_CONTEXT;
rerere_io_putconflict('<', marker_size, io);
rerere_io_putmem(one.buf, one.len, io);
@@ -462,7 +463,7 @@ static int handle_path(unsigned char *sha1, struct 
rerere_io *io, int marker_siz
git_SHA1_Final(sha1, );
if (hunk != RR_CONTEXT)
return -1;
-   return hunk_no;
+   return has_conflicts;
 }
 
 /*
@@ -471,7 +472,7 @@ static int handle_path(unsigned char *sha1, struct 
rerere_io *io, int marker_siz
  */
 static int handle_file(const char *path, unsigned char *sha1, const char 
*output)
 {
-   int hunk_no = 0;
+   int has_conflicts = 0;
struct rerere_io_file io;
int marker_size = ll_merge_marker_size(path);
 
@@ -491,7 +492,7 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
}
}
 
-   hunk_no = handle_path(sha1, (struct rerere_io *), marker_size);
+   has_conflicts = handle_path(sha1, (struct rerere_io *), marker_size);
 
fclose(io.input);
if (io.io.wrerror)
@@ -500,14 +501,14 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
if (io.io.output && fclose(io.io.output))
io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
-   if (hunk_no < 0) {
+   if (has_conflicts < 0) {
if (output)
unlink_or_warn(output);
return error(_("could not parse conflict hunks in '%s'"), path);
}
if (io.io.wrerror)
return -1;
-   return hunk_no;
+   return has_conflicts;
 }
 
 /*
@@ -954,7 +955,7 @@ static int handle_cache(const char *path, unsigned char 
*sha1, const char *outpu
mmfile_t mmfile[3] = {{NULL}};
mmbuffer_t result = {NULL, 0};
const struct cache_entry *ce;
-   int pos, len, i, hunk_no;
+   int pos, len, i, has_conflicts;
struct rerere_io_mem io;
int marker_size = ll_merge_marker_size(path);
 
@@ -1008,11 +1009,11 @@ static int handle_cache(const char *path, unsigned char 
*sha1, const char *outpu
 * Grab the conflict ID and optionally write the original
 * contents with conflict markers out.
 */
-   hunk_no = handle_path(sha1, (struct rerere_io *), marker_size);
+   has_conflicts = handle_path(sha1, (struct rerere_io *), marker_size);
strbuf_release();
if (io.io.output)
fclose(io.io.output);
-   return hunk_no;
+   return has_conflicts;
 }
 
 static int rerere_forget_one_path(const char *path, struct string_list *rr)
-- 
2.18.0.720.gf7a957e2e7

[PATCH v4 01/11] rerere: unify error messages when read_cache fails

2018-08-05 Thread Thomas Gummerer

We have multiple different variants of the error message we show to
the user if 'read_cache' fails.  The "Could not read index" variant we
are using in 'rerere.c' is currently not used anywhere in translated
form.

As a subsequent commit will mark all output that comes from 'rerere.c'
for translation, make the life of the translators a little bit easier
by using a string that is used elsewhere, and marked for translation
there, and thus most likely already translated.

"index file corrupt" seems to be the most common error message we show
when 'read_cache' fails, so use that here as well.

Signed-off-by: Thomas Gummerer 
---
 rerere.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/rerere.c b/rerere.c
index e0862e2778..473d32a5cd 100644
--- a/rerere.c
+++ b/rerere.c
@@ -568,7 +568,7 @@ static int find_conflict(struct string_list *conflict)
 {
int i;
if (read_cache() < 0)
-   return error("Could not read index");
+   return error("index file corrupt");
 
for (i = 0; i < active_nr;) {
int conflict_type;
@@ -601,7 +601,7 @@ int rerere_remaining(struct string_list *merge_rr)
if (setup_rerere(merge_rr, RERERE_READONLY))
return 0;
if (read_cache() < 0)
-   return error("Could not read index");
+   return error("index file corrupt");
 
for (i = 0; i < active_nr;) {
int conflict_type;
@@ -1103,7 +1103,7 @@ int rerere_forget(struct pathspec *pathspec)
struct string_list merge_rr = STRING_LIST_INIT_DUP;
 
if (read_cache() < 0)
-   return error("Could not read index");
+   return error("index file corrupt");
 
fd = setup_rerere(_rr, RERERE_NOAUTOUPDATE);
if (fd < 0)
-- 
2.18.0.720.gf7a957e2e7

[PATCH v4 00/11] rerere: handle nested conflicts

2018-08-05 Thread Thomas Gummerer

The previous rounds were at
<20180520211210.1248-1-t.gumme...@gmail.com>,
<20180605215219.28783-1-t.gumme...@gmail.com> and
<20180714214443.7184-1-t.gumme...@gmail.com>.

Thanks Junio for the review and Simon for pointing out an error in my
commit message.

The changes in this round are mainly improving the commit messages,
and polishing the documentation.

It also simplifies one test case in patch 6/11.

Patches 10 and 11 are still included, however I'm not going to be too
sad if we decide to not include them, as they really only help in an
obscure case, which could be considered using git "wrong".

I also realized that while I wrote "no functional changes intended" in
7/11, and functional changes were in fact not intended, there still is
a slight functional change.  As I think that's a good change, I
documented it in the commit message.

Thomas Gummerer (11):
  rerere: unify error messages when read_cache fails
  rerere: lowercase error messages
  rerere: wrap paths in output in sq
  rerere: mark strings for translation
  rerere: add documentation for conflict normalization
  rerere: fix crash with files rerere can't handle
  rerere: only return whether a path has conflicts or not
  rerere: factor out handle_conflict function
  rerere: return strbuf from handle path
  rerere: teach rerere to handle nested conflicts
  rerere: recalculate conflict ID when unresolved conflict is committed

 Documentation/technical/rerere.txt | 182 +
 builtin/rerere.c   |   4 +-
 rerere.c   | 243 ++---
 t/t4200-rerere.sh  |  65 
 4 files changed, 365 insertions(+), 129 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

Range diff below:

 1:  ce876f1b6b =  1:  018bd68a8a rerere: unify error messages when read_cache 
fails
 2:  0326503c4a =  2:  281fcbf24f rerere: lowercase error messages
 3:  a33211e3d3 =  3:  b6d5e2e26d rerere: wrap paths in output in sq
 4:  3da84604f0 !  4:  6ed390c8f5 rerere: mark strings for translation
@@ -2,7 +2,7 @@
 
 rerere: mark strings for translation
 
-'git rerere' is considered a plumbing command and as such its output
+'git rerere' is considered a porcelain command and as such its output
 should be translated.  Its functionality is also only enabled through
 a config setting, so scripts really shouldn't rely on the output
 either way.
 5:  749d49a625 !  5:  3cef1d57bc rerere: add documentation for conflict 
normalization
@@ -28,8 +28,8 @@
 +conflicts before writing them to the rerere database.
 +
 +Different conflict styles and branch names are normalized by stripping
-+the labels from the conflict markers, and removing extraneous
-+information from the `diff3` conflict style. Branches that are merged
++the labels from the conflict markers, and removing the common ancestor
++version from the `diff3` conflict style. Branches that are merged
 +in different order are normalized by sorting the conflict hunks.  More
 +on each of those steps in the following sections.
 +
@@ -37,8 +37,8 @@
 +calculated based on the normalized conflict, which is later used by
 +rerere to look up the conflict in the rerere database.
 +
-+Stripping extraneous information
-+
++Removing the common ancestor version
++
 +
 +Say we have three branches AB, AC and AC2.  The common ancestor of
 +these branches has a file with a line containing the string "A" (for
@@ -79,7 +79,7 @@
 +
 +By extension, this means that rerere should recognize that the above
 +conflicts are the same.  To do this, the labels on the conflict
-+markers are stripped, and the diff3 output is removed.  The above
++markers are stripped, and the common ancestor version is removed.  The 
above
 +examples would both result in the following normalized conflict:
 +
 +<<<<<<<
 6:  d465bd087e !  6:  a02d90157d rerere: fix crash when conflict goes 
unresolved
@@ -1,37 +1,42 @@
 Author: Thomas Gummerer 
 
-rerere: fix crash when conflict goes unresolved
+rerere: fix crash with files rerere can't handle
 
-Currently when a user doesn't resolve a conflict in a file, but
-commits the file with the conflict markers, and later the file ends up
-in a state in which rerere can't handle it, subsequent rerere
-operations that are interested in that path, such as 'rerere clear' or
-'rerere forget ' will fail, or even worse in the case of 'rerere
-clear' segfault.
+Currently when a user does a conflict resolution and ends it (in any
+way that calls 'git rerere' again) with a file 'rerere' can't handle,
+subsequent rerere operat

[PATCH v4 03/11] rerere: wrap paths in output in sq

2018-08-05 Thread Thomas Gummerer

It looks like most paths in the output in the git codebase are wrapped
in single quotes.  Standardize on that in rerere as well.

Apart from being more consistent, this also makes some of the strings
match strings that are already translated in other parts of the
codebase, thus reducing the work for translators, when the strings are
marked for translation in a subsequent commit.

Signed-off-by: Thomas Gummerer 
---
 builtin/rerere.c |  2 +-
 rerere.c | 26 +-
 2 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0bc40298c2..e0c67c98e9 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char 
*prefix)
const char *path = merge_rr.items[i].string;
const struct rerere_id *id = merge_rr.items[i].util;
if (diff_two(rerere_path(id, "preimage"), path, path, 
path))
-   die("unable to generate diff for %s", 
rerere_path(id, NULL));
+   die("unable to generate diff for '%s'", 
rerere_path(id, NULL));
}
} else
usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index c5d9ea171f..cde1f6e696 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
io.input = fopen(path, "r");
io.io.wrerror = 0;
if (!io.input)
-   return error_errno("could not open %s", path);
+   return error_errno("could not open '%s'", path);
 
if (output) {
io.io.output = fopen(output, "w");
if (!io.io.output) {
-   error_errno("could not write %s", output);
+   error_errno("could not write '%s'", output);
fclose(io.input);
return -1;
}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
 
fclose(io.input);
if (io.io.wrerror)
-   error("there were errors while writing %s (%s)",
+   error("there were errors while writing '%s' (%s)",
  path, strerror(io.io.wrerror));
if (io.io.output && fclose(io.io.output))
-   io.io.wrerror = error_errno("failed to flush %s", path);
+   io.io.wrerror = error_errno("failed to flush '%s'", path);
 
if (hunk_no < 0) {
if (output)
unlink_or_warn(output);
-   return error("could not parse conflict hunks in %s", path);
+   return error("could not parse conflict hunks in '%s'", path);
}
if (io.io.wrerror)
return -1;
@@ -684,17 +684,17 @@ static int merge(const struct rerere_id *id, const char 
*path)
 * Mark that "postimage" was used to help gc.
 */
if (utime(rerere_path(id, "postimage"), NULL) < 0)
-   warning_errno("failed utime() on %s",
+   warning_errno("failed utime() on '%s'",
  rerere_path(id, "postimage"));
 
/* Update "path" with the resolution */
f = fopen(path, "w");
if (!f)
-   return error_errno("could not open %s", path);
+   return error_errno("could not open '%s'", path);
if (fwrite(result.ptr, result.size, 1, f) != 1)
-   error_errno("could not write %s", path);
+   error_errno("could not write '%s'", path);
if (fclose(f))
-   return error_errno("writing %s failed", path);
+   return error_errno("writing '%s' failed", path);
 
 out:
free(cur.ptr);
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
return rr_cache_exists;
 
if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-   die("could not create directory %s", git_path_rr_cache());
+   die("could not create directory '%s'", git_path_rr_cache());
return 1;
 }
 
@@ -1067,9 +1067,9 @@ static int rerere_forget_one_path(const char *path, 
struct string_list *rr)
filename = rerere_path(id, "postimage");
if (unlink(filename)) {
if (errno == ENOENT)
-   error("no remembered resolution for %s", path);
+   error("no remembered resolution for '%s'", path);
else
-   error_errno("cannot

[PATCH v4 02/11] rerere: lowercase error messages

2018-08-05 Thread Thomas Gummerer

Documentation/CodingGuidelines mentions that error messages should be
lowercase.  Prior to marking them for translation follow that pattern
in rerere as well, so translators won't have to translate messages
that don't conform to our guidelines.

Signed-off-by: Thomas Gummerer 
---
 rerere.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/rerere.c b/rerere.c
index 473d32a5cd..c5d9ea171f 100644
--- a/rerere.c
+++ b/rerere.c
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
io.input = fopen(path, "r");
io.io.wrerror = 0;
if (!io.input)
-   return error_errno("Could not open %s", path);
+   return error_errno("could not open %s", path);
 
if (output) {
io.io.output = fopen(output, "w");
if (!io.io.output) {
-   error_errno("Could not write %s", output);
+   error_errno("could not write %s", output);
fclose(io.input);
return -1;
}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
 
fclose(io.input);
if (io.io.wrerror)
-   error("There were errors while writing %s (%s)",
+   error("there were errors while writing %s (%s)",
  path, strerror(io.io.wrerror));
if (io.io.output && fclose(io.io.output))
-   io.io.wrerror = error_errno("Failed to flush %s", path);
+   io.io.wrerror = error_errno("failed to flush %s", path);
 
if (hunk_no < 0) {
if (output)
unlink_or_warn(output);
-   return error("Could not parse conflict hunks in %s", path);
+   return error("could not parse conflict hunks in %s", path);
}
if (io.io.wrerror)
return -1;
@@ -690,11 +690,11 @@ static int merge(const struct rerere_id *id, const char 
*path)
/* Update "path" with the resolution */
f = fopen(path, "w");
if (!f)
-   return error_errno("Could not open %s", path);
+   return error_errno("could not open %s", path);
if (fwrite(result.ptr, result.size, 1, f) != 1)
-   error_errno("Could not write %s", path);
+   error_errno("could not write %s", path);
if (fclose(f))
-   return error_errno("Writing %s failed", path);
+   return error_errno("writing %s failed", path);
 
 out:
free(cur.ptr);
@@ -720,7 +720,7 @@ static void update_paths(struct string_list *update)
 
if (write_locked_index(_index, _lock,
   COMMIT_LOCK | SKIP_IF_UNCHANGED))
-   die("Unable to write new index file");
+   die("unable to write new index file");
 }
 
 static void remove_variant(struct rerere_id *id)
@@ -878,7 +878,7 @@ static int is_rerere_enabled(void)
return rr_cache_exists;
 
if (!rr_cache_exists && mkdir_in_gitdir(git_path_rr_cache()))
-   die("Could not create directory %s", git_path_rr_cache());
+   die("could not create directory %s", git_path_rr_cache());
return 1;
 }
 
@@ -1031,7 +1031,7 @@ static int rerere_forget_one_path(const char *path, 
struct string_list *rr)
 */
ret = handle_cache(path, sha1, NULL);
if (ret < 1)
-   return error("Could not parse conflict hunks in '%s'", path);
+   return error("could not parse conflict hunks in '%s'", path);
 
/* Nuke the recorded resolution for the conflict */
id = new_rerere_id(sha1);
@@ -1049,7 +1049,7 @@ static int rerere_forget_one_path(const char *path, 
struct string_list *rr)
handle_cache(path, sha1, rerere_path(id, "thisimage"));
if (read_mmfile(, rerere_path(id, "thisimage"))) {
free(cur.ptr);
-   error("Failed to update conflicted state in '%s'", 
path);
+   error("failed to update conflicted state in '%s'", 
path);
goto fail_exit;
}
cleanly_resolved = !try_merge(id, path, , );
-- 
2.18.0.720.gf7a957e2e7

[PATCH v4 06/11] rerere: fix crash with files rerere can't handle

2018-08-05 Thread Thomas Gummerer

Currently when a user does a conflict resolution and ends it (in any
way that calls 'git rerere' again) with a file 'rerere' can't handle,
subsequent rerere operations that are interested in that path, such as
'rerere clear' or 'rerere forget ' will fail, or even worse in
the case of 'rerere clear' segfault.

Such states include nested conflicts, or a conflict marker that
doesn't have any match.

This is because 'git rerere' calculates a conflict file and writes it
to the MERGE_RR file.  When the user then changes the file in any way
rerere can't handle, and then calls 'git rerere' on it again to record
the conflict resolution, the handle_file function fails, and removes
the 'preimage' file in the rr-cache in the process, while leaving the
ID in the MERGE_RR file.

Now when 'rerere clear' is run, it reads the ID from the MERGE_RR
file, however the 'fit_variant' function for the ID is never called as
the 'preimage' file does not exist anymore.  This means
'collection->status' in 'has_rerere_resolution' is NULL, and the
command will crash.

To fix this, remove the rerere ID from the MERGE_RR file in the case
when we can't handle it, just after the 'preimage' file was removed
and remove the corresponding variant from .git/rr-cache/.  Removing it
unconditionally is fine here, because if the user would have resolved
the conflict and ran rerere, the entry would no longer be in the
MERGE_RR file, so we wouldn't have this problem in the first place,
while if the conflict was not resolved.

Currently there is nothing left in this folder, as the 'preimage'
was already deleted by the 'handle_file' function, so 'remove_variant'
is a no-op.  Still call the function, to make sure we clean everything
up, in case we add some other files corresponding to a variant in the
future.

Note that other variants that have the same conflict ID will not be
touched.

Signed-off-by: Thomas Gummerer 
---
 rerere.c  | 12 +++-
 t/t4200-rerere.sh | 21 +
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/rerere.c b/rerere.c
index da1ab54027..895ad80c0c 100644
--- a/rerere.c
+++ b/rerere.c
@@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int fd)
struct rerere_id *id;
unsigned char sha1[20];
const char *path = conflict.items[i].string;
-   int ret;
-
-   if (string_list_has_string(rr, path))
-   continue;
+   int ret, has_string;
 
/*
 * Ask handle_file() to scan and assign a
@@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int fd)
 * yet.
 */
ret = handle_file(path, sha1, NULL);
-   if (ret < 1)
+   has_string = string_list_has_string(rr, path);
+   if (ret < 0 && has_string) {
+   remove_variant(string_list_lookup(rr, path)->util);
+   string_list_remove(rr, path, 1);
+   }
+   if (ret < 1 || has_string)
continue;
 
id = new_rerere_id(sha1);
diff --git a/t/t4200-rerere.sh b/t/t4200-rerere.sh
index 8417e5a4b1..23f9c0ca45 100755
--- a/t/t4200-rerere.sh
+++ b/t/t4200-rerere.sh
@@ -580,4 +580,25 @@ test_expect_success 'multiple identical conflicts' '
count_pre_post 0 0
 '
 
+test_expect_success 'rerere with unexpected conflict markers does not crash' '
+   git reset --hard &&
+
+   git checkout -b branch-1 master &&
+   echo "bar" >test &&
+   git add test &&
+   git commit -q -m two &&
+
+   git reset --hard &&
+   git checkout -b branch-2 master &&
+   echo "foo" >test &&
+   git add test &&
+   git commit -q -a -m one &&
+
+   test_must_fail git merge branch-1 &&
+   echo "<<<<<<< a" >test &&
+   git rerere &&
+
+   git rerere clear
+'
+
 test_done
-- 
2.18.0.720.gf7a957e2e7

[PATCH v4 05/11] rerere: add documentation for conflict normalization

2018-08-05 Thread Thomas Gummerer

Add some documentation for the logic behind the conflict normalization
in rerere.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 Documentation/technical/rerere.txt | 140 +
 rerere.c   |   4 -
 2 files changed, 140 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/technical/rerere.txt

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
new file mode 100644
index 00..3d10dbfa67
--- /dev/null
+++ b/Documentation/technical/rerere.txt
@@ -0,0 +1,140 @@
+Rerere
+==
+
+This document describes the rerere logic.
+
+Conflict normalization
+--
+
+To ensure recorded conflict resolutions can be looked up in the rerere
+database, even when branches are merged in a different order,
+different branches are merged that result in the same conflict, or
+when different conflict style settings are used, rerere normalizes the
+conflicts before writing them to the rerere database.
+
+Different conflict styles and branch names are normalized by stripping
+the labels from the conflict markers, and removing the common ancestor
+version from the `diff3` conflict style. Branches that are merged
+in different order are normalized by sorting the conflict hunks.  More
+on each of those steps in the following sections.
+
+Once these two normalization operations are applied, a conflict ID is
+calculated based on the normalized conflict, which is later used by
+rerere to look up the conflict in the rerere database.
+
+Removing the common ancestor version
+
+
+Say we have three branches AB, AC and AC2.  The common ancestor of
+these branches has a file with a line containing the string "A" (for
+brevity this is called "line A" in the rest of the document).  In
+branch AB this line is changed to "B", in AC, this line is changed to
+"C", and branch AC2 is forked off of AC, after the line was changed to
+"C".
+
+Forking a branch ABAC off of branch AB and then merging AC into it, we
+get a conflict like the following:
+
+<<<<<<< HEAD
+B
+===
+C
+>>>>>>> AC
+
+Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB
+and then merging branch AC2 into it), using the diff3 conflict style,
+we get a conflict like the following:
+
+<<<<<<< HEAD
+B
+||| merged common ancestors
+A
+===
+C
+>>>>>>> AC2
+
+By resolving this conflict, to leave line D, the user declares:
+
+After examining what branches AB and AC did, I believe that making
+line A into line D is the best thing to do that is compatible with
+what AB and AC wanted to do.
+
+As branch AC2 refers to the same commit as AC, the above implies that
+this is also compatible what AB and AC2 wanted to do.
+
+By extension, this means that rerere should recognize that the above
+conflicts are the same.  To do this, the labels on the conflict
+markers are stripped, and the common ancestor version is removed.  The above
+examples would both result in the following normalized conflict:
+
+<<<<<<<
+B
+===
+C
+>>>>>>>
+
+Sorting hunks
+~
+
+As before, lets imagine that a common ancestor had a file with line A
+its early part, and line X in its late part.  And then four branches
+are forked that do these things:
+
+- AB: changes A to B
+- AC: changes A to C
+- XY: changes X to Y
+- XZ: changes X to Z
+
+Now, forking a branch ABAC off of branch AB and then merging AC into
+it, and forking a branch ACAB off of branch AC and then merging AB
+into it, would yield the conflict in a different order.  The former
+would say "A became B or C, what now?" while the latter would say "A
+became C or B, what now?"
+
+As a reminder, the act of merging AC into ABAC and resolving the
+conflict to leave line D means that the user declares:
+
+After examining what branches AB and AC did, I believe that
+making line A into line D is the best thing to do that is
+compatible with what AB and AC wanted to do.
+
+So the conflict we would see when merging AB into ACAB should be
+resolved the same way---it is the resolution that is in line with that
+declaration.
+
+Imagine that similarly previously a branch XYXZ was forked from XY,
+and XZ was merged into it, and resolved "X became Y or Z" into "X
+became W".
+
+Now, if a branch ABXY was forked from AB and then merged XY, then ABXY
+would have line B in its early part and line Y in its later part.
+Such a merge would be quite clean.  We can construct 4 combinations
+using these four branches ((AB, AC) x (XY, XZ)).
+
+Merging ABXY and ACXZ would make "an early A became B or C, a late X
+became Y or Z" conflict, while merging A

[PATCH v4 04/11] rerere: mark strings for translation

2018-08-05 Thread Thomas Gummerer

'git rerere' is considered a porcelain command and as such its output
should be translated.  Its functionality is also only enabled through
a config setting, so scripts really shouldn't rely on the output
either way.

Signed-off-by: Thomas Gummerer 
---
 builtin/rerere.c |  4 +--
 rerere.c | 68 
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/builtin/rerere.c b/builtin/rerere.c
index e0c67c98e9..5ed941b91f 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -75,7 +75,7 @@ int cmd_rerere(int argc, const char **argv, const char 
*prefix)
if (!strcmp(argv[0], "forget")) {
struct pathspec pathspec;
if (argc < 2)
-   warning("'git rerere forget' without paths is 
deprecated");
+   warning(_("'git rerere forget' without paths is 
deprecated"));
parse_pathspec(, 0, PATHSPEC_PREFER_CWD,
   prefix, argv + 1);
return rerere_forget();
@@ -107,7 +107,7 @@ int cmd_rerere(int argc, const char **argv, const char 
*prefix)
const char *path = merge_rr.items[i].string;
const struct rerere_id *id = merge_rr.items[i].util;
if (diff_two(rerere_path(id, "preimage"), path, path, 
path))
-   die("unable to generate diff for '%s'", 
rerere_path(id, NULL));
+   die(_("unable to generate diff for '%s'"), 
rerere_path(id, NULL));
}
} else
usage_with_options(rerere_usage, options);
diff --git a/rerere.c b/rerere.c
index cde1f6e696..be98c0afcb 100644
--- a/rerere.c
+++ b/rerere.c
@@ -212,7 +212,7 @@ static void read_rr(struct string_list *rr)
 
/* There has to be the hash, tab, path and then NUL */
if (buf.len < 42 || get_sha1_hex(buf.buf, sha1))
-   die("corrupt MERGE_RR");
+   die(_("corrupt MERGE_RR"));
 
if (buf.buf[40] != '.') {
variant = 0;
@@ -221,10 +221,10 @@ static void read_rr(struct string_list *rr)
errno = 0;
variant = strtol(buf.buf + 41, , 10);
if (errno)
-   die("corrupt MERGE_RR");
+   die(_("corrupt MERGE_RR"));
}
if (*(path++) != '\t')
-   die("corrupt MERGE_RR");
+   die(_("corrupt MERGE_RR"));
buf.buf[40] = '\0';
id = new_rerere_id_hex(buf.buf);
id->variant = variant;
@@ -259,12 +259,12 @@ static int write_rr(struct string_list *rr, int out_fd)
rr->items[i].string, 0);
 
if (write_in_full(out_fd, buf.buf, buf.len) < 0)
-   die("unable to write rerere record");
+   die(_("unable to write rerere record"));
 
strbuf_release();
}
if (commit_lock_file(_lock) != 0)
-   die("unable to write rerere record");
+   die(_("unable to write rerere record"));
return 0;
 }
 
@@ -484,12 +484,12 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
io.input = fopen(path, "r");
io.io.wrerror = 0;
if (!io.input)
-   return error_errno("could not open '%s'", path);
+   return error_errno(_("could not open '%s'"), path);
 
if (output) {
io.io.output = fopen(output, "w");
if (!io.io.output) {
-   error_errno("could not write '%s'", output);
+   error_errno(_("could not write '%s'"), output);
fclose(io.input);
return -1;
}
@@ -499,15 +499,15 @@ static int handle_file(const char *path, unsigned char 
*sha1, const char *output
 
fclose(io.input);
if (io.io.wrerror)
-   error("there were errors while writing '%s' (%s)",
+   error(_("there were errors while writing '%s' (%s)"),
  path, strerror(io.io.wrerror));
if (io.io.output && fclose(io.io.output))
-   io.io.wrerror = error_errno("failed to flush '%s'", path);
+   io.io.wrerror = error_errno(_("failed to flush '%s'"), path);
 
if (hunk_no < 0) {
if (output)
unlink_or_warn(output);
-   return error("could not parse conflict hunks in '%s'", p

Re: git worktree add verbosity

2018-08-05 Thread Thomas Gummerer

On 08/05, Karen Arutyunov wrote:
> Hello,
> 
> We are using git for automation in our build2 project.
> 
> What's quite inconvenient is that the 'git worktree add' command prints some
> output by default and there is no way to suppress it, as it normally can be
> achieved with the --quiet option for the most of git commands.
> 
> Could you add support for the --quiet option for the worktree command?

I think a '--quiet' option would be nice to have.  I wouldn't need it
much personally, so I'm probably not going to work on it, but it would
be great if you could work on that.  The best place to get started
with contributing is to read Documentation/SubmittingPatches document.
The CONTRIBUTING.md in the git-for-windows repository [1] may also
have some helpful pointers.  Some of it is windows specific, but a lot
of it is generally applicable.

I'm happy to help reviewing these patches, and give more pointers to
help you get started.  Another useful resource is the #git-devel IRC
channel on Freenode, where you may be able to get some help (may need
some patience though :))

[1]: https://github.com/git-for-windows/git/blob/master/CONTRIBUTING.md

> Best regards,
> Karen

Re: git worktree add prints to stdout

2018-08-05 Thread Thomas Gummerer

On 08/05, Karen Arutyunov wrote:
> Hello,
> 
> The 'git worktree add' command prints to both standard streams. So in the
> following example the first line is printed to stderr and the second to
> stdout.

git 2.18.0 should print both of those lines to stdout.  This was done
to match where 'git reset --hard' prints the 'HEAD is now at...'
message. See also the thread at [1] where we did make that decision.

[1]: 
https://public-inbox.org/git/capig+cq8vzdycumo-qoexndbgqgegj2bpmpa-y0vhgct_br...@mail.gmail.com/

> $ git worktree add ../pub build2-control
> Preparing ../pub (identifier pub)
> HEAD is now at b03ea86 Update
> 
> This looks like a bug, as, for example, the checkout command prints 'HEAD is
> now at...' message to stderr.

I think eventually it would be nice to write all those messages to
'stderr', as I think they do make more sense there.  I said I may do
that at some point in [2], but never got around to it yet.  If you
want to take a stab at it, feel free :)

[2]: https://public-inbox.org/git/xmqq604rzytx@gitster-ct.c.googlers.com/

> Best regards,
> Karen

Re: [PATCH v4 05/21] range-diff: also show the diff between patches

2018-07-30 Thread Thomas Gummerer

On 07/30, Johannes Schindelin wrote:
> Hi Thomas & Eric,
> 
> On Sun, 29 Jul 2018, Thomas Gummerer wrote:
> 
> > On 07/29, Eric Sunshine wrote:
> > > On Sun, Jul 29, 2018 at 3:04 PM Thomas Gummerer  
> > > wrote:
> > > > On 07/21, Johannes Schindelin via GitGitGadget wrote:
> > > > > Just like tbdiff, we now show the diff between matching patches. This 
> > > > > is
> > > > > a "diff of two diffs", so it can be a bit daunting to read for the
> > > > > beginner.
> > > > > [...]
> > > > > Note also: while tbdiff accepts the `--no-patches` option to suppress
> > > > > these diffs between patches, we prefer the `-s` option that is
> > > > > automatically supported via our use of diff_opt_parse().
> > > >
> > > > One slightly unfortunate thing here is that we don't show these
> > > > options in 'git range-diff -h', which would be nice to have.  I don't
> > > > know if that's possible in git right now, if it's not easily possible,
> > > > I definitely wouldn't want to delay this series for that, and we could
> > > > just add it to the list of possible future enhancements that other
> > > > people mentioned.
> > > 
> > > This issue is not specific to git-range-diff; it's shared by other
> > > commands which inherit diff options via diff_opt_parse(). For
> > > instance, "git log -h" doesn't show diff-related options either, yet
> > > it accepts them.
> > 
> > Fair enough, that makes sense.  Thanks for the pointer!
> > 
> > There's one more thing that I noticed here:
> > 
> > git range-diff --no-patches
> > fatal: single arg format requires a symmetric range
> > 
> > Which is a slightly confusing error message.  In contrast git log does
> > the following on an unrecognized argument:
> > 
> > git log --no-patches
> > fatal: unrecognized argument: --no-patches
> > 
> > which is a little better I think.  I do however also thing the "fatal:
> > single arg format requires a symmetric range" is useful when someone
> > genuinely tries to use the single argument version of the command.  So
> > I don't know what a good solution for this would be.
> 
> I immediately thought of testing for a leading `-` of the remaining
> argument, but I could imagine that somebody enterprisey uses
> 
>   git range-diff -- -my-first-attempt...-my-second-attempt
> 
> and I do not really want to complexify the code... Ideas?

Good point.  I can't really come up with a good option right now
either.  It's not too bad, as users just typed the command, so it
should be easy enough to see from the previous line what went wrong.

One potential option may be to turn "die(_("single arg format requires
a symmetric range"));" into an 'error()', and show the usage?  I think
that may be nice anyway, as "symmetric range" may not be immediately
obvious to everyone, but together with the usage it may be clearer?

> > > > > diff --git a/range-diff.c b/range-diff.c
> > > > > @@ -300,6 +325,9 @@ static void output(struct string_list *a, struct 
> > > > > string_list *b)
> > > > >   printf("%d: %s ! %d: %s\n",
> > > > >  b_util->matching + 1, short_oid(a_util),
> > > > >  j + 1, short_oid(b_util));
> > > > > + if (!(diffopt->output_format & 
> > > > > DIFF_FORMAT_NO_OUTPUT))
> > > >
> > > > Looking at this line, it looks like it would be easy to support
> > > > '--no-patches' as well, which may be slightly easier to understand that
> > > > '-s' to someone new to the command.  But again that can be added later
> > > > if someone actually cares about it.
> > > 
> > > What wasn't mentioned (but was implied) by the commit message is that
> > > "-s" is short for "--no-patch", which also comes for free via
> > > diff_opt_parse(). True, "--no-patch" isn't spelled exactly the same as
> > > "--no-patches", but git-range-diff isn't exactly a perfect tbdiff
> > > clone, so hopefully not a git problem. Moreover, "--no-patch" is
> > > internally consistent within the Git builtin commands.
> > 
> > Makes sense, thanks!  "--no-patch" does make sense to me.  There's
> > still a lot of command line flags in git to learn for me, even after
> > all this time using it ;)  Might be nice to spell it out in the commit
> > message for someone like me, especially as "--no-patches" is already
> > mentioned.  Though I guess most regulars here would know about
> > "--no-patch", so maybe it's not worth it.  Anyway that is definitely
> > not worth another round here.
> 
> Sure, but not many users learn from reading the commit history...
> 
> :-)
> 
> Ciao,
> Dscho

Re: [PATCH v4 03/21] range-diff: first rudimentary implementation

2018-07-30 Thread Thomas Gummerer

On 07/30, Johannes Schindelin wrote:
> Hi Thomas,
> 
> On Sun, 29 Jul 2018, Thomas Gummerer wrote:
> 
> > On 07/21, Johannes Schindelin via GitGitGadget wrote:
> > > 
> > > [...]
> > > 
> > > +static void find_exact_matches(struct string_list *a, struct string_list 
> > > *b)
> > > +{
> > > + struct hashmap map;
> > > + int i;
> > > +
> > > + hashmap_init(, (hashmap_cmp_fn)patch_util_cmp, NULL, 0);
> > > +
> > > + /* First, add the patches of a to a hash map */
> > > + for (i = 0; i < a->nr; i++) {
> > > + struct patch_util *util = a->items[i].util;
> > > +
> > > + util->i = i;
> > > + util->patch = a->items[i].string;
> > > + util->diff = util->patch + util->diff_offset;
> > > + hashmap_entry_init(util, strhash(util->diff));
> > > + hashmap_add(, util);
> > > + }
> > > +
> > > + /* Now try to find exact matches in b */
> > > + for (i = 0; i < b->nr; i++) {
> > > + struct patch_util *util = b->items[i].util, *other;
> > > +
> > > + util->i = i;
> > > + util->patch = b->items[i].string;
> > > + util->diff = util->patch + util->diff_offset;
> > > + hashmap_entry_init(util, strhash(util->diff));
> > > + other = hashmap_remove(, util, NULL);
> > > + if (other) {
> > > + if (other->matching >= 0)
> > > + BUG("already assigned!");
> > > +
> > > + other->matching = i;
> > > + util->matching = other->i;
> > > + }
> > > + }
> > 
> > One possibly interesting corner case here is what happens when there
> > are two patches that have the exact same diff, for example in the
> > pathological case of commit A doing something, commit B reverting
> > commit A, and then commit C reverting commit B, so it ends up with the
> > same diff.
> > 
> > Having those same commits unchanged in both ranges (e.g. if a commit
> > earlier in the range has been changed, and range B has been rebased on
> > top of that), we'd get the following mapping from range A to range B
> > for the commits in question:
> > 
> > A -> C
> > B -> B
> > C -> A
> > 
> > Which is not quite what I would expect as the user (even though it is
> > a valid mapping, and it probably doesn't matter too much for the end
> > result of the range diff, as nothing has changed between the commits
> > anyway).  So I'm not sure it's worth fixing this, as it is a
> > pathological case, and nothing really breaks.
> 
> Indeed. As far as I am concerned, this falls squarely into the "let's
> cross that bridge when, or if, we reach it" category.

Makes sense, this can definitely be addressed later.

> > > +
> > > + hashmap_free(, 0);
> > > +}
> > > +
> > > +static void diffsize_consume(void *data, char *line, unsigned long len)
> > > +{
> > > + (*(int *)data)++;
> > > +}
> > > +
> > > +static int diffsize(const char *a, const char *b)
> > > +{
> > > + xpparam_t pp = { 0 };
> > > + xdemitconf_t cfg = { 0 };
> > > + mmfile_t mf1, mf2;
> > > + int count = 0;
> > > +
> > > + mf1.ptr = (char *)a;
> > > + mf1.size = strlen(a);
> > > + mf2.ptr = (char *)b;
> > > + mf2.size = strlen(b);
> > > +
> > > + cfg.ctxlen = 3;
> > > + if (!xdi_diff_outf(, , diffsize_consume, , , ))
> > > + return count;
> > > +
> > > + error(_("failed to generate diff"));
> > > + return COST_MAX;
> > > +}
> > > +
> > > +static void get_correspondences(struct string_list *a, struct 
> > > string_list *b,
> > > + int creation_factor)
> > > +{
> > > + int n = a->nr + b->nr;
> > > + int *cost, c, *a2b, *b2a;
> > > + int i, j;
> > > +
> > > + ALLOC_ARRAY(cost, st_mult(n, n));
> > > + ALLOC_ARRAY(a2b, n);
> > > + ALLOC_ARRAY(b2a, n);
> > > +
> > > + for (i = 0; i < a->nr; i++) {
> > > + struct patch_util *a_util = a->items[i].util;
> > > +
> > > + for (j = 0; j < b->nr; j++) {
> > > + struct patch_util *b_util = b->items[j].util;
> > &

Re: [PATCH v3 00/11] rerere: handle nested conflicts

2018-07-30 Thread Thomas Gummerer

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Thomas Gummerer (11):
> >   rerere: unify error messages when read_cache fails
> >   rerere: lowercase error messages
> >   rerere: wrap paths in output in sq
> >   rerere: mark strings for translation
> >   rerere: add documentation for conflict normalization
> >   rerere: fix crash when conflict goes unresolved
> >   rerere: only return whether a path has conflicts or not
> >   rerere: factor out handle_conflict function
> >   rerere: return strbuf from handle path
> >   rerere: teach rerere to handle nested conflicts
> >   rerere: recalculate conflict ID when unresolved conflict is committed
> 
> Even though I am not certain about the last two steps, everything
> before them looked trivially correct and good changes (well, the
> "strbuf" one's goodness obviously depends on the goodness of the
> last two, which are helped by it).
> 
> Sorry for taking so long before getting to the series.

No worries, I realize you are busy with a lot of other things.  Thanks
a lot for your review!

Re: [PATCH v3 07/11] rerere: only return whether a path has conflicts or not

2018-07-30 Thread Thomas Gummerer

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > We currently return the exact number of conflict hunks a certain path
> > has from the 'handle_paths' function.  However all of its callers only
> > care whether there are conflicts or not or if there is an error.
> > Return only that information, and document that only that information
> > is returned.  This will simplify the code in the subsequent steps.
> >
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  rerere.c | 23 ---
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> I do recall writing this code without knowing if the actual number
> of conflicts would be useful by callers, but it is apparent that it
> wasn't.  I won't mind losing that bit of info at all.  Besides, we
> won't risk mistaking a file with 2 billion conflicts with a file
> whose conflicts cannot be parsed ;-).

Hah, I would love to see someone actually achieve that ;)

> The patch looks good.  Thanks.

Re: [PATCH v3 06/11] rerere: fix crash when conflict goes unresolved

2018-07-30 Thread Thomas Gummerer

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Currently when a user doesn't resolve a conflict in a file, but
> > commits the file with the conflict markers, and later the file ends up
> > in a state in which rerere can't handle it, subsequent rerere
> > operations that are interested in that path, such as 'rerere clear' or
> > 'rerere forget ' will fail, or even worse in the case of 'rerere
> > clear' segfault.
> >
> > Such states include nested conflicts, or an extra conflict marker that
> > doesn't have any match.
> >
> > This is because the first 'git rerere' when there was only one
> > conflict in the file leaves an entry in the MERGE_RR file behind.  The
> 
> I find this sentence, especially the "only one conflict in the file"
> part, a bit unclear.  What does the sentence count as one conflict?
> One block of lines enclosed inside "<<<"...">>>" pair?  The command
> behaves differently when there are two such blocks instead?

Yeah as you mentioned below, conflict marker(s) that cannot be parsed
here would make more sense.  Will adjust the commit message.

> > next 'git rerere' will then pick the rerere ID for that file up, and
> > not assign a new ID as it can't successfully calculate one.  It will
> > however still try to do the rerere operation, because of the existing
> > ID.  As the handle_file function fails, it will remove the 'preimage'
> > for the ID in the process, while leaving the ID in the MERGE_RR file.
> >
> > Now when 'rerere clear' for example is run, it will segfault in
> > 'has_rerere_resolution', because status is NULL.
> 
> I think this "status" refers to the collection->status[].  How do we
> get into that state, though?
> 
> new_rerere_id() and new_rerere_id_hex() fills id->collection by
> calling find_rerere_dir(), which either finds an existing rerere_dir
> instance or manufactures one with .status==NULL.  The .status[]
> array is later grown by calling fit_variant as we scan and find the
> pre/post images, but because there is no pre/post image for a file
> with unparseable conflicts, it is left NULL.
> 
> So another possible fix could be to make sure that .status[] is only
> read when .status_nr says there is something worth reading.  I am
> not saying that would be a better fix---I am just thinking out loud
> to make sure I understand the issue correctly.

Yeah what you are writing above matches my understanding, and that
should fix the issue as well.  I haven't actually tried what you're
proposing above, but I think I find it nicer to just remove the entry
we can't do anything with anyway.

> > To fix this, remove the rerere ID from the MERGE_RR file in the case
> > when we can't handle it, and remove the corresponding variant from
> > .git/rr-cache/.  Removing it unconditionally is fine here, because if
> > the user would have resolved the conflict and ran rerere, the entry
> > would no longer be in the MERGE_RR file, so we wouldn't have this
> > problem in the first place, while if the conflict was not resolved,
> > the only thing that's left in the folder is the 'preimage', which by
> > itself will be regenerated by git if necessary, so the user won't
> > loose any work.
> 
> s/loose/lose/
> 
> > Note that other variants that have the same conflict ID will not be
> > touched.
> 
> Nice.  Thanks for a fix.
> 
> >
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  rerere.c  | 12 +++-
> >  t/t4200-rerere.sh | 22 ++
> >  2 files changed, 29 insertions(+), 5 deletions(-)
> >
> > diff --git a/rerere.c b/rerere.c
> > index da1ab54027..895ad80c0c 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -823,10 +823,7 @@ static int do_plain_rerere(struct string_list *rr, int 
> > fd)
> > struct rerere_id *id;
> > unsigned char sha1[20];
> > const char *path = conflict.items[i].string;
> > -   int ret;
> > -
> > -   if (string_list_has_string(rr, path))
> > -   continue;
> > +   int ret, has_string;
> >  
> > /*
> >  * Ask handle_file() to scan and assign a
> > @@ -834,7 +831,12 @@ static int do_plain_rerere(struct string_list *rr, int 
> > fd)
> >  * yet.
> >  */
> > ret = handle_file(path, sha1, NULL);
> > -   if (ret < 1)
> > +   has_string = string_list_has_string(rr, path);
> > +   if (ret < 0 && has_string) {
> > +

Re: [PATCH v3 05/11] rerere: add documentation for conflict normalization

2018-07-30 Thread Thomas Gummerer

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > +Different conflict styles and branch names are normalized by stripping
> > +the labels from the conflict markers, and removing extraneous
> > +information from the `diff3` conflict style. Branches that are merged
> 
> s/extraneous information/commmon ancestor version/ perhaps, to be
> fact-based without passing value judgment?

Yeah I meant "extraneous information for rerere", but common ancester
version is better.

> We drop the common ancestor version only because we cannot normalize
> from `merge` style to `diff3` style by adding one, and not because
> it is extraneous.  It does help humans understand the conflict a lot
> better to have that section.
> 
> > +By extension, this means that rerere should recognize that the above
> > +conflicts are the same.  To do this, the labels on the conflict
> > +markers are stripped, and the diff3 output is removed.  The above
> 
> s/diff3 output/common ancestor version/, as "diff3 output" would
> mean the whole thing between <<< and >>> to readers.

Makes sense, will fix in the re-roll, thanks!

> > diff --git a/rerere.c b/rerere.c
> > index be98c0afcb..da1ab54027 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -394,10 +394,6 @@ static int is_cmarker(char *buf, int marker_char, int 
> > marker_size)
> >   * and NUL concatenated together.
> >   *
> >   * Return the number of conflict hunks found.
> > - *
> > - * NEEDSWORK: the logic and theory of operation behind this conflict
> > - * normalization may deserve to be documented somewhere, perhaps in
> > - * Documentation/technical/rerere.txt.
> >   */
> >  static int handle_path(unsigned char *sha1, struct rerere_io *io, int 
> > marker_size)
> >  {
> 
> Thanks for finally removing this age-old NEEDSWORK comment.

Re: [PATCH v3 10/11] rerere: teach rerere to handle nested conflicts

2018-07-30 Thread Thomas Gummerer

On 07/30, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Currently rerere can't handle nested conflicts and will error out when
> > it encounters such conflicts.  Do that by recursively calling the
> > 'handle_conflict' function to normalize the conflict.
> >
> > The conflict ID calculation here deserves some explanation:
> >
> > As we are using the same handle_conflict function, the nested conflict
> > is normalized the same way as for non-nested conflicts, which means
> > the ancestor in the diff3 case is stripped out, and the parts of the
> > conflict are ordered alphabetically.
> >
> > The conflict ID is however is only calculated in the top level
> > handle_conflict call, so it will include the markers that 'rerere'
> > adds to the output.  e.g. say there's the following conflict:
> >
> > <<<<<<< HEAD
> > 1
> > ===
> > <<<<<<< HEAD
> > 3
> > ===
> > 2
> > >>>>>>> branch-2
> > >>>>>>> branch-3~
> 
> Hmph, I vaguely recall that I made inner merges to use the conflict
> markers automatically lengthened (by two, if I recall correctly)
> than its immediate outer merge.  Wouldn't the above look more like
> 
>  <<<<<<< HEAD
>  1
>  ===
>  <<<<<<<<< HEAD
>  3
>  =
>  2
>  >>>>>>>>> branch-2
>  >>>>>>> branch-3~
> 
> Perhaps I am not recalling it correctly.

The only way I could reproduce this is by not resolving a conflict
(just leaving the conflict markers in place, but running 'git add
conflicted'), and then merging something else, which produces another
conflict, where one of the sides was the one with conflict markers
already in the file, same as what I did in the test.

So in that case, the conflict markers of the already existing conflict
would just be treated as normal text during the merge I believe, and
thus the new conflict markers would be the same length.

The usage of git is really a bit wrong here, so I don't know if it's
actually worth helping the users at this point.  But trying to
understand how rerere exactly works, I had this written up already, so
I thought I would include it in this series anyway in case it helps
somebody :)

> > it would be recorde as follows in the preimage:
> >
> > <<<<<<<
> >     1
> > ===
> > <<<<<<<
> > 2
> > ===
> > 3
> > >>>>>>>
> > >>>>>>>
> >
> > and the conflict ID would be calculated as
> >
> > sha1(1<<<<<<<
> > 2
> > ===
> > 3
> > >>>>>>>)
> >
> > Stripping out vs. leaving the conflict markers in place in the inner
> > conflict should have no practical impact, but it simplifies the
> > implementation.
> >
> > Signed-off-by: Thomas Gummerer 
> > ---
> >  Documentation/technical/rerere.txt | 42 ++
> >  rerere.c   | 10 +--
> >  t/t4200-rerere.sh  | 37 ++
> >  3 files changed, 87 insertions(+), 2 deletions(-)
> >
> > [..snip..]
> > 
> > diff --git a/rerere.c b/rerere.c
> > index a35b88916c..f78bef80b1 100644
> > --- a/rerere.c
> > +++ b/rerere.c
> > @@ -365,12 +365,18 @@ static int handle_conflict(struct strbuf *out, struct 
> > rerere_io *io,
> > RR_SIDE_1 = 0, RR_SIDE_2, RR_ORIGINAL
> > } hunk = RR_SIDE_1;
> > struct strbuf one = STRBUF_INIT, two = STRBUF_INIT;
> > -   struct strbuf buf = STRBUF_INIT;
> > +   struct strbuf buf = STRBUF_INIT, conflict = STRBUF_INIT;
> > int has_conflicts = -1;
> >  
> > while (!io->getline(, io)) {
> > if (is_cmarker(buf.buf, '<', marker_size)) {
> > -   break;
> > +   if (handle_conflict(, io, marker_size, NULL) < 
> > 0)
> > +   break;
> > +   if (hunk == RR_SIDE_1)
> > +   strbuf_addbuf(, );
> > +   else
> > +   strbuf_addbuf(, );
> 
> Hmph, do we ever see the inner conflict block while we are skipping
> and ignoring the common ancestor version, or it is impossible that
> we see '<' only while processing either our or their side?

As ment

Re: [PATCH v4 00/21] Add `range-diff`, a `tbdiff` lookalike

2018-07-29 Thread Thomas Gummerer

>   -static void output_pair_header(struct strbuf *buf,
>  -+static void output_pair_header(struct diff_options *diffopt, struct 
> strbuf *buf,
>  ++static void output_pair_header(struct diff_options *diffopt,
>  ++  struct strbuf *buf,
>  +   struct strbuf *dashes,
>  struct patch_util *a_util,
>  struct patch_util *b_util)
>{
>  -static char *dashes;
>   struct object_id *oid = a_util ? _util->oid : _util->oid;
>   struct commit *commit;
>   +   char status;
>  @@ -34,11 +35,10 @@
>   +   const char *color_commit = diff_get_color_opt(diffopt, 
> DIFF_COMMIT);
>   +   const char *color;
>
>  -if (!dashes) {
>  -char *p;
>  -@@
>  -*p = '-';
>  -}
>  +if (!dashes->len)
>  +strbuf_addchars(dashes, '-',
>  +strlen(find_unique_abbrev(oid,
>  +  
> DEFAULT_ABBREV)));
>
>   +   if (!b_util) {
>   +   color = color_old;
>  @@ -57,7 +57,7 @@
>   strbuf_reset(buf);
>   +   strbuf_addstr(buf, status == '!' ? color_old : color);
>   if (!a_util)
>  -strbuf_addf(buf, "-:  %s ", dashes);
>  +strbuf_addf(buf, "-:  %s ", dashes->buf);
>   else
>   strbuf_addf(buf, "%d:  %s ", a_util->i + 1,
>   find_unique_abbrev(_util->oid, 
> DEFAULT_ABBREV));
>  @@ -77,7 +77,7 @@
>   +   strbuf_addf(buf, "%s%s", color_reset, color_new);
>
>   if (!b_util)
>  -strbuf_addf(buf, " -:  %s", dashes);
>  +strbuf_addf(buf, " -:  %s", dashes->buf);
>   @@
>   const char *commit_buffer = get_commit_buffer(commit, 
> NULL);
>   const char *subject;
>  @@ -99,24 +99,27 @@
>
>   /* Show unmatched LHS commit whose predecessors were 
> shown. */
>   if (i < a->nr && a_util->matching < 0) {
>  --   output_pair_header(, a_util, NULL);
>  -+   output_pair_header(diffopt, , a_util, NULL);
>  +-   output_pair_header(, , a_util, NULL);
>  ++   output_pair_header(diffopt,
>  ++  , , a_util, NULL);
>   i++;
>   continue;
>   }
>
>   /* Show unmatched RHS commits. */
>   while (j < b->nr && b_util->matching < 0) {
>  --   output_pair_header(, NULL, b_util);
>  -+   output_pair_header(diffopt, , NULL, b_util);
>  +-   output_pair_header(, , NULL, b_util);
>  ++   output_pair_header(diffopt,
>  ++  , , NULL, b_util);
>   b_util = ++j < b->nr ? b->items[j].util : NULL;
>   }
>
>   /* Show matching LHS/RHS pair. */
>   if (j < b->nr) {
>   a_util = a->items[b_util->matching].util;
>  --   output_pair_header(, a_util, b_util);
>  -+   output_pair_header(diffopt, , a_util, 
> b_util);
>  +-   output_pair_header(, , a_util, 
> b_util);
>  ++   output_pair_header(diffopt,
>  ++  , , a_util, 
> b_util);
>   if (!(diffopt->output_format & 
> DIFF_FORMAT_NO_OUTPUT))
>   
> patch_diff(a->items[b_util->matching].string,
>  b->items[j].string, diffopt);
>  13:  96a3073fb = 13:  9ccb9516a color: add the meta color GIT_COLOR_REVERSE
>  14:  6be4baf60 = 14:  9de5bd229 diff: add an internal option to dual-color 
> diffs of diffs
>  15:  02e13c0c6 ! 15:  21b2f9e4b range-diff: offer to dual-color the diffs
>  @@ -40,4 +40,4 @@
>   +
>   if (argc == 2) {
>   if (!str

1 2 3 4 5 6 7 8 9 >

1 - 100 of 848 matches

Mail list logo