Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-28 Thread Christian Couder
On Tue, Mar 28, 2017 at 11:49 PM, Jeff King  wrote:
> On Tue, Mar 28, 2017 at 11:15:12PM +0200, Christian Couder wrote:
>
>> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe  wrote:
>> > FreeBSD implements getcwd(3) as a syscall, but falls back to a version
>> > based on readdir(3) if it fails for some reason.  The latter requires
>> > permissions to read and execute path components, while the former does
>> > not.  That means that if our buffer is too small and we're missing
>> > rights we could get EACCES, but we may succeed with a bigger buffer.
>> >
>> > Keep retrying if getcwd(3) indicates lack of permissions until our
>> > buffer can fit PATH_MAX bytes, as that's the maximum supported by the
>> > syscall on FreeBSD anyway.  This way we do what we can to be able to
>> > benefit from the syscall, but we also won't loop forever if there is a
>> > real permission issue.
>>
>> Sorry to be late and maybe I missed something obvious, but the above
>> and the patch seem complex to me compared with something like:
>>
>> diff --git a/strbuf.c b/strbuf.c
>> index ace58e7367..25eadcbedc 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char
>> *path, size_t hint)
>>  int strbuf_getcwd(struct strbuf *sb)
>>  {
>> size_t oldalloc = sb->alloc;
>> -   size_t guessed_len = 128;
>> +   size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128;
>>
>> for (;; guessed_len *= 2) {
>> strbuf_grow(sb, guessed_len);
>
> I think the main reason is just that we do not have to pay the price to
> allocate PATH_MAX-sized buffers when they are rarely used.
>
> I doubt it matters all that much in practice, though.

Yeah, I can understand that.

But I also wonder if René's patch relies on PATH_MAX being a multiple
of 128 on FreeBSD and what would happen for a path between 129 and
PATH_MAX if PATH_MAX was something like 254.


Re: [PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance

2017-03-28 Thread Michael Haggerty
On 03/29/2017 02:32 AM, Daniel Ferreira wrote:
> Create inline helpers to dir_iterator_advance(). Make
> dir_iterator_advance()'s code more legible and allow some behavior to
> be reusable.

Thanks for breaking up the patches. That makes them a lot easier to review.

> Signed-off-by: Daniel Ferreira 
> ---
>  dir-iterator.c | 65 
> +-
>  1 file changed, 42 insertions(+), 23 deletions(-)
> 
> diff --git a/dir-iterator.c b/dir-iterator.c
> index 34182a9..853c040 100644
> --- a/dir-iterator.c
> +++ b/dir-iterator.c
> @@ -50,6 +50,43 @@ struct dir_iterator_int {
>   struct dir_iterator_level *levels;
>  };
> 
> +static inline void push_dir_level(struct dir_iterator_int *iter, struct 
> dir_iterator_level *level)
> +{
> + level->dir_state = DIR_STATE_RECURSE;
> + ALLOC_GROW(iter->levels, iter->levels_nr + 1,
> +iter->levels_alloc);
> + level = >levels[iter->levels_nr++];
> + level->initialized = 0;
> +}
> +
> +static inline int pop_dir_level(struct dir_iterator_int *iter, struct 
> dir_iterator_level *level)
> +{
> + return --iter->levels_nr;
> +}

`pop_dir_level()` doesn't use its `level` argument; it can be removed.

> [...]

Michael



Re: [PATCH 0/18] snprintf cleanups

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 03:33:48PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > It's a lot of patches, but hopefully they're all pretty straightforward
> > to read.
> 
> Yes, quite a lot of changes.  I didn't see anything questionable in
> there.
> 
> As to the "patch-id" thing, I find the alternate one slightly easier
> to read.  Also, exactly because this is not a performance critical
> codepath, it may be better if patch_id_add_string() filtered out
> whitespaces; that would allow the source to express things in more
> natural way, e.g.
> 
>   patch_id_addf(, "new file mode");
>   patch_id_addf(, "%06o", p->two->mode);
>   patch_id_addf(, "--- /dev/null");
>   patch_id_addf(, "+++ b/%.*s", len2, p->two->path);
> 
> Or I may be going overboard by bringing "addf" into the mix X-<.

I think there are two things going on in your example.

One is that obviously patch_id_addf() removes the spaces from the
result. But we could do that now by keeping the big strbuf_addf(), and
then just walking the result and feeding non-spaces.

The second is that your addf means we are back to formatting everything
into a buffer again. And it has to be dynamic to handle the final line
there, because "len2" isn't bounded. At which point we may as well go
back to sticking it all in one big strbuf (your example also breaks it
down line by line, but we could do that with separte strbuf_addf calls,
too).

Or you have to reimplement the printf format-parsing yourself, and write
into the sha1 instead of into the buffers. But that's probably insane.

I think the "no extra buffer with whitespace" combo is more like:

  void patch_id_add_buf(git_SHA1_CTX *ctx, const char *buf, size_t len)
  {
for (; len > 0; buf++, len--) {
if (!isspace(*buf))
git_SHA1_Update(ctx, buf, 1);
}
  }

  void patch_id_add_str(git_SHA1_CTX *ctx, const char *str)
  {
patch_id_add_buf(ctx, strlen(str));
  }

  void patch_id_add_mode(git_SHA1_CTX *ctx, unsigned mode)
  {
char buf[16]; /* big enough... */
int len = xsnprintf(buf, "%06o", mode);
patch_id_add_buf(ctx, buf, len);
  }

  patch_id_add_str(, "new file mode");
  patch_id_add_mode(, p->two->mode);
  patch_id_add_str(, "--- /dev/null");
  patch_id_add_str(, "+++ b/");
  patch_id_add_buf(, p->two->path, len2);

I dunno. I wondered if feeding single bytes to the sha1 update might
actually be noticeably slower, because I would assume that internally it
generally copies data in larger chunks. I didn't measure it, though.

-Peff


Re: Git fails to build on Ubuntu Server 16.04

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 07:17:16PM -0400, Jeffrey Walton wrote:

> I configured with --enable-pthreads, and LIBS included -lpthread.
> 
> $ make V=1
> gcc -I/usr/local/include -g -O2 -I. -DHAVE_ALLOCA_H
> -I/usr/local/include -DUSE_CURL_FOR_IMAP_SEND -I/usr/local/include
> -I/usr/local/include  -DHAVE_PATHS_H -DHAVE_STRINGS_H -DHAVE_DEV_TTY
> -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM
> -DSHA1_HEADER=''  -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"'
> -DPAGER_ENV='"LESS=FRX LV=-c"' -o git-credential-store
> -Wl,-rpath,/usr/local/lib -L/usr/local/lib  credential-store.o
> common-main.o libgit.a xdiff/lib.a  -L/usr/local/lib
> -Wl,-rpath,/usr/local/lib -lz -L/usr/local/lib
> -Wl,-rpath,/usr/local/lib -lcrypto  -lrt
> /usr/bin/ld: libgit.a(run-command.o): undefined reference to symbol
> 'pthread_sigmask@@GLIBC_2.2.5'
> //lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO
> missing from command line
> collect2: error: ld returned 1 exit status
> Makefile:2053: recipe for target 'git-credential-store' failed
> make: *** [git-credential-store] Error 1

Hmm. I can reproduce with:

  LIBS=-lpthread ./configure --enable-pthreads
  make

I think the problem is that $LIBS is meaningful to autoconf, but not to
Git's Makefile. So it tricks autoconf into writing a blank PTHREAD_LIBS
variable (because it can compile a pthread program without any extra
options), but the Makefile does not include $LIBS.

Just doing:

  ./configure --enable-pthreads
  make

works fine. So should:

  ./configure
  make

which should detect pthreads. Or just:

  make

as building with pthreads is the default on Linux.

So depending on your perspective, it's either:

  - not a bug (because we do not advertise $LIBS as a meaningful input
to the build process)

  - a bug that the configure script respects $LIBS at all, since it is
not meaningful to the Makefile

  - a bug that the configure script does not propagate $LIBS into
something the Makefile _does_ understand, like $EXTLIBS

  - a bug that the Makefile does not care about $LIBS

Patches welcome for any of the latter three (I do not have an opinion
myself; I don't use autoconf at all).

-Peff


Re: Can't locate ExtUtils/MakeMaker.pm in @INC

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 09:03:43PM -0400, Jeffrey Walton wrote:

> This looks like the last issue with Git 2.12.2. This time the machine
> is Fedora 25.
> 
> I configured with PERL_PATH=/usr/local/bin/perl. The local Perl was
> built specifically for this error, and it includes
> ExtUtils/MakeMaker.pm:

I'm not sure what "configured with PERL_PATH" means exactly. If you did:

  PERL_PATH=/usr/local/bin/perl ./configure

then I don't think that works. The way to tell configure that you want
to use a specific version of perl is with a command-line option:

  ./configure --with-perl=/usr/local/bin/perl

When you're running make itself, you can override the default (or what
was specified during configure) with:

  make PERL_PATH=/usr/local/bin/perl

Both of the latter two work for me:

  $ ./configure --with-perl=/perl/from/configure
  [...]
  $ make
  [...]
  /perl/from/configure Makefile.PL PREFIX='/home/peff/local/git/master' 
INSTALL_BASE='' --localedir='/home/peff/local/git/master/share/locale'
  make[1]: /perl/from/configure: Command not found

  $ make PERL_PATH=/perl/from/make
  [...]
  /perl/from/make Makefile.PL PREFIX='/home/peff/local/git/master' 
INSTALL_BASE='' --localedir='/home/peff/local/git/master/share/locale'
  make[1]: /perl/from/make: Command not found

Obviously those are nonsense, but they quickly show that we're using the
requested version of perl.

-Peff


Re: Re: Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Pickfire
Jacob Keller  wrote:

> On Tue, Mar 28, 2017 at 10:53 AM, Pickfire  wrote:
> >
> > Yes, I can't reproduce it outside the test suite. I have added the builtin
> > and yet the test fails. test_decode_color gets same output as expect but
> > still it fails, should I send in the patch?
> 
> You also need to ensure you have the exact same color settings as used
> by the test scripts.
> 
> Thanks,
> Jake

Yes, I used the same color, looks like the block that are failing is:

test_must_fail git diff --no-index "$@" pre post >output


Re: Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Pickfire
Stefan Beller  wrote:

> On Tue, Mar 28, 2017 at 3:46 AM, Pickfire  wrote:
> 
> > EOF
> >
> > echo '* diff="cpp"' > .gitmodules
> 
> Did you mean .gitattributes?

Yeah, that's a spelling error.


Can't locate ExtUtils/MakeMaker.pm in @INC

2017-03-28 Thread Jeffrey Walton
This looks like the last issue with Git 2.12.2. This time the machine
is Fedora 25.

I configured with PERL_PATH=/usr/local/bin/perl. The local Perl was
built specifically for this error, and it includes
ExtUtils/MakeMaker.pm:

$ find /usr/local -name MakeMaker.pm
/usr/local/lib/perl5/5.24.1/ExtUtils/MakeMaker.pm

$ make all
...

GEN git-bisect
GEN git-difftool--helper
GEN git-filter-branch
GEN git-merge-octopus
GEN git-merge-one-file
GEN git-merge-resolve
GEN git-mergetool
GEN git-quiltimport
GEN git-rebase
GEN git-request-pull
GEN git-stash
GEN git-submodule
GEN git-web--browse
SUBDIR perl
/usr/bin/perl Makefile.PL PREFIX='/usr/local' INSTALL_BASE=''
--localedir='/usr/local/share/locale'
GEN git-p4
Can't locate ExtUtils/MakeMaker.pm in @INC (you may need to install
the ExtUtils::MakeMaker module) (@INC contains: /usr/local/lib64/perl5
/usr/local/share/perl5 /usr/lib64/perl5/vendor_perl
/usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at
Makefile.PL line 3.
BEGIN failed--compilation aborted at Makefile.PL line 3.
Makefile:83: recipe for target 'perl.mak' failed
make[1]: *** [perl.mak] Error 2
Makefile:1843: recipe for target 'perl/perl.mak' failed
make: *** [perl/perl.mak] Error 2
make: *** Waiting for unfinished jobs
Failed to build Git

/usr/local/bin/perl is on path but Git is using the old one in /usr/bin:

$ which perl
/usr/local/bin/perl

It appears Git is not honoring the request for the updated Perl.

Thanks,


[PATCH v4 3/5] remove_subtree(): reimplement using iterators

2017-03-28 Thread Daniel Ferreira
Use dir_iterator to traverse through remove_subtree()'s directory tree,
avoiding the need for recursive calls to readdir(). Simplify
remove_subtree()'s code.

A conversion similar in purpose was previously done at 46d092a
("for_each_reflog(): reimplement using iterators", 2016-05-21).

Signed-off-by: Daniel Ferreira 
---
 entry.c | 31 ++-
 1 file changed, 10 insertions(+), 21 deletions(-)

diff --git a/entry.c b/entry.c
index c6eea24..bbebd16 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,8 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "iterator.h"
+#include "dir-iterator.h"

 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -46,29 +48,16 @@ static void create_directories(const char *path, int 
path_len,

 static void remove_subtree(struct strbuf *path)
 {
-   DIR *dir = opendir(path->buf);
-   struct dirent *de;
-   int origlen = path->len;
-
-   if (!dir)
-   die_errno("cannot opendir '%s'", path->buf);
-   while ((de = readdir(dir)) != NULL) {
-   struct stat st;
-
-   if (is_dot_or_dotdot(de->d_name))
-   continue;
-
-   strbuf_addch(path, '/');
-   strbuf_addstr(path, de->d_name);
-   if (lstat(path->buf, ))
-   die_errno("cannot lstat '%s'", path->buf);
-   if (S_ISDIR(st.st_mode))
-   remove_subtree(path);
-   else if (unlink(path->buf))
+   struct dir_iterator *diter = dir_iterator_begin(path->buf, 
DIR_ITERATOR_DEPTH_FIRST);
+
+   while (dir_iterator_advance(diter) == ITER_OK) {
+   if (S_ISDIR(diter->st.st_mode)) {
+   if (rmdir(diter->path.buf))
+   die_errno("cannot rmdir '%s'", path->buf);
+   } else if (unlink(diter->path.buf))
die_errno("cannot unlink '%s'", path->buf);
-   strbuf_setlen(path, origlen);
}
-   closedir(dir);
+
if (rmdir(path->buf))
die_errno("cannot rmdir '%s'", path->buf);
 }
--
2.7.4 (Apple Git-66)



[PATCH v4 2/5] dir_iterator: iterate over dir after its contents

2017-03-28 Thread Daniel Ferreira
Create an option for the dir_iterator API to iterate over subdirectories
only after having iterated through their contents. This feature was
predicted, although not implemented by 0fe5043 ("dir_iterator: new API
for iterating over a directory tree", 2016-06-18).

Add the "flags" parameter to dir_iterator_create, allowing for the
aforementioned "depth-first" iteration mode to be enabled. Currently,
the only acceptable flag is DIR_ITERATOR_DEPTH_FIRST.

This is useful for recursively removing a directory and calling rmdir()
on a directory only after all of its contents have been wiped.

Signed-off-by: Daniel Ferreira 
---
 dir-iterator.c | 46 ++
 dir-iterator.h | 14 +++---
 2 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 853c040..545d333 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -48,6 +48,9 @@ struct dir_iterator_int {
 * that will be included in this iteration.
 */
struct dir_iterator_level *levels;
+
+   /* Holds the flags passed to dir_iterator_begin(). */
+   unsigned flags;
 };

 static inline void push_dir_level(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
@@ -114,12 +117,14 @@ int dir_iterator_advance(struct dir_iterator 
*dir_iterator)
}

level->initialized = 1;
-   } else if (S_ISDIR(iter->base.st.st_mode)) {
+   } else if (S_ISDIR(iter->base.st.st_mode) &&
+   !iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
if (level->dir_state == DIR_STATE_ITER) {
/*
 * The directory was just iterated
 * over; now prepare to iterate into
-* it.
+* it (unless an option is set for us
+* to do otherwise).
 */
push_dir_level(iter, level);
continue;
@@ -153,10 +158,27 @@ int dir_iterator_advance(struct dir_iterator 
*dir_iterator)
de = readdir(level->dir);

if (!de) {
-   /* This level is exhausted; pop up a level. */
+   /* This level is exhausted  */
if (errno) {
warning("error reading directory %s: 
%s",
iter->base.path.buf, 
strerror(errno));
+   } else if (iter->flags & 
DIR_ITERATOR_DEPTH_FIRST) {
+   /* If we are handling dirpaths after 
their contents,
+* we have to iterate over the 
directory now that we'll
+* have finished iterating into it. */
+   level->dir = NULL;
+
+   if (pop_dir_level(iter, level) == 0)
+   return 
dir_iterator_abort(dir_iterator);
+
+   level = >levels[iter->levels_nr - 
1];
+   /* Remove a trailing slash */
+   strbuf_strip_suffix(>base.path, 
"/");
+
+   if (set_iterator_data(iter, level))
+   continue;
+
+   return ITER_OK;
} else if (closedir(level->dir))
warning("error closing directory %s: 
%s",
iter->base.path.buf, 
strerror(errno));
@@ -175,8 +197,22 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
if (set_iterator_data(iter, level))
continue;

+   /*
+* If we want to iterate dirs after files, we shall
+* begin looking into them *before* we return the dir
+* itself.
+*/
+   if (S_ISDIR(iter->base.st.st_mode) &&
+   iter->flags & DIR_ITERATOR_DEPTH_FIRST) {
+   push_dir_level(iter, level);
+   goto continue_outer_loop;
+   }
+
return ITER_OK;
}
+
+continue_outer_loop:
+   ;
}
 }

@@ -201,7 +237,7 @@ int dir_iterator_abort(struct dir_iterator *dir_iterator)
return ITER_DONE;
 }

-struct dir_iterator *dir_iterator_begin(const char *path)
+struct dir_iterator *dir_iterator_begin(const char *path, unsigned flags)
 {
struct 

[PATCH v4 5/5] files_reflog_iterator: amend use of dir_iterator

2017-03-28 Thread Daniel Ferreira
Amend a call to dir_iterator_begin() to pass the flags parameter
introduced in 3efb5c0 ("dir_iterator: iterate over dir after its
contents", 2017-28-03).

Signed-off-by: Daniel Ferreira 
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 50188e9..b4bba74 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3346,7 +3346,7 @@ static struct ref_iterator 
*files_reflog_iterator_begin(struct ref_store *ref_st
files_downcast(ref_store, 0, "reflog_iterator_begin");

base_ref_iterator_init(ref_iterator, _reflog_iterator_vtable);
-   iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+   iter->dir_iterator = dir_iterator_begin(git_path("logs"), 0);
return ref_iterator;
 }

--
2.7.4 (Apple Git-66)



[PATCH v4 1/5] dir_iterator: add helpers to dir_iterator_advance

2017-03-28 Thread Daniel Ferreira
Create inline helpers to dir_iterator_advance(). Make
dir_iterator_advance()'s code more legible and allow some behavior to
be reusable.

Signed-off-by: Daniel Ferreira 
---
 dir-iterator.c | 65 +-
 1 file changed, 42 insertions(+), 23 deletions(-)

diff --git a/dir-iterator.c b/dir-iterator.c
index 34182a9..853c040 100644
--- a/dir-iterator.c
+++ b/dir-iterator.c
@@ -50,6 +50,43 @@ struct dir_iterator_int {
struct dir_iterator_level *levels;
 };

+static inline void push_dir_level(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
+{
+   level->dir_state = DIR_STATE_RECURSE;
+   ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+  iter->levels_alloc);
+   level = >levels[iter->levels_nr++];
+   level->initialized = 0;
+}
+
+static inline int pop_dir_level(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
+{
+   return --iter->levels_nr;
+}
+
+static inline int set_iterator_data(struct dir_iterator_int *iter, struct 
dir_iterator_level *level)
+{
+   if (lstat(iter->base.path.buf, >base.st) < 0) {
+   if (errno != ENOENT)
+   warning("error reading path '%s': %s",
+   iter->base.path.buf,
+   strerror(errno));
+   return -1;
+   }
+
+   /*
+* We have to set these each time because
+* the path strbuf might have been realloc()ed.
+*/
+   iter->base.relative_path =
+   iter->base.path.buf + iter->levels[0].prefix_len;
+   iter->base.basename =
+   iter->base.path.buf + level->prefix_len;
+   level->dir_state = DIR_STATE_ITER;
+
+   return 0;
+}
+
 int dir_iterator_advance(struct dir_iterator *dir_iterator)
 {
struct dir_iterator_int *iter =
@@ -84,11 +121,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 * over; now prepare to iterate into
 * it.
 */
-   level->dir_state = DIR_STATE_RECURSE;
-   ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-  iter->levels_alloc);
-   level = >levels[iter->levels_nr++];
-   level->initialized = 0;
+   push_dir_level(iter, level);
continue;
} else {
/*
@@ -104,7 +137,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
 * This level is exhausted (or wasn't opened
 * successfully); pop up a level.
 */
-   if (--iter->levels_nr == 0)
+   if (pop_dir_level(iter, level) == 0)
return dir_iterator_abort(dir_iterator);

continue;
@@ -129,7 +162,7 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
iter->base.path.buf, 
strerror(errno));

level->dir = NULL;
-   if (--iter->levels_nr == 0)
+   if (pop_dir_level(iter, level) == 0)
return dir_iterator_abort(dir_iterator);
break;
}
@@ -138,23 +171,9 @@ int dir_iterator_advance(struct dir_iterator *dir_iterator)
continue;

strbuf_addstr(>base.path, de->d_name);
-   if (lstat(iter->base.path.buf, >base.st) < 0) {
-   if (errno != ENOENT)
-   warning("error reading path '%s': %s",
-   iter->base.path.buf,
-   strerror(errno));
-   continue;
-   }

-   /*
-* We have to set these each time because
-* the path strbuf might have been realloc()ed.
-*/
-   iter->base.relative_path =
-   iter->base.path.buf + 
iter->levels[0].prefix_len;
-   iter->base.basename =
-   iter->base.path.buf + level->prefix_len;
-   level->dir_state = DIR_STATE_ITER;
+   if (set_iterator_data(iter, level))
+   continue;

return ITER_OK;
}
--
2.7.4 (Apple Git-66)



[PATCH v4 0/5] [GSoC] remove_subtree(): reimplement using iterators

2017-03-28 Thread Daniel Ferreira
Hi there,

This is the fourth version of the GSoC microproject of refactoring
remove_subtree() from recursively using readdir() to use dir_iterator. Below
are the threads for other versions:

v1: 
https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#t
v2: 
https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t
v3: 
https://public-inbox.org/git/CAGZ79kYtpmURSQWPumobA=e3jbfjkhwcdv_lphkcd71zrwm...@mail.gmail.com/T/#t

In this version of the patch, I followed Michael's suggestion
of splitting the commits responsible for adding the new feature to
dir_iterator. His suggestion of using flags instead of an options
struct has also been adopted.

This version also contains a test that is finally able to test the
function decently (not the one Stefan had suggested, which
did not result in a call to remove_subtree). I am just unsure about
its location in t/. I'd appreciate suggestions to put it in a more
decent home.

Daniel Ferreira (5):
  dir_iterator: add helpers to dir_iterator_advance
  dir_iterator: iterate over dir after its contents
  remove_subtree(): reimplement using iterators
  remove_subtree(): test removing nested directories
  files_reflog_iterator: amend use of dir_iterator

 dir-iterator.c  | 105 +++-
 dir-iterator.h  |  14 --
 entry.c |  31 
 refs/files-backend.c|   2 +-
 t/t2000-checkout-cache-clash.sh |  11 +
 5 files changed, 114 insertions(+), 49 deletions(-)

--
2.7.4 (Apple Git-66)



[PATCH v4 4/5] remove_subtree(): test removing nested directories

2017-03-28 Thread Daniel Ferreira
Test removing a nested directory when an attempt is made to restore the
index to a state where it does not exist. A similar test could be found
previously in t/t2000-checkout-cache-clash.sh, but it did not check for
nested directories, which could allow a faulty implementation of
remove_subtree() pass the tests.

Signed-off-by: Daniel Ferreira 
---
 t/t2000-checkout-cache-clash.sh | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/t/t2000-checkout-cache-clash.sh b/t/t2000-checkout-cache-clash.sh
index de3edb5..ac10ba3 100755
--- a/t/t2000-checkout-cache-clash.sh
+++ b/t/t2000-checkout-cache-clash.sh
@@ -57,4 +57,15 @@ test_expect_success SYMLINKS 'checkout-index -f twice with 
--prefix' '
git checkout-index -a -f --prefix=there/
 '

+test_expect_success 'git checkout-index -f should remove nested subtrees' '
+   echo content >path &&
+   git update-index --add path &&
+   rm path &&
+   mkdir -p path/with/nested/paths &&
+   echo content >path/file1 &&
+   echo content >path/with/nested/paths/file2 &&
+   git checkout-index -f -a &&
+   test ! -d path
+'
+
 test_done
--
2.7.4 (Apple Git-66)



Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *

2017-03-28 Thread brian m. carlson
On Tue, Mar 28, 2017 at 10:27:41AM -0700, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> 
> > Convert the callers to pass struct object_id by changing the function
> > declaration and definition and applying the following semantic patch:
> >
> > @@
> > expression E1, E2, E3;
> > @@
> > - sha1_array_append(E1, E2[E3].hash)
> > + sha1_array_append(E1, E2 + E3)
> >
> > @@
> > expression E1, E2;
> > @@
> > - sha1_array_append(E1, E2.hash)
> > + sha1_array_append(E1, )
> 
> I noticed something similar in the change to bisect.c while reading
> the previous step, and I suspect that the above two rules leave
> somewhat inconsistent and harder-to-read result.  Wouldn't it make
> the result more readable if the former rule were
> 
> -sha1_array_append(E1, E2[E3].hash)
> +sha1_array_append(E1, [E3])
> 
> 
> FWIW, the bit that made me read it twice in the previous step was
> this change
> 
> - strbuf_addstr(_hexs, sha1_to_hex(array->sha1[i]));
> + strbuf_addstr(_hexs, oid_to_hex(array->oid + i));
> 
> which I would have written &(array->oid[i]) instead.
> 
> After all, the original written by a human said E2[E3].hash (or
> array->sha1[i]) because to the human's mind, E2 is a series of
> things that can be indexed with an int E3, and even though 
> 
> *(E2 + E3)
> E2[E3]
> E3[E2]
> 
> all mean the same thing, the human decided that E2[E3] is the most
> natural way to express this particular reference to an item in the
> array.  [E3] would keep that intention by the original author
> better than E2 + E3.

I'm happy to make that change.  I'm an experienced C programmer, so a
pointer addition seems very readable and natural to me, but if you think
it's better or more readable as [E3], I can certainly reroll.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v3 2/5] setup: allow for prefix to be passed to git commands

2017-03-28 Thread Stefan Beller
On Mon, Mar 20, 2017 at 3:34 PM, Brandon Williams  wrote:
> That the gist of how I'm hoping to solve the problem.  Hopefully that was
> clear enough to get some feedback on.

Junio wrote in  "What's cooking in git.git (Mar 2017, #10; Fri, 24)"
> I saw no particular issues myself.  Do others find this series good
> (not just "meh--it is harmless" but I want to hear a positive "yes
> these are good changes")?

So I reviewed them again, and I think they are good to go.
As a followup we may want to consider this

diff --git a/setup.c b/setup.c
index 56cd68ba93..bdc294091a 100644
--- a/setup.c
+++ b/setup.c
@@ -944,6 +944,10 @@ const char *setup_git_directory_gently(int *nongit_ok)
 prefix = setup_git_directory_gently_1(nongit_ok);
 env_prefix = getenv(GIT_TOPLEVEL_PREFIX_ENVIRONMENT);

+if (prefix && env_prefix)
+die("BUG: can't invoke command in sub directory with %s set",
+GIT_TOPLEVEL_PREFIX_ENVIRONMENT);
+
 if (env_prefix)
 prefix = env_prefix;
--

Thanks,
Stefan


Re: [PATCH 2/2] submodule.c: correctly handle nested submodules in is_submodule_modified

2017-03-28 Thread Jonathan Nieder
Stefan Beller wrote:

> Suppose I have a superproject 'super', with two submodules 'super/sub'
> and 'super/sub1'. 'super/sub' itself contains a submodule
> 'super/sub/subsub'. Now suppose I run, from within 'super':
>
> echo hi >sub/subsub/stray-file
> echo hi >sub1/stray-file
>
> Currently we get would see the following output in git-status:
>
> git status --short
>  m sub
>  ? sub1
>
> With this patch applied, the untracked file in the nested submodule is
> turned into an untracked file on the 'super' level as well.

s/turned into/displayed as/

> git status --short
>  ? sub
>  ? sub1
>
> This doesn't change the output of 'git status --porcelain=1' for nested
> submodules, because its output is always ' M' for either untracked files
> or local modifications no matter the nesting level of the submodule.
>
> 'git status --porcelain=2' is affected by this change in a nested
> submodule, though. Without this patch it would report the direct submodule
> as modified and having no untracked files. With this patch it would report
> untracked files. Chalk this up as a bug fix.

I think that's reasonable, and the documentation does a good job of
describing it.

Does this have any effect on the default mode of 'git status' (without
--short or --porcelain)?

[...]
> --- a/Documentation/git-status.txt
> +++ b/Documentation/git-status.txt
> @@ -187,6 +187,8 @@ Submodules have more state and instead report
>   mthe submodule has modified content
>   ?the submodule has untracked files
>  
> +Note that 'm' and '?' are applied recursively, e.g. if a nested submodule
> +in a submodule contains an untracked file, this is reported as '?' as well.

Language nits:

* Can simplify by leaving out "Note that ".
* s/, e\.g\./. For example,/

Should this say a brief word about rationale?  The obvious way to
describe it would involve "git add --recurse-submodules", which alas
doesn't exist yet.  But we could say

  this is reported as '?' as well,
   since the change cannot be added using "git add -u" from within any of the
   submodules.

[...]
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1078,8 +1078,27 @@ unsigned is_submodule_modified(const char *path, int 
> ignore_untracked)
>   /* regular untracked files */
>   if (buf.buf[0] == '?')
>   dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> - else
> - dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
> +
> + if (buf.buf[0] == 'u' ||
> + buf.buf[0] == '1' ||
> + buf.buf[0] == '2') {
> + /*
> +  * T XY :
> +  * T = line type, XY = status,  = submodule state
> +  */
> + if (buf.len < 1 + 1 + 2 + 1 + 4)

optional: Can shorten:

/* T = line type, XY = status,  = submodule state */
if (buf.len < strlen("T XY "))
...

That produces the same code at run time because compilers are able to
inline the strlen of a constant.  What you already have also seems
sensible, though.

[...]
> + die("BUG: invalid status --porcelain=2 line %s",
> + buf.buf);
> +
> + /* regular unmerged and renamed files */
> + if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
> + /* nested untracked file */
> + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
> +
> + if (memcmp(buf.buf + 5, "S..U", 4))
> + /* other change */
> + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;

sanity check: What does this do for a "2" line indicating a sub-submodule
that has been renamed that contains an untracked file?  Do we need to
rely on some other indication to show this as a change?

Enumerating some more cases, since I keep finding myself getting lost:

 - if the HEAD commit of "sub" changes, we show this as " M sub".
   What should we show if the HEAD commit of "sub/subsub" changes?
   I think this should be " m".

 - if "sub" is renamed, we show this as "R  sub -> newname".
   What should we show if "sub/subsub" is renamed?  It is tempting
   to show this as " m".

 - if "sub" is deleted, we show this as "D  sub". What should we
   show if "sub/subsub" is deleted? I think this is " m".

It keeps getting more complicated, but I think this is making sense.

Thanks and hope that helps,
Jonathan


Re: [PATCH 1/2] short status: improve reporting for submodule changes

2017-03-28 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

[...]
> +++ b/t/t7506-status-submodule.sh
[...]
> @@ -287,4 +311,82 @@ test_expect_success 'diff --submodule with merge 
> conflict in .gitmodules' '
>   test_cmp diff_submodule_actual diff_submodule_expect
>  '
>  
> +test_expect_success 'setup superproject with untracked file in nested 
> submodule' '
> + (
> + cd super &&
> + git clean -dfx &&
> + rm .gitmodules &&
> + git submodule add -f ./sub1 &&
> + git submodule add -f ./sub2 &&
> + git commit -a -m "messy merge in superproject" &&
> + (
> + cd sub1 &&
> + git submodule add ../sub2 &&
> + git commit -a -m "add sub2 to sub1"
> + ) &&
> + git add sub1 &&
> + git commit -a -m "update sub1 to contain nested sub"
> + ) &&
> + echo "{ \$7=\"HASH\"; \$8=\"HASH\"; print }" >suppress_hashes.awk &&
> + echo "suppress_hashes.awk" >>.git/info/exclude &&

I had some trouble following what suppress_hashes.awk does at first.

Some other examples that could be worth imitating:

- diff-lib.sh has some sanitize_diff functions using 'sed'
- t4202 has a sanitize_output functino, also using 'sed'
- grepping for $_x40 finds some other examples (these will be fun to
  change when the hash function changes, but at least they're easy to
  find).

The main general idea I have in mind is that providing a function at the
top of the file for the test to use instead of a script that interacts
with what git is tracking can make things easier to understand.

Also, could there be a comment with a diagram describing the setup
(sub1 vs sub2, etc)?

[...]
> +test_expect_success 'status with untracked file in nested submodule 
> (porcelain)' '
> + git -C super status --porcelain >output &&
> + diff output - <<-\EOF
> +  M sub1
> +  M sub2
> + EOF
> +'
> +
> +test_expect_success 'status with untracked file in nested submodule 
> (porcelain=2)' '
> + git -C super status --porcelain=2 >output &&
> + awk -f suppress_hashes.awk output >output2 &&
> + diff output2 - <<-\EOF
> + 1 .M S.M. 16 16 16 HASH HASH sub1
> + 1 .M S..U 16 16 16 HASH HASH sub2
> + EOF
> +'
> +
> +test_expect_success 'status with untracked file in nested submodule (short)' 
> '
> + git -C super status --short >output &&
> + diff output - <<-\EOF
> +  m sub1
> +  ? sub2
> + EOF
> +'
> +
> +test_expect_success 'setup superproject with modified file in nested 
> submodule' '
> + git -C super/sub1/sub2 add file &&
> + git -C super/sub2 add file
> +'
> +
> +test_expect_success 'status with added file in nested submodule (porcelain)' 
> '
> + git -C super status --porcelain >output &&
> + diff output - <<-\EOF
> +  M sub1
> +  M sub2
> + EOF
> +'
> +
> +test_expect_success 'status with added file in nested submodule 
> (porcelain=2)' '
> + git -C super status --porcelain=2 >output &&
> + awk -f suppress_hashes.awk output >output2 &&
> + diff output2 - <<-\EOF
> + 1 .M S.M. 16 16 16 HASH HASH sub1
> + 1 .M S.M. 16 16 16 HASH HASH sub2
> + EOF
> +'
> +
> +test_expect_success 'status with added file in nested submodule (short)' '
> + git -C super status --short >output &&
> + diff output - <<-\EOF
> +  m sub1
> +  m sub2
> + EOF

How does ordinary non --short "git status" handle these cases?

What should happen when there is a new untracked repository within a
submodule?

What should happen if there is both a modified tracked file and an
untracked file in a sub-submodule?


> +'
> +
>  test_done

Very nice.  Thanks --- this was exactly what I was looking for.

The rest looks good.

Sincerely,
Jonathan


Re: Proposal for "fetch-any-blob Git protocol" and server design

2017-03-28 Thread Stefan Beller
+cc Ben Peart, who sent
"[RFC] Add support for downloading blobs on demand" to the list recently.
This proposal here seems like it has the same goal, so maybe your review
could go a long way here?

Thanks,
Stefan

On Tue, Mar 14, 2017 at 3:57 PM, Jonathan Tan  wrote:
> As described in "Background" below, there have been at least 2 patch sets to
> support "partial clones" and on-demand blob fetches, where the server part
> that supports on-demand blob fetches was treated at least in outline. Here
> is a proposal treating that server part in detail.
>
> == Background
>
> The desire for Git to support (i) missing blobs and (ii) fetching them as
> needed from a remote repository has surfaced on the mailing list a few
> times, most recently in the form of RFC patch sets [1] [2].
>
> A local repository that supports (i) will be created by a "partial clone",
> that is, a clone with some special parameters (exact parameters are still
> being discussed) that does not download all blobs normally downloaded. Such
> a repository should support (ii), which is what this proposal describes.
>
> == Design
>
> A new endpoint "server" is created. The client will send a message in the
> following format:
>
> 
> fbp-request = PKT-LINE("fetch-blob-pack")
>   1*want
>   flush-pkt
> want = PKT-LINE("want" SP obj-id)
> 
>
> The client may send one or more SHA-1s for which it wants blobs, then a
> flush-pkt.
>
> The server will then reply:
>
> 
> server-reply = flush-pkt | PKT-LINE("ERR" SP message)
> 
>
> If there was no error, the server will then send them in a packfile,
> formatted like described in "Packfile Data" in pack-protocol.txt with
> "side-band-64k" enabled.
>
> Any server that supports "partial clone" will also support this, and the
> client will automatically assume this. (How a client discovers "partial
> clone" is not covered by this proposal.)
>
> The server will perform reachability checks on requested blobs through the
> equivalent of "git rev-list --use-bitmap-index" (like "git upload-pack" when
> using the allowreachablesha1inwant option), unless configured to suppress
> reachability checks through a config option. The server administrator is
> highly recommended to regularly regenerate the bitmap (or suppress
> reachability checks).
>
> === Endpoint support for forward compatibility
>
> This "server" endpoint requires that the first line be understood, but will
> ignore any other lines starting with words that it does not understand. This
> allows new "commands" to be added (distinguished by their first lines) and
> existing commands to be "upgraded" with backwards compatibility.
>
> === Related improvements possible with new endpoint
>
> Previous protocol upgrade suggestions have had to face the difficulty of
> allowing updated clients to discover the server support while not slowing
> down (for example, through extra network round-trips) any client, whether
> non-updated or updated. The introduction of "partial clone" allows clients
> to rely on the guarantee that any server that supports "partial clone"
> supports "fetch-blob-pack", and we can extend the guarantee to other
> protocol upgrades that such repos would want.
>
> One such upgrade is "ref-in-want" [3]. The full details can be obtained from
> that email thread, but to summarize, the patch set eliminates the need for
> the initial ref advertisement and allows communication in ref name globs,
> making it much easier for multiple load-balanced servers to serve large
> repos to clients - this is something that would greatly benefit the Android
> project, for example, and possibly many others.
>
> Bundling support for "ref-in-want" with "fetch-blob-pack" simplifies matters
> for the client in that a client needs to only handle one "version" of server
> (a server that supports both). If "ref-in-want" were added later, instead of
> now, clients would need to be able to handle two "versions" (one with only
> "fetch-blob-pack" and one with both "fetch-blob-pack" and "ref-in-want").
>
> As for its implementation, that email thread already contains a patch set
> that makes it work with the existing "upload-pack" endpoint; I can update
> that patch set to use the proposed "server" endpoint (with a
> "fetch-commit-pack" message) if need be.
>
> == Client behavior
>
> This proposal is concerned with server behavior only, but it is useful to
> envision how the client would use this to ensure that the server behavior is
> useful.
>
> === Indication to use the proposed endpoint
>
> The client will probably already record that at least one of its remotes
> (the one that it successfully performed a "partial clone" from) supports
> this new endpoint (if not, it can’t determine whether a missing blob was
> caused by repo corruption or by the "partial clone"). This knowledge can be
> used both to know that the server supports "fetch-blob-pack" and
> "fetch-commit-pack" (for the latter, the client can 

Git fails to build on Ubuntu Server 16.04

2017-03-28 Thread Jeffrey Walton
I configured with --enable-pthreads, and LIBS included -lpthread.

$ make V=1
gcc -I/usr/local/include -g -O2 -I. -DHAVE_ALLOCA_H
-I/usr/local/include -DUSE_CURL_FOR_IMAP_SEND -I/usr/local/include
-I/usr/local/include  -DHAVE_PATHS_H -DHAVE_STRINGS_H -DHAVE_DEV_TTY
-DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM
-DSHA1_HEADER=''  -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"'
-DPAGER_ENV='"LESS=FRX LV=-c"' -o git-credential-store
-Wl,-rpath,/usr/local/lib -L/usr/local/lib  credential-store.o
common-main.o libgit.a xdiff/lib.a  -L/usr/local/lib
-Wl,-rpath,/usr/local/lib -lz -L/usr/local/lib
-Wl,-rpath,/usr/local/lib -lcrypto  -lrt
/usr/bin/ld: libgit.a(run-command.o): undefined reference to symbol
'pthread_sigmask@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO
missing from command line
collect2: error: ld returned 1 exit status
Makefile:2053: recipe for target 'git-credential-store' failed
make: *** [git-credential-store] Error 1


If the makefile is modified after 'make configure':

find `pwd` -name Makefile -exec sed -i 's|-lrt|-lrt -lpthread|g' {} \;

Then Git builds successfully.

It appears the request to use pthreads is not being honored.

Thanks.


[PATCH v8 0/7] short status: improve reporting for submodule changes

2017-03-28 Thread Stefan Beller
v8:
* This is a resend of the last two patches, i.e. these two patches apply
  at 5c896f7c3ec (origin/sb/submodule-short-status^^)
* below is a diff of this patch series against origin/sb/submodule-short-status
* add tests showing the subtle bug fix in case of nesting.
* add a bit of documentation

v7:
previous work at
https://public-inbox.org/git/20170325003610.15282-1-sbel...@google.com/

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 01b457c322..452c6eb875 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -187,6 +187,8 @@ Submodules have more state and instead report
mthe submodule has modified content
?the submodule has untracked files
 
+Note that 'm' and '?' are applied recursively, e.g. if a nested submodule
+in a submodule contains an untracked file, this is reported as '?' as well.
 
 If -b is used the short-format status is preceded by a line
 
diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index ab822c79e6..4d6d8f6817 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -327,20 +327,65 @@ test_expect_success 'setup superproject with untracked 
file in nested submodule'
git add sub1 &&
git commit -a -m "update sub1 to contain nested sub"
) &&
-   echo untracked >super/sub1/sub2/untracked
+   echo "{ \$7=\"HASH\"; \$8=\"HASH\"; print }" >suppress_hashes.awk &&
+   echo "suppress_hashes.awk" >>.git/info/exclude &&
+   echo "output2" >>.git/info/exclude &&
+   echo content >super/sub1/sub2/file &&
+   echo content >super/sub2/file
 '
 
 test_expect_success 'status with untracked file in nested submodule 
(porcelain)' '
git -C super status --porcelain >output &&
diff output - <<-\EOF
 M sub1
+M sub2
+   EOF
+'
+
+test_expect_success 'status with untracked file in nested submodule 
(porcelain=2)' '
+   git -C super status --porcelain=2 >output &&
+   awk -f suppress_hashes.awk output >output2 &&
+   diff output2 - <<-\EOF
+   1 .M S..U 16 16 16 HASH HASH sub1
+   1 .M S..U 16 16 16 HASH HASH sub2
EOF
 '
 
 test_expect_success 'status with untracked file in nested submodule (short)' '
git -C super status --short >output &&
diff output - <<-\EOF
-? sub1
+m sub1
+? sub2
+   EOF
+'
+
+test_expect_success 'setup superproject with modified file in nested 
submodule' '
+   git -C super/sub1/sub2 add file &&
+   git -C super/sub2 add file
+'
+
+test_expect_success 'status with added file in nested submodule (porcelain)' '
+   git -C super status --porcelain >output &&
+   diff output - <<-\EOF
+M sub1
+M sub2
+   EOF
+'
+
+test_expect_success 'status with added file in nested submodule (porcelain=2)' 
'
+   git -C super status --porcelain=2 >output &&
+   awk -f suppress_hashes.awk output >output2 &&
+   diff output2 - <<-\EOF
+   1 .M S.M. 16 16 16 HASH HASH sub1
+   1 .M S.M. 16 16 16 HASH HASH sub2
+   EOF
+'
+
+test_expect_success 'status with added file in nested submodule (short)' '
+   git -C super status --short >output &&
+   diff output - <<-\EOF
+m sub1
+m sub2
EOF
 '
 


[PATCH 2/2] submodule.c: correctly handle nested submodules in is_submodule_modified

2017-03-28 Thread Stefan Beller
Suppose I have a superproject 'super', with two submodules 'super/sub'
and 'super/sub1'. 'super/sub' itself contains a submodule
'super/sub/subsub'. Now suppose I run, from within 'super':

echo hi >sub/subsub/stray-file
echo hi >sub1/stray-file

Currently we get would see the following output in git-status:

git status --short
 m sub
 ? sub1

With this patch applied, the untracked file in the nested submodule is
turned into an untracked file on the 'super' level as well.

git status --short
 ? sub
 ? sub1

This doesn't change the output of 'git status --porcelain=1' for nested
submodules, because its output is always ' M' for either untracked files
or local modifications no matter the nesting level of the submodule.

'git status --porcelain=2' is affected by this change in a nested
submodule, though. Without this patch it would report the direct submodule
as modified and having no untracked files. With this patch it would report
untracked files. Chalk this up as a bug fix.

Signed-off-by: Stefan Beller 
---
 Documentation/git-status.txt |  2 ++
 submodule.c  | 23 +--
 t/t3600-rm.sh|  2 +-
 t/t7506-status-submodule.sh  |  2 +-
 4 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index 01b457c322..452c6eb875 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -187,6 +187,8 @@ Submodules have more state and instead report
mthe submodule has modified content
?the submodule has untracked files
 
+Note that 'm' and '?' are applied recursively, e.g. if a nested submodule
+in a submodule contains an untracked file, this is reported as '?' as well.
 
 If -b is used the short-format status is preceded by a line
 
diff --git a/submodule.c b/submodule.c
index fa21c7bb72..730cc9513a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1078,8 +1078,27 @@ unsigned is_submodule_modified(const char *path, int 
ignore_untracked)
/* regular untracked files */
if (buf.buf[0] == '?')
dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
-   else
-   dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
+
+   if (buf.buf[0] == 'u' ||
+   buf.buf[0] == '1' ||
+   buf.buf[0] == '2') {
+   /*
+* T XY :
+* T = line type, XY = status,  = submodule state
+*/
+   if (buf.len < 1 + 1 + 2 + 1 + 4)
+   die("BUG: invalid status --porcelain=2 line %s",
+   buf.buf);
+
+   /* regular unmerged and renamed files */
+   if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
+   /* nested untracked file */
+   dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
+
+   if (memcmp(buf.buf + 5, "S..U", 4))
+   /* other change */
+   dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
+   }
 
if ((dirty_submodule & DIRTY_SUBMODULE_MODIFIED) &&
((dirty_submodule & DIRTY_SUBMODULE_UNTRACKED) ||
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index a6e5c5bd56..b58793448b 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -659,7 +659,7 @@ test_expect_success 'rm of a populated nested submodule 
with nested untracked fi
test -d submod &&
test -f submod/.git &&
git status -s -uno --ignore-submodules=none >actual &&
-   test_cmp expect.modified_inside actual &&
+   test_cmp expect.modified_untracked actual &&
git rm -f submod &&
test ! -d submod &&
git status -s -uno --ignore-submodules=none >actual &&
diff --git a/t/t7506-status-submodule.sh b/t/t7506-status-submodule.sh
index fd057751df..4d6d8f6817 100755
--- a/t/t7506-status-submodule.sh
+++ b/t/t7506-status-submodule.sh
@@ -346,7 +346,7 @@ test_expect_success 'status with untracked file in nested 
submodule (porcelain=2
git -C super status --porcelain=2 >output &&
awk -f suppress_hashes.awk output >output2 &&
diff output2 - <<-\EOF
-   1 .M S.M. 16 16 16 HASH HASH sub1
+   1 .M S..U 16 16 16 HASH HASH sub1
1 .M S..U 16 16 16 HASH HASH sub2
EOF
 '
-- 
2.12.1.438.g67623a8358



[PATCH 1/2] short status: improve reporting for submodule changes

2017-03-28 Thread Stefan Beller
If I add an untracked file to a submodule or modify a tracked file,
currently "git status --short" treats the change in the same way as
changes to the current HEAD of the submodule:

$ git clone --quiet --recurse-submodules 
https://gerrit.googlesource.com/gerrit
$ echo hello >gerrit/plugins/replication/stray-file
$ sed -i -e 's/.*//' gerrit/plugins/replication/.mailmap
$ git -C gerrit status --short
 M plugins/replication

This is by analogy with ordinary files, where "M" represents a change
that has not been added yet to the index.  But this change cannot be
added to the index without entering the submodule, "git add"-ing it,
and running "git commit", so the analogy is counterproductive.

Introduce new status letters " ?" and " m" for this.  These are similar
to the existing "??" and " M" but mean that the submodule (not the
parent project) has new untracked files and modified files, respectively.
The user can use "git add" and "git commit" from within the submodule to
add them.

Changes to the submodule's HEAD commit can be recorded in the index with
a plain "git add -u" and are shown with " M", like today.

To avoid excessive clutter, show at most one of " ?", " m", and " M" for
the submodule.  They represent increasing levels of change --- the last
one that applies is shown (e.g., " m" if there are both modified files
and untracked files in the submodule, or " M" if the submodule's HEAD
has been modified and it has untracked files).

While making these changes, we need to make sure to not break porcelain
level 1, which shares code with "status --short".  We only change
"git status --short".

Non-short "git status" and "git status --porcelain=2" already handle
these cases by showing more detail:

$ git -C gerrit status --porcelain=2
1 .M S.MU 16 16 16 305c864db28eb0c77c8499bc04c87de3f849cf3c 
305c864db28eb0c77c8499bc04c87de3f849cf3c plugins/replication
$ git -C gerrit status
[...]
modified:   plugins/replication (modified content, untracked content)

Scripts caring about these distinctions should use --porcelain=2.

Helped-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
Reviewed-by: Jonathan Nieder 
---
 Documentation/git-status.txt |   9 
 t/t3600-rm.sh|  18 +---
 t/t7506-status-submodule.sh  | 102 +++
 wt-status.c  |  17 +++-
 4 files changed, 139 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-status.txt b/Documentation/git-status.txt
index ba873657cf..01b457c322 100644
--- a/Documentation/git-status.txt
+++ b/Documentation/git-status.txt
@@ -181,6 +181,13 @@ in which case `XY` are `!!`.
 !   !ignored
 -
 
+Submodules have more state and instead report
+   Mthe submodule has a different HEAD than
+recorded in the index
+   mthe submodule has modified content
+   ?the submodule has untracked files
+
+
 If -b is used the short-format status is preceded by a line
 
 ## branchname tracking info
@@ -210,6 +217,8 @@ field from the first filename).  Third, filenames 
containing special
 characters are not specially formatted; no quoting or
 backslash-escaping is performed.
 
+Any submodule changes are reported as modified `M` instead of `m` or single 
`?`.
+
 Porcelain Format Version 2
 ~~
 
diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
index 5aa6db584c..a6e5c5bd56 100755
--- a/t/t3600-rm.sh
+++ b/t/t3600-rm.sh
@@ -268,6 +268,14 @@ cat >expect.modified actual &&
@@ -436,7 +444,7 @@ test_expect_success 'rm of a populated submodule with 
untracked files fails unle
test -d submod &&
test -f submod/.git &&
git status -s -uno --ignore-submodules=none >actual &&
-   test_cmp expect.modified actual &&
+   test_cmp expect.modified_untracked actual &&
git rm -f submod &&
test ! -d submod &&
git status -s -uno --ignore-submodules=none >actual &&
@@ -621,7 +629,7 @@ test_expect_success 'rm of a populated nested submodule 
with different nested HE
test -d submod &&
test -f submod/.git &&
git status -s -uno --ignore-submodules=none >actual &&
-   test_cmp expect.modified actual &&
+   test_cmp expect.modified_inside actual &&
git rm -f submod &&
test ! -d submod &&
git status -s -uno --ignore-submodules=none >actual &&
@@ -636,7 +644,7 @@ test_expect_success 'rm of a populated nested submodule 
with nested modification
test -d 

Re: [PATCH 0/18] snprintf cleanups

2017-03-28 Thread Junio C Hamano
Jeff King  writes:

> It's a lot of patches, but hopefully they're all pretty straightforward
> to read.

Yes, quite a lot of changes.  I didn't see anything questionable in
there.

As to the "patch-id" thing, I find the alternate one slightly easier
to read.  Also, exactly because this is not a performance critical
codepath, it may be better if patch_id_add_string() filtered out
whitespaces; that would allow the source to express things in more
natural way, e.g.

patch_id_addf(, "new file mode");
patch_id_addf(, "%06o", p->two->mode);
patch_id_addf(, "--- /dev/null");
patch_id_addf(, "+++ b/%.*s", len2, p->two->path);

Or I may be going overboard by bringing "addf" into the mix X-<.


Git hackathon New York / London - call for mentors

2017-03-28 Thread Charles Bailey
Bloomberg would like to host a Git hackathon over a weekend in both New
York and London, towards the end of April or the beginning of May.

Crucial to the success of the weekend will be having mentors available
in both locations who can guide people on the project. Mentors should
have some experience with developing for Git and should be familiar with
the process and guidelines around contributing.

If you are interested in being a mentor or have further questions, then
please get in contact with me via email (either to this address or to
cbaile...@bloomberg.net) letting me know whether you are closer to New
York or London and if you have any date restrictions.

Charles.

---

Git was the first project for which we hosted an "Open Source Day" and
since then we've learned a lot and would like to revisit Git again.

The event will involve volunteers who are usually competent programmers
but who don't necessarily have experience with contributing to Git,
working to contribute to the project over two days. Typically the type
of tasks tackled would include documentation improvements, test case
improvements and very simple bug fixes that have previously been
identified as "low hanging fruit".


Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 11:15:12PM +0200, Christian Couder wrote:

> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe  wrote:
> > FreeBSD implements getcwd(3) as a syscall, but falls back to a version
> > based on readdir(3) if it fails for some reason.  The latter requires
> > permissions to read and execute path components, while the former does
> > not.  That means that if our buffer is too small and we're missing
> > rights we could get EACCES, but we may succeed with a bigger buffer.
> >
> > Keep retrying if getcwd(3) indicates lack of permissions until our
> > buffer can fit PATH_MAX bytes, as that's the maximum supported by the
> > syscall on FreeBSD anyway.  This way we do what we can to be able to
> > benefit from the syscall, but we also won't loop forever if there is a
> > real permission issue.
> 
> Sorry to be late and maybe I missed something obvious, but the above
> and the patch seem complex to me compared with something like:
> 
> diff --git a/strbuf.c b/strbuf.c
> index ace58e7367..25eadcbedc 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char
> *path, size_t hint)
>  int strbuf_getcwd(struct strbuf *sb)
>  {
> size_t oldalloc = sb->alloc;
> -   size_t guessed_len = 128;
> +   size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128;
> 
> for (;; guessed_len *= 2) {
> strbuf_grow(sb, guessed_len);

I think the main reason is just that we do not have to pay the price to
allocate PATH_MAX-sized buffers when they are rarely used.

I doubt it matters all that much in practice, though.

-Peff


Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-28 Thread Christian Couder
On Tue, Mar 28, 2017 at 11:24 PM, Stefan Beller  wrote:
> On Tue, Mar 28, 2017 at 2:15 PM, Christian Couder
>  wrote:
>> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe  wrote:
>>> FreeBSD implements getcwd(3) as a syscall, but falls back to a version
>>> based on readdir(3) if it fails for some reason.  The latter requires
>>> permissions to read and execute path components, while the former does
>>> not.  That means that if our buffer is too small and we're missing
>>> rights we could get EACCES, but we may succeed with a bigger buffer.
>>>
>>> Keep retrying if getcwd(3) indicates lack of permissions until our
>>> buffer can fit PATH_MAX bytes, as that's the maximum supported by the
>>> syscall on FreeBSD anyway.  This way we do what we can to be able to
>>> benefit from the syscall, but we also won't loop forever if there is a
>>> real permission issue.
>>
>> Sorry to be late and maybe I missed something obvious, but the above
>> and the patch seem complex to me compared with something like:
>>
>> diff --git a/strbuf.c b/strbuf.c
>> index ace58e7367..25eadcbedc 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char
>> *path, size_t hint)
>>  int strbuf_getcwd(struct strbuf *sb)
>>  {
>> size_t oldalloc = sb->alloc;
>> -   size_t guessed_len = 128;
>> +   size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128;
>>
>> for (;; guessed_len *= 2) {
>> strbuf_grow(sb, guessed_len);
>
> From f22a76e911 (strbuf: add strbuf_getcwd(), 2014-07-28):
> Because it doesn't use a fixed-size buffer it supports
> arbitrarily long paths, provided the platform's getcwd() does as well.
> At least on Linux and FreeBSD it handles paths longer than PATH_MAX
> just fine.

Well René's patch says:

>>> Keep retrying if getcwd(3) indicates lack of permissions until our
>>> buffer can fit PATH_MAX bytes, as that's the maximum supported by the
>>> syscall on FreeBSD anyway.

So it says that FreeBSD syscall doesn't handle paths longer than PATH_MAX.

> So with your patch, we'd still see the original issue for paths > PATH_MAX
> IIUC.

Also, René's patch just adds:

+   if (errno == EACCES && guessed_len < PATH_MAX)
+   continue;

so if the length of the path is > PATH_MAX, then guessed_len will have
to be > PATH_MAX and then René's patch will do nothing.


Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-28 Thread Stefan Beller
On Tue, Mar 28, 2017 at 2:15 PM, Christian Couder
 wrote:
> On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe  wrote:
>> FreeBSD implements getcwd(3) as a syscall, but falls back to a version
>> based on readdir(3) if it fails for some reason.  The latter requires
>> permissions to read and execute path components, while the former does
>> not.  That means that if our buffer is too small and we're missing
>> rights we could get EACCES, but we may succeed with a bigger buffer.
>>
>> Keep retrying if getcwd(3) indicates lack of permissions until our
>> buffer can fit PATH_MAX bytes, as that's the maximum supported by the
>> syscall on FreeBSD anyway.  This way we do what we can to be able to
>> benefit from the syscall, but we also won't loop forever if there is a
>> real permission issue.
>
> Sorry to be late and maybe I missed something obvious, but the above
> and the patch seem complex to me compared with something like:
>
> diff --git a/strbuf.c b/strbuf.c
> index ace58e7367..25eadcbedc 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char
> *path, size_t hint)
>  int strbuf_getcwd(struct strbuf *sb)
>  {
> size_t oldalloc = sb->alloc;
> -   size_t guessed_len = 128;
> +   size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128;
>
> for (;; guessed_len *= 2) {
> strbuf_grow(sb, guessed_len);

>From f22a76e911 (strbuf: add strbuf_getcwd(), 2014-07-28):
Because it doesn't use a fixed-size buffer it supports
arbitrarily long paths, provided the platform's getcwd() does as well.
At least on Linux and FreeBSD it handles paths longer than PATH_MAX
just fine.

So with your patch, we'd still see the original issue for paths > PATH_MAX
IIUC.

Thanks,
Stefan


Re: [PATCH 7/7] submodule.c: correctly handle nested submodules in is_submodule_modified

2017-03-28 Thread Stefan Beller
On Mon, Mar 27, 2017 at 2:46 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> When a nested submodule has untracked files, it would be reported as
>> "modified submodule" in the superproject, because submodules are not
>> parsed correctly in is_submodule_modified as they are bucketed into
>> the modified pile as "they are not an untracked file".
>
> I cannot quite parse the above.

I tried to describe the example Jonathan gave in his reply in a shorter form.
I'll
>> + /* regular unmerged and renamed files */
>> + if (buf.buf[5] == 'S' && buf.buf[8] == 'U')
>> + /* nested untracked file */
>> + dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
>
> OK, we have untracked one.
>
>> + if (memcmp(buf.buf + 5, "S..U", 4))
>> + /* other change */
>> + dirty_submodule |= DIRTY_SUBMODULE_MODIFIED;
>
> And for other cases like C at buf.buf[6] or M at buf.buf[7],
> i.e. where the submodule not just has untracked files but has real
> changes, we say it is truly MODIFIED here.
>
> If there are changes to paths that is not a submodule but a tracked
> file in the submodule in question would have N at buf.buf[5] and is
> also caught with the same "not S..U so that's MODIFIED" logic.
>
> OK.

ok, thanks for checking.

>
> Shouldn't this done as part of 4/7 where is_submodule_modified()
> starts reading from the porcelain v2 output?

I did that in an earlier version of the series. However the change from
porcelain=1 to 2 should not be observable by the end user.

>  4/7 does adjust for
> the change from double question mark (porcelain v1) to a single one
> for untracked, but at the same time it needs to prepare for these
> 'u' (unmerged), '1' (normal modification) and '2' (mods with rename)
> to appear in the output, no?

No, up to patch 5/7 we only had refactors, no user visible changes
intended. And until then we had "has untracked files" and "everything
else". The nice part of the conversion was to cover the "everything else"
part easily.

This patch transforms it into "has untracked files or submodule reports
untracked files (possibly nested)" and "everything else", but the former
is more complicated to compute.

> IOW, with 4/7 and 7/7 done as separate steps, isn't the system
> broken between these steps?

No, see Jonathans answer.

We could argue, that 6/7 and 7/7 done as separate steps is broken,
because since 6/7 we promise to report untracked files inside
submodules as " ?", but we do not report them properly for nested
submodules.

Suppose I have a superproject 'parent', with two submodules 'parent/sub'
and 'parent/sub1'. 'parent/sub' itself contains a submodule 'parent/sub/subsub'.
Now suppose I run, from within 'parent':

echo hi >sub/subsub/stray-file
echo hi >sub1/stray-file

with patches 1..5:
git status --short
 M sub
 M sub1

patch 6:
git status --short
 m sub
 ? sub1

with patch 7:
git status --short
 ? sub
 ? sub1

The documentation update in patch 6 is unclear what to expect from
nested submodules, so that documentation is technically not wrong,
but maybe misleading.

I want to resend patch 7 with a documentation update as well as tests.

Thanks,
Stefan


Re: [PATCH] strbuf: support long paths w/o read rights in strbuf_getcwd() on FreeBSD

2017-03-28 Thread Christian Couder
On Sun, Mar 26, 2017 at 3:43 PM, René Scharfe  wrote:
> FreeBSD implements getcwd(3) as a syscall, but falls back to a version
> based on readdir(3) if it fails for some reason.  The latter requires
> permissions to read and execute path components, while the former does
> not.  That means that if our buffer is too small and we're missing
> rights we could get EACCES, but we may succeed with a bigger buffer.
>
> Keep retrying if getcwd(3) indicates lack of permissions until our
> buffer can fit PATH_MAX bytes, as that's the maximum supported by the
> syscall on FreeBSD anyway.  This way we do what we can to be able to
> benefit from the syscall, but we also won't loop forever if there is a
> real permission issue.

Sorry to be late and maybe I missed something obvious, but the above
and the patch seem complex to me compared with something like:

diff --git a/strbuf.c b/strbuf.c
index ace58e7367..25eadcbedc 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -441,7 +441,7 @@ int strbuf_readlink(struct strbuf *sb, const char
*path, size_t hint)
 int strbuf_getcwd(struct strbuf *sb)
 {
size_t oldalloc = sb->alloc;
-   size_t guessed_len = 128;
+   size_t guessed_len = PATH_MAX > 128 ? PATH_MAX : 128;

for (;; guessed_len *= 2) {
strbuf_grow(sb, guessed_len);


Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output

2017-03-28 Thread Junio C Hamano
Jeff King  writes:

> That also means an option to something like "expand" is tough, because
> "first character" really means "first non-ANSI-code character".

That is true, but once you commit to the mindset that you are
extending "expand", then that becomes a mere part of what must be
done, i.e. if you want your "expand" to be able to handle coloured
input, you'd need to know how wide each segment of the input is.
For that matter, you would also want your "expand" to pay attention
to the differences between display- vs byte-widths of a string,
perhaps reusing utf8_strwidth() or something.

Also "the first character is special" does not have to be a "diff
specific hack".  Your extended "expand" can take a list of
tab-widths and the special case for highlighting diff output happens
to take 9,8 as the "list of tab-widths" parameter (whose semantics
is that each number in the list tells how wide the tab is, and the
last number repeats forever, so 9,8 really means 9,8,8,8,8,...).


Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 01:05:40PM -0700, Jacob Keller wrote:

> On Tue, Mar 28, 2017 at 12:03 PM, Junio C Hamano  wrote:
> > Jacob Keller  writes:
> >
> >> I'm really not a fan of how the ws code ended up. It seems pretty ugly
> >> and weird to hack in the expand_tabs stuff here. However, I'm really not
> >> sure how else I could handle this. Additionally, I'm not 100% sure
> >> whether this interacts with format-patch or other machinery which may
> >> well want some way to be excluded. Thoughts?
> >
> > As long as you do the same as "do we color the output?  no, no, we
> > are format-patch and must not color" logic to refrain from expanding
> > the tabs, you should be OK.
> >
> >> I think there also may be some wonky bits when performing the tab
> >> expansion during whitespace checks, due to the way we expand, because I
> >> don't think that the tabexpand function takes into account the "current"
> >> location when adding a string, so it very well may not be correct I
> >> am unsure if there is a good way to fix this.
> >
> > This "feature" is limited to the diff output, so one way may be to
> > leave the code as-is and pipe the output to a filter that is similar
> > to /usr/bin/expand but knows that the first column is special (this
> > is the part that "this is limited to diff" kicks in).  You may even
> > be able to implement it as a new option to "expand(1)" and then
> > people who aren't Git users would also benefit.
> >
> 
> That makes more sense. I'll take a look at that. It might even be
> possible to modify the pager setup so that it does that as part of its
> process.

This is similar to how I use diff-highlight, which is basically:

  [pager]
  log = diff-highlight | less
  show = diff-highlight | less
  diff = diff-highlight | less

I've wanted to move diff-highlight inside git, but ran into ugly
conflicts with the whitespace-marking code as well. Something like:

  git show b16a991c1be5681b4b673d4343dfcc0c2f5ad498 |
  perl -pe 's/^(.)(\t+)/$1 . (" " x (length($2) * 8))/e'

But sticking it in your pager pipeline is tough, because you actually
need to skip over the color bytes when they are present. You can see in
diff-highlight how this is handled.

That also means an option to something like "expand" is tough, because
"first character" really means "first non-ANSI-code character".

-Peff


Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 03:50:34PM -0400, Jeff Hostetler wrote:

> It was a convenient way to isolate, average, and compare
> read_index() times, but I suppose we could do something
> like that.
> 
> I did confirm that a ls-files does show a slight 0.008
> second difference on the 58K file Linux tree when toggled
> on or off.

Yeah, I agree it helps isolate the change. I'm just not sure we want to
carry a bunch of function-specific perf-testing code. And one of the
nice things about testing a real command is that it's...a real command.
So it's an actual improvement a user might see.

> But I'm tempted to suggest that we just omit my helper exe
> and not worry about a test -- since we don't have any test
> repos large enough to really demonstrate the differences.
> My concern is that that 0.008 would be lost in the noise
> of the rest of the test and make for an unreliable result.

Yeah, I think that would be fine. You _could_ write a t/perf test and
then use your 400MB monstrosity as GIT_PERF_LARGE_REPO. But given that
most people don't have such a thing, there's not much value over you
just showing off the perf improvement in the commit message.

We could also have a t/perf test that generates a monstrous index and
shows that it's faster. But frankly, I don't think this is all that
interesting as a performance regression test. It's not like there's
something subtle about the performance improvement; we stopped computing
the SHA-1, and (gasp!) it takes exactly one SHA-1 computation's less
time.

So just mentioning the test case and the improvement in the commit
message is sufficient, IMHO.

-Peff


Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output

2017-03-28 Thread Jacob Keller
On Tue, Mar 28, 2017 at 12:03 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> I'm really not a fan of how the ws code ended up. It seems pretty ugly
>> and weird to hack in the expand_tabs stuff here. However, I'm really not
>> sure how else I could handle this. Additionally, I'm not 100% sure
>> whether this interacts with format-patch or other machinery which may
>> well want some way to be excluded. Thoughts?
>
> As long as you do the same as "do we color the output?  no, no, we
> are format-patch and must not color" logic to refrain from expanding
> the tabs, you should be OK.
>
>> I think there also may be some wonky bits when performing the tab
>> expansion during whitespace checks, due to the way we expand, because I
>> don't think that the tabexpand function takes into account the "current"
>> location when adding a string, so it very well may not be correct I
>> am unsure if there is a good way to fix this.
>
> This "feature" is limited to the diff output, so one way may be to
> leave the code as-is and pipe the output to a filter that is similar
> to /usr/bin/expand but knows that the first column is special (this
> is the part that "this is limited to diff" kicks in).  You may even
> be able to implement it as a new option to "expand(1)" and then
> people who aren't Git users would also benefit.
>

That makes more sense. I'll take a look at that. It might even be
possible to modify the pager setup so that it does that as part of its
process.

Thanks,
Jake


Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 12:40:29PM -0700, Junio C Hamano wrote:

> > Here's that minor tweak, in case anybody is interested. It's less useful
> > without that follow-on that touches "eol" more, but perhaps it increases
> > readability on its own.
> 
> Yup, the only thing that the original (with Brian's fix) appears to
> be more careful about is it tries very hard to avoid setting boc
> past eoc.  As we are not checking "boc != eoc" but doing the
> comparison, that "careful" appearance does not give us any benefit
> in practice, other than having to do an extra "eol ? eol+1 : eoc";
> the result of this patch is easier to read.
> 
> By the way, eoc is "one past the end" of the array that begins at
> boc, so setting a pointer to eoc+1 may technically be in violation.
> I do not know how much it matters, though ;-)

I think that is OK. We are reading a strbuf, so eoc must either be the
first character of the PGP signature, or the terminating NUL if there
was no signature block[1]. So it's actually _inside_ the array, and
eoc+1 is our "one past".

-Peff

[1] Arguably we should bail when parse_signature() does not find a PGP
signature at all. We already bail with "malformed push certificate"
when there are other syntactic anomalies.


Re: [PATCH 04/18] diff: avoid fixed-size buffer for patch-ids

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 03:46:18PM -0400, Jeff King wrote:

> As a result, the data we feed to the sha1 computation may be
> truncated, and it's possible that a commit with a very long
> filename could erroneously collide in the patch-id space
> with another commit. For instance, if one commit modified
> "really-long-filename/foo" and another modified "bar" in the
> same directory.
> 
> In practice this is unlikely. Because the filenames are
> repeated, and because there's a single cutoff at the end of
> the buffer, the offending filename would have to be on the
> order of four times larger than PATH_MAX.
> 
> But it's easy to fix by moving to a strbuf.

The other obvious solution is to avoid formatting the string entirely,
and just feed the pieces to the sha1 computation. Unfortunately that
still involves formatting the modes (into a fixed-size buffer!) since we
need the sha1 of their string representations.

That patch is below for reference. I'm not sure if it's more readable or
not.

---
diff --git a/diff.c b/diff.c
index a628ac3a9..435c734f4 100644
--- a/diff.c
+++ b/diff.c
@@ -4570,6 +4570,19 @@ static void patch_id_consume(void *priv, char *line, 
unsigned long len)
data->patchlen += new_len;
 }
 
+static void patch_id_add_string(git_SHA_CTX *ctx, const char *str)
+{
+   git_SHA1_Update(ctx, str, strlen(str));
+}
+
+static void patch_id_add_mode(git_SHA_CTX *ctx, unsigned mode)
+{
+   /* large enough for 2^32 in octal */
+   char buf[12];
+   int len = xsnprintf(buf, sizeof(buf), "%06o", mode);
+   git_SHA1_Update(ctx, buf, len);
+}
+
 /* returns 0 upon success, and writes result into sha1 */
 static int diff_get_patch_id(struct diff_options *options, unsigned char 
*sha1, int diff_header_only)
 {
@@ -4577,7 +4590,6 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
int i;
git_SHA_CTX ctx;
struct patch_id_t data;
-   char buffer[PATH_MAX * 4 + 20];
 
git_SHA1_Init();
memset(, 0, sizeof(struct patch_id_t));
@@ -4609,36 +4621,30 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
 
len1 = remove_space(p->one->path, strlen(p->one->path));
len2 = remove_space(p->two->path, strlen(p->two->path));
-   if (p->one->mode == 0)
-   len1 = snprintf(buffer, sizeof(buffer),
-   "diff--gita/%.*sb/%.*s"
-   "newfilemode%06o"
-   "---/dev/null"
-   "+++b/%.*s",
-   len1, p->one->path,
-   len2, p->two->path,
-   p->two->mode,
-   len2, p->two->path);
-   else if (p->two->mode == 0)
-   len1 = snprintf(buffer, sizeof(buffer),
-   "diff--gita/%.*sb/%.*s"
-   "deletedfilemode%06o"
-   "---a/%.*s"
-   "+++/dev/null",
-   len1, p->one->path,
-   len2, p->two->path,
-   p->one->mode,
-   len1, p->one->path);
-   else
-   len1 = snprintf(buffer, sizeof(buffer),
-   "diff--gita/%.*sb/%.*s"
-   "---a/%.*s"
-   "+++b/%.*s",
-   len1, p->one->path,
-   len2, p->two->path,
-   len1, p->one->path,
-   len2, p->two->path);
-   git_SHA1_Update(, buffer, len1);
+   patch_id_add_string(, "diff--git");
+   patch_id_add_string(, "a/");
+   git_SHA1_Update(, p->one->path, len1);
+   patch_id_add_string(, "b/");
+   git_SHA1_Update(, p->two->path, len2);
+
+   if (p->one->mode == 0) {
+   patch_id_add_string(, "newfilemode");
+   patch_id_add_mode(, p->two->mode);
+   patch_id_add_string(, "---/dev/null");
+   patch_id_add_string(, "+++b/");
+   git_SHA1_Update(, p->two->path, len2);
+   } else if (p->two->mode == 0) {
+   patch_id_add_string(, "deletedfilemode");
+   patch_id_add_mode(, p->one->mode);
+   patch_id_add_string(, "---a/");
+   git_SHA1_Update(, p->one->path, len1);
+   patch_id_add_string(, "+++/dev/null");
+   } else {
+   

Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread

2017-03-28 Thread Jeff Hostetler



On 3/28/2017 3:16 PM, Jeff King wrote:

On Tue, Mar 28, 2017 at 07:07:30PM +, g...@jeffhostetler.com wrote:


From: Jeff Hostetler 

Version 3 of this patch series simplifies this effort to just turn
on/off the hash verification using a "core.checksumindex" config variable.

I've preserved the original checksum validation code so that we can
force it on in fsck if desired.

It eliminates the original threading model completely.

Jeff Hostetler (2):
  read-cache: core.checksumindex
  test-core-checksum-index: core.checksumindex test helper

 Makefile|  1 +
 read-cache.c| 12 ++
 t/helper/.gitignore |  1 +
 t/helper/test-core-checksum-index.c | 77 +


Do we still need test-core-checksum-index? Can we just time ls-files or
something in t/perf?


It was a convenient way to isolate, average, and compare
read_index() times, but I suppose we could do something
like that.

I did confirm that a ls-files does show a slight 0.008
second difference on the 58K file Linux tree when toggled
on or off.

But I'm tempted to suggest that we just omit my helper exe
and not worry about a test -- since we don't have any test
repos large enough to really demonstrate the differences.
My concern is that that 0.008 would be lost in the noise
of the rest of the test and make for an unreliable result.

Jeff


[PATCH 18/18] daemon: use an argv_array to exec children

2017-03-28 Thread Jeff King
Our struct child_process already has its own argv_array.
Let's use that to avoid having to format options into
separate buffers.

Note that we'll need to declare the child process outside of
the run_service_command() helper to do this. But that opens
up a further simplification, which is that the helper can
append to our argument list, saving each caller from
specifying "." manually.

Signed-off-by: Jeff King 
---
Of all the patches in this series, this is the one where I was most
undecided on whether the result was more readable or not.

My ulterior motive, of course, was to get rid of the unchecked
snprintf() into timebuf. But it could also just become an xsnprintf()
with no further changes.

 daemon.c | 38 +-
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/daemon.c b/daemon.c
index 473e6b6b6..f70d27b82 100644
--- a/daemon.c
+++ b/daemon.c
@@ -449,46 +449,42 @@ static void copy_to_log(int fd)
fclose(fp);
 }
 
-static int run_service_command(const char **argv)
+static int run_service_command(struct child_process *cld)
 {
-   struct child_process cld = CHILD_PROCESS_INIT;
-
-   cld.argv = argv;
-   cld.git_cmd = 1;
-   cld.err = -1;
-   if (start_command())
+   argv_array_push(>args, ".");
+   cld->git_cmd = 1;
+   cld->err = -1;
+   if (start_command(cld))
return -1;
 
close(0);
close(1);
 
-   copy_to_log(cld.err);
+   copy_to_log(cld->err);
 
-   return finish_command();
+   return finish_command(cld);
 }
 
 static int upload_pack(void)
 {
-   /* Timeout as string */
-   char timeout_buf[64];
-   const char *argv[] = { "upload-pack", "--strict", NULL, ".", NULL };
-
-   argv[2] = timeout_buf;
-
-   snprintf(timeout_buf, sizeof timeout_buf, "--timeout=%u", timeout);
-   return run_service_command(argv);
+   struct child_process cld = CHILD_PROCESS_INIT;
+   argv_array_pushl(, "upload-pack", "--strict", NULL);
+   argv_array_pushf(, "--timeout=%u", timeout);
+   return run_service_command();
 }
 
 static int upload_archive(void)
 {
-   static const char *argv[] = { "upload-archive", ".", NULL };
-   return run_service_command(argv);
+   struct child_process cld = CHILD_PROCESS_INIT;
+   argv_array_push(, "upload-archive");
+   return run_service_command();
 }
 
 static int receive_pack(void)
 {
-   static const char *argv[] = { "receive-pack", ".", NULL };
-   return run_service_command(argv);
+   struct child_process cld = CHILD_PROCESS_INIT;
+   argv_array_push(, "receive-pack");
+   return run_service_command();
 }
 
 static struct daemon_service daemon_service[] = {
-- 
2.12.2.845.g55fcf8b10


[PATCH 16/18] transport-helper: replace checked snprintf with xsnprintf

2017-03-28 Thread Jeff King
We can use xsnprintf to do our truncation check with less
code. The error message isn't as specific, but the point is
that this isn't supposed to trigger in the first place
(because our buffer is big enough to handle any int).

Signed-off-by: Jeff King 
---
 transport-helper.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index dc90a1fb7..36408046e 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -347,14 +347,11 @@ static int set_helper_option(struct transport *transport,
 static void standard_options(struct transport *t)
 {
char buf[16];
-   int n;
int v = t->verbose;
 
set_helper_option(t, "progress", t->progress ? "true" : "false");
 
-   n = snprintf(buf, sizeof(buf), "%d", v + 1);
-   if (n >= sizeof(buf))
-   die("impossibly large verbosity value");
+   xsnprintf(buf, sizeof(buf), "%d", v + 1);
set_helper_option(t, "verbosity", buf);
 
switch (t->family) {
-- 
2.12.2.845.g55fcf8b10



[PATCH 17/18] gc: replace local buffer with git_path

2017-03-28 Thread Jeff King
We probe the "17/" loose object directory for auto-gc, and
use a local buffer to format the path. We can just use
git_path() for this. It handles paths of any length
(reducing our error handling). And because we feed the
result straight to a system call, we can just use the static
variant.

Note that git_path also knows the string "objects/" is
special, and will replace it with git_object_directory()
when necessary.

Another alternative would be to use sha1_file_name() for the
pretend object "17...", but that ends up being more
hassle for no gain, as we have to truncate the final path
component.

Signed-off-by: Jeff King 
---
 builtin/gc.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57b..2daede782 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -135,8 +135,6 @@ static int too_many_loose_objects(void)
 * distributed, we can check only one and get a reasonable
 * estimate.
 */
-   char path[PATH_MAX];
-   const char *objdir = get_object_directory();
DIR *dir;
struct dirent *ent;
int auto_threshold;
@@ -146,11 +144,7 @@ static int too_many_loose_objects(void)
if (gc_auto_threshold <= 0)
return 0;
 
-   if (sizeof(path) <= snprintf(path, sizeof(path), "%s/17", objdir)) {
-   warning(_("insanely long object directory %.*s"), 50, objdir);
-   return 0;
-   }
-   dir = opendir(path);
+   dir = opendir(git_path("objects/17"));
if (!dir)
return 0;
 
-- 
2.12.2.845.g55fcf8b10



[PATCH 14/18] combine-diff: replace malloc/snprintf with xstrfmt

2017-03-28 Thread Jeff King
There's no need to use the magic "100" when a strbuf can do
it for us.

Signed-off-by: Jeff King 
---
 combine-diff.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/combine-diff.c b/combine-diff.c
index 59501db99..d3560573a 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -292,9 +292,10 @@ static char *grab_blob(const struct object_id *oid, 
unsigned int mode,
enum object_type type;
 
if (S_ISGITLINK(mode)) {
-   blob = xmalloc(100);
-   *size = snprintf(blob, 100,
-"Subproject commit %s\n", oid_to_hex(oid));
+   struct strbuf buf = STRBUF_INIT;
+   strbuf_addf(, "Subproject commit %s\n", oid_to_hex(oid));
+   *size = buf.len;
+   blob = strbuf_detach(, NULL);
} else if (is_null_oid(oid)) {
/* deleted blob */
*size = 0;
-- 
2.12.2.845.g55fcf8b10



[PATCH 05/18] tag: use strbuf to format tag header

2017-03-28 Thread Jeff King
We format the tag header into a fixed 1024-byte buffer. But
since the tag-name and tagger ident can be arbitrarily
large, we may unceremoniously die with "tag header too big".
Let's just use a strbuf instead.

Note that it looks at first glance like we can just format
this directly into the "buf" strbuf where it will ultimately
go. But that buffer may already contain the tag message, and
we have no easy way to prepend formatted data to a strbuf
(we can only splice in an already-generated buffer). This
isn't a performance-critical path, so going through an extra
buffer isn't a big deal.

Signed-off-by: Jeff King 
---
 builtin/tag.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index ad29be692..045e7ad23 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -231,26 +231,22 @@ static void create_tag(const unsigned char *object, const 
char *tag,
   unsigned char *prev, unsigned char *result)
 {
enum object_type type;
-   char header_buf[1024];
-   int header_len;
+   struct strbuf header = STRBUF_INIT;
char *path = NULL;
 
type = sha1_object_info(object, NULL);
if (type <= OBJ_NONE)
die(_("bad object type."));
 
-   header_len = snprintf(header_buf, sizeof(header_buf),
- "object %s\n"
- "type %s\n"
- "tag %s\n"
- "tagger %s\n\n",
- sha1_to_hex(object),
- typename(type),
- tag,
- git_committer_info(IDENT_STRICT));
-
-   if (header_len > sizeof(header_buf) - 1)
-   die(_("tag header too big."));
+   strbuf_addf(,
+   "object %s\n"
+   "type %s\n"
+   "tag %s\n"
+   "tagger %s\n\n",
+   sha1_to_hex(object),
+   typename(type),
+   tag,
+   git_committer_info(IDENT_STRICT));
 
if (!opt->message_given) {
int fd;
@@ -288,7 +284,8 @@ static void create_tag(const unsigned char *object, const 
char *tag,
if (!opt->message_given && !buf->len)
die(_("no tag message?"));
 
-   strbuf_insert(buf, 0, header_buf, header_len);
+   strbuf_insert(buf, 0, header.buf, header.len);
+   strbuf_release();
 
if (build_tag_object(buf, opt->sign, result) < 0) {
if (path)
-- 
2.12.2.845.g55fcf8b10



[PATCH 10/18] create_branch: use xstrfmt for reflog message

2017-03-28 Thread Jeff King
We generate a reflog message that contains some fixed text
plus a branch name, and use a buffer of size PATH_MAX + 20.
This mostly works if you assume that refnames are shorter
than PATH_MAX, but:

  1. That's not necessarily true. PATH_MAX is not always the
 filesystem's limit.

  2. The "20" is not sufficiently large for the fixed text
 anyway.

Let's just switch to a heap buffer so we don't have to even
care.

Signed-off-by: Jeff King 
---
 branch.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/branch.c b/branch.c
index 6d0ca94cc..ad5a2299b 100644
--- a/branch.c
+++ b/branch.c
@@ -296,14 +296,12 @@ void create_branch(const char *name, const char 
*start_name,
if (!dont_change_ref) {
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
-   char msg[PATH_MAX + 20];
+   char *msg;
 
if (forcing)
-   snprintf(msg, sizeof msg, "branch: Reset to %s",
-start_name);
+   msg = xstrfmt("branch: Reset to %s", start_name);
else
-   snprintf(msg, sizeof msg, "branch: Created from %s",
-start_name);
+   msg = xstrfmt("branch: Created from %s", start_name);
 
transaction = ref_transaction_begin();
if (!transaction ||
@@ -314,6 +312,7 @@ void create_branch(const char *name, const char *start_name,
die("%s", err.buf);
ref_transaction_free(transaction);
strbuf_release();
+   free(msg);
}
 
if (real_ref && track)
-- 
2.12.2.845.g55fcf8b10



[PATCH 08/18] avoid using mksnpath for refs

2017-03-28 Thread Jeff King
Like the previous commit, we'd like to avoid the assumption
that refs fit into PATH_MAX-sized buffers. These callsites
have an extra twist, though: they write the refnames using
mksnpath. This does two things beyond a regular snprintf:

  1. It quietly writes "/bad-path/" when truncation occurs.
 This saves the caller having to check the error code,
 but if you aren't actually feeding the result to a
 system call (and we aren't here), it's questionable.

  2. It calls cleanup_path(), which removes leading
 instances of "./".  That's questionable when dealing
 with refnames, as we could silently canonicalize a
 syntactically bogus refname into a valid one.

Let's convert each case to use a strbuf. This is preferable
to xstrfmt() because we can reuse the same buffer as we
loop.

Signed-off-by: Jeff King 
---
 refs.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/refs.c b/refs.c
index e7606716d..9de6979ac 100644
--- a/refs.c
+++ b/refs.c
@@ -429,29 +429,31 @@ int expand_ref(const char *str, int len, unsigned char 
*sha1, char **ref)
 {
const char **p, *r;
int refs_found = 0;
+   struct strbuf fullref = STRBUF_INIT;
 
*ref = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
-   char fullref[PATH_MAX];
unsigned char sha1_from_ref[20];
unsigned char *this_result;
int flag;
 
this_result = refs_found ? sha1_from_ref : sha1;
-   mksnpath(fullref, sizeof(fullref), *p, len, str);
-   r = resolve_ref_unsafe(fullref, RESOLVE_REF_READING,
+   strbuf_reset();
+   strbuf_addf(, *p, len, str);
+   r = resolve_ref_unsafe(fullref.buf, RESOLVE_REF_READING,
   this_result, );
if (r) {
if (!refs_found++)
*ref = xstrdup(r);
if (!warn_ambiguous_refs)
break;
-   } else if ((flag & REF_ISSYMREF) && strcmp(fullref, "HEAD")) {
-   warning("ignoring dangling symref %s.", fullref);
-   } else if ((flag & REF_ISBROKEN) && strchr(fullref, '/')) {
-   warning("ignoring broken ref %s.", fullref);
+   } else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, 
"HEAD")) {
+   warning("ignoring dangling symref %s.", fullref.buf);
+   } else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) {
+   warning("ignoring broken ref %s.", fullref.buf);
}
}
+   strbuf_release();
return refs_found;
 }
 
@@ -460,21 +462,22 @@ int dwim_log(const char *str, int len, unsigned char 
*sha1, char **log)
char *last_branch = substitute_branch_name(, );
const char **p;
int logs_found = 0;
+   struct strbuf path = STRBUF_INIT;
 
*log = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
unsigned char hash[20];
-   char path[PATH_MAX];
const char *ref, *it;
 
-   mksnpath(path, sizeof(path), *p, len, str);
-   ref = resolve_ref_unsafe(path, RESOLVE_REF_READING,
+   strbuf_reset();
+   strbuf_addf(, *p, len, str);
+   ref = resolve_ref_unsafe(path.buf, RESOLVE_REF_READING,
 hash, NULL);
if (!ref)
continue;
-   if (reflog_exists(path))
-   it = path;
-   else if (strcmp(ref, path) && reflog_exists(ref))
+   if (reflog_exists(path.buf))
+   it = path.buf;
+   else if (strcmp(ref, path.buf) && reflog_exists(ref))
it = ref;
else
continue;
@@ -485,6 +488,7 @@ int dwim_log(const char *str, int len, unsigned char *sha1, 
char **log)
if (!warn_ambiguous_refs)
break;
}
+   strbuf_release();
free(last_branch);
return logs_found;
 }
@@ -944,6 +948,7 @@ char *shorten_unambiguous_ref(const char *refname, int 
strict)
static char **scanf_fmts;
static int nr_rules;
char *short_name;
+   struct strbuf resolved_buf = STRBUF_INIT;
 
if (!nr_rules) {
/*
@@ -1002,7 +1007,6 @@ char *shorten_unambiguous_ref(const char *refname, int 
strict)
 */
for (j = 0; j < rules_to_fail; j++) {
const char *rule = ref_rev_parse_rules[j];
-   char refname[PATH_MAX];
 
/* skip matched rule */
if (i == j)
@@ -1013,9 +1017,10 @@ char *shorten_unambiguous_ref(const char *refname, int 
strict)
   

[PATCH 11/18] name-rev: replace static buffer with strbuf

2017-03-28 Thread Jeff King
When name-rev needs to format an actual name, we do so into
a fixed-size buffer. That includes the actual ref tip, as
well as any traversal information. Since refs can exceed
1024 bytes, this means you can get a bogus result. E.g.,
doing:

   git tag $(perl -e 'print join("/", 1..1024)')
   git describe --contains HEAD^

results in ".../282/283", when it should be
".../1023/1024~1".

We can solve this by using a heap buffer. We'll use a
strbuf, which lets us write into the same buffer from our
loop without having to reallocate.

Signed-off-by: Jeff King 
---
 builtin/name-rev.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 8bdc3eaa6..92a5d8a5d 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -238,10 +238,9 @@ static const char *get_exact_ref_match(const struct object 
*o)
return NULL;
 }
 
-/* returns a static buffer */
-static const char *get_rev_name(const struct object *o)
+/* may return a constant string or use "buf" as scratch space */
+static const char *get_rev_name(const struct object *o, struct strbuf *buf)
 {
-   static char buffer[1024];
struct rev_name *n;
struct commit *c;
 
@@ -258,10 +257,9 @@ static const char *get_rev_name(const struct object *o)
int len = strlen(n->tip_name);
if (len > 2 && !strcmp(n->tip_name + len - 2, "^0"))
len -= 2;
-   snprintf(buffer, sizeof(buffer), "%.*s~%d", len, n->tip_name,
-   n->generation);
-
-   return buffer;
+   strbuf_reset(buf);
+   strbuf_addf(buf, "%.*s~%d", len, n->tip_name, n->generation);
+   return buf->buf;
}
 }
 
@@ -271,10 +269,11 @@ static void show_name(const struct object *obj,
 {
const char *name;
const struct object_id *oid = >oid;
+   struct strbuf buf = STRBUF_INIT;
 
if (!name_only)
printf("%s ", caller_name ? caller_name : oid_to_hex(oid));
-   name = get_rev_name(obj);
+   name = get_rev_name(obj, );
if (name)
printf("%s\n", name);
else if (allow_undefined)
@@ -283,6 +282,7 @@ static void show_name(const struct object *obj,
printf("%s\n", find_unique_abbrev(oid->hash, DEFAULT_ABBREV));
else
die("cannot describe '%s'", oid_to_hex(oid));
+   strbuf_release();
 }
 
 static char const * const name_rev_usage[] = {
@@ -294,6 +294,7 @@ static char const * const name_rev_usage[] = {
 
 static void name_rev_line(char *p, struct name_ref_data *data)
 {
+   struct strbuf buf = STRBUF_INIT;
int forty = 0;
char *p_start;
for (p_start = p; *p; p++) {
@@ -314,7 +315,7 @@ static void name_rev_line(char *p, struct name_ref_data 
*data)
struct object *o =
lookup_object(sha1);
if (o)
-   name = get_rev_name(o);
+   name = get_rev_name(o, );
}
*(p+1) = c;
 
@@ -332,6 +333,8 @@ static void name_rev_line(char *p, struct name_ref_data 
*data)
/* flush */
if (p_start != p)
fwrite(p_start, p - p_start, 1, stdout);
+
+   strbuf_release();
 }
 
 int cmd_name_rev(int argc, const char **argv, const char *prefix)
-- 
2.12.2.845.g55fcf8b10



[PATCH 13/18] replace unchecked snprintf calls with heap buffers

2017-03-28 Thread Jeff King
We'd prefer to avoid unchecked snprintf calls because
truncation can lead to unexpected results.

These are all cases where truncation shouldn't ever happen,
because the input to snprintf is fixed in size. That makes
them candidates for xsnprintf(), but it's simpler still to
just use the heap, and then nobody has to wonder if "100" is
big enough.

We'll use xstrfmt() where possible, and a strbuf when we need
the resulting size or to reuse the same buffer in a loop.

Signed-off-by: Jeff King 
---
 bisect.c | 8 +---
 builtin/index-pack.c | 9 +
 builtin/notes.c  | 9 -
 builtin/rev-parse.c  | 5 +++--
 4 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/bisect.c b/bisect.c
index 30808cadf..d12583eaa 100644
--- a/bisect.c
+++ b/bisect.c
@@ -200,6 +200,7 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
 {
struct commit_list *p;
struct commit_dist *array = xcalloc(nr, sizeof(*array));
+   struct strbuf buf = STRBUF_INIT;
int cnt, i;
 
for (p = list, cnt = 0; p; p = p->next) {
@@ -217,17 +218,18 @@ static struct commit_list *best_bisection_sorted(struct 
commit_list *list, int n
}
QSORT(array, cnt, compare_commit_dist);
for (p = list, i = 0; i < cnt; i++) {
-   char buf[100]; /* enough for dist=%d */
struct object *obj = &(array[i].commit->object);
 
-   snprintf(buf, sizeof(buf), "dist=%d", array[i].distance);
-   add_name_decoration(DECORATION_NONE, buf, obj);
+   strbuf_reset();
+   strbuf_addf(, "dist=%d", array[i].distance);
+   add_name_decoration(DECORATION_NONE, buf.buf, obj);
 
p->item = array[i].commit;
p = p->next;
}
if (p)
p->next = NULL;
+   strbuf_release();
free(array);
return list;
 }
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f4af2ab37..197c51912 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1443,10 +1443,11 @@ static void final(const char *final_pack_name, const 
char *curr_pack_name,
if (!from_stdin) {
printf("%s\n", sha1_to_hex(sha1));
} else {
-   char buf[48];
-   int len = snprintf(buf, sizeof(buf), "%s\t%s\n",
-  report, sha1_to_hex(sha1));
-   write_or_die(1, buf, len);
+   struct strbuf buf = STRBUF_INIT;
+
+   strbuf_addf(, "%s\t%s\n", report, sha1_to_hex(sha1));
+   write_or_die(1, buf.buf, buf.len);
+   strbuf_release();
 
/*
 * Let's just mimic git-unpack-objects here and write
diff --git a/builtin/notes.c b/builtin/notes.c
index 0513f7455..7b891471c 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -554,7 +554,7 @@ static int append_edit(int argc, const char **argv, const 
char *prefix)
struct notes_tree *t;
unsigned char object[20], new_note[20];
const unsigned char *note;
-   char logmsg[100];
+   char *logmsg;
const char * const *usage;
struct note_data d = { 0, 0, NULL, STRBUF_INIT };
struct option options[] = {
@@ -618,17 +618,16 @@ static int append_edit(int argc, const char **argv, const 
char *prefix)
write_note_data(, new_note);
if (add_note(t, object, new_note, combine_notes_overwrite))
die("BUG: combine_notes_overwrite failed");
-   snprintf(logmsg, sizeof(logmsg), "Notes added by 'git notes 
%s'",
-   argv[0]);
+   logmsg = xstrfmt("Notes added by 'git notes %s'", argv[0]);
} else {
fprintf(stderr, _("Removing note for object %s\n"),
sha1_to_hex(object));
remove_note(t, object);
-   snprintf(logmsg, sizeof(logmsg), "Notes removed by 'git notes 
%s'",
-   argv[0]);
+   logmsg = xstrfmt("Notes removed by 'git notes %s'", argv[0]);
}
commit_notes(t, logmsg);
 
+   free(logmsg);
free_note_data();
free_notes(t);
return 0;
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 9e53a1a7c..f54d7b502 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -213,13 +213,14 @@ static int show_abbrev(const unsigned char *sha1, void 
*cb_data)
 
 static void show_datestring(const char *flag, const char *datestr)
 {
-   static char buffer[100];
+   char *buffer;
 
/* date handling requires both flags and revs */
if ((filter & (DO_FLAGS | DO_REVS)) != (DO_FLAGS | DO_REVS))
return;
-   snprintf(buffer, sizeof(buffer), "%s%lu", flag, approxidate(datestr));
+   buffer = xstrfmt("%s%lu", flag, approxidate(datestr));
show(buffer);
+   free(buffer);
 }
 
 static int 

[PATCH 06/18] fetch: use heap buffer to format reflog

2017-03-28 Thread Jeff King
Part of the reflog content comes from the environment, which
can be much larger than our fixed buffer. Let's use a heap
buffer so we avoid truncating it.

Signed-off-by: Jeff King 
---
 builtin/fetch.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b5ad09d04..4ef7a08af 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -421,7 +421,7 @@ static int s_update_ref(const char *action,
struct ref *ref,
int check_old)
 {
-   char msg[1024];
+   char *msg;
char *rla = getenv("GIT_REFLOG_ACTION");
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
@@ -431,7 +431,7 @@ static int s_update_ref(const char *action,
return 0;
if (!rla)
rla = default_rla.buf;
-   snprintf(msg, sizeof(msg), "%s: %s", rla, action);
+   msg = xstrfmt("%s: %s", rla, action);
 
transaction = ref_transaction_begin();
if (!transaction ||
@@ -449,11 +449,13 @@ static int s_update_ref(const char *action,
 
ref_transaction_free(transaction);
strbuf_release();
+   free(msg);
return 0;
 fail:
ref_transaction_free(transaction);
error("%s", err.buf);
strbuf_release();
+   free(msg);
return df_conflict ? STORE_REF_ERROR_DF_CONFLICT
   : STORE_REF_ERROR_OTHER;
 }
-- 
2.12.2.845.g55fcf8b10



[PATCH 07/18] avoid using fixed PATH_MAX buffers for refs

2017-03-28 Thread Jeff King
Many functions which handle refs use a PATH_MAX-sized buffer
to do so. This is mostly reasonable as we have to write
loose refs into the filesystem, and at least on Linux the 4K
PATH_MAX is big enough that nobody would care. But:

  1. The static PATH_MAX is not always the filesystem limit.

  2. On other platforms, PATH_MAX may be much smaller.

  3. As we move to alternate ref storage, we won't be bound
 by filesystem limits.

Let's convert these to heap buffers so we don't have to
worry about truncation or size limits.

We may want to eventually constrain ref lengths for sanity
and to prevent malicious names, but we should do so
consistently across all platforms, and in a central place
(like the ref code).

Signed-off-by: Jeff King 
---
 builtin/checkout.c  |  5 ++---
 builtin/ls-remote.c | 10 ++
 builtin/replace.c   | 50 +++---
 builtin/tag.c   | 15 ++-
 4 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 81f07c3ef..93e8dfc82 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -889,11 +889,10 @@ static int check_tracking_name(struct remote *remote, 
void *cb_data)
 static const char *unique_tracking_name(const char *name, struct object_id 
*oid)
 {
struct tracking_name_data cb_data = { NULL, NULL, NULL, 1 };
-   char src_ref[PATH_MAX];
-   snprintf(src_ref, PATH_MAX, "refs/heads/%s", name);
-   cb_data.src_ref = src_ref;
+   cb_data.src_ref = xstrfmt("refs/heads/%s", name);
cb_data.dst_oid = oid;
for_each_remote(check_tracking_name, _data);
+   free(cb_data.src_ref);
if (cb_data.unique)
return cb_data.dst_ref;
free(cb_data.dst_ref);
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index 66cdd45cc..b2d7d5ce6 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -17,17 +17,19 @@ static const char * const ls_remote_usage[] = {
 static int tail_match(const char **pattern, const char *path)
 {
const char *p;
-   char pathbuf[PATH_MAX];
+   char *pathbuf;
 
if (!pattern)
return 1; /* no restriction */
 
-   if (snprintf(pathbuf, sizeof(pathbuf), "/%s", path) > sizeof(pathbuf))
-   return error("insanely long ref %.*s...", 20, path);
+   pathbuf = xstrfmt("/%s", path);
while ((p = *(pattern++)) != NULL) {
-   if (!wildmatch(p, pathbuf, 0, NULL))
+   if (!wildmatch(p, pathbuf, 0, NULL)) {
+   free(pathbuf);
return 1;
+   }
}
+   free(pathbuf);
return 0;
 }
 
diff --git a/builtin/replace.c b/builtin/replace.c
index f83e7b8fc..065515bab 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -93,26 +93,31 @@ typedef int (*each_replace_name_fn)(const char *name, const 
char *ref,
 static int for_each_replace_name(const char **argv, each_replace_name_fn fn)
 {
const char **p, *full_hex;
-   char ref[PATH_MAX];
+   struct strbuf ref = STRBUF_INIT;
+   size_t base_len;
int had_error = 0;
struct object_id oid;
 
+   strbuf_addstr(, git_replace_ref_base);
+   base_len = ref.len;
+
for (p = argv; *p; p++) {
if (get_oid(*p, )) {
error("Failed to resolve '%s' as a valid ref.", *p);
had_error = 1;
continue;
}
-   full_hex = oid_to_hex();
-   snprintf(ref, sizeof(ref), "%s%s", git_replace_ref_base, 
full_hex);
-   /* read_ref() may reuse the buffer */
-   full_hex = ref + strlen(git_replace_ref_base);
-   if (read_ref(ref, oid.hash)) {
+
+   strbuf_setlen(, base_len);
+   strbuf_addstr(, oid_to_hex());
+   full_hex = ref.buf + base_len;
+
+   if (read_ref(ref.buf, oid.hash)) {
error("replace ref '%s' not found.", full_hex);
had_error = 1;
continue;
}
-   if (fn(full_hex, ref, ))
+   if (fn(full_hex, ref.buf, ))
had_error = 1;
}
return had_error;
@@ -129,21 +134,18 @@ static int delete_replace_ref(const char *name, const 
char *ref,
 
 static void check_ref_valid(struct object_id *object,
struct object_id *prev,
-   char *ref,
-   int ref_size,
+   struct strbuf *ref,
int force)
 {
-   if (snprintf(ref, ref_size,
-"%s%s", git_replace_ref_base,
-oid_to_hex(object)) > ref_size - 1)
-   die("replace ref name too long: %.*s...", 50, ref);
-   if (check_refname_format(ref, 0))
-   die("'%s' is not a valid ref name.", 

[PATCH 15/18] convert unchecked snprintf into xsnprintf

2017-03-28 Thread Jeff King
These calls to snprintf should always succeed, because their
input is small and fixed. Let's use xsnprintf to make sure
this is the case (and to make auditing for actual truncation
easier).

These could be candidates for turning into heap buffers, but
they fall into a few broad categories that make it not worth
doing:

  - formatting single numbers is simple enough that we can
see the result should fit

  - the size of a sha1 is likewise well-known, and I didn't
want to cause unnecessary conflicts with the ongoing
process to convert these constants to GIT_MAX_HEXSZ

  - the interface for curl_errorstr is dictated by curl

Signed-off-by: Jeff King 
---
 grep.c  |  4 ++--
 http.c  | 10 +-
 imap-send.c |  2 +-
 sha1_file.c |  4 ++--
 submodule.c |  2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/grep.c b/grep.c
index 0dbdc1d00..39b4b60d2 100644
--- a/grep.c
+++ b/grep.c
@@ -1164,7 +1164,7 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
}
if (opt->linenum) {
char buf[32];
-   snprintf(buf, sizeof(buf), "%d", lno);
+   xsnprintf(buf, sizeof(buf), "%d", lno);
output_color(opt, buf, strlen(buf), opt->color_lineno);
output_sep(opt, sign);
}
@@ -1651,7 +1651,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
 opt->color_filename);
output_sep(opt, ':');
}
-   snprintf(buf, sizeof(buf), "%u\n", count);
+   xsnprintf(buf, sizeof(buf), "%u\n", count);
opt->output(opt, buf, strlen(buf));
return 1;
}
diff --git a/http.c b/http.c
index 96d84bbed..8d94e2c63 100644
--- a/http.c
+++ b/http.c
@@ -1366,9 +1366,9 @@ static int handle_curl_result(struct slot_results 
*results)
 * FAILONERROR it is lost, so we can give only the numeric
 * status code.
 */
-   snprintf(curl_errorstr, sizeof(curl_errorstr),
-"The requested URL returned error: %ld",
-results->http_code);
+   xsnprintf(curl_errorstr, sizeof(curl_errorstr),
+ "The requested URL returned error: %ld",
+ results->http_code);
}
 
if (results->curl_result == CURLE_OK) {
@@ -1410,8 +1410,8 @@ int run_one_slot(struct active_request_slot *slot,
 {
slot->results = results;
if (!start_active_slot(slot)) {
-   snprintf(curl_errorstr, sizeof(curl_errorstr),
-"failed to start HTTP request");
+   xsnprintf(curl_errorstr, sizeof(curl_errorstr),
+ "failed to start HTTP request");
return HTTP_START_FAILED;
}
 
diff --git a/imap-send.c b/imap-send.c
index 5c7e27a89..857591660 100644
--- a/imap-send.c
+++ b/imap-send.c
@@ -964,7 +964,7 @@ static struct imap_store *imap_open_store(struct 
imap_server_conf *srvc, char *f
int gai;
char portstr[6];
 
-   snprintf(portstr, sizeof(portstr), "%d", srvc->port);
+   xsnprintf(portstr, sizeof(portstr), "%d", srvc->port);
 
memset(, 0, sizeof(hints));
hints.ai_socktype = SOCK_STREAM;
diff --git a/sha1_file.c b/sha1_file.c
index 71063890f..43990dec7 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3762,8 +3762,8 @@ static int for_each_file_in_obj_subdir(int subdir_nr,
char hex[GIT_SHA1_HEXSZ+1];
struct object_id oid;
 
-   snprintf(hex, sizeof(hex), "%02x%s",
-subdir_nr, de->d_name);
+   xsnprintf(hex, sizeof(hex), "%02x%s",
+ subdir_nr, de->d_name);
if (!get_oid_hex(hex, )) {
if (obj_cb) {
r = obj_cb(, path->buf, data);
diff --git a/submodule.c b/submodule.c
index 3200b7bb2..e11ea7d0a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1221,7 +1221,7 @@ static int find_first_merges(struct object_array *result, 
const char *path,
memset(_opts, 0, sizeof(rev_opts));
 
/* get all revisions that merge commit a */
-   snprintf(merged_revision, sizeof(merged_revision), "^%s",
+   xsnprintf(merged_revision, sizeof(merged_revision), "^%s",
oid_to_hex(>object.oid));
init_revisions(, NULL);
rev_opts.submodule = path;
-- 
2.12.2.845.g55fcf8b10



[PATCH 04/18] diff: avoid fixed-size buffer for patch-ids

2017-03-28 Thread Jeff King
To generate a patch id, we format the diff header into a
fixed-size buffer, and then feed the result to our sha1
computation. The fixed buffer has size '4*PATH_MAX + 20',
which in theory accomodates the four filenames plus some
extra data. Except:

  1. The filenames may not be constrained to PATH_MAX. The
 static value may not be a real limit on the current
 filesystem. Moreover, we may compute patch-ids for
 names stored only in git, without touching the current
 filesystem at all.

  2. The 20 bytes is not nearly enough to cover the
 extra content we put in the buffer.

As a result, the data we feed to the sha1 computation may be
truncated, and it's possible that a commit with a very long
filename could erroneously collide in the patch-id space
with another commit. For instance, if one commit modified
"really-long-filename/foo" and another modified "bar" in the
same directory.

In practice this is unlikely. Because the filenames are
repeated, and because there's a single cutoff at the end of
the buffer, the offending filename would have to be on the
order of four times larger than PATH_MAX.

But it's easy to fix by moving to a strbuf.

Technically this may change the output of patch-id for very
long filenames, but it's not worth making an exception for
this in the --stable output. It was a bug, and one that only
affected an unlikely set of paths.  And anyway, the exact
value would have varied from platform to platform depending
on the value of PATH_MAX, so there is no "stable" value.

Signed-off-by: Jeff King 
---
 diff.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 58cb72d7e..89b5dc890 100644
--- a/diff.c
+++ b/diff.c
@@ -4577,7 +4577,7 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
int i;
git_SHA_CTX ctx;
struct patch_id_t data;
-   char buffer[PATH_MAX * 4 + 20];
+   struct strbuf buffer = STRBUF_INIT;
 
git_SHA1_Init();
memset(, 0, sizeof(struct patch_id_t));
@@ -4607,10 +4607,11 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
diff_fill_sha1_info(p->one);
diff_fill_sha1_info(p->two);
 
+   strbuf_reset();
len1 = remove_space(p->one->path, strlen(p->one->path));
len2 = remove_space(p->two->path, strlen(p->two->path));
if (p->one->mode == 0)
-   len1 = snprintf(buffer, sizeof(buffer),
+   strbuf_addf(,
"diff--gita/%.*sb/%.*s"
"newfilemode%06o"
"---/dev/null"
@@ -4620,7 +4621,7 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
p->two->mode,
len2, p->two->path);
else if (p->two->mode == 0)
-   len1 = snprintf(buffer, sizeof(buffer),
+   strbuf_addf(,
"diff--gita/%.*sb/%.*s"
"deletedfilemode%06o"
"---a/%.*s"
@@ -4630,7 +4631,7 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
p->one->mode,
len1, p->one->path);
else
-   len1 = snprintf(buffer, sizeof(buffer),
+   strbuf_addf(,
"diff--gita/%.*sb/%.*s"
"---a/%.*s"
"+++b/%.*s",
@@ -4638,14 +4639,16 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
len2, p->two->path,
len1, p->one->path,
len2, p->two->path);
-   git_SHA1_Update(, buffer, len1);
+   git_SHA1_Update(, buffer.buf, buffer.len);
 
if (diff_header_only)
continue;
 
if (fill_mmfile(, p->one) < 0 ||
-   fill_mmfile(, p->two) < 0)
+   fill_mmfile(, p->two) < 0) {
+   strbuf_release();
return error("unable to read files to diff");
+   }
 
if (diff_filespec_is_binary(p->one) ||
diff_filespec_is_binary(p->two)) {
@@ -4660,11 +4663,14 @@ static int diff_get_patch_id(struct diff_options 
*options, unsigned char *sha1,
xecfg.ctxlen = 3;
xecfg.flags = 0;
if (xdi_diff_outf(, , patch_id_consume, ,
- , ))
+ , )) {

[PATCH 09/18] create_branch: move msg setup closer to point of use

2017-03-28 Thread Jeff King
In create_branch() we write the reflog msg into a buffer in
the main function, but then use it only inside a
conditional. If you carefully follow the logic, you can
confirm that we never use the buffer uninitialized nor write
when it would not be used. But we can make this a lot more
obvious by simply moving the write step inside the
conditional.

Signed-off-by: Jeff King 
---
 branch.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/branch.c b/branch.c
index 5c12036b0..6d0ca94cc 100644
--- a/branch.c
+++ b/branch.c
@@ -234,7 +234,7 @@ void create_branch(const char *name, const char *start_name,
 {
struct commit *commit;
unsigned char sha1[20];
-   char *real_ref, msg[PATH_MAX + 20];
+   char *real_ref;
struct strbuf ref = STRBUF_INIT;
int forcing = 0;
int dont_change_ref = 0;
@@ -290,19 +290,20 @@ void create_branch(const char *name, const char 
*start_name,
die(_("Not a valid branch point: '%s'."), start_name);
hashcpy(sha1, commit->object.oid.hash);
 
-   if (forcing)
-   snprintf(msg, sizeof msg, "branch: Reset to %s",
-start_name);
-   else if (!dont_change_ref)
-   snprintf(msg, sizeof msg, "branch: Created from %s",
-start_name);
-
if (reflog)
log_all_ref_updates = LOG_REFS_NORMAL;
 
if (!dont_change_ref) {
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
+   char msg[PATH_MAX + 20];
+
+   if (forcing)
+   snprintf(msg, sizeof msg, "branch: Reset to %s",
+start_name);
+   else
+   snprintf(msg, sizeof msg, "branch: Created from %s",
+start_name);
 
transaction = ref_transaction_begin();
if (!transaction ||
-- 
2.12.2.845.g55fcf8b10



[PATCH 12/18] receive-pack: print --pack-header directly into argv array

2017-03-28 Thread Jeff King
After receive-pack reads the pack header from the client, it
feeds the already-read part to index-pack and unpack-objects
via their --pack-header command-line options.  To do so, we
format it into a fixed buffer, then duplicate it into the
child's argv_array.

Our buffer is long enough to handle any possible input, so
this isn't wrong. But it's more complicated than it needs to
be; we can just argv_array_pushf() the final value and avoid
the intermediate copy. This drops the magic number and is
more efficient, too.

Note that we need to push to the argv_array in order, which
means we can't do the push until we are in the "unpack-objects
versus index-pack" conditional.  Rather than duplicate the
slightly complicated format specifier, I pushed it into a
helper function.

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index fb2a090a0..2ca93adef 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1634,12 +1634,17 @@ static const char *parse_pack_header(struct pack_header 
*hdr)
 
 static const char *pack_lockfile;
 
+static void push_header_arg(struct argv_array *args, struct pack_header *hdr)
+{
+   argv_array_pushf(args, "--pack_header=%"PRIu32",%"PRIu32,
+   ntohl(hdr->hdr_version), ntohl(hdr->hdr_entries));
+}
+
 static const char *unpack(int err_fd, struct shallow_info *si)
 {
struct pack_header hdr;
const char *hdr_err;
int status;
-   char hdr_arg[38];
struct child_process child = CHILD_PROCESS_INIT;
int fsck_objects = (receive_fsck_objects >= 0
? receive_fsck_objects
@@ -1653,9 +1658,6 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
close(err_fd);
return hdr_err;
}
-   snprintf(hdr_arg, sizeof(hdr_arg),
-   "--pack_header=%"PRIu32",%"PRIu32,
-   ntohl(hdr.hdr_version), ntohl(hdr.hdr_entries));
 
if (si->nr_ours || si->nr_theirs) {
alt_shallow_file = setup_temporary_shallow(si->shallow);
@@ -1679,7 +1681,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
tmp_objdir_add_as_alternate(tmp_objdir);
 
if (ntohl(hdr.hdr_entries) < unpack_limit) {
-   argv_array_pushl(, "unpack-objects", hdr_arg, NULL);
+   argv_array_push(, "unpack-objects");
+   push_header_arg(, );
if (quiet)
argv_array_push(, "-q");
if (fsck_objects)
@@ -1697,8 +1700,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
} else {
char hostname[256];
 
-   argv_array_pushl(, "index-pack",
-"--stdin", hdr_arg, NULL);
+   argv_array_pushl(, "index-pack", "--stdin", NULL);
+   push_header_arg(, );
 
if (gethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), "localhost");
-- 
2.12.2.845.g55fcf8b10



[PATCH 03/18] odb_mkstemp: use git_path_buf

2017-03-28 Thread Jeff King
Since git_path_buf() is smart enough to replace "objects/"
with the correct object path, we can use it instead of
manually assembling the path. That's slightly shorter, and
will clean up any non-canonical bits in the path.

Signed-off-by: Jeff King 
---
 environment.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/environment.c b/environment.c
index 88276790d..a9bf5658a 100644
--- a/environment.c
+++ b/environment.c
@@ -282,16 +282,14 @@ int odb_mkstemp(struct strbuf *template, const char 
*pattern)
 * restrictive except to remove write permission.
 */
int mode = 0444;
-   strbuf_reset(template);
-   strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+   git_path_buf(template, "objects/%s", pattern);
fd = git_mkstemp_mode(template->buf, mode);
if (0 <= fd)
return fd;
 
/* slow path */
/* some mkstemp implementations erase template on failure */
-   strbuf_reset(template);
-   strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+   git_path_buf(template, "objects/%s", pattern);
safe_create_leading_directories(template->buf);
return xmkstemp_mode(template->buf, mode);
 }
-- 
2.12.2.845.g55fcf8b10



[PATCH 02/18] odb_mkstemp: write filename into strbuf

2017-03-28 Thread Jeff King
The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.

In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.

The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).

Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.

Signed-off-by: Jeff King 
---
 builtin/index-pack.c |  6 +++---
 cache.h  |  2 +-
 environment.c| 16 
 fast-import.c|  9 +
 pack-bitmap-write.c  | 12 +++-
 pack-write.c | 12 ++--
 6 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 49e7175d9..f4af2ab37 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -307,10 +307,10 @@ static const char *open_pack_file(const char *pack_name)
if (from_stdin) {
input_fd = 0;
if (!pack_name) {
-   static char tmp_file[PATH_MAX];
-   output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
+   struct strbuf tmp_file = STRBUF_INIT;
+   output_fd = odb_mkstemp(_file,
"pack/tmp_pack_XX");
-   pack_name = xstrdup(tmp_file);
+   pack_name = strbuf_detach(_file, NULL);
} else {
output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 
0600);
if (output_fd < 0)
diff --git a/cache.h b/cache.h
index acad7078d..1c1889428 100644
--- a/cache.h
+++ b/cache.h
@@ -1678,7 +1678,7 @@ extern void pack_report(void);
  * usual "XX" trailer, and the resulting filename is written into the
  * "template" buffer. Returns the open descriptor.
  */
-extern int odb_mkstemp(char *template, size_t limit, const char *pattern);
+extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
 /*
  * Generate the filename to be used for a pack file with checksum "sha1" and
diff --git a/environment.c b/environment.c
index 2fdba7622..88276790d 100644
--- a/environment.c
+++ b/environment.c
@@ -274,7 +274,7 @@ char *get_object_directory(void)
return git_object_dir;
 }
 
-int odb_mkstemp(char *template, size_t limit, const char *pattern)
+int odb_mkstemp(struct strbuf *template, const char *pattern)
 {
int fd;
/*
@@ -282,18 +282,18 @@ int odb_mkstemp(char *template, size_t limit, const char 
*pattern)
 * restrictive except to remove write permission.
 */
int mode = 0444;
-   snprintf(template, limit, "%s/%s",
-get_object_directory(), pattern);
-   fd = git_mkstemp_mode(template, mode);
+   strbuf_reset(template);
+   strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+   fd = git_mkstemp_mode(template->buf, mode);
if (0 <= fd)
return fd;
 
/* slow path */
/* some mkstemp implementations erase template on failure */
-   snprintf(template, limit, "%s/%s",
-get_object_directory(), pattern);
-   safe_create_leading_directories(template);
-   return xmkstemp_mode(template, mode);
+   strbuf_reset(template);
+   strbuf_addf(template, "%s/%s", get_object_directory(), pattern);
+   safe_create_leading_directories(template->buf);
+   return xmkstemp_mode(template->buf, mode);
 }
 
 int odb_pack_keep(const char *name)
diff --git a/fast-import.c b/fast-import.c
index 41a539f97..900b9626f 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -890,14 +890,15 @@ static struct tree_content *dup_tree_content(struct 
tree_content *s)
 
 static void start_packfile(void)
 {
-   static char tmp_file[PATH_MAX];
+   struct strbuf tmp_file = STRBUF_INIT;
struct packed_git *p;
struct pack_header hdr;
int pack_fd;
 
-   pack_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
- "pack/tmp_pack_XX");
-   FLEX_ALLOC_STR(p, pack_name, tmp_file);
+   pack_fd = odb_mkstemp(_file, "pack/tmp_pack_XX");
+   FLEX_ALLOC_STR(p, pack_name, tmp_file.buf);
+   strbuf_release(_file);
+
p->pack_fd = pack_fd;
p->do_not_close = 1;
pack_file = sha1fd(pack_fd, 

[PATCH 01/18] do not check odb_mkstemp return value for errors

2017-03-28 Thread Jeff King
The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.

Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.

So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.

And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.

Signed-off-by: Jeff King 
---
 builtin/index-pack.c | 7 ---
 cache.h  | 5 -
 pack-bitmap-write.c  | 2 --
 pack-write.c | 4 ++--
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 88d205f85..49e7175d9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -311,10 +311,11 @@ static const char *open_pack_file(const char *pack_name)
output_fd = odb_mkstemp(tmp_file, sizeof(tmp_file),
"pack/tmp_pack_XX");
pack_name = xstrdup(tmp_file);
-   } else
+   } else {
output_fd = open(pack_name, O_CREAT|O_EXCL|O_RDWR, 
0600);
-   if (output_fd < 0)
-   die_errno(_("unable to create '%s'"), pack_name);
+   if (output_fd < 0)
+   die_errno(_("unable to create '%s'"), 
pack_name);
+   }
nothread_data.pack_fd = output_fd;
} else {
input_fd = open(pack_name, O_RDONLY);
diff --git a/cache.h b/cache.h
index db4120c23..acad7078d 100644
--- a/cache.h
+++ b/cache.h
@@ -1673,7 +1673,10 @@ extern struct packed_git *find_sha1_pack(const unsigned 
char *sha1,
 extern void pack_report(void);
 
 /*
- * Create a temporary file rooted in the object database directory.
+ * Create a temporary file rooted in the object database directory, or
+ * die on failure. The filename is taken from "pattern", which should have the
+ * usual "XX" trailer, and the resulting filename is written into the
+ * "template" buffer. Returns the open descriptor.
  */
 extern int odb_mkstemp(char *template, size_t limit, const char *pattern);
 
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 970559601..44492c346 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -517,8 +517,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 
int fd = odb_mkstemp(tmp_file, sizeof(tmp_file), 
"pack/tmp_bitmap_XX");
 
-   if (fd < 0)
-   die_errno("unable to create '%s'", tmp_file);
f = sha1fd(fd, tmp_file);
 
memcpy(header.magic, BITMAP_IDX_SIGNATURE, 
sizeof(BITMAP_IDX_SIGNATURE));
diff --git a/pack-write.c b/pack-write.c
index 88bc7f9f7..19cb514ea 100644
--- a/pack-write.c
+++ b/pack-write.c
@@ -77,9 +77,9 @@ const char *write_idx_file(const char *index_name, struct 
pack_idx_entry **objec
} else {
unlink(index_name);
fd = open(index_name, O_CREAT|O_EXCL|O_WRONLY, 0600);
+   if (fd < 0)
+   die_errno("unable to create '%s'", index_name);
}
-   if (fd < 0)
-   die_errno("unable to create '%s'", index_name);
f = sha1fd(fd, index_name);
}
 
-- 
2.12.2.845.g55fcf8b10



[PATCH 0/18] snprintf cleanups

2017-03-28 Thread Jeff King
Our code base calls snprintf() into a fixed-size buffer in a bunch of
places. Sometimes we check the result, and sometimes we accept a silent
truncation. In some cases an overflow is easy given long input. In some
cases it's impossible. And in some cases it depends on how big PATH_MAX
is on your filesystem, and whether it's actually enforced. :)

This series attempts to give more predictable and consistent results by
removing arbitrary buffer limitations. It also tries to make further
audits of snprintf() easier by converting to xsnprintf() where
appropriate.

There are still some snprintf() calls left after this. A few are in code
that's in flux, or is being cleaned up in nearby series (several of my
recent cleanup series were split off from this). A few should probably
remain (e.g., git-daemon will refuse to consider a repo name larger than
PATH_MAX, which may be a reasonable defense against weird memory tricks.
I wouldn't be sad to see this turned into a strbuf with an explicit
length policy enforced separately, though). And there were a few that I
just didn't get around to converting (the dumb-http walker, for example,
but I think it may need a pretty involved audit overall).

It's a lot of patches, but hopefully they're all pretty straightforward
to read.

  [01/18]: do not check odb_mkstemp return value for errors
  [02/18]: odb_mkstemp: write filename into strbuf
  [03/18]: odb_mkstemp: use git_path_buf
  [04/18]: diff: avoid fixed-size buffer for patch-ids
  [05/18]: tag: use strbuf to format tag header
  [06/18]: fetch: use heap buffer to format reflog
  [07/18]: avoid using fixed PATH_MAX buffers for refs
  [08/18]: avoid using mksnpath for refs
  [09/18]: create_branch: move msg setup closer to point of use
  [10/18]: create_branch: use xstrfmt for reflog message
  [11/18]: name-rev: replace static buffer with strbuf
  [12/18]: receive-pack: print --pack-header directly into argv array
  [13/18]: replace unchecked snprintf calls with heap buffers
  [14/18]: combine-diff: replace malloc/snprintf with xstrfmt
  [15/18]: convert unchecked snprintf into xsnprintf
  [16/18]: transport-helper: replace checked snprintf with xsnprintf
  [17/18]: gc: replace local buffer with git_path
  [18/18]: daemon: use an argv_array to exec children

 bisect.c   |  8 +---
 branch.c   | 16 
 builtin/checkout.c |  5 ++---
 builtin/fetch.c|  6 --
 builtin/gc.c   |  8 +---
 builtin/index-pack.c   | 22 --
 builtin/ls-remote.c| 10 ++
 builtin/name-rev.c | 21 -
 builtin/notes.c|  9 -
 builtin/receive-pack.c | 17 ++---
 builtin/replace.c  | 50 +++---
 builtin/rev-parse.c|  5 +++--
 builtin/tag.c  | 42 ++
 cache.h|  7 +--
 combine-diff.c |  7 ---
 daemon.c   | 38 +-
 diff.c | 20 +---
 environment.c  | 14 ++
 fast-import.c  |  9 +
 grep.c |  4 ++--
 http.c | 10 +-
 imap-send.c|  2 +-
 pack-bitmap-write.c| 14 +++---
 pack-write.c   | 16 
 refs.c | 44 ++--
 sha1_file.c|  4 ++--
 submodule.c|  2 +-
 transport-helper.c |  5 +
 28 files changed, 215 insertions(+), 200 deletions(-)


Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Junio C Hamano
Jeff King  writes:

> Here's that minor tweak, in case anybody is interested. It's less useful
> without that follow-on that touches "eol" more, but perhaps it increases
> readability on its own.

Yup, the only thing that the original (with Brian's fix) appears to
be more careful about is it tries very hard to avoid setting boc
past eoc.  As we are not checking "boc != eoc" but doing the
comparison, that "careful" appearance does not give us any benefit
in practice, other than having to do an extra "eol ? eol+1 : eoc";
the result of this patch is easier to read.

By the way, eoc is "one past the end" of the array that begins at
boc, so setting a pointer to eoc+1 may technically be in violation.
I do not know how much it matters, though ;-)

> -- >8 --
> Subject: [PATCH] receive-pack: simplify eol handling in cert parsing
>
> The queue_commands_from_cert() function wants to handle each
> line of the cert individually. It looks for "\n" in the
> to-be-parsed bytes, and special-cases each use of eol (the
> end-of-line variable) when we didn't find one.  Instead, we
> can just set the end-of-line variable to end-of-cert in the
> latter case.
>
> For advancing to the next line, it's OK for us to move our
> pointer past end-of-cert, because our loop condition just
> checks for pointer inequality. And it doesn't even violate
> the ANSI C "no more than one past the end of an array" rule,
> because we know in the worst case we've hit the terminating
> NUL of the strbuf.
>
> Signed-off-by: Jeff King 
> ---
>  builtin/receive-pack.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index 5d9e4da0a..58de2a1a9 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command 
> **tail,
>  
>   while (boc < eoc) {
>   const char *eol = memchr(boc, '\n', eoc - boc);
> - tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc);
> - boc = eol ? eol + 1 : eoc;
> + if (!eol)
> + eol = eoc;
> + tail = queue_command(tail, boc, eol - boc);
> + boc = eol + 1;
>   }
>  }


Re: [RFC] should these two topics graduate to 'master' soon?

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 11:51:49AM -0700, Jonathan Nieder wrote:

> > * jc/merge-drop-old-syntax (2015-04-29) 1 commit
> >
> >   This topic stops "git merge  HEAD " syntax that
> >   has been deprecated since October 2007 (and we have issued a
> >   warning message since around v2.5.0 when the ancient syntax was
> >   used).
> >
> > * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit
> >
> >   This is the endgame of the topic to avoid blindly falling back to
> >   ".git" when the setup sequence said we are _not_ in Git repository.
> >   A corner case that happens to work right now may be broken by a call
> >   to die("BUG").
> >
> > I am leaning toward including the former in the upcoming release,
> > whose -rc0 is tentatively scheduled to happen on Apr 20th.  I think
> > the rest of the system is also ready for the latter (back when we
> > merged it to 'next' and started cooking, there were still a few
> > codepaths that triggered its die(), which have been fixed).
> >
> > Opinions?
> 
> Google has been running with both of these for a while.  Any problems
> we ran into were already reported and fixed.  I would be all for
> including them in the next release.

Thanks, I was wondering how much exposure the latter got. It might be a
good idea to merge it to "master" early in the post-2.13 cycle to get a
little more exposure (since the point of it is really to flush out
unusual cases, the more people run it before we make a release the
better). But I'm also OK if it's merged to master this cycle, as long as
it's soon-ish. It's much better to flush out problems in pre-release
master than in a released version.

-Peff


Re: [PATCH v3 0/2] read-cache: call verify_hdr() in a background thread

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 07:07:30PM +, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
> 
> Version 3 of this patch series simplifies this effort to just turn
> on/off the hash verification using a "core.checksumindex" config variable.
> 
> I've preserved the original checksum validation code so that we can
> force it on in fsck if desired.
> 
> It eliminates the original threading model completely.
> 
> Jeff Hostetler (2):
>   read-cache: core.checksumindex
>   test-core-checksum-index: core.checksumindex test helper
> 
>  Makefile|  1 +
>  read-cache.c| 12 ++
>  t/helper/.gitignore |  1 +
>  t/helper/test-core-checksum-index.c | 77 
> +

Do we still need test-core-checksum-index? Can we just time ls-files or
something in t/perf?

-Peff


[PATCH v3 0/2] read-cache: call verify_hdr() in a background thread

2017-03-28 Thread git
From: Jeff Hostetler 

Version 3 of this patch series simplifies this effort to just turn
on/off the hash verification using a "core.checksumindex" config variable.

I've preserved the original checksum validation code so that we can
force it on in fsck if desired.

It eliminates the original threading model completely.

Jeff Hostetler (2):
  read-cache: core.checksumindex
  test-core-checksum-index: core.checksumindex test helper

 Makefile|  1 +
 read-cache.c| 12 ++
 t/helper/.gitignore |  1 +
 t/helper/test-core-checksum-index.c | 77 +
 4 files changed, 91 insertions(+)
 create mode 100644 t/helper/test-core-checksum-index.c

-- 
2.9.3



[PATCH v3 2/2] test-core-checksum-index: core.checksumindex test helper

2017-03-28 Thread git
From: Jeff Hostetler 

Created test helper to measure read_index() with and without
core.checksumindex set and report performance.

Signed-off-by: Jeff Hostetler 
---
 Makefile|  1 +
 t/helper/.gitignore |  1 +
 t/helper/test-core-checksum-index.c | 77 +
 3 files changed, 79 insertions(+)
 create mode 100644 t/helper/test-core-checksum-index.c

diff --git a/Makefile b/Makefile
index 9ec6065..6049427 100644
--- a/Makefile
+++ b/Makefile
@@ -607,6 +607,7 @@ PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 TEST_PROGRAMS_NEED_X += test-chmtime
 TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
+TEST_PROGRAMS_NEED_X += test-core-checksum-index
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index d6e8b36..f651a6b 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -1,6 +1,7 @@
 /test-chmtime
 /test-ctype
 /test-config
+/test-core-checksum-index
 /test-date
 /test-delta
 /test-dump-cache-tree
diff --git a/t/helper/test-core-checksum-index.c 
b/t/helper/test-core-checksum-index.c
new file mode 100644
index 000..0452dc2
--- /dev/null
+++ b/t/helper/test-core-checksum-index.c
@@ -0,0 +1,77 @@
+#include "cache.h"
+#include "parse-options.h"
+
+uint64_t time_runs(int do_checksum, int count)
+{
+   uint64_t t0, t1;
+   uint64_t sum = 0;
+   uint64_t avg;
+   int i;
+
+   git_config_set_gently("core.checksumindex",
+   (do_checksum ? "true" : "false"));
+
+   for (i = 0; i < count; i++) {
+   t0 = getnanotime();
+   read_cache();
+   t1 = getnanotime();
+
+   sum += (t1 - t0);
+
+   printf("%f %d [cache_nr %d]\n",
+  ((double)(t1 - t0))/10,
+  do_checksum,
+  the_index.cache_nr);
+   fflush(stdout);
+
+   discard_cache();
+   }
+
+   avg = sum / count;
+   if (count > 1) {
+   printf("%f %d avg\n",
+  (double)avg/10,
+  do_checksum);
+   fflush(stdout);
+   }
+
+   return avg;
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   int original_do_checksum;
+   int count = 1;
+   const char *usage[] = {
+   "test-core-checksum-index",
+   NULL
+   };
+   struct option options[] = {
+   OPT_INTEGER('c', "count", , "number of passes"),
+   OPT_END(),
+   };
+   const char *prefix;
+   uint64_t avg_0, avg_1;
+
+   prefix = setup_git_directory();
+   argc = parse_options(argc, argv, prefix, options, usage, 0);
+
+   if (count < 1)
+   die("count must greater than zero");
+
+   /* throw away call to get the index in the disk cache */
+   read_cache();
+   discard_cache();
+
+   git_config_get_bool("core.checksumindex", _do_checksum);
+
+   avg_1 = time_runs(1, count);
+   avg_0 = time_runs(0, count);
+
+   git_config_set_gently("core.checksumindex",
+   ((original_do_checksum) ? "true" : "false"));
+
+   if (avg_0 >= avg_1)
+   die("skipping index checksum verification did not help");
+   return 0;
+}
-- 
2.9.3



[PATCH v3 1/2] read-cache: core.checksumindex

2017-03-28 Thread git
From: Jeff Hostetler 

Teach git to skip verification of the index SHA in verify_hdr()
in read_index().

This is a performance optimization.  The index file SHA verification
can be considered an ancient relic from the early days of git and only
useful for detecting disk corruption.  For small repositories, this
SHA calculation is not that significant, but for gigantic repositories
this calculation adds significant time to every command.

Added "core.checksumindex" to enable/disable the SHA verification.

Signed-off-by: Jeff Hostetler 
---
 read-cache.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 9054369..2ab4b74 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,12 +1376,24 @@ static int verify_hdr(struct cache_header *hdr, 
unsigned long size)
git_SHA_CTX c;
unsigned char sha1[20];
int hdr_version;
+   int do_checksum = 0;
 
if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
return error("bad signature");
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
+
+   /*
+* Since we run very early in command startup, git_config()
+* may not have been called yet and the various "core_*"
+* global variables haven't been set.  So look it up
+* explicitly.
+*/
+   git_config_get_bool("core.checksumindex", _checksum);
+   if (!do_checksum)
+   return 0;
+
git_SHA1_Init();
git_SHA1_Update(, hdr, size - 20);
git_SHA1_Final(sha1, );
-- 
2.9.3



Re: [PATCH RFC 2/2] diff: teach diff to expand tabs in output

2017-03-28 Thread Junio C Hamano
Jacob Keller  writes:

> I'm really not a fan of how the ws code ended up. It seems pretty ugly
> and weird to hack in the expand_tabs stuff here. However, I'm really not
> sure how else I could handle this. Additionally, I'm not 100% sure
> whether this interacts with format-patch or other machinery which may
> well want some way to be excluded. Thoughts?

As long as you do the same as "do we color the output?  no, no, we
are format-patch and must not color" logic to refrain from expanding
the tabs, you should be OK.

> I think there also may be some wonky bits when performing the tab
> expansion during whitespace checks, due to the way we expand, because I
> don't think that the tabexpand function takes into account the "current"
> location when adding a string, so it very well may not be correct I
> am unsure if there is a good way to fix this.

This "feature" is limited to the diff output, so one way may be to
leave the code as-is and pipe the output to a filter that is similar
to /usr/bin/expand but knows that the first column is special (this
is the part that "this is limited to diff" kicks in).  You may even
be able to implement it as a new option to "expand(1)" and then
people who aren't Git users would also benefit.



Re: [RFC] should these two topics graduate to 'master' soon?

2017-03-28 Thread Stefan Beller
On Tue, Mar 28, 2017 at 11:35 AM, Junio C Hamano  wrote:
> There are two topics that are marked as "Will cook in 'next'" for
> practically forever in the "What's cooking" reports.  The world may
> have become ready for one or both of them, in which case we should
> do the merge not too late in the cycle.
>
> * jc/merge-drop-old-syntax (2015-04-29) 1 commit
>
>   This topic stops "git merge  HEAD " syntax that
>   has been deprecated since October 2007 (and we have issued a
>   warning message since around v2.5.0 when the ancient syntax was
>   used).

git-gui has:
82fbd8a (git-gui: maintain backwards compatibility for merge syntax, 2016-10-04)
which was the only blocker IIUC.
So this looks good to me.

>
> * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit
>
>   This is the endgame of the topic to avoid blindly falling back to
>   ".git" when the setup sequence said we are _not_ in Git repository.
>   A corner case that happens to work right now may be broken by a call
>   to die("BUG").
>
> I am leaning toward including the former in the upcoming release,
> whose -rc0 is tentatively scheduled to happen on Apr 20th.  I think
> the rest of the system is also ready for the latter (back when we
> merged it to 'next' and started cooking, there were still a few
> codepaths that triggered its die(), which have been fixed).

Just read through the commit messages of that branch and they
look reasonable, but I refrain from having an opinion here.

Thanks,
Stefan


Re: [RFC] should these two topics graduate to 'master' soon?

2017-03-28 Thread Jonathan Nieder
Hi Junio,

Junio C Hamano wrote:

> There are two topics that are marked as "Will cook in 'next'" for
> practically forever in the "What's cooking" reports.  The world may
> have become ready for one or both of them, in which case we should
> do the merge not too late in the cycle.
>
> * jc/merge-drop-old-syntax (2015-04-29) 1 commit
>
>   This topic stops "git merge  HEAD " syntax that
>   has been deprecated since October 2007 (and we have issued a
>   warning message since around v2.5.0 when the ancient syntax was
>   used).
>
> * jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit
>
>   This is the endgame of the topic to avoid blindly falling back to
>   ".git" when the setup sequence said we are _not_ in Git repository.
>   A corner case that happens to work right now may be broken by a call
>   to die("BUG").
>
> I am leaning toward including the former in the upcoming release,
> whose -rc0 is tentatively scheduled to happen on Apr 20th.  I think
> the rest of the system is also ready for the latter (back when we
> merged it to 'next' and started cooking, there were still a few
> codepaths that triggered its die(), which have been fixed).
>
> Opinions?

Google has been running with both of these for a while.  Any problems
we ran into were already reported and fixed.  I would be all for
including them in the next release.

Thanks and hope that helps,
Jonathan


[RFC] should these two topics graduate to 'master' soon?

2017-03-28 Thread Junio C Hamano
There are two topics that are marked as "Will cook in 'next'" for
practically forever in the "What's cooking" reports.  The world may
have become ready for one or both of them, in which case we should
do the merge not too late in the cycle.

* jc/merge-drop-old-syntax (2015-04-29) 1 commit

  This topic stops "git merge  HEAD " syntax that
  has been deprecated since October 2007 (and we have issued a
  warning message since around v2.5.0 when the ancient syntax was
  used).

* jk/no-looking-at-dotgit-outside-repo-final (2016-10-26) 1 commit

  This is the endgame of the topic to avoid blindly falling back to
  ".git" when the setup sequence said we are _not_ in Git repository.
  A corner case that happens to work right now may be broken by a call
  to die("BUG").

I am leaning toward including the former in the upcoming release,
whose -rc0 is tentatively scheduled to happen on Apr 20th.  I think
the rest of the system is also ready for the latter (back when we
merged it to 'next' and started cooking, there were still a few
codepaths that triggered its die(), which have been fixed).

Opinions?





Re: [PATCH] [GSOC] get_non_kept_pack_filenames(): reimplement using iterators

2017-03-28 Thread Stefan Beller
On Mon, Mar 27, 2017 at 6:39 PM, Robert Stanca  wrote:
> Replaces recursive traversing of opendir with dir_iterator.
>
> Signed-off-by: Robert Stanca 
> ---
>  builtin/repack.c | 15 ++-
>  1 file changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 677bc7c..27a5597 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -7,6 +7,8 @@
>  #include "strbuf.h"
>  #include "string-list.h"
>  #include "argv-array.h"
> +#include "iterator.h"
> +#include "dir-iterator.h"
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> @@ -86,26 +88,21 @@ static void remove_pack_on_signal(int signo)
>   */
>  static void get_non_kept_pack_filenames(struct string_list *fname_list)
>  {
> -   DIR *dir;
> -   struct dirent *e;
> +   struct dir_iterator *diter = dir_iterator_begin(packdir);
> char *fname;
>
> -   if (!(dir = opendir(packdir)))
> -   return;
> -
> -   while ((e = readdir(dir)) != NULL) {
> +   while (dir_iterator_advance(diter) == ITER_OK) {
> size_t len;
> -   if (!strip_suffix(e->d_name, ".pack", ))
> +   if (!strip_suffix(diter->relative_path, ".pack", ))
> continue;
>
> -   fname = xmemdupz(e->d_name, len);
> +   fname = xmemdupz(diter->relative_path, len);
>
> if (!file_exists(mkpath("%s/%s.keep", packdir, fname)))
> string_list_append_nodup(fname_list, fname);
> else
> free(fname);
> }
> -   closedir(dir);
>  }
>
>  static void remove_redundant_pack(const char *dir_name, const char 
> *base_name)
> --
> 2.7.4
>
>
>
>
> Hi , this is my first patch submission for Git Gsoc. I ran full tests and 
> local tests with
> prove --timer --jobs 15 ./t*pack*.sh .
>
> Have a great day,
>  Robert.

Hi and welcome to the Git community!

The patch looks like a faithful conversion with no side effects to me.
Reviewed-by: Stefan Beller 

Note: mail readers usually collapse long signatures (e.g. everything
after "--\n2.7.4"  in this email. So I did not know or even read your
comments on how you tested this patch before hitting reply.
You can also put these lines after the "---" after the sign off,
right before the diff stat.  A good example is
https://public-inbox.org/git/20170324231013.23346-1-ava...@gmail.com/
as the patch is also long enough, such that people may not scroll to the
bottom.  Once you're used to it, it is very easy to spot the chatter
that will not be part of the commit message, though.

Thanks,
Stefan


Re: UNS: Re: cherry-pick --message?

2017-03-28 Thread Jonathan Nieder
Andreas Krey wrote:
> On Tue, 21 Mar 2017 13:33:35 +, Jeff King wrote:

>> Probably "format-patch | sed | am -3" is your best bet if you want to
>> modify the patches in transit _and_ have the user just use normal git
>> tools.
>
> Except that 'git am' doesn't have --no-commit like cherry-pick does. :-(
> It's always something. (Perhaps I'm instead going to rewrite the commit
> before cherry-picking it.)

'git apply --index' can do that.  I agree that it would be sensible
for 'git am' to grow a --no-commit option to do that.

Thanks and hope that helps,
Jonathan


Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Stefan Beller
On Tue, Mar 28, 2017 at 3:46 AM, Pickfire  wrote:

> EOF
>
> echo '* diff="cpp"' > .gitmodules

Did you mean .gitattributes?


Re: Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Jacob Keller
On Tue, Mar 28, 2017 at 10:53 AM, Pickfire  wrote:
>
> Yes, I can't reproduce it outside the test suite. I have added the builtin
> and yet the test fails. test_decode_color gets same output as expect but
> still it fails, should I send in the patch?

You also need to ensure you have the exact same color settings as used
by the test scripts.

Thanks,
Jake


Re: [GSoC] Proposal: turn git-add--interactive.perl into a builtin

2017-03-28 Thread Stefan Beller
On Sat, Mar 25, 2017 at 8:15 PM, Daniel Ferreira (theiostream)
 wrote:

> SYNOPSIS
> There are many advantages to converting parts of git that are still
> scripts to C builtins, among which execution speed, improved
> compatibility and code deduplication.

agreed.

> git-add--interactive, one of the most useful features of Git.

knee jerk reaction: I never used it, so it cannot be that important ;)
(I use git-gui, which is essentially the same workflow. There are tons
of ways to accomplish a given goal using Git, so I guess we don't
want to get in an argument here).

>
> FEASIBILITY
>
> There was only one discussion regarding the feasibility of its porting
> (https://public-inbox.org/git/CAP8UFD2PcBsU6=FK4OHVrB7E98ycohS_0pYcbCBar=of1hl...@mail.gmail.com/).
> It resulted in a consensus that doing it would be a task too large –
> although interesting – for GSoC 2015 based on the amount of its lines
> of code. It is, however, only a few lines larger than
> git-rebase--interactive, which has been considered an appropriate
> idea. As such, it looks like a possible project for three months of
> full-time work.

ok, it sounds a challenging project. (currently counting 1750 lines of
code). Scrolling over the source code, there are quite a couple of
functions, where the direct equivalent in C springs to mind.

run_cmd_pipe -> see run-command.h
unquote_path -> unquote_c_style ?
refresh -> update_index_if_able()
list_modified -> iterate over "const struct cache_entry *ce = active_cache[i];"


> PROJECTED TIMELINE
> - Prior to May 4
> -- Refine my basic knowledge of Perl
> -- Craft one or two small patches to some of Git's Perl components
> (preferentially to git-add--interactive itself) to improve my
> understanding of the language and of how Git's Perl scripts actually
> work



>
> - May 4 - May 30
> -- Clarify implementation details with my mentor, and work on a more
> detailed roadmap for the project
> -- Investigate roughly how to replace command invocations from the
> script with actual builtin functions; which Git APIs in Perl already
> have functional equivalents in C; which parts will require a full
> rewrite.

There are different approaches for replacing functionality in another
language. Examples:
* Implement the functionality in C and then have a "flag-day" commit
  783d7e865e (builtin-am: remove redirection to git-am.sh, 2015-08-04)
  This only works when the whole functionality was replaced in prior commits
* Implement partial functionality in C and call it via a helper function.
  3604242f08 (submodule: port init from shell to C, 2016-04-15)
  This works well for only partial conversions (the larger the thing to
  convert the more appealing this is, as it gets code shipped early.)
  When choosing this strategy, this part of the Project would be to
  identify parts that could be ported on its own without much
  additional glue-code.

> - May 30 - June 30 (start of coding period)
> -- Define the architecture of the builtin within git (which
> functions/interfaces will it have? where will its code reside?).
> -- Implement a small subset of the builtin (to be defined with my
> mentor) and glue it into the existing Perl script. Present this as a
> first patch to get feedback early regarding the implementation and
> avoid piling up mistakes early.
> -- Do necessary changes based on this initial review.
> -- Have roughly 1/3 of the script's functionality ported to C.
>
> - June 30 - July 28
> -- Port the remainder of the script to a builtin.
> -- Have a weekly roadmap, sending a part of the patch every 15 days to
> the mailing list for review and to avoid massive commits by the end of
> GSoC.

yeah; send early, send often. ;)

> -- Apply suggestions from community reviews when possible; if not,
> save them for doing toward the end of GSoC (see below).

Please do not underestimate the discussion by community, finding
consensus on list consumes a bit of time in some cases.

> (Note: due to a previous commitment, during a five-day period of July
> I will only be able to work part-time on GSoC. The actual week will be
> known over the next weeks.)

Maybe you want to shift the schedule up to here by one week then?
(e.g. the first period would be April 27  - May 23)

>
> - July 28 - August 29
> -- By the start of this period, send a patch with the builtin fully
> implemented to the mailing list.

/a patch/a patch series consisting of many patches/
Experience shows that smaller patches are easier to review as
it is more focused. Consider e.g. e86ab2c1cd (wt-status: convert
to struct object_id, 2017-02-21) and the parents leading up to this
commit. They work on the same big topic, but focus on very
regional areas to ease review.

> -- Fix bugs, test extensively, possibly extend test coverage for
> git-add--interactive.

AFAICT ('$ git grep "git add -i"') there is only t3701 testing the
interactive add. Maybe we need to add tests first to document
current behavior, before attempting a conversion?


Re: Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Pickfire
Jacob Keller  wrote:

> On Tue, Mar 28, 2017 at 3:46 AM, Pickfire  wrote:
> > While I was working buildins shell patterns for user diffs. I noticed that
> > the tests t4034 passes but I can reproduce it manually with:
> >
> > mkdir cpp/ && cd cpp/ && git init
> >
> > cat > pre < > Foo():x(0&&1){}
> > cout<<"Hello World!\n"< > 1 -1e10 0xabcdef 'x'
> > [a] a->b a.b
> > !a ~a a++ a-- a*b a
> > a*b a/b a%b
> > a+b a-b
> > a<>b
> > ab a>=b
> > a==b a!=b
> > a
> > a^b
> > a|b
> > a&
> > a||b
> > a?b:z
> > a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
> > a,y
> > a::b
> > EOF
> >
> > cat > post < > Foo() : x(0&42) { bar(x); }
> > cout<<"Hello World?\n"< > (1) (-1e10) (0xabcdef) 'y'
> > [x] x->y x.y
> > !x ~x x++ x-- x*y x
> > x*y x/y x%y
> > x+y x-y
> > x<>y
> > xy x>=y
> > x==y x!=y
> > x
> > x^y
> > x|y
> > x&
> > x||y
> > x?y:z
> > x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
> > x,y
> > x::y
> > EOF
> >
> > echo '* diff="cpp"' > .gitmodules
> > git diff --no-index --color-words pre post > output
> >
> > Surprisingly, it shows (which is very different from the expected output):
> >
> 
> The diff test code uses "test_decode_color" function which decodes the
> color commands into human readable text. From the looks of it, you're
> trying to reproduce the test outside the test suite. However, you're
> not decoding the colors using the test library function, so it doesn't
> look right.

Yes, I can't reproduce it outside the test suite. I have added the builtin
and yet the test fails. test_decode_color gets same output as expect but
still it fails, should I send in the patch?


Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 01:35:36PM -0400, Jeff King wrote:

> I thought I'd knock this out quickly before I forgot about it. But it
> actually isn't so simple.
> 
> The main caller in read_head_info() does indeed just pass strlen(line)
> as the length in each case. But the cert parser really does need us to
> respect the line length. So we either have to pass it in, or tie off the
> string.
> 
> The latter looks something like the patch below (on top of a minor
> tweak around "eol" handling). It's sufficiently ugly that it may not
> count as an actual cleanup, though. I'm OK if we just drop the idea.

Here's that minor tweak, in case anybody is interested. It's less useful
without that follow-on that touches "eol" more, but perhaps it increases
readability on its own.

-- >8 --
Subject: [PATCH] receive-pack: simplify eol handling in cert parsing

The queue_commands_from_cert() function wants to handle each
line of the cert individually. It looks for "\n" in the
to-be-parsed bytes, and special-cases each use of eol (the
end-of-line variable) when we didn't find one.  Instead, we
can just set the end-of-line variable to end-of-cert in the
latter case.

For advancing to the next line, it's OK for us to move our
pointer past end-of-cert, because our loop condition just
checks for pointer inequality. And it doesn't even violate
the ANSI C "no more than one past the end of an array" rule,
because we know in the worst case we've hit the terminating
NUL of the strbuf.

Signed-off-by: Jeff King 
---
 builtin/receive-pack.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 5d9e4da0a..58de2a1a9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1524,8 +1524,10 @@ static void queue_commands_from_cert(struct command 
**tail,
 
while (boc < eoc) {
const char *eol = memchr(boc, '\n', eoc - boc);
-   tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc);
-   boc = eol ? eol + 1 : eoc;
+   if (!eol)
+   eol = eoc;
+   tail = queue_command(tail, boc, eol - boc);
+   boc = eol + 1;
}
 }
 
-- 
2.12.2.845.g55fcf8b10



Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 11:13:15AM +, brian m. carlson wrote:

> > I suggested an additional cleanup around "linelen" in one patch. In the
> > name of keeping the number of re-rolls sane, I'm OK if we skip that for
> > now (the only reason I mentioned it at all is that you have to justify
> > the caveat in the commit message; with the fix, that justification can
> > go away).
> 
> Let's leave it as it is, assuming Junio's okay with it.  I can send in a
> few more patches to clean that up and use skip_prefix that we can drop
> on top and graduate separately.
> 
> I think the justification is useful as it is, since it explains why we
> no longer want to check that particular value for historical reasons.

I thought I'd knock this out quickly before I forgot about it. But it
actually isn't so simple.

The main caller in read_head_info() does indeed just pass strlen(line)
as the length in each case. But the cert parser really does need us to
respect the line length. So we either have to pass it in, or tie off the
string.

The latter looks something like the patch below (on top of a minor
tweak around "eol" handling). It's sufficiently ugly that it may not
count as an actual cleanup, though. I'm OK if we just drop the idea.

---
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 58de2a1a9..561a982e7 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1483,13 +1483,10 @@ static void execute_commands(struct command *commands,
 }
 
 static struct command **queue_command(struct command **tail,
- const char *line,
- int linelen)
+ const char *line)
 {
struct object_id old_oid, new_oid;
struct command *cmd;
-   const char *refname;
-   int reflen;
const char *p;
 
if (parse_oid_hex(line, _oid, ) ||
@@ -1498,9 +1495,7 @@ static struct command **queue_command(struct command 
**tail,
*p++ != ' ')
die("protocol error: expected old/new/ref, got '%s'", line);
 
-   refname = p;
-   reflen = linelen - (p - line);
-   FLEX_ALLOC_MEM(cmd, ref_name, refname, reflen);
+   FLEX_ALLOC_STR(cmd, ref_name, p);
oidcpy(>old_oid, _oid);
oidcpy(>new_oid, _oid);
*tail = cmd;
@@ -1510,7 +1505,7 @@ static struct command **queue_command(struct command 
**tail,
 static void queue_commands_from_cert(struct command **tail,
 struct strbuf *push_cert)
 {
-   const char *boc, *eoc;
+   char *boc, *eoc;
 
if (*tail)
die("protocol error: got both push certificate and unsigned 
commands");
@@ -1523,10 +1518,17 @@ static void queue_commands_from_cert(struct command 
**tail,
eoc = push_cert->buf + parse_signature(push_cert->buf, push_cert->len);
 
while (boc < eoc) {
-   const char *eol = memchr(boc, '\n', eoc - boc);
+   char *eol = memchr(boc, '\n', eoc - boc);
+   char tmp;
+
if (!eol)
eol = eoc;
-   tail = queue_command(tail, boc, eol - boc);
+
+   tmp = *eol;
+   *eol = '\0';
+   tail = queue_command(tail, boc);
+   *eol = tmp;
+
boc = eol + 1;
}
 }
@@ -1590,7 +1592,7 @@ static struct command *read_head_info(struct oid_array 
*shallow)
continue;
}
 
-   p = queue_command(p, line, linelen);
+   p = queue_command(p, line);
}
 
if (push_cert.len)


Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread Junio C Hamano
Jeff King  writes:

> On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote:
>
>> This is part 7 in the continuing transition to use struct object_id.
>> 
>> This series focuses on two main areas: adding two constants for the
>> maximum hash size we'll be using (which will be suitable for allocating
>> memory) and converting struct sha1_array to struct oid_array.
>
> Both changes are very welcome. I do think it's probably worth changing
> the name of sha1-array.[ch], but it doesn't need to happen immediately.
>
> I read through the whole series and didn't find anything objectionable.
> The pointer-arithmetic fix should perhaps graduate separately.

I didn't see anything incorrect when I queued the series, either,
and after I re-read it I saw a few minor readability issues, but
modulo that this looks ready.  I did split the push-cert parsing fix
and applied to an older base independently, though.

> I suggested an additional cleanup around "linelen" in one patch. In the
> name of keeping the number of re-rolls sane, I'm OK if we skip that for
> now (the only reason I mentioned it at all is that you have to justify
> the caveat in the commit message; with the fix, that justification can
> go away).

A follow-up after the dust settles could also mention "we earlier
mentioned this caveat but with this fix we no longer have to worry
about it", no?


Thanks both, anyways.


Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *

2017-03-28 Thread Junio C Hamano
"brian m. carlson"  writes:

> Convert the callers to pass struct object_id by changing the function
> declaration and definition and applying the following semantic patch:
>
> @@
> expression E1, E2, E3;
> @@
> - sha1_array_append(E1, E2[E3].hash)
> + sha1_array_append(E1, E2 + E3)
>
> @@
> expression E1, E2;
> @@
> - sha1_array_append(E1, E2.hash)
> + sha1_array_append(E1, )

I noticed something similar in the change to bisect.c while reading
the previous step, and I suspect that the above two rules leave
somewhat inconsistent and harder-to-read result.  Wouldn't it make
the result more readable if the former rule were

-sha1_array_append(E1, E2[E3].hash)
+sha1_array_append(E1, [E3])


FWIW, the bit that made me read it twice in the previous step was
this change

-   strbuf_addstr(_hexs, sha1_to_hex(array->sha1[i]));
+   strbuf_addstr(_hexs, oid_to_hex(array->oid + i));

which I would have written &(array->oid[i]) instead.

After all, the original written by a human said E2[E3].hash (or
array->sha1[i]) because to the human's mind, E2 is a series of
things that can be indexed with an int E3, and even though 

*(E2 + E3)
E2[E3]
E3[E2]

all mean the same thing, the human decided that E2[E3] is the most
natural way to express this particular reference to an item in the
array.  [E3] would keep that intention by the original author
better than E2 + E3.

The above comment does not affect the correctness of the conversion,
but I think it would affect the readability of the resulting code.


Re: [PATCH v3 0/2] [GSoC] remove_subtree(): reimplement using iterators

2017-03-28 Thread Stefan Beller
On Sat, Mar 25, 2017 at 11:12 AM, Daniel Ferreira  wrote:
> This is the third version of the GSoC microproject
> of refactoring remove_subtree() from recursively using
> readdir() to use dir_iterator. Below are the threads for
> other versions:
>
> v1: 
> https://public-inbox.org/git/CAGZ79kZwT-9mHTiOJ5CEjk2wDFkn6+NcogjX0=vjhsah16a...@mail.gmail.com/T/#mae023e7a7d7626f00e0923833c4359f5af493730
> v2: 
> https://public-inbox.org/git/cacsjy8dxh-qpbblfafwpawusba9gvxa7x+mxljevykhk1zo...@mail.gmail.com/T/#t
>
> Duy suggested adding features to dir_iterator might go
> beyond the intention of a microproject, but I figured I
> might go for it to learn more about the project.
>
> The dir_iterator reimplementation has been tested in a
> separate binary I created (and linked with libgit.a) to
> reproduce remove_subtree()'s contents. As pointed out in the
> last thread, git's tests for this function were unable to
> catch a daunting bug I had introduced, and I still haven't
> been able to come up with a way to reproduce remove_subtree()
> being called. Any help?
>

I would think a test llike the following would work:

test_expect_success 'remove nested subtrees' '
test_commit initial &&
mkdir -p dir/with/nested/dir &&
echo content >dir/with/nested/dir/file &&
echo content >dir/file &&
git add dir/with/nested/dir/file dir/file &&
git commit -a -m "commit directory structure" &&
git checkout initial &&
! test dir
'


Re: [PATCH v2 06/21] builtin/receive-pack: fix incorrect pointer arithmetic

2017-03-28 Thread Junio C Hamano
Jeff King  writes:

> The patch itself is obviously an improvement. It may be worth graduating
> separately from the rest of the series.

Yup, I will split it out to bc/push-cert-receive-fix that builds
directly on an ancient jc/push-cert topic that was merged at
v2.2.0-rc0~64.

I'll need to drop the duplicate from bc/object-id topic, of course
(which hasn't happend).

Thanks.



Re: What's cooking in git.git (Mar 2017, #11; Mon, 27)

2017-03-28 Thread Stefan Beller
> * sb/submodule-short-status (2017-03-27) 7 commits
>  - submodule.c: correctly handle nested submodules in is_submodule_modified
>  - short status: improve reporting for submodule changes
>  - submodule.c: stricter checking for submodules in is_submodule_modified
>  - submodule.c: port is_submodule_modified to use porcelain 2
>  - submodule.c: convert is_submodule_modified to use strbuf_getwholeline
>  - submodule.c: factor out early loop termination in is_submodule_modified
>  - submodule.c: use argv_array in is_submodule_modified
>
>  The output from "git status --short" has been extended to show
>  various kinds of dirtyness in submodules differently; instead of to
>  "M" for modified, 'm' and '?' can be shown to signal changes only
>  to the working tree of the submodule but not the commit that is
>  checked out.
>
>  Waiting for further comments.
>  The endgame looked mostly OK.

I will reroll the top most commit

>  - submodule.c: correctly handle nested submodules in is_submodule_modified

per jrnieder's request to explain itself more (via tests, documentation)

Thanks,
Stefan


Re: [GSoC] Proposal Discussion

2017-03-28 Thread Devin Lehmacher
On Tue, Mar 28, 2017 at 07:52:42AM +0200, Christian Couder wrote:
> Hi,
>
> On Tue, Mar 28, 2017 at 12:17 AM, Devin Lehmacher  wrote:
> > Hello everyone,
> >
> > I am a student studying Computer Science at Cornell University. I
> > already completed a microproject, Move ~/.git-credential-cache/socket to
> > $XDG_CACHE_HOME/credential/socket a week and a half ago or so.
>
> Nice. It would be better though if you could provide a link to the
> thread where your microproject was discussed. If it has been merged to
> master, you could also provide the merge commit. Otherwise please tell
> what is its branch name and current status in the last "What's cooking
> in git.git" email from Junio.

Here is the merge commit into master: 78cf8efec34c419ecea86bc8d1fe47ec0b51ba37

> > I am interested in 2 different projects and would like some advice on
> > them, to help me decide which one to submit a proposal for.
> >
> > 1. `git rebase -i` conversion.
> >I was initially the most interested in this project but realize that
> >after having a very busy week last week that Ivan Tham started
> >[discussion][1] about this project. Would it be appropriate to submit
> >a proposal for a project that someone else also wants to work on?
>
> Yes, it is ok. Obviously only one student/proposal can be selected for
> a given project, but as anyway the main constraint for us is usually
> the number of available mentors, there is a low chance that this would
> prevent us from selecting one more student than we could otherwise
> select.
>
> You could also submit 2 proposals if you have time to work on more than one.

Ok! I think I will post rough drafts of both proposals sometime tomorrow
or maybe later today if I have time.

> > 2. formatting tool improvements.
> >There are four different git commands mentioned [here][2] as possible
> >tools to improve as can be seen in the email. Of those I think it
> >would make the most sense to extend `git name-rev`. It seems best
> >suited to the desired behavior. It would need to be extended to
> >understand rev's that refer to objects rather than just a commit-ish
> >and also add formatting support similar to the information that log
> >and for-each-ref can output. Since this doesn't seem like much work,
> >would it be feasible to generalize and somewhat standardize all of
> >the formatting commands?
>
> Yeah, I think it would be good. It might involve a lot of discussion
> though and this could slow your project.
> So if you really want to do it, my advice is to try to start the
> discussion as soon as possible, that is now.
>
> To do that you could for example Cc people involved in the email
> discussions, and try to come up with concrete proposals about how to
> generalize and standardize the formatting commands.

I will try to send out an email later this afternoon with a preliminary
plan and to start discussion about how best to rework formatting
commands.

Thanks,
Devin


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-28 Thread Jeff King
On Tue, Mar 28, 2017 at 11:27:19AM -0400, Jeff Hostetler wrote:

> > Hrm, there shouldn't be any dependency of the config on the index (and
> > there are a handful of options which impact the index already). Did you
> > try it and run into problems?
> 
> Yeah, I tried adding a new "core.verifyindex" property and the
> corresponding global variable.  But read_index() and verify_hdr()
> was being called BEFORE the config was loaded.  And it wasn't clear
> how best to solve that.
> 
> The issue was in "git status" where cmd_status() called
> status_init_config() which called gitmodules_config() before
> git_config().  but gitmodules_config() called read_index(),
> so my settings weren't loaded yet in verify_hdr().

Ugh, yeah, the callback-oriented interface suffers from these kind of
dependency cycles. You can fix it by doing a limited "basic config that
should always be loaded" git_config() call before anything else, and
then following up with the application-level config.

For something low-level that should _always_ be respected, even in
plumbing programs, I think we're better off lazy-loading the config
inside the function. The configset cache makes them more or less free.

I.e., something like:

diff --git a/read-cache.c b/read-cache.c
index e44775182..89bbf8d1e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1376,17 +1376,23 @@ static int verify_hdr(struct cache_header *hdr, 
unsigned long size)
git_SHA_CTX c;
unsigned char sha1[20];
int hdr_version;
+   int do_checksum = 0;
 
if (hdr->hdr_signature != htonl(CACHE_SIGNATURE))
return error("bad signature");
hdr_version = ntohl(hdr->hdr_version);
if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
return error("bad index version %d", hdr_version);
-   git_SHA1_Init();
-   git_SHA1_Update(, hdr, size - 20);
-   git_SHA1_Final(sha1, );
-   if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
-   return error("bad index file sha1 signature");
+
+   git_config_get_bool("core.checksumindex", _checksum);
+   if (do_checksum) {
+   git_SHA1_Init();
+   git_SHA1_Update(, hdr, size - 20);
+   git_SHA1_Final(sha1, );
+   if (hashcmp(sha1, (unsigned char *)hdr + size - 20))
+   return error("bad index file sha1 signature");
+   }
+
return 0;
 }

-Peff


Re: [PATCHv2 08/14] completion: let 'for-each-ref' and 'ls-remote' filter matching refs

2017-03-28 Thread SZEDER Gábor
On Fri, Mar 24, 2017 at 8:42 PM, Jeff King  wrote:
> On Thu, Mar 23, 2017 at 04:29:18PM +0100, SZEDER Gábor wrote:
>>   case "$cur_" in
>>   refs|refs/*)
>>   format="refname"
>> - refs="${cur_%/*}"
>> + refs=("$match*" "$match*/**")
>>   track=""
>
> Working on the aforementioned patch, I noticed that for-each-ref's
> matching is a little tricky due to its path semantics. So I wanted to
> double-check your patterns. :) I think these should do the right thing.

Yeah, I always thought that it's weird that globbing in for-each-ref
behaves differently from globbing in ls-remote or refspecs, but there
is nothing we can do about it now.

Anyway, this is why the tests added in this patch include e.g. both
'matching-branch' and 'matching/branch'.


Re: [PATCH v2 0/2] read-cache: call verify_hdr() in a background thread

2017-03-28 Thread Jeff Hostetler



On 3/27/2017 6:45 PM, Jeff King wrote:

On Mon, Mar 27, 2017 at 09:09:37PM +, g...@jeffhostetler.com wrote:


From: Jeff Hostetler 

Version 2 of this patch series simplifies this to just
turn off the hash verification.  Independent comments
from Linus and Peff suggested that we could just turn
this off and not worry about it.  So I've updated this
patch to do that.  I added a global variable to allow
the original code path to be used.  I also added a
t/helper command to demonstrate the differences.

On the Linux repo, the effect is rather trivial:

$ ~/work/gfw/t/helper/test-skip-verify-index -c 3
0.029884 0 [cache_nr 57994]
0.031035 0 [cache_nr 57994]
0.024308 0 [cache_nr 57994]
0.028409 0 avg
0.018359 1 [cache_nr 57994]
0.017025 1 [cache_nr 57994]
0.011087 1 [cache_nr 57994]
0.015490 1 avg

On my Windows source tree (450MB index), I'm seeing a
savings of 0.6 seconds -- read_index() went from 1.2 to 0.6
seconds.


Very satisfying. I assume that was with OpenSSL as the SHA-1
implementation (sha1dc would have been much slower on 450MB, I think).

-Peff



Yes, this was with the OpenSSL SHA-1 code in a GfW build.
I haven't played with the sha1dc code yet.

$ $/work/gh_gfw/t/helper/test-skip-verify-index.exe -c 5
1.276485 0 [cache_nr 3077831]
1.261164 0 [cache_nr 3077831]
1.256012 0 [cache_nr 3077831]
1.261411 0 [cache_nr 3077831]
1.266174 0 [cache_nr 3077831]
1.264249 0 avg
0.672057 1 [cache_nr 3077831]
0.666968 1 [cache_nr 3077831]
0.668725 1 [cache_nr 3077831]
0.675879 1 [cache_nr 3077831]
0.670213 1 [cache_nr 3077831]
0.670768 1 avg

Jeff


Re: [PATCH v2 1/2] read-cache: skip index SHA verification

2017-03-28 Thread Jeff Hostetler



On 3/27/2017 6:44 PM, Jeff King wrote:

On Mon, Mar 27, 2017 at 09:09:38PM +, g...@jeffhostetler.com wrote:


From: Jeff Hostetler 

Teach git to skip verification of the index SHA in read_index().

This is a performance optimization.  The index file SHA verification
can be considered an ancient relic from the early days of git and only
useful for detecting disk corruption.  For small repositories, this
SHA calculation is not that significant, but for gigantic repositories
this calculation adds significant time to every command.

I added a global "skip_verify_index" variable to control this and
allow it to be tested.

I did not create a config setting for this because of chicken-n-egg
problems with the loading the config and the index.


Hrm, there shouldn't be any dependency of the config on the index (and
there are a handful of options which impact the index already). Did you
try it and run into problems?


Yeah, I tried adding a new "core.verifyindex" property and the
corresponding global variable.  But read_index() and verify_hdr()
was being called BEFORE the config was loaded.  And it wasn't clear
how best to solve that.

The issue was in "git status" where cmd_status() called
status_init_config() which called gitmodules_config() before
git_config().  but gitmodules_config() called read_index(),
so my settings weren't loaded yet in verify_hdr().

I tried switching the order in status_init_config(), but
that caused errors in t7508 with submodules not being handled
properly.

https://github.com/jeffhostetler/git/commits/upstream/core_verify_index

At this point I decided that it wasn't that important to have
this config setting, since we'll probably default it to be faster
and be done with it.



In general, I'd much rather see us either:

  1. Rip the code out entirely if it is not meant to be configurable,
 and cannot be triggered by the actual git binary.

or

  2. Make it configurable, even if most people wouldn't use it. And then
 have a test to exercise it using a git command (unlike the one-off
 test helper, which isn't run at all).

-Peff



I'm OK with (1) if everyone else is.

Jeff



[PATCH/RFC v2] WIP configurable options facility

2017-03-28 Thread Ævar Arnfjörð Bjarmason
---

FWIW here's the current state of this WIP hack which I worked a bit on
yesterday. I converted it to use a hashmap and got rid of the need to
s/const // the options struct.

I've either converted options that read the config to this already, or
left TODO comments on those that are candidates for migration.

All tests pass with this, but as the various TODO comments note
there's some problems. It's leaking memory currently, and as the
comment in parse_options_step() notes due to how this options code is
called I've had to sprinkle some boilerplate in several parse_*_opt()
functions.

This is getting rid of less code than I'd hoped at first, although
it'll be made up if we make more stuff configurable.

On Tue, Mar 28, 2017 at 7:17 AM, Jeff King  wrote:
> On Sat, Mar 25, 2017 at 11:32:02PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> > So hopefully it's clear that the two are functionally equivalent, and
>> > differ only in syntax (in this case we manually decided which options
>> > are safe to pull from the config, but we'd have to parse the options.log
>> > string, too, and we could make the same decision there).
>>
>> I like the simplicity of this approach a lot. I.e. (to paraphrase it
>> just to make sure we're on the same page): Skip all the complexity of
>> reaching into the getopt guts, and just munge argc/argv given a config
>> we can stick ahead of the getopt (struct options) spec, inject some
>> options at the beginning if they're in the config, and off we go
>> without any further changes to the getopt guts.
>
> Yep, I think that's an accurate description.
>
>> There's two practical issues with this that are easy to solve with my
>> current approach, but I can't find an easy solution to using this
>> method.
>>
>> The first is that we're replacing the semantics of:
>>
>> "If you're specifying it on the command-line, we take it from there,
>> otherwise we use your config, if set, regardless of how the option
>> works"
>>
>> with:
>>
>> "We read your config, inject options implicitly at the start of the
>> command line, and then append whatever command-line you give us"
>>
>> These two are not the same. Consider e.g. the commit.verbose config.
>> With my current patch if have commit.verbose=1 in your config and do
>> "commit --verbose" you just end up with a result equivalent to not
>> having it in your config, but since the --verbose option can be
>> supplied multiple times to increase verbosity with the injection
>> method you'd end up with the equivalent of commit.verbose=2.
>
> Right, for anything where multiple options are meaningful, they'd have
> to give "--no-verbose" to reset the option. In a sense that's less
> friendly, because it's more manual. But it's also less magical, because
> the semantics are clear: the config option behaves exactly as if you
> gave the option on the command line. So for an OPT_STRING_LIST(), you
> could append to the list, or reset it to empty, etc, as you see fit.
>
> But I do agree that it's more manual, and probably would cause some
> confusion.

And some slight breakage of backwards compatibiilty for things that
supply the option on the CLI now, although it wouldn't be a huge deal.

>> I can't think of a good way around that with your proposed approach
>> that doesn't essentially get us back to something very similar to my
>> patch, i.e. we'd need to parse the command-line using the options spec
>> before applying our implicit config.
>
> Yes, the semantics you gave require parsing the options first. I think
> it would be sufficient to just give each "struct option" a "seen" flag
> (rather than having it understand the config mechanism), having
> parse_options() set the flag, and then feeding the result to a separate
> config/cmdline mapping mechanism. That keeps the complexity out of the
> options code.

Right, would require s/const // on the struct though like in my v1, or
keeping this info in some datastructure on the side.

> It does tie us back in to requiring parse-options, which not all the
> options use.

I think if we do keep something like this patch it's fair enough to
say it won't work for everything. It's just there to make it easier
for us to add configuration for options in the common case, but
there's going to be various special cases (e.g. currently the options
synonyms, and I think I'll leave that) that we won't be able to
handle.

> In a lot of cases that "seen" flag is effectively a sentinel value in
> whatever variable the option value is stored in. But some of the options
> don't have reasonable sentinel values (as you noticed with the "revert
> -m" handling recently).
>
>> The second issue is related, i.e. I was going to add some flag an
>> option could supply to say "if I'm provided none of these other
>> maybe-from-config options get to read their config". This is needed
>> for hybrid plumbing/porcelain like "git status --porcelain".
>
> Yeah, I agree you can't make that decision until you've seen the
> command-line 

Re: Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Jacob Keller
On Tue, Mar 28, 2017 at 3:46 AM, Pickfire  wrote:
> While I was working buildins shell patterns for user diffs. I noticed that
> the tests t4034 passes but I can reproduce it manually with:
>
> mkdir cpp/ && cd cpp/ && git init
>
> cat > pre < Foo():x(0&&1){}
> cout<<"Hello World!\n"< 1 -1e10 0xabcdef 'x'
> [a] a->b a.b
> !a ~a a++ a-- a*b a
> a*b a/b a%b
> a+b a-b
> a<>b
> ab a>=b
> a==b a!=b
> a
> a^b
> a|b
> a&
> a||b
> a?b:z
> a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
> a,y
> a::b
> EOF
>
> cat > post < Foo() : x(0&42) { bar(x); }
> cout<<"Hello World?\n"< (1) (-1e10) (0xabcdef) 'y'
> [x] x->y x.y
> !x ~x x++ x-- x*y x
> x*y x/y x%y
> x+y x-y
> x<>y
> xy x>=y
> x==y x!=y
> x
> x^y
> x|y
> x&
> x||y
> x?y:z
> x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
> x,y
> x::y
> EOF
>
> echo '* diff="cpp"' > .gitmodules
> git diff --no-index --color-words pre post > output
>
> Surprisingly, it shows (which is very different from the expected output):
>

The diff test code uses "test_decode_color" function which decodes the
color commands into human readable text. From the looks of it, you're
trying to reproduce the test outside the test suite. However, you're
not decoding the colors using the test library function, so it doesn't
look right.

Thanks,
Jake


[PATCH RFC 2/2] diff: teach diff to expand tabs in output

2017-03-28 Thread Jacob Keller
From: Jacob Keller 

When creating a diff for contents, we prepend a single character to
represent the state of that line. This character can offset how the tabs
in the real content are displayed, which may result in weird alignment
issues when viewing the diffs. Teach the diff core a new option to
expand these tabs, similar to how we already expand log contents.

This new option can be used to display the lines so that the reader can
see the expected results without the confusion of the offset tabstops
caused by the extra character prepended to each line.

Because some of the printing location is tied up into the whitespace
checking code, we also need to teach this code that it may need to
expand tabs as well. We will expand the output at the last moment so
that the whitespace checks see the contents before it is expanded.

Signed-off-by: Jacob Keller 
---
I'm really not a fan of how the ws code ended up. It seems pretty ugly
and weird to hack in the expand_tabs stuff here. However, I'm really not
sure how else I could handle this. Additionally, I'm not 100% sure
whether this interacts with format-patch or other machinery which may
well want some way to be excluded. Thoughts? Does anyone have a better
implementation?

I think there also may be some wonky bits when performing the tab
expansion during whitespace checks, due to the way we expand, because I
don't think that the tabexpand function takes into account the "current"
location when adding a string, so it very well may not be correct I
am unsure if there is a good way to fix this.

 Documentation/diff-options.txt |  6 
 cache.h|  2 +-
 diff.c | 23 +--
 diff.h |  6 
 t/t4063-diff-expand-tabs.sh| 65 ++
 ws.c   | 42 ++-
 6 files changed, 126 insertions(+), 18 deletions(-)
 create mode 100755 t/t4063-diff-expand-tabs.sh

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f48deef..82e314b20b3d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -329,6 +329,12 @@ endif::git-format-patch[]
the diff-patch output format.  Non default number of
digits can be specified with `--abbrev=`.
 
+--diff-expand-tabs[=]::
+   When showing diff output, expand any tabs into spaces first before
+   printing. The size of the tabs is determined by  which defaults to
+   8 if not provided. --no-diff-expand-tabs or a size of 0 will disable
+   expansion.
+
 -B[][/]::
 --break-rewrites[=[][/]]::
Break complete rewrite changes into pairs of delete and
diff --git a/cache.h b/cache.h
index 5c8078291c47..2e221174edd4 100644
--- a/cache.h
+++ b/cache.h
@@ -2155,7 +2155,7 @@ extern unsigned whitespace_rule_cfg;
 extern unsigned whitespace_rule(const char *);
 extern unsigned parse_whitespace_rule(const char *);
 extern unsigned ws_check(const char *line, int len, unsigned ws_rule);
-extern void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE 
*stream, const char *set, const char *reset, const char *ws);
+extern void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE 
*stream, const char *set, const char *reset, const char *ws, int expand_tabs);
 extern char *whitespace_error_string(unsigned ws);
 extern void ws_fix_copy(struct strbuf *, const char *, int, unsigned, int *);
 extern int ws_blank_line(const char *line, int len, unsigned ws_rule);
diff --git a/diff.c b/diff.c
index 58cb72d7e72a..488019335df7 100644
--- a/diff.c
+++ b/diff.c
@@ -544,7 +544,14 @@ static void emit_line_0(struct diff_options *o, const char 
*set, const char *res
fputs(set, file);
if (!nofirst)
fputc(first, file);
-   fwrite(line, len, 1, file);
+   if (o->expand_tabs) {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_add_tabexpand(, o->expand_tabs, line, len);
+   fwrite(sb.buf, sb.len, 1, file);
+   strbuf_release();
+   } else {
+   fwrite(line, len, 1, file);
+   }
fputs(reset, file);
}
if (has_trailing_carriage_return)
@@ -595,7 +602,8 @@ static void emit_line_checked(const char *reset,
/* Emit just the prefix, then the rest. */
emit_line_0(ecbdata->opt, set, reset, sign, "", 0);
ws_check_emit(line, len, ecbdata->ws_rule,
- ecbdata->opt->file, set, reset, ws);
+ ecbdata->opt->file, set, reset, ws,
+ ecbdata->opt->expand_tabs);
}
 }
 
@@ -2190,7 +2198,7 @@ static void checkdiff_consume(void *priv, char *line, 
unsigned long len)
free(err);
  

[PATCH RFC 1/2] strbuf: move strbuf_add_tabexpand into strbuf.c

2017-03-28 Thread Jacob Keller
From: Jacob Keller 

In commit 7cc13c717b52 ("pretty: expand tabs in indented logs to make
things line up properly", 2016-03-16) a new function was added to insert
a line into a strbuf while expanding the tabs into spaces. This
functionality was used to help show the log message correctly when it
has been indented, so as to properly display the expected output.

This functionality will be useful in a future patch that adds similar
functionality into git diff, so lets move it into strbuf.c and make it
a public function.

While we're doing this, rename a few of the variables to fix the
surrounding strbuf code.

Signed-off-by: Jacob Keller 
---
 pretty.c | 50 --
 strbuf.c | 50 ++
 strbuf.h |  6 ++
 3 files changed, 56 insertions(+), 50 deletions(-)

diff --git a/pretty.c b/pretty.c
index d0f86f5d85ca..70368509ffea 100644
--- a/pretty.c
+++ b/pretty.c
@@ -1658,56 +1658,6 @@ void pp_title_line(struct pretty_print_context *pp,
strbuf_release();
 }
 
-static int pp_utf8_width(const char *start, const char *end)
-{
-   int width = 0;
-   size_t remain = end - start;
-
-   while (remain) {
-   int n = utf8_width(, );
-   if (n < 0 || !start)
-   return -1;
-   width += n;
-   }
-   return width;
-}
-
-static void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
-const char *line, int linelen)
-{
-   const char *tab;
-
-   while ((tab = memchr(line, '\t', linelen)) != NULL) {
-   int width = pp_utf8_width(line, tab);
-
-   /*
-* If it wasn't well-formed utf8, or it
-* had characters with badly defined
-* width (control characters etc), just
-* give up on trying to align things.
-*/
-   if (width < 0)
-   break;
-
-   /* Output the data .. */
-   strbuf_add(sb, line, tab - line);
-
-   /* .. and the de-tabified tab */
-   strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));
-
-   /* Skip over the printed part .. */
-   linelen -= tab + 1 - line;
-   line = tab + 1;
-   }
-
-   /*
-* Print out everything after the last tab without
-* worrying about width - there's nothing more to
-* align.
-*/
-   strbuf_add(sb, line, linelen);
-}
-
 /*
  * pp_handle_indent() prints out the intendation, and
  * the whole line (without the final newline), after
diff --git a/strbuf.c b/strbuf.c
index 00457940cfc1..6cecfcadb05b 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -275,6 +275,56 @@ void strbuf_commented_addf(struct strbuf *sb, const char 
*fmt, ...)
strbuf_release();
 }
 
+static int find_utf8_width(const char *start, const char *end)
+{
+   int width = 0;
+   size_t remain = end - start;
+
+   while (remain) {
+   int n = utf8_width(, );
+   if (n < 0 || !start)
+   return -1;
+   width += n;
+   }
+   return width;
+}
+
+void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
+ const char *buf, size_t size)
+{
+   const char *tab;
+
+   while ((tab = memchr(buf, '\t', size)) != NULL) {
+   int width = find_utf8_width(buf, tab);
+
+   /*
+* If it wasn't well-formed utf8, or it
+* had characters with badly defined
+* width (control characters etc), just
+* give up on trying to align things.
+*/
+   if (width < 0)
+   break;
+
+   /* Output the data .. */
+   strbuf_add(sb, buf, tab - buf);
+
+   /* .. and the de-tabified tab */
+   strbuf_addchars(sb, ' ', tabwidth - (width % tabwidth));
+
+   /* Skip over the printed part .. */
+   size -= tab + 1 - buf;
+   buf = tab + 1;
+   }
+
+   /*
+* Print out everything after the last tab without
+* worrying about width - there's nothing more to
+* align.
+*/
+   strbuf_add(sb, buf, size);
+}
+
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap)
 {
int len;
diff --git a/strbuf.h b/strbuf.h
index 80047b1bb7b8..17e04911833e 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -238,6 +238,12 @@ extern void strbuf_splice(struct strbuf *, size_t pos, 
size_t len,
  */
 extern void strbuf_add_commented_lines(struct strbuf *out, const char *buf, 
size_t size);
 
+/**
+ * Add a NUL-terminated string to the buffer. Tabs will be expanded using the
+ * provided tabwidth.
+ */
+extern void strbuf_add_tabexpand(struct strbuf *sb, int tabwidth,
+const 

Re: UNS: Re: cherry-pick --message?

2017-03-28 Thread Andreas Krey
On Tue, 21 Mar 2017 13:33:35 +, Jeff King wrote:
...
> Probably "format-patch | sed | am -3" is your best bet if you want to
> modify the patches in transit _and_ have the user just use normal git
> tools.

Except that 'git am' doesn't have --no-commit like cherry-pick does. :-(
It's always something. (Perhaps I'm instead going to rewrite the commit
before cherry-picking it.)

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800


Re: [PATCH v2 00/21] object_id part 7

2017-03-28 Thread brian m. carlson
On Tue, Mar 28, 2017 at 03:31:59AM -0400, Jeff King wrote:
> I read through the whole series and didn't find anything objectionable.
> The pointer-arithmetic fix should perhaps graduate separately.

Junio's welcome to take that patch separately if he likes.

> I suggested an additional cleanup around "linelen" in one patch. In the
> name of keeping the number of re-rolls sane, I'm OK if we skip that for
> now (the only reason I mentioned it at all is that you have to justify
> the caveat in the commit message; with the fix, that justification can
> go away).

Let's leave it as it is, assuming Junio's okay with it.  I can send in a
few more patches to clean that up and use skip_prefix that we can drop
on top and graduate separately.

I think the justification is useful as it is, since it explains why we
no longer want to check that particular value for historical reasons.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Microproject | Add more builtin patterns for userdiff

2017-03-28 Thread Pickfire
While I was working buildins shell patterns for user diffs. I noticed that
the tests t4034 passes but I can reproduce it manually with:

mkdir cpp/ && cd cpp/ && git init

cat > pre b
ab a>=b
a==b a!=b
a
a^b
a|b
a&
a||b
a?b:z
a=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b
a,y
a::b
EOF

cat > post y
xy x>=y
x==y x!=y
x
x^y
x|y
x&
x||y
x?y:z
x=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y
x,y
x::y
EOF

echo '* diff="cpp"' > .gitmodules
git diff --no-index --color-words pre post > output

Surprisingly, it shows (which is very different from the expected output):

^[[1mdiff --git a/pre b/post^[[m
^[[1mindex 23d5c8a..7e8c026 100644^[[m
^[[1m--- a/pre^[[m
^[[1m+++ b/post^[[m
^[[36m@@ -1,19 +1,19 @@^[[m
^[[31mFoo():x(0&&1){}^[[m^[[32mFoo() : x(0&42) { bar(x); }^[[m
cout<<"Hello ^[[31mWorld!\n"<b a.b^[[m
^[[31m!a ~a a++ a-- a*b a^[[m
^[[31ma*b a/b a%b^[[m
^[[31ma+b a-b^[[m
^[[31ma<>b^[[m
^[[31mab a>=b^[[m
^[[31ma==b a!=b^[[m
^[[31ma^[[m
^[[31ma^b^[[m
^[[31ma|b^[[m
^[[31ma&^[[m
^[[31ma||b^[[m
^[[31ma?b:z^[[m
^[[31ma=b a+=b a-=b a*=b a/=b a%=b a<<=b a>>=b a&=b a^=b a|=b^[[m
^[[31ma,y^[[m
^[[31ma::b^[[m^[[32mWorld?\n"<y x.y^[[m
^[[32m!x ~x x++ x-- x*y x^[[m
^[[32mx*y x/y x%y^[[m
^[[32mx+y x-y^[[m
^[[32mx<>y^[[m
^[[32mxy x>=y^[[m
^[[32mx==y x!=y^[[m
^[[32mx^[[m
^[[32mx^y^[[m
^[[32mx|y^[[m
^[[32mx&^[[m
^[[32mx||y^[[m
^[[32mx?y:z^[[m
^[[32mx=y x+=y x-=y x*=y x/=y x%=y x<<=y x>>=y x&=y x^=y x|=y^[[m
^[[32mx,y^[[m
^[[32mx::y^[[m

Instead of:

diff --git a/pre b/post
index 23d5c8a..7e8c026 100644
--- a/pre
+++ b/post
@@ -1,19 +1,19 @@
Foo() : x(0&&1&42) { bar(x); }
cout<<"Hello World!?\n"<b 
ay x.by
!ax ~a ax x++ 
ax-- ax*b 
ay x&b
ay
x*b ay x/b ay 
x%b
ay
x+b ay x-b
ay
x<>b
ay
xb ay x>=b
ay
x==b ay x!=b
ay
x&b
ay
x^b
ay
x|b
ay
x&&b
ay
x||b
ay
x?by:z
ax=b ay x+=b 
ay x-=b ay x*=b 
ay x/=b ay x%=b 
ay x<<=b ay x>>=b 
ay x&=b ay x^=b 
ay x|=b
ay
x,y
ax::by

That's does not just happens to cpp builtins, it happens to bibtex as well.
Is it that I had missed some configuration since I have tested this on a
few machines?

I have a workaround for that, which is to run log with --color-words=
with regex from the userdiff.c, does it automatically use the builtins diff?


Re: What's cooking in git.git (Mar 2017, #11; Mon, 27)


> On 28 Mar 2017, at 00:35, Junio C Hamano  wrote:
> ...
> 
> * ls/filter-process-delayed (2017-03-06) 1 commit
> - convert: add "status=delayed" to filter process protocol
> 
> What's the status of this one???
> cf. 

I was about to send out a new round last Sunday but then
I discovered a problem. Believe it or not but I am still
working on this :-) ... although way too slow :-(

---

Completely different topic:

Would it be possible to get the travis-ci Windows patch 
queued in pu for testing?
http://public-inbox.org/git/20170324113747.44991-1-larsxschnei...@gmail.com/

Thanks,
Lars


Re: [PATCH v2 00/21] object_id part 7

On Sun, Mar 26, 2017 at 04:01:22PM +, brian m. carlson wrote:

> This is part 7 in the continuing transition to use struct object_id.
> 
> This series focuses on two main areas: adding two constants for the
> maximum hash size we'll be using (which will be suitable for allocating
> memory) and converting struct sha1_array to struct oid_array.

Both changes are very welcome. I do think it's probably worth changing
the name of sha1-array.[ch], but it doesn't need to happen immediately.

I read through the whole series and didn't find anything objectionable.
The pointer-arithmetic fix should perhaps graduate separately.

I suggested an additional cleanup around "linelen" in one patch. In the
name of keeping the number of re-rolls sane, I'm OK if we skip that for
now (the only reason I mentioned it at all is that you have to justify
the caveat in the commit message; with the fix, that justification can
go away).

-Peff


Re: [PATCH v2 16/21] Make sha1_array_append take a struct object_id *

On Sun, Mar 26, 2017 at 04:01:38PM +, brian m. carlson wrote:

> diff --git a/transport.c b/transport.c
> index 8a90b0c29b..e492757726 100644
> --- a/transport.c
> +++ b/transport.c
> @@ -1027,7 +1027,8 @@ int transport_push(struct transport *transport,
>  
>   for (; ref; ref = ref->next)
>   if (!is_null_oid(>new_oid))
> - sha1_array_append(, 
> ref->new_oid.hash);
> + sha1_array_append(,
> +   >new_oid);

Funny that this line wrapped when it got shorter. :)

I think wrapping is the right thing, though (it is longer than 80).

-Peff


Re: [PATCH v2 15/21] sha1-array: convert internal storage for struct sha1_array to object_id

On Sun, Mar 26, 2017 at 04:01:37PM +, brian m. carlson wrote:

> Make the internal storage for struct sha1_array use an array of struct
> object_id internally.  Update the users of this struct which inspect its
> internals.

Yay. I'm happy to see all those gross (*sha1)[20] bits go away.

-Peff


Re: [PATCH v2 07/21] builtin/receive-pack: convert portions to struct object_id

On Sun, Mar 26, 2017 at 04:01:29PM +, brian m. carlson wrote:

> Convert some hardcoded constants into uses of parse_oid_hex.
> Additionally, convert all uses of struct command, and miscellaneous
> other functions necessary for that.  This work is necessary to be able
> to convert sha1_array_append later on.
> 
> To avoid needing to specify a constant, reject shallow lines with the
> wrong length instead of simply ignoring them.

It took me a while to find it. This is the switch from "len == 48" to
"len > 8" when matching "shallow" lines. I think this makes sense.

> Note that in queue_command we are guaranteed to have a NUL-terminated
> buffer or at least one byte of overflow that we can safely read, so the
> linelen check can be elided.  We would die in such a case, but not read
> invalid memory.

I think linelen is always just strlen(line). Since the queue_command
function no longer cares about it, perhaps we can just omit it?

> @@ -1541,12 +1541,12 @@ static struct command *read_head_info(struct 
> sha1_array *shallow)
>   if (!line)
>   break;
>  
> - if (len == 48 && starts_with(line, "shallow ")) {
> - unsigned char sha1[20];
> - if (get_sha1_hex(line + 8, sha1))
> + if (len > 8 && starts_with(line, "shallow ")) {
> + struct object_id oid;
> + if (get_oid_hex(line + 8, ))
>   die("protocol error: expected shallow sha, got 
> '%s'",
>   line + 8);

This would be much nicer as:

  if (skip_prefix(line, "shallow ", ))

It's probably best to keep to one type of cleanup at a time here. I'm
just making a mental note.

-Peff


Re: [PATCH v2 06/21] builtin/receive-pack: fix incorrect pointer arithmetic

On Sun, Mar 26, 2017 at 04:01:28PM +, brian m. carlson wrote:

> If we had already processed the last newline in a push certificate, we
> would end up subtracting NULL from the end-of-certificate pointer when
> computing the length of the line.  This would have resulted in an
> absurdly large length, and possibly a buffer overflow.  Instead,
> subtract the beginning-of-certificate pointer from the
> end-of-certificate pointer, which is what's expected.
> 
> Note that this situation should never occur, since not only do we
> require the certificate to be newline terminated, but the signature will
> only be read from the beginning of a line.  Nevertheless, it seems
> prudent to correct it.

I think you can trigger it with carefully-crafted input. Try this on the
client side:

diff --git a/send-pack.c b/send-pack.c
index 66e652f7e..dd18c9a33 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -311,8 +311,7 @@ static int generate_push_cert(struct strbuf *req_buf,
if (!update_seen)
goto free_return;
 
-   if (sign_buffer(, , signing_key))
-   die(_("failed to sign the push certificate"));
+   strbuf_rtrim();
 
packet_buf_write(req_buf, "push-cert%c%s", 0, cap_string);
for (cp = cert.buf; cp < cert.buf + cert.len; cp = np) {

We omit the signature entirely, which causes the parser to treat the
end of the string as the end-of-cert (we still find the end because the
push-cert-end is in its own pkt-line; you could also just issue a flush,
which has the same effect).

And then the rtrim means that the cert doesn't actually end in a
newline. Running t5534 with this patch causes receive-pack to segfault.
It's not an overflow on writing the buffer, though; the nonsense size is
passed into a FLEX_ALLOC_MEM(). In my test case, I ended up allocating a
1.4GB buffer (which just happened to be the mod-2^32 residue of my "eoc"
pointer), and the segfault comes from trying to read an unallocated
page.

So I don't think it's exploitable in any interesting way.

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index feafb076a4..116f3177a1 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1524,7 +1524,7 @@ static void queue_commands_from_cert(struct command 
> **tail,
>  
>   while (boc < eoc) {
>   const char *eol = memchr(boc, '\n', eoc - boc);
> - tail = queue_command(tail, boc, eol ? eol - boc : eoc - eol);
> + tail = queue_command(tail, boc, eol ? eol - boc : eoc - boc);
>   boc = eol ? eol + 1 : eoc;
>   }
>  }

The patch itself is obviously an improvement. It may be worth graduating
separately from the rest of the series.

-Peff


Re: Re: Re: Re: GSoC Project | Convert interactive rebase to C

On Mon, Mar 27, 2017 at 6:31 PM, Pickfire  wrote:
> Johannes Schindelin  wrote:
>
>> On Sat, 25 Mar 2017, Ivan Tham wrote:
>>
>> > Johannes Schindelin  wrote:
>> > > On Tue, 21 Mar 2017, Ivan Tham wrote:
>> > > > Stefan Beller  wrote:
>> > > > > On Mon, Mar 20, 2017 at 9:41 AM, Ivan Tham  
>> > > > > wrote:
>> > > > >
>> > > > > > I am interested to work on "Convert interactive rebase to C"
>> > > > >
>> > > > > +cc Johannes, who recently worked on rebase and the sequencer.
>> > >
>> > > Glad you are interested! Please note that large parts of the
>> > > interactive rebase are already in C now, but there is enough work left
>> > > in that corner.
>> >
>> > Glad to hear that, I would really like to see interactive rebase in C.
>>
>> Please note that a notable part already made it into C in v2.12.1. There
>> are still a few loose ends to tie, of course; it still makes for a great
>> head start on your project, methinks.
>
> Ah, that's great.
>
> And while I was working on the microproject (shell patterns in user diff),
> I can't produce the output of t/t4034-diff-words.sh manually with:

I don't think it's a good idea to discuss a microproject in the same
thread where a project is discussed.
I would suggest to move it in another thread where you describe in
more details what you want to do and why, what you expect and what
happened, and so on.

[...]

> That's does not just happens to cpp builtins, it happens to bibtex as well.
> Is it that I had missed some configuration since I have tested this on a
> few machines?

>From a very quick look the problem seems related to how
test_decode_color() is used or not.