Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-28 Thread Jeff King
On Tue, Aug 28, 2018 at 02:35:25PM -0700, Stefan Beller wrote:

> 3) (optional) instead of putting it all in modules/, use another
>directory gitmodules/ for example. this will make sure we can tell
>if a repository has been converted or is stuck with a setup of a
>current git.

I actually kind of like that idea, as it makes the interaction between
old and new names much simpler to reason about.

And since old code won't know about the new names anyway, there's in
theory no downside. In practice, of course, the encoding may often be a
noop, and lazy scripts would continue to work most of the time if you
didn't change out the prefix directory. I'm not sure if that is an
argument for the scheme (because it will suss out broken scripts more
consistently) or against it (because 99% of the time those old scripts
would just happen to work).

> > This is exactly the reason why I wanted to get some opinions on what the
> > best thing to do here would be.  I _think_ the best thing would probably
> > be to write a specific routine to do the conversion, and it wouldn't
> > even have to be all that complex.  Basically I'm just interested in
> > converting '/' characters so that things no longer behave like
> > nested directories.
> 
> Yeah, then let's just convert '/' with as little overhead as possible.

Do you care about case-folding issues (e.g., submodules "FOO" and "foo"
colliding)?

I'm OK if the answer is "no", but if you do want to deal with it, the
time is probably now.

-Peff


Re: Thank you for public-inbox!

2018-08-28 Thread Jeff King
On Mon, Aug 27, 2018 at 04:25:13PM +0200, Johannes Schindelin wrote:

> I would like to take five minutes to thank you for public-inbox. It is
> invaluable for me in the meantime. And I think I will never be able to
> thank you enough for it.

Let me echo that appreciation. I have always kept my own archive of the
list, but it's so valuable to have stable URLs for communicating with
other folks (and I find the general design of public-inbox _way_ more
useful than gmane ever was).

> P.S.: FWIW I added a mirror of public-inbox to
> https://git-for-windows.visualstudio.com/git.public-inbox, so that my
> automated tasks, as well as my playing around, does not stress your server
> too much.

I've thought about mirroring it to a public server as well, just for
redundancy. But without the same domain, I'm not sure it would be all
that useful as a community resource.

Eric, let me know if there's something there that would help (e.g.,
putting more servers in DNS round-robin).

-Peff


Re: Contributor Summit planning

2018-08-28 Thread Jeff King
On Tue, Aug 28, 2018 at 12:49:55AM +0200, Johannes Schindelin wrote:

> On Mon, 13 Aug 2018, Ævar Arnfjörð Bjarmason wrote:
> 
> >  * Re the second half of "Not everyone can travel or can afford to do
> >so" from Derrick, there's been travel sponsorships in past years.
> 
> Just to make sure that you understand: there are many more reasons than
> just travel costs involved. Just to name a few:
> 
> - visa issues.
> 
> - some people have to take care of family members and are not at liberty
>   to leave for even so much as half a day.
> 
> - conflicting schedules.
> 
> - mental health. As a friend of mine once put it: nobody is completely
>   "healthy", we all suffer, on a spectrum, from anxiety, trauma, and other
>   issues. The more inclusive we are, the more we can benefit from
>   everybody's talents and contributions.
> 
> Just saying. Money is sometimes not the solution.

Thanks for bringing this up. It's very easy to forget that not everybody
is in exactly the same situation.

It's actually one of the things I worry _most_ about having an in-person
meeting: that it ends up widening the gap between people whose employer
is sponsoring their work on Git, and people who have to carve their Git
time out of the rest of their busy lives.

Your virtual meeting idea is obviously meant to help with that. But
fundamentally I think allowing people to participate asynchronously is
the most important part, and we need to make sure there's some kind of
artifact summarizing any real-time meetings (for virtual stuff,
recordings or IRC logs are probably fine; for the in-person one, I was
very happy with the notes you took last year).

-Peff


Re: Contributor Summit planning

2018-08-28 Thread Jeff King
On Mon, Aug 27, 2018 at 03:34:16PM +0200, Johannes Schindelin wrote:

> >   - format
> > 
> > For those who haven't attended before, it's basically 25-ish Git
> > (and associated project) developers sitting in a room for a day
> > chatting about the project. Topics go on a whiteboard in the
> > morning, and then we discuss each for 30-60 minutes.
> > 
> > We could do multiple days (which might give more room for actually
> > working collaboratively instead of just discussing). We could do
> > something more formal (like actual talks). We could do something
> > less formal (like an all-day spaghetti buffet, where conversation
> > happens only between mouthfuls). The sky is the limit. Some of those
> > ideas may be better than others.
> 
> I found the unconference-style, one day meeting to be most productive.
> 
> As to more formal? I don't know... talks seem to be fun and all, but they
> require a lot of preparation. Something championed in our standups are
> "chalk talks", i.e. somebody presenting in a bit more detail what they are
> working on, in particular explaining the context (think: Stolee
> enlightening the audience about finer points of computational graph
> theory) *without* preparing for it specifically. That makes for fun
> presentations, if a bit more chaotic than a real "conference talk". This
> format obviously lends itself to Google Hangouts.
> 
> As to multiple days: Of course it would be nice to have a kind of a "hack
> day", but I wonder how productive this would be in the context of Git,
> where interests very so widely.

Thanks for your input. For what it's worth, that largely matches my
opinion, too. Most of the ideas I threw out there were just trying to
stimulate discussion (except for the spaghetti buffet, for which I am
a true believer).

> Rather than have a "hack day", I would actually prefer to work with other
> contributors in a way that we have not done before, but which I had the
> pleasure of "test ballooning" with Pratik: using Visual Studio Code Live
> Share. This allows multiple users to work on the same code base, in the
> same worktree, seeing what each other is doing. It requires a separate
> communication channel to talk; Pratik & I used IRC, but I think Google
> Hangout (or Skype or WhatsApp or ) would
> have worked a bit better. It's kind of pair programming, but with some of
> the limitations removed.

OK, I said in my earlier email that I would give any scheme a try, but I
really don't like pair programming. ;)

-Peff


Re: Contributor Summit planning

2018-08-28 Thread Jeff King
On Mon, Aug 27, 2018 at 03:22:39PM +0200, Johannes Schindelin wrote:

> Having said that, I believe that we core contributors can learn to have a
> fruitful online meeting. With 30+ participants, too.
> 
> Learning from my past life in academia (it is hard for me to imagine a
> less disciplined crowd than a bunch of white, male, old scientists), we
> would need a moderator, and some forum that allows to "give somebody the
> mic". That software/platform should exist somewhere.

Yes, I agree that software tools could help a lot with a crowd that
size. I have used various "virtual classroom" tools before, and I think
the core of the idea is there, but I was often unimpressed by the
execution (and expense). So if you know of a good tool, it might be
worth trying.

> I would love to have the best of both worlds. For example, it is an annual
> annoyance that we are discussion all kinds of things regarding Git, trying
> to steer the direction, to form collaborations on certain features, and
> the person at the helm is not even there.
> 
> Maybe *two* meetings per year, one attached to GitMerge, and one purely
> online, would help.

I'm somewhat skeptical of the utility of an online meeting. That said,
I'm willing give it a try (or any other scheme people want to come up
with, for that matter).

> Point in favor of the pure-online meeting: the informal standup on IRC
> every second Friday. I really try to attend it (it is a bit awkward
> because it is on a Friday evening in my timezone, right at the time when I
> want to unwind from the work week), as it does have a similar effect to
> in-person standups: surprising collaborations spring up, unexpected help,
> and a general sense of belonging.

Yes, I've been meaning to make it to another one (I popped in for one a
month or two ago, and it didn't seem like much of anything was
happening).

What time is it, again?

> Such an online summit as I suggested above would really only work if
> enough frequent contributors would attend. If enough people like you,
> Junio, and the standup regulars would say: yep, we're willing to plan and
> attend an online summit, where we try to have a timezone-friendly
> "unconference"-style meeting on one day (on which we would of course try
> to free ourselves from our regular work obligations).
> 
> I guess I am asking for a "raise your hands", with mine high up in the
> air.

I'll come if you want to organize it.

-Peff


Re: [PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-28 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> Add a test function helper, test_oid, that produces output that varies
> depending on the hash in use.

Cool!

>Add two additional helpers,
> test_oid_cache, which can be used to load data for test_oid from
> standard input, and test_oid_init, which can be used to load certain
> fixed values from lookup charts.  Check that these functions work in
> t, as the rest of the testsuite will soon come to depend on them.
>
> Implement two basic lookup charts, one for common invalid or synthesized
> object IDs, and one for various facts about the hash function in use.
> Provide versions for both SHA-1 and SHA-256.

What do test_oid_cache and test_oid_init do?  How can I use them?

Judging from t-basic.sh, the idea looks something like

Add a test function helper, test_oid, that ...

test_oid allows looking up arbitrary information about an object format:
the length of object ids, values of well known object ids, etc.  Before
calling it, a test script must invoke test_oid_cache (either directly
or indirectly through test_oid_init) to load the lookup charts.

See t for an example, which also serves as a sanity-check that
these functions work in preparation for using them in the rest of the
test suite.

There are two basic lookup charts for now: oid-info/oid, with common
invalid or synthesized object IDs; and oid-info/hash-info, with facts
such as object id length about the formats in use.  The charts contain
information about both SHA-1 and SHA-256.

So now you can update existing tests to be format-independent by (1)
adding an invocation of test_oid_init to test setup and (2) replacing
format dependencies with $(test_oid foo).

Since values are stored as shell variables, names used for lookup can
only consist of shell identifier characters.  If this is a problem in
the future, we can hash the names before use.

Improved-by: Eric Sunshine 

Do these always use sha1 for now?  Ah, t answers but it might be
worth mentioning in the commit message, too:

test_set_hash allows setting which object format test_oid should look
up information for, and test_detect_hash returns to the default format.

[...]
> --- /dev/null
> +++ b/t/oid-info/hash-info
> @@ -0,0 +1,8 @@
> +rawsz sha1:20
> +rawsz sha256:32

Can there be a README in this directory describing the files and format?

[...]
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
>   EOF
>  "
>  
> +test_oid_init

Can this be wrapped in test_expect_success?  That way, if it fails or
prints an error message then the usual test machinery would handle it.

> +
> +test_expect_success 'test_oid provides sane info by default' '
> + test_oid zero >actual &&
> + grep "^00*$" actual &&

nit: can save the reader some confusion by escaping the $.

> + rawsz="$(test_oid rawsz)" &&
> + hexsz="$(test_oid hexsz)" &&

optional: no need for these quotation marks --- a command substitution
assigned to a shell variable is treated as if it were quoted.

> + test "$hexsz" -eq $(wc -c  + test $(( $rawsz * 2)) -eq "$hexsz"

Makes sense.

[...]
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -1155,3 +1155,47 @@ depacketize () {
[...]
> +test_oid_cache () {
> + test -n "$test_hash_algo" || test_detect_hash

Should this use an uninterrupted &&-chain?

> + while read _tag _rest

This appears to be the first use of this naming convention.  I wonder
if we can use "local" instead.

> + do
> + case $_tag in
> + \#*)
> + continue;;
> + ?*)
> + # non-empty
> + ;;
> + *)
> + # blank line
> + continue;;
> +

unnecessary blank line here

> + esac &&
> +
> + _k="${_rest%:*}" &&
> + _v="${_rest#*:}" &&
> + { echo "$_k" | egrep '^[a-z0-9]+$' >/dev/null ||
> + error 'bug in the test script: bad hash algorithm'; } &&
> + eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1

This is dense, so I'm having trouble taking it in at a glance.

I think the idea is

key=${rest%%:*} &&
val=${rest#*:} &&

if ! expr "$key" : '[a-z0-9]*$' >/dev/null
then
error ...
fi &&
eval "test_oid_${key}_${tag}=\${val}"

> + done
> +}
> +
> +test_oid () {
> + eval "
> + test -n \"\${test_oid_${test_hash_algo}_$1+set}\" &&
> + printf '%s' \"\${test_oid_${test_hash_algo}_$1}\"
> + "

I'm also having trouble taking this one in.  Maybe splitting into two
evals would work?


[RFC PATCH 01/12] sha1-file: rename algorithm to "sha1"

2018-08-28 Thread brian m. carlson
The transition plan anticipates us using a syntax such as "^{sha1}" for
disambiguation.  Since this is a syntax some people will be typing a
lot, it makes sense to provide a short, easy-to-type syntax.  Omitting
the dash doesn't create any ambiguity, but it does make it shorter and
easier to type, especially for touch typists.  In addition, the
transition plan already uses "sha1" in this context.

Rename the name of SHA-1 implementation to "sha1".

Note that this change creates no backwards compatibility concerns, since
we haven't yet used this field in any serialized data formats.

Signed-off-by: brian m. carlson 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index 97b7423848..5223e3d1ce 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -97,7 +97,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
NULL,
},
{
-   "sha-1",
+   "sha1",
/* "sha1", big-endian */
0x73686131,
GIT_SHA1_RAWSZ,


[RFC PATCH 02/12] sha1-file: provide functions to look up hash algorithms

2018-08-28 Thread brian m. carlson
There are several ways we might refer to a hash algorithm: by name, such
as in the config file; by format ID, such as in a pack; or internally,
by a pointer to the hash_algos array.  Provide functions to look up hash
algorithms based on these various forms and return the internal constant
used for them.  If conversion to another form is necessary, this
internal constant can be used to look up the proper data in the
hash_algos array.

Signed-off-by: brian m. carlson 
---
 hash.h  | 13 +
 sha1-file.c | 21 +
 2 files changed, 34 insertions(+)

diff --git a/hash.h b/hash.h
index 7c8238bc2e..90f4344619 100644
--- a/hash.h
+++ b/hash.h
@@ -98,4 +98,17 @@ struct git_hash_algo {
 };
 extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
 
+/*
+ * Return a GIT_HASH_* constant based on the name.  Returns GIT_HASH_UNKNOWN if
+ * the name doesn't match a known algorithm.
+ */
+int hash_algo_by_name(const char *name);
+/* Identical, except based on the format ID. */
+int hash_algo_by_id(uint32_t format_id);
+/* Identical, except for a pointer to struct git_hash_algo. */
+inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+{
+   return p - hash_algos;
+}
+
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 5223e3d1ce..f6976b179f 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -122,6 +122,27 @@ const char *empty_blob_oid_hex(void)
return oid_to_hex_r(buf, the_hash_algo->empty_blob);
 }
 
+int hash_algo_by_name(const char *name)
+{
+   int i;
+   if (!name)
+   return GIT_HASH_UNKNOWN;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (!strcmp(name, hash_algos[i].name))
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+int hash_algo_by_id(uint32_t format_id)
+{
+   int i;
+   for (i = 1; i < GIT_HASH_NALGOS; i++)
+   if (format_id == hash_algos[i].format_id)
+   return i;
+   return GIT_HASH_UNKNOWN;
+}
+
+
 /*
  * This is meant to hold a *small* number of objects that you would
  * want read_sha1_file() to be able to return, but yet you do not want


[RFC PATCH 09/12] Add a base implementation of SHA-256 support

2018-08-28 Thread brian m. carlson
SHA-1 is weak and we need to transition to a new hash function.  For
some time, we have referred to this new function as NewHash.

The selection criteria for NewHash specify that it should (a) be 256
bits in length, (b) have high quality implementations available, (c)
should match Git's needs in terms of security, and (d) ideally, be fast
to compute.

SHA-256 has a variety of high quality implementations across various
libraries.  It is implemented by every cryptographic library we support
and is available on every platform and in almost every programming
language.  It is often highly optimized, since it is commonly used in
TLS and elsewhere.  Additionally, there are various command line
utilities that implement it, which is useful for educational and testing
purposes.

SHA-256 is presently considered secure and has received a reasonable
amount of cryptanalysis in the literature.  It is, admittedly, not
resistant to length extension attacks, but Git object storage is immune
to those due to the length field at the beginning.

SHA-256 is somewhat slower to compute than SHA-1 in software.  However,
since our default SHA-1 implementation is collision-detecting, a
reasonable cryptographic library implementation of SHA-256 will actually
be faster than SHA-256.  In addition, modern ARM and AMD processors (and
some Intel processors) contain instructions for implementing SHA-256 in
hardware, making it the fastest possible option.

There are other reasons to select SHA-256.  With signed commits and
tags, it's possible to use SHA-256 for signatures and therefore have to
rely on only one hash algorithm for security.

Add a basic implementation of SHA-256 based off libtomcrypt, which is in
the public domain.  Optimize it and tidy it somewhat.

Wire up SHA-256 in the list of hash algorithms, and add a test that the
algorithm works correctly.

Note that with this patch, it is still not possible to switch to using
SHA-256 in Git.  Additional patches are needed to prepare the code to
handle a larger hash algorithm and further test fixes are needed.

Signed-off-by: brian m. carlson 
---
 Makefile   |   4 +
 cache.h|  12 ++-
 hash.h |  19 -
 sha1-file.c|  45 +++
 sha256/block/sha256.c  | 180 +
 sha256/block/sha256.h  |  26 ++
 t/helper/test-sha256.c |   7 ++
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 t/t0014-hash.sh|  25 ++
 10 files changed, 316 insertions(+), 4 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 t/helper/test-sha256.c

diff --git a/Makefile b/Makefile
index e047dea988..624d852e79 100644
--- a/Makefile
+++ b/Makefile
@@ -733,6 +733,7 @@ TEST_BUILTINS_OBJS += test-run-command.o
 TEST_BUILTINS_OBJS += test-scrap-cache-tree.o
 TEST_BUILTINS_OBJS += test-sha1-array.o
 TEST_BUILTINS_OBJS += test-sha1.o
+TEST_BUILTINS_OBJS += test-sha256.o
 TEST_BUILTINS_OBJS += test-sigchain.o
 TEST_BUILTINS_OBJS += test-strcmp-offset.o
 TEST_BUILTINS_OBJS += test-string-list.o
@@ -1623,6 +1624,9 @@ endif
 endif
 endif
 
+LIB_OBJS += sha256/block/sha256.o
+BASIC_CFLAGS += -DSHA256_BLK
+
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
BASIC_CFLAGS += -DSHA1_MAX_BLOCK_SIZE="$(SHA1_MAX_BLOCK_SIZE)"
diff --git a/cache.h b/cache.h
index c1b5a7a337..5a35497b34 100644
--- a/cache.h
+++ b/cache.h
@@ -48,11 +48,17 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The block size of SHA-1. */
 #define GIT_SHA1_BLKSZ 64
 
+/* The length in bytes and in hex digits of an object name (SHA-256 value). */
+#define GIT_SHA256_RAWSZ 32
+#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
+/* The block size of SHA-256. */
+#define GIT_SHA256_BLKSZ 64
+
 /* The length in byte and in hex digits of the largest possible hash value. */
-#define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
-#define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
+#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
 /* The largest possible block size for any supported hash. */
-#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
+#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 46dff69eb3..88d18896d7 100644
--- a/hash.h
+++ b/hash.h
@@ -15,6 +15,8 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#include "sha256/block/sha256.h"
+
 #ifndef platform_SHA_CTX
 /*
  * platform's underlying implementation of SHA-1; could be OpenSSL,
@@ -34,6 +36,18 @@
 #define git_SHA1_Updateplatform_SHA1_Update
 #define git_SHA1_Final platform_SHA1_Final
 
+#ifndef platform_SHA256_CTX
+#define platform_SHA256_CTXSHA256_CTX
+#define platform_SHA256_Init   SHA256_Init
+#define platform_SHA256_Update SHA256_Update
+#define platform_SHA256_Final  SHA256_Final
+#endif
+
+#define git_SHA256_CTX platform_SHA256_CTX
+#define git_SHA256_Init 

[RFC PATCH 10/12] sha256: add an SHA-256 implementation using libgcrypt

2018-08-28 Thread brian m. carlson
Generally, one gets better performance out of cryptographic routines
written in assembly than C, and this is also true for SHA-256.  In
addition, most Linux distributions cannot distribute Git linked against
OpenSSL for licensing reasons.

Most systems with GnuPG will also have libgcrypt, since it is a
dependency of GnuPG.  libgcrypt is also faster than the SHA1DC
implementation for messages of a few KiB and larger. It is licensed
under the LGPL 2.1, which is compatible with the GPL.

Add an implementation of SHA-256 that uses libgcrypt.

Signed-off-by: brian m. carlson 
---
 Makefile| 13 +++--
 hash.h  |  4 
 sha256/gcrypt.h | 30 ++
 3 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 sha256/gcrypt.h

diff --git a/Makefile b/Makefile
index 624d852e79..86867af083 100644
--- a/Makefile
+++ b/Makefile
@@ -179,6 +179,10 @@ all::
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
 #
+# Define BLK_SHA256 to use the built-in SHA-256 routines.
+#
+# Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1624,8 +1628,13 @@ endif
 endif
 endif
 
-LIB_OBJS += sha256/block/sha256.o
-BASIC_CFLAGS += -DSHA256_BLK
+ifdef GCRYPT_SHA256
+   BASIC_CFLAGS += -DSHA256_GCRYPT
+   EXTLIBS += -lgcrypt
+else
+   LIB_OBJS += sha256/block/sha256.o
+   BASIC_CFLAGS += -DSHA256_BLK
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 88d18896d7..9df562f2f6 100644
--- a/hash.h
+++ b/hash.h
@@ -15,7 +15,11 @@
 #include "block-sha1/sha1.h"
 #endif
 
+#if defined(SHA256_GCRYPT)
+#include "sha256/gcrypt.h"
+#else
 #include "sha256/block/sha256.h"
+#endif
 
 #ifndef platform_SHA_CTX
 /*
diff --git a/sha256/gcrypt.h b/sha256/gcrypt.h
new file mode 100644
index 00..09bd8bb200
--- /dev/null
+++ b/sha256/gcrypt.h
@@ -0,0 +1,30 @@
+#ifndef SHA256_GCRYPT_H
+#define SHA256_GCRYPT_H
+
+#include 
+
+#define SHA256_DIGEST_SIZE 32
+
+typedef gcry_md_hd_t gcrypt_SHA256_CTX;
+
+inline void gcrypt_SHA256_Init(gcrypt_SHA256_CTX *ctx)
+{
+   gcry_md_open(ctx, GCRY_MD_SHA256, 0);
+}
+
+inline void gcrypt_SHA256_Update(gcrypt_SHA256_CTX *ctx, const void *data, 
size_t len)
+{
+   gcry_md_write(*ctx, data, len);
+}
+
+inline void gcrypt_SHA256_Final(unsigned char *digest, gcrypt_SHA256_CTX *ctx)
+{
+   memcpy(digest, gcry_md_read(*ctx, GCRY_MD_SHA256), SHA256_DIGEST_SIZE);
+}
+
+#define platform_SHA256_CTX gcrypt_SHA256_CTX
+#define platform_SHA256_Init gcrypt_SHA256_Init
+#define platform_SHA256_Update gcrypt_SHA256_Update
+#define platform_SHA256_Final gcrypt_SHA256_Final
+
+#endif


[RFC PATCH 12/12] commit-graph: specify OID version for SHA-256

2018-08-28 Thread brian m. carlson
Since the commit-graph code wants to serialize the hash algorithm into
the data store, specify a version number for each supported algorithm.
Note that we don't use the values of the constants themselves, as they
are internal and could change in the future.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 29356d84a2..5b86acdb43 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,7 +45,14 @@ char *get_commit_graph_filename(const char *obj_dir)
 
 static uint8_t oid_version(void)
 {
-   return 1;
+   switch (hash_algo_by_ptr(the_hash_algo)) {
+   case GIT_HASH_SHA1:
+   return 1;
+   case GIT_HASH_SHA256:
+   return 2;
+   default:
+   BUG("unknown hash algorithm");
+   }
 }
 
 static struct commit_graph *alloc_commit_graph(void)


[RFC PATCH 11/12] hash: add an SHA-256 implementation using OpenSSL

2018-08-28 Thread brian m. carlson
We already have OpenSSL routines available for SHA-1, so add routines
for SHA-256 as well.

Signed-off-by: brian m. carlson 
---
 Makefile | 7 +++
 hash.h   | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/Makefile b/Makefile
index 86867af083..8b7df4dfc5 100644
--- a/Makefile
+++ b/Makefile
@@ -183,6 +183,8 @@ all::
 #
 # Define GCRYPT_SHA256 to use the SHA-256 routines in libgcrypt.
 #
+# Define OPENSSL_SHA256 to use the SHA-256 routines in OpenSSL.
+#
 # Define NEEDS_CRYPTO_WITH_SSL if you need -lcrypto when using -lssl (Darwin).
 #
 # Define NEEDS_SSL_WITH_CRYPTO if you need -lssl when using -lcrypto (Darwin).
@@ -1628,6 +1630,10 @@ endif
 endif
 endif
 
+ifdef OPENSSL_SHA256
+   EXTLIBS += $(LIB_4_CRYPTO)
+   BASIC_CFLAGS += -DSHA256_OPENSSL
+else
 ifdef GCRYPT_SHA256
BASIC_CFLAGS += -DSHA256_GCRYPT
EXTLIBS += -lgcrypt
@@ -1635,6 +1641,7 @@ else
LIB_OBJS += sha256/block/sha256.o
BASIC_CFLAGS += -DSHA256_BLK
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/hash.h b/hash.h
index 9df562f2f6..9df06d56b4 100644
--- a/hash.h
+++ b/hash.h
@@ -17,6 +17,8 @@
 
 #if defined(SHA256_GCRYPT)
 #include "sha256/gcrypt.h"
+#elif defined(SHA256_OPENSSL)
+#include 
 #else
 #include "sha256/block/sha256.h"
 #endif


[RFC PATCH 07/12] t/helper: add a test helper to compute hash speed

2018-08-28 Thread brian m. carlson
Add a utility (which is less for the testsuite and more for developers)
that can compute hash speeds for whatever hash algorithms are
implemented.  This allows developers to test their personal systems to
determine the performance characteristics of various algorithms.

Signed-off-by: brian m. carlson 
---
 Makefile   |  1 +
 t/helper/test-hash-speed.c | 61 ++
 t/helper/test-tool.c   |  1 +
 t/helper/test-tool.h   |  1 +
 4 files changed, 64 insertions(+)
 create mode 100644 t/helper/test-hash-speed.c

diff --git a/Makefile b/Makefile
index eb1a080cc9..e047dea988 100644
--- a/Makefile
+++ b/Makefile
@@ -714,6 +714,7 @@ TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
+TEST_BUILTINS_OBJS += test-hash-speed.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
 TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
diff --git a/t/helper/test-hash-speed.c b/t/helper/test-hash-speed.c
new file mode 100644
index 00..432233c7f0
--- /dev/null
+++ b/t/helper/test-hash-speed.c
@@ -0,0 +1,61 @@
+#include "test-tool.h"
+#include "cache.h"
+
+#define NUM_SECONDS 3
+
+static inline void compute_hash(const struct git_hash_algo *algo, git_hash_ctx 
*ctx, uint8_t *final, const void *p, size_t len)
+{
+   algo->init_fn(ctx);
+   algo->update_fn(ctx, p, len);
+   algo->final_fn(final, ctx);
+}
+
+int cmd__hash_speed(int ac, const char **av)
+{
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_RAWSZ];
+   clock_t initial, start, end;
+   unsigned bufsizes[] = { 64, 256, 1024, 8192, 16384 };
+   int i;
+   void *p;
+   const struct git_hash_algo *algo = NULL;
+
+   if (ac == 2) {
+   for (i = 1; i < GIT_HASH_NALGOS; i++) {
+   if (!strcmp(av[1], hash_algos[i].name)) {
+   algo = _algos[i];
+   break;
+   }
+   }
+   }
+   if (!algo)
+   die("usage: test-tool hash-speed algo_name");
+
+   /* Use this as an offset to make overflow less likely. */
+   initial = clock();
+
+   printf("algo: %s\n", algo->name);
+
+   for (i = 0; i < ARRAY_SIZE(bufsizes); i++) {
+   unsigned long j, kb;
+   double kb_per_sec;
+   p = xcalloc(1, bufsizes[i]);
+   start = end = clock() - initial;
+   for (j = 0; ((end - start) / CLOCKS_PER_SEC) < NUM_SECONDS; 
j++) {
+   compute_hash(algo, , hash, p, bufsizes[i]);
+
+   /*
+* Only check elapsed time every 128 iterations to avoid
+* dominating the runtime with system calls.
+*/
+   if (!(j & 127))
+   end = clock() - initial;
+   }
+   kb = j * bufsizes[i];
+   kb_per_sec = kb / (1024 * ((double)end - start) / 
CLOCKS_PER_SEC);
+   printf("size %u: %lu iters; %lu KiB; %0.2f KiB/s\n", 
bufsizes[i], j, kb, kb_per_sec);
+   free(p);
+   }
+
+   exit(0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 0edafcfd65..c0d3eecc36 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -18,6 +18,7 @@ static struct test_cmd cmds[] = {
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
+   { "hash-speed", cmd__hash_speed },
{ "index-version", cmd__index_version },
{ "json-writer", cmd__json_writer },
{ "lazy-init-name-hash", cmd__lazy_init_name_hash },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 9026a8f608..abe3a253d5 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -14,6 +14,7 @@ int cmd__dump_split_index(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
+int cmd__hash_speed(int argc, const char **argv);
 int cmd__index_version(int argc, const char **argv);
 int cmd__json_writer(int argc, const char **argv);
 int cmd__lazy_init_name_hash(int argc, const char **argv);


[RFC PATCH 08/12] commit-graph: convert to using the_hash_algo

2018-08-28 Thread brian m. carlson
Instead of using hard-coded constants for object sizes, use
the_hash_algo to look them up.  In addition, use a function call to look
up the object ID version and produce the correct value.

Signed-off-by: brian m. carlson 
---
 commit-graph.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 8a1bec7b8a..29356d84a2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,11 +25,6 @@
 #define GRAPH_VERSION_1 0x1
 #define GRAPH_VERSION GRAPH_VERSION_1
 
-#define GRAPH_OID_VERSION_SHA1 1
-#define GRAPH_OID_LEN_SHA1 GIT_SHA1_RAWSZ
-#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
-#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
-
 #define GRAPH_OCTOPUS_EDGES_NEEDED 0x8000
 #define GRAPH_PARENT_MISSING 0x7fff
 #define GRAPH_EDGE_LAST_MASK 0x7fff
@@ -41,13 +36,18 @@
 #define GRAPH_FANOUT_SIZE (4 * 256)
 #define GRAPH_CHUNKLOOKUP_WIDTH 12
 #define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
-   + GRAPH_FANOUT_SIZE + GRAPH_OID_LEN)
+   + GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
 
 char *get_commit_graph_filename(const char *obj_dir)
 {
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static uint8_t oid_version(void)
+{
+   return 1;
+}
+
 static struct commit_graph *alloc_commit_graph(void)
 {
struct commit_graph *g = xcalloc(1, sizeof(*g));
@@ -100,15 +100,15 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
}
 
hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != GRAPH_OID_VERSION) {
+   if (hash_version != oid_version()) {
error(_("hash version %X does not match version %X"),
- hash_version, GRAPH_OID_VERSION);
+ hash_version, oid_version());
goto cleanup_fail;
}
 
graph = alloc_commit_graph();
 
-   graph->hash_len = GRAPH_OID_LEN;
+   graph->hash_len = the_hash_algo->rawsz;
graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
@@ -124,7 +124,7 @@ struct commit_graph *load_commit_graph_one(const char 
*graph_file)
 
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
-   if (chunk_offset > graph_size - GIT_MAX_RAWSZ) {
+   if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
goto cleanup_fail;
@@ -692,6 +692,7 @@ void write_commit_graph(const char *obj_dir,
int num_chunks;
int num_extra_edges;
struct commit_list *parent;
+   const unsigned hashsz = the_hash_algo->rawsz;
 
oids.nr = 0;
oids.alloc = approximate_object_count() / 4;
@@ -812,7 +813,7 @@ void write_commit_graph(const char *obj_dir,
hashwrite_be32(f, GRAPH_SIGNATURE);
 
hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, GRAPH_OID_VERSION);
+   hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, 0); /* unused padding byte */
 
@@ -827,8 +828,8 @@ void write_commit_graph(const char *obj_dir,
 
chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
-   chunk_offsets[2] = chunk_offsets[1] + GRAPH_OID_LEN * commits.nr;
-   chunk_offsets[3] = chunk_offsets[2] + (GRAPH_OID_LEN + 16) * commits.nr;
+   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
+   chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
chunk_offsets[4] = chunk_offsets[3] + 4 * num_extra_edges;
 
for (i = 0; i <= num_chunks; i++) {
@@ -841,8 +842,8 @@ void write_commit_graph(const char *obj_dir,
}
 
write_graph_chunk_fanout(f, commits.list, commits.nr);
-   write_graph_chunk_oids(f, GRAPH_OID_LEN, commits.list, commits.nr);
-   write_graph_chunk_data(f, GRAPH_OID_LEN, commits.list, commits.nr);
+   write_graph_chunk_oids(f, hashsz, commits.list, commits.nr);
+   write_graph_chunk_data(f, hashsz, commits.list, commits.nr);
write_graph_chunk_large_edges(f, commits.list, commits.nr);
 
close_commit_graph();


[RFC PATCH 05/12] t: make the sha1 test-tool helper generic

2018-08-28 Thread brian m. carlson
Since we're going to have multiple hash algorithms to test, it makes
sense to share as much of the test code as possible.  Convert the sha1
helper for the test-tool to be generic and move it out into its own
module.  This will allow us to share most of this code with our NewHash
implementation.

Signed-off-by: brian m. carlson 
---
 Makefile  |  1 +
 t/helper/{test-sha1.c => test-hash.c} | 19 +-
 t/helper/test-sha1.c  | 52 +--
 t/helper/test-tool.h  |  2 ++
 4 files changed, 14 insertions(+), 60 deletions(-)
 copy t/helper/{test-sha1.c => test-hash.c} (66%)

diff --git a/Makefile b/Makefile
index 5a969f5830..eb1a080cc9 100644
--- a/Makefile
+++ b/Makefile
@@ -712,6 +712,7 @@ TEST_BUILTINS_OBJS += test-dump-cache-tree.o
 TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
 TEST_BUILTINS_OBJS += test-genrandom.o
+TEST_BUILTINS_OBJS += test-hash.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
 TEST_BUILTINS_OBJS += test-json-writer.o
diff --git a/t/helper/test-sha1.c b/t/helper/test-hash.c
similarity index 66%
copy from t/helper/test-sha1.c
copy to t/helper/test-hash.c
index 1ba0675c75..9992de2212 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-hash.c
@@ -1,13 +1,14 @@
 #include "test-tool.h"
 #include "cache.h"
 
-int cmd__sha1(int ac, const char **av)
+int cmd_hash_impl(int ac, const char **av, int algo)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
+   git_hash_ctx ctx;
+   unsigned char hash[GIT_MAX_HEXSZ];
unsigned bufsz = 8192;
int binary = 0;
char *buffer;
+   const struct git_hash_algo *algop = _algos[algo];
 
if (ac == 2) {
if (!strcmp(av[1], "-b"))
@@ -26,7 +27,7 @@ int cmd__sha1(int ac, const char **av)
die("OOPS");
}
 
-   git_SHA1_Init();
+   algop->init_fn();
 
while (1) {
ssize_t sz, this_sz;
@@ -38,20 +39,20 @@ int cmd__sha1(int ac, const char **av)
if (sz == 0)
break;
if (sz < 0)
-   die_errno("test-sha1");
+   die_errno("test-hash");
this_sz += sz;
cp += sz;
room -= sz;
}
if (this_sz == 0)
break;
-   git_SHA1_Update(, buffer, this_sz);
+   algop->update_fn(, buffer, this_sz);
}
-   git_SHA1_Final(sha1, );
+   algop->final_fn(hash, );
 
if (binary)
-   fwrite(sha1, 1, 20, stdout);
+   fwrite(hash, 1, algop->rawsz, stdout);
else
-   puts(sha1_to_hex(sha1));
+   puts(hash_to_hex_algo(hash, algo));
exit(0);
 }
diff --git a/t/helper/test-sha1.c b/t/helper/test-sha1.c
index 1ba0675c75..d860c387c3 100644
--- a/t/helper/test-sha1.c
+++ b/t/helper/test-sha1.c
@@ -3,55 +3,5 @@
 
 int cmd__sha1(int ac, const char **av)
 {
-   git_SHA_CTX ctx;
-   unsigned char sha1[20];
-   unsigned bufsz = 8192;
-   int binary = 0;
-   char *buffer;
-
-   if (ac == 2) {
-   if (!strcmp(av[1], "-b"))
-   binary = 1;
-   else
-   bufsz = strtoul(av[1], NULL, 10) * 1024 * 1024;
-   }
-
-   if (!bufsz)
-   bufsz = 8192;
-
-   while ((buffer = malloc(bufsz)) == NULL) {
-   fprintf(stderr, "bufsz %u is too big, halving...\n", bufsz);
-   bufsz /= 2;
-   if (bufsz < 1024)
-   die("OOPS");
-   }
-
-   git_SHA1_Init();
-
-   while (1) {
-   ssize_t sz, this_sz;
-   char *cp = buffer;
-   unsigned room = bufsz;
-   this_sz = 0;
-   while (room) {
-   sz = xread(0, cp, room);
-   if (sz == 0)
-   break;
-   if (sz < 0)
-   die_errno("test-sha1");
-   this_sz += sz;
-   cp += sz;
-   room -= sz;
-   }
-   if (this_sz == 0)
-   break;
-   git_SHA1_Update(, buffer, this_sz);
-   }
-   git_SHA1_Final(sha1, );
-
-   if (binary)
-   fwrite(sha1, 1, 20, stdout);
-   else
-   puts(sha1_to_hex(sha1));
-   exit(0);
+   return cmd_hash_impl(ac, av, GIT_HASH_SHA1);
 }
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index e954e8c522..9026a8f608 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,4 +41,6 @@ int cmd__urlmatch_normalization(int argc, const char **argv);
 int cmd__wildmatch(int argc, const char **argv);
 int 

[RFC PATCH 06/12] sha1-file: add a constant for hash block size

2018-08-28 Thread brian m. carlson
There is one place we need the hash algorithm block size: the HMAC code
for push certs.  Expose this constant in struct git_hash_algo and expose
values for SHA-1 and for the largest value of any hash.

Signed-off-by: brian m. carlson 
---
 cache.h | 4 
 hash.h  | 3 +++
 sha1-file.c | 2 ++
 3 files changed, 9 insertions(+)

diff --git a/cache.h b/cache.h
index 3cb953445c..c1b5a7a337 100644
--- a/cache.h
+++ b/cache.h
@@ -45,10 +45,14 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
 /* The length in bytes and in hex digits of an object name (SHA-1 value). */
 #define GIT_SHA1_RAWSZ 20
 #define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
+/* The block size of SHA-1. */
+#define GIT_SHA1_BLKSZ 64
 
 /* The length in byte and in hex digits of the largest possible hash value. */
 #define GIT_MAX_RAWSZ GIT_SHA1_RAWSZ
 #define GIT_MAX_HEXSZ GIT_SHA1_HEXSZ
+/* The largest possible block size for any supported hash. */
+#define GIT_MAX_BLKSZ GIT_SHA1_BLKSZ
 
 struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
diff --git a/hash.h b/hash.h
index 90f4344619..46dff69eb3 100644
--- a/hash.h
+++ b/hash.h
@@ -81,6 +81,9 @@ struct git_hash_algo {
/* The length of the hash in hex characters. */
size_t hexsz;
 
+   /* The block size of the hash. */
+   size_t blksz;
+
/* The hash initialization function. */
git_hash_init_fn init_fn;
 
diff --git a/sha1-file.c b/sha1-file.c
index f6976b179f..a1ad1b8268 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -90,6 +90,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x,
0,
0,
+   0,
git_hash_unknown_init,
git_hash_unknown_update,
git_hash_unknown_final,
@@ -102,6 +103,7 @@ const struct git_hash_algo hash_algos[GIT_HASH_NALGOS] = {
0x73686131,
GIT_SHA1_RAWSZ,
GIT_SHA1_HEXSZ,
+   GIT_SHA1_BLKSZ,
git_hash_sha1_init,
git_hash_sha1_update,
git_hash_sha1_final,


[RFC PATCH 03/12] hex: introduce functions to print arbitrary hashes

2018-08-28 Thread brian m. carlson
Currently, we have functions that turn an arbitrary SHA-1 value or an
object ID into hex format, either using a static buffer or with a
user-provided buffer.  Add variants of these functions that can handle
an arbitrary hash algorithm, specified by constant.  Update the
documentation as well.

While we're at it, remove the "extern" declaration from this family of
functions, since it's not needed and our style now recommends against
it.

We use the variant taking the algorithm structure pointer as the
internal variant, since in the future we'll want to replace sha1_to_hex
with a hash_to_hex that handles the_hash_algo, and taking an algorithm
pointer is the easiest way to handle all of the variants in use.

Signed-off-by: brian m. carlson 
---
 cache.h | 14 --
 hex.c   | 32 
 2 files changed, 32 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 4d014541ab..3cb953445c 100644
--- a/cache.h
+++ b/cache.h
@@ -1351,9 +1351,9 @@ extern int get_oid_hex(const char *hex, struct object_id 
*sha1);
 extern int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
 
 /*
- * Convert a binary sha1 to its hex equivalent. The `_r` variant is reentrant,
+ * Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
  * and writes the NUL-terminated output to the buffer `out`, which must be at
- * least `GIT_SHA1_HEXSZ + 1` bytes, and returns a pointer to out for
+ * least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
  * convenience.
  *
  * The non-`_r` variant returns a static buffer, but uses a ring of 4
@@ -1361,10 +1361,12 @@ extern int hex_to_bytes(unsigned char *binary, const 
char *hex, size_t len);
  *
  *   printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
  */
-extern char *sha1_to_hex_r(char *out, const unsigned char *sha1);
-extern char *oid_to_hex_r(char *out, const struct object_id *oid);
-extern char *sha1_to_hex(const unsigned char *sha1);   /* static buffer 
result! */
-extern char *oid_to_hex(const struct object_id *oid);  /* same static buffer 
as sha1_to_hex */
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo);
+char *sha1_to_hex_r(char *out, const unsigned char *sha1);
+char *oid_to_hex_r(char *out, const struct object_id *oid);
+char *hash_to_hex_algo(const unsigned char *hash, int algo);   /* static 
buffer result! */
+char *sha1_to_hex(const unsigned char *sha1);  /* same static 
buffer */
+char *oid_to_hex(const struct object_id *oid); /* same static 
buffer */
 
 /*
  * Parse a 40-character hexadecimal object ID starting from hex, updating the
diff --git a/hex.c b/hex.c
index 10af1a29e8..870032a868 100644
--- a/hex.c
+++ b/hex.c
@@ -73,14 +73,15 @@ int parse_oid_hex(const char *hex, struct object_id *oid, 
const char **end)
return ret;
 }
 
-char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+static inline char *hash_to_hex_algop_r(char *buffer, const unsigned char 
*hash,
+   const struct git_hash_algo *algop)
 {
static const char hex[] = "0123456789abcdef";
char *buf = buffer;
int i;
 
-   for (i = 0; i < the_hash_algo->rawsz; i++) {
-   unsigned int val = *sha1++;
+   for (i = 0; i < algop->rawsz; i++) {
+   unsigned int val = *hash++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
@@ -89,20 +90,35 @@ char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
return buffer;
 }
 
-char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+char *hash_to_hex_algo_r(char *buffer, const unsigned char *hash, int algo)
 {
-   return sha1_to_hex_r(buffer, oid->hash);
+   return hash_to_hex_algop_r(buffer, hash, _algos[algo]);
 }
 
-char *sha1_to_hex(const unsigned char *sha1)
+char *sha1_to_hex_r(char *buffer, const unsigned char *sha1)
+{
+   return hash_to_hex_algo_r(buffer, sha1, GIT_HASH_SHA1);
+}
+
+char *oid_to_hex_r(char *buffer, const struct object_id *oid)
+{
+   return hash_to_hex_algop_r(buffer, oid->hash, the_hash_algo);
+}
+
+char *hash_to_hex_algo(const unsigned char *hash, int algo)
 {
static int bufno;
static char hexbuffer[4][GIT_MAX_HEXSZ + 1];
bufno = (bufno + 1) % ARRAY_SIZE(hexbuffer);
-   return sha1_to_hex_r(hexbuffer[bufno], sha1);
+   return hash_to_hex_algo_r(hexbuffer[bufno], hash, algo);
+}
+
+char *sha1_to_hex(const unsigned char *sha1)
+{
+   return hash_to_hex_algo(sha1, GIT_HASH_SHA1);
 }
 
 char *oid_to_hex(const struct object_id *oid)
 {
-   return sha1_to_hex(oid->hash);
+   return hash_to_hex_algo(oid->hash, hash_algo_by_ptr(the_hash_algo));
 }


[RFC PATCH 04/12] t: add basic tests for our SHA-1 implementation

2018-08-28 Thread brian m. carlson
We have in the past had some unfortunate endianness issues with some
SHA-1 implementations we ship, especially on big-endian machines.  Add
an explicit test using the test helper to catch these issues and point
them out prominently.  This test can also be used as a staging ground
for people testing additional algorithms to verify that their
implementations are working as expected.

Signed-off-by: brian m. carlson 
---
 t/t0014-hash.sh | 29 +
 1 file changed, 29 insertions(+)
 create mode 100755 t/t0014-hash.sh

diff --git a/t/t0014-hash.sh b/t/t0014-hash.sh
new file mode 100755
index 00..8e763c2c3d
--- /dev/null
+++ b/t/t0014-hash.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='test basic hash implementation'
+. ./test-lib.sh
+
+
+test_expect_success 'test basic SHA-1 hash values' '
+   test-tool sha1 actual &&
+   grep da39a3ee5e6b4b0d3255bfef95601890afd80709 actual &&
+   printf "a" | test-tool sha1 >actual &&
+   grep 86f7e437faa5a7fce15d1ddcb9eaeaea377667b8 actual &&
+   printf "abc" | test-tool sha1 >actual &&
+   grep a9993e364706816aba3e25717850c26c9cd0d89d actual &&
+   printf "message digest" | test-tool sha1 >actual &&
+   grep c12252ceda8be8994d5fa0290a47231c1d16aae3 actual &&
+   printf "abcdefghijklmnopqrstuvwxyz" | test-tool sha1 >actual &&
+   grep 32d10c7b8cf96570ca04ce37f2a19d84240d3a89 actual &&
+   perl -E "for (1..10) { print q{aa}; }" | \
+   test-tool sha1 >actual &&
+   grep 34aa973cd4c4daa4f61eeb2bdbad27316534016f actual &&
+   printf "blob 0\0" | test-tool sha1 >actual &&
+   grep e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 actual &&
+   printf "blob 3\0abc" | test-tool sha1 >actual &&
+   grep f2ba8f84ab5c1bce84a7b441cb1959cfc7093b7f actual &&
+   printf "tree 0\0" | test-tool sha1 >actual &&
+   grep 4b825dc642cb6eb9a060e54bf8d69288fbee4904 actual
+'
+
+test_done


[RFC PATCH 00/12] Base SHA-256 algorithm implementation

2018-08-28 Thread brian m. carlson
This RFC series provides an actual SHA-256 implementation and wires it
up, along with a few housekeeping patches to make it usable for testing.

As discussed in some threads, this changes the algorithm name from
"sha-1" to "sha1" (and also adds "sha256") because it's far easier to
type.

I introduced some basic tests for the hash algorithms in use.  Since I
did not import the SHA-256 implementation verbatim, I felt it was
necessary to ensure that the hash algorithm implementations continued to
function as expected.  My main changes were to adjust the code to use
our endianness functions, to adopt something closer to our style, and to
make use of memcpy and friends for performance reasons.

I opted to place all the implementation code for SHA-256 into one
directory, as opposed to the various directories we have for the SHA-1
implementations, mostly for tidiness and ease of use.

I wired up OpenSSL because we already have it and libgcrypt because it
performs better than SHA-1.  I didn't provide an implementation for
SHA-1 with libgcrypt because everybody should be using SHA1DC for
security.

I didn't provide an implementation based on libnettle because its x86-64
assembly implementation isn't vectorized and is pretty slow.  Since this
was written before I had access to a Mac, Apple Common Crypto hasn't
been wired up, either.  Patches welcome.

If libgit2 would like to import this SHA-256 implementation, they're
welcome to do so under their normal license terms.  If not, that's fine,
too.

brian m. carlson (12):
  sha1-file: rename algorithm to "sha1"
  sha1-file: provide functions to look up hash algorithms
  hex: introduce functions to print arbitrary hashes
  t: add basic tests for our SHA-1 implementation
  t: make the sha1 test-tool helper generic
  sha1-file: add a constant for hash block size
  t/helper: add a test helper to compute hash speed
  commit-graph: convert to using the_hash_algo
  Add a base implementation of SHA-256 support
  sha256: add an SHA-256 implementation using libgcrypt
  hash: add an SHA-256 implementation using OpenSSL
  commit-graph: specify OID version for SHA-256

 Makefile  |  22 
 cache.h   |  28 ++--
 commit-graph.c|  38 +++---
 hash.h|  41 +-
 hex.c |  32 +++--
 sha1-file.c   |  70 +-
 sha256/block/sha256.c | 180 ++
 sha256/block/sha256.h |  26 
 sha256/gcrypt.h   |  30 +
 t/helper/test-hash-speed.c|  61 +
 t/helper/{test-sha1.c => test-hash.c} |  19 +--
 t/helper/test-sha1.c  |  52 +---
 t/helper/test-sha256.c|   7 +
 t/helper/test-tool.c  |   2 +
 t/helper/test-tool.h  |   4 +
 t/t0014-hash.sh   |  54 
 16 files changed, 573 insertions(+), 93 deletions(-)
 create mode 100644 sha256/block/sha256.c
 create mode 100644 sha256/block/sha256.h
 create mode 100644 sha256/gcrypt.h
 create mode 100644 t/helper/test-hash-speed.c
 copy t/helper/{test-sha1.c => test-hash.c} (66%)
 create mode 100644 t/helper/test-sha256.c
 create mode 100755 t/t0014-hash.sh



[PATCH v3 06/11] t0064: make hash size independent

2018-08-28 Thread brian m. carlson
Compute test values of the appropriate size instead of hard-coding
40-character values.  Rename the echo20 function to echoid, since the
values may be of varying sizes.

Signed-off-by: brian m. carlson 
---
 t/t0064-sha1-array.sh | 49 ---
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/t/t0064-sha1-array.sh b/t/t0064-sha1-array.sh
index 67484502a0..5dda570b9a 100755
--- a/t/t0064-sha1-array.sh
+++ b/t/t0064-sha1-array.sh
@@ -3,30 +3,30 @@
 test_description='basic tests for the SHA1 array implementation'
 . ./test-lib.sh
 
-echo20 () {
+echoid () {
prefix="${1:+$1 }"
shift
while test $# -gt 0
do
-   echo "$prefix$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1$1"
+   echo "$prefix$ZERO_OID" | sed -e "s/00/$1/g"
shift
done
 }
 
 test_expect_success 'ordered enumeration' '
-   echo20 "" 44 55 88 aa >expect &&
+   echoid "" 44 55 88 aa >expect &&
{
-   echo20 append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
echo for_each_unique
} | test-tool sha1-array >actual &&
test_cmp expect actual
 '
 
 test_expect_success 'ordered enumeration with duplicate suppression' '
-   echo20 "" 44 55 88 aa >expect &&
+   echoid "" 44 55 88 aa >expect &&
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
echo for_each_unique
} | test-tool sha1-array >actual &&
test_cmp expect actual
@@ -34,8 +34,8 @@ test_expect_success 'ordered enumeration with duplicate 
suppression' '
 
 test_expect_success 'lookup' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 55
+   echoid append 88 44 aa 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -eq 1
@@ -43,8 +43,8 @@ test_expect_success 'lookup' '
 
 test_expect_success 'lookup non-existing entry' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 33
+   echoid append 88 44 aa 55 &&
+   echoid lookup 33
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -lt 0
@@ -52,9 +52,9 @@ test_expect_success 'lookup non-existing entry' '
 
 test_expect_success 'lookup with duplicates' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 55
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -ge 2 &&
@@ -63,19 +63,24 @@ test_expect_success 'lookup with duplicates' '
 
 test_expect_success 'lookup non-existing entry with duplicates' '
{
-   echo20 append 88 44 aa 55 &&
-   echo20 append 88 44 aa 55 &&
-   echo20 lookup 66
+   echoid append 88 44 aa 55 &&
+   echoid append 88 44 aa 55 &&
+   echoid lookup 66
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -lt 0
 '
 
 test_expect_success 'lookup with almost duplicate values' '
+   # n-1 5s
+   root=$(echoid "" 55) &&
+   root=${root%5} &&
{
-   echo "append " &&
-   echo "append 555f" &&
-   echo20 lookup 55
+   id1="${root}5" &&
+   id2="${root}f" &&
+   echo "append $id1" &&
+   echo "append $id2" &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -eq 0
@@ -83,8 +88,8 @@ test_expect_success 'lookup with almost duplicate values' '
 
 test_expect_success 'lookup with single duplicate value' '
{
-   echo20 append 55 55 &&
-   echo20 lookup 55
+   echoid append 55 55 &&
+   echoid lookup 55
} | test-tool sha1-array >actual &&
n=$(cat actual) &&
test "$n" -ge 0 &&


[PATCH v3 02/11] t0000: use hash translation table

2018-08-28 Thread brian m. carlson
If the hash we're using is 32 bytes in size, attempting to insert a
20-byte object name won't work.  Since these are synthesized objects
that are almost all zeros, look them up in a translation table.

Signed-off-by: brian m. carlson 
---
 t/t-basic.sh | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index e3cace299e..b7f57ea052 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -1013,12 +1013,13 @@ test_expect_success SHA1 'validate object ID for a 
known tree' '
 
 test_expect_success 'put invalid objects into the index' '
rm -f .git/index &&
-   cat >badobjects <<-\EOF &&
-   100644 blob 1000dir/file1
-   100644 blob 2000dir/file2
-   100644 blob 3000dir/file3
-   100644 blob 4000dir/file4
-   100644 blob 5000dir/file5
+   suffix=$(echo $ZERO_OID | sed -e "s/^.//") &&
+   cat >badobjects <<-EOF &&
+   100644 blob $(test_oid 001) dir/file1
+   100644 blob $(test_oid 002) dir/file2
+   100644 blob $(test_oid 003) dir/file3
+   100644 blob $(test_oid 004) dir/file4
+   100644 blob $(test_oid 005) dir/file5
EOF
git update-index --index-info 

[PATCH v3 08/11] t1400: switch hard-coded object ID to variable

2018-08-28 Thread brian m. carlson
Switch a hard-coded all-zeros object ID to use a variable instead.

Signed-off-by: brian m. carlson 
---
 t/t1400-update-ref.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index 7c8df20955..6072650686 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -346,7 +346,7 @@ test_expect_success "verifying $m's log (logged by config)" 
'
 
 git update-ref $m $D
 cat >.git/logs/$m < 1117150320 -0500
+$Z $C $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150320 -0500
 $C $A $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150350 -0500
 $A $B $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150380 -0500
 $F $Z $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1117150680 -0500


[PATCH v3 03/11] t0000: update tests for SHA-256

2018-08-28 Thread brian m. carlson
Test t tests the "basics of the basics" and as such, checks that we
have various fixed hard-coded object IDs.  The tests relying on these
assertions have been marked with the SHA1 prerequisite, as they will
obviously not function in their current form with SHA-256.

Use the test_oid helper to update these assertions and provide values
for both SHA-1 and SHA-256.

These object IDs were synthesized using a set of scripts that created
the objects for both SHA-1 and SHA-256 using the same method to ensure
that they are indeed the correct values.

Signed-off-by: brian m. carlson 
---
 t/t-basic.sh | 163 +--
 1 file changed, 102 insertions(+), 61 deletions(-)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index b7f57ea052..72cf6bcb12 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -859,6 +859,47 @@ test_expect_success 'test_oid can look up data for 
SHA-256' '
 
 # Basics of the basics
 
+test_oid_cache <<\EOF
+path0f sha1:f87290f8eb2cbbea7857214459a0739927eab154
+path0f sha256:638106af7c38be056f3212cbd7ac65bc1bac74f420ca5a436ff006a9d025d17d
+
+path0s sha1:15a98433ae33114b085f3eb3bb03b832b3180a01
+path0s sha256:3a24cc53cf68edddac490bbf94a418a52932130541361f685df685e41dd6c363
+
+path2f sha1:3feff949ed00a62d9f7af97c15cd8a30595e7ac7
+path2f sha256:2a7f36571c6fdbaf0e3f62751a0b25a3f4c54d2d1137b3f4af9cb794bb498e5f
+
+path2s sha1:d8ce161addc5173867a3c3c730924388daedbc38
+path2s sha256:18fd611b787c2e938ddcc248fabe4d66a150f9364763e9ec133dd01d5bb7c65a
+
+path2d sha1:58a09c23e2ca152193f2786e06986b7b6712bdbe
+path2d sha256:00e4b32b96e7e3d65d79112dcbea53238a22715f896933a62b811377e2650c17
+
+path3f sha1:0aa34cae68d0878578ad119c86ca2b5ed5b28376
+path3f sha256:09f58616b951bd571b8cb9dc76d372fbb09ab99db2393f5ab3189d26c45099ad
+
+path3s sha1:8599103969b43aff7e430efea79ca4636466794f
+path3s sha256:fce1aed087c053306f3f74c32c1a838c662bbc4551a7ac2420f5d6eb061374d0
+
+path3d sha1:21ae8269cacbe57ae09138dcc3a2887f904d02b3
+path3d sha256:9b60497be959cb830bf3f0dc82bcc9ad9e925a24e480837ade46b2295e47efe1
+
+subp3f sha1:00fb5908cb97c2564a9783c0c64087333b3b464f
+subp3f sha256:a1a9e16998c988453f18313d10375ee1d0ddefe757e710dcae0d66aa1e0c58b3
+
+subp3s sha1:6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c
+subp3s sha256:81759d9f5e93c6546ecfcadb560c1ff057314b09f93fe8ec06e2d8610d34ef10
+
+subp3d sha1:3c5e5399f3a333eddecce7a9b9465b63f65f51e2
+subp3d sha256:76b4ef482d4fa1c754390344cf3851c7f883b27cf9bc999c6547928c46aeafb7
+
+root sha1:087704a96baf1c2d1c869a8b084481e121c88b5b
+root sha256:9481b52abab1b2ffeedbf9de63ce422b929f179c1b98ff7bee5f8f1bc0710751
+
+simpletree sha1:7bb943559a305bdd6bdee2cef6e5df2413c3d30a
+simpletree 
sha256:1710c07a6c86f9a3c7376364df04c47ee39e5a5e221fcdd84b743bc9bb7e2bc5
+EOF
+
 # updating a new file without --add should fail.
 test_expect_success 'git update-index without --add should fail adding' '
test_must_fail git update-index should-be-empty
@@ -874,8 +915,8 @@ test_expect_success 'writing tree out with git write-tree' '
 '
 
 # we know the shape and contents of the tree and know the object ID for it.
-test_expect_success SHA1 'validate object ID of a known tree' '
-   test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a
+test_expect_success 'validate object ID of a known tree' '
+   test "$tree" = "$(test_oid simpletree)"
 '
 
 # Removing paths.
@@ -917,16 +958,16 @@ test_expect_success 'showing stage with git ls-files 
--stage' '
git ls-files --stage >current
 '
 
-test_expect_success SHA1 'validate git ls-files output for a known tree' '
-   cat >expected <<-\EOF &&
-   100644 f87290f8eb2cbbea7857214459a0739927eab154 0   path0
-   12 15a98433ae33114b085f3eb3bb03b832b3180a01 0   path0sym
-   100644 3feff949ed00a62d9f7af97c15cd8a30595e7ac7 0   path2/file2
-   12 d8ce161addc5173867a3c3c730924388daedbc38 0   path2/file2sym
-   100644 0aa34cae68d0878578ad119c86ca2b5ed5b28376 0   path3/file3
-   12 8599103969b43aff7e430efea79ca4636466794f 0   path3/file3sym
-   100644 00fb5908cb97c2564a9783c0c64087333b3b464f 0   
path3/subp3/file3
-   12 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c 0   
path3/subp3/file3sym
+test_expect_success 'validate git ls-files output for a known tree' '
+   cat >expected <<-EOF &&
+   100644 $(test_oid path0f) 0 path0
+   12 $(test_oid path0s) 0 path0sym
+   100644 $(test_oid path2f) 0 path2/file2
+   12 $(test_oid path2s) 0 path2/file2sym
+   100644 $(test_oid path3f) 0 path3/file3
+   12 $(test_oid path3s) 0 path3/file3sym
+   100644 $(test_oid subp3f) 0 path3/subp3/file3
+   12 $(test_oid subp3s) 0 path3/subp3/file3sym
EOF
test_cmp expected current
 '
@@ -935,20 +976,20 @@ test_expect_success 'writing tree out with git 
write-tree' '
tree=$(git write-tree)
 '
 

[PATCH v3 07/11] t1006: make hash size independent

2018-08-28 Thread brian m. carlson
Compute the size of the tree and commit objects we're creating by
checking for the size of an object ID and computing the resulting sizes
accordingly.

Signed-off-by: brian m. carlson 
---
 t/t1006-cat-file.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f19d591f2..a7c95bb785 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -140,15 +140,17 @@ test_expect_success '--batch-check without %(rest) 
considers whole line' '
test_cmp expect actual
 '
 
+test_oid_init
+
 tree_sha1=$(git write-tree)
-tree_size=33
+tree_size=$(($(test_oid rawsz) + 13))
 tree_pretty_content="100644 blob $hello_sha1   hello"
 
 run_tests 'tree' $tree_sha1 $tree_size "" "$tree_pretty_content"
 
 commit_message="Initial commit"
 commit_sha1=$(echo_without_newline "$commit_message" | git commit-tree 
$tree_sha1)
-commit_size=177
+commit_size=$(($(test_oid hexsz) + 137))
 commit_content="tree $tree_sha1
 author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 00 +
 committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 00 +


[PATCH v3 09/11] t1405: make hash size independent

2018-08-28 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1405-main-ref-store.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index a74c38b5fb..331899ddc4 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -54,7 +54,7 @@ test_expect_success 'for_each_ref(refs/heads/)' '
 '
 
 test_expect_success 'for_each_ref() is sorted' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
sort actual > expected &&
test_cmp expected actual
 '
@@ -71,7 +71,7 @@ test_expect_success 'verify_ref(new-master)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-   $RUN for-each-reflog | sort -k2 | cut -c 42- >actual &&
+   $RUN for-each-reflog | sort -k2 | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
refs/heads/master 0x0


[PATCH v3 11/11] t1407: make hash size independent

2018-08-28 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1407-worktree-ref-store.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1407-worktree-ref-store.sh b/t/t1407-worktree-ref-store.sh
index 4623ae15c4..9a84858118 100755
--- a/t/t1407-worktree-ref-store.sh
+++ b/t/t1407-worktree-ref-store.sh
@@ -58,7 +58,7 @@ test_expect_success 'for_each_reflog()' '
mkdir -p .git/worktrees/wt/logs/refs/bisect &&
echo $ZERO_OID > .git/worktrees/wt/logs/refs/bisect/wt-random &&
 
-   $RWT for-each-reflog | cut -c 42- | sort >actual &&
+   $RWT for-each-reflog | cut -d" " -f 2- | sort >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
PSEUDO-WT 0x0
@@ -68,7 +68,7 @@ test_expect_success 'for_each_reflog()' '
EOF
test_cmp expected actual &&
 
-   $RMAIN for-each-reflog | cut -c 42- | sort >actual &&
+   $RMAIN for-each-reflog | cut -d" " -f 2- | sort >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
PSEUDO-MAIN 0x0


[PATCH v3 10/11] t1406: make hash-size independent

2018-08-28 Thread brian m. carlson
Instead of hard-coding a 40-based constant, split the output of
for-each-ref and for-each-reflog by field.

Signed-off-by: brian m. carlson 
---
 t/t1406-submodule-ref-store.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t1406-submodule-ref-store.sh b/t/t1406-submodule-ref-store.sh
index e093782cc3..d199d872fb 100755
--- a/t/t1406-submodule-ref-store.sh
+++ b/t/t1406-submodule-ref-store.sh
@@ -39,7 +39,7 @@ test_expect_success 'rename_refs() not allowed' '
 '
 
 test_expect_success 'for_each_ref(refs/heads/)' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
master 0x0
new-master 0x0
@@ -48,7 +48,7 @@ test_expect_success 'for_each_ref(refs/heads/)' '
 '
 
 test_expect_success 'for_each_ref() is sorted' '
-   $RUN for-each-ref refs/heads/ | cut -c 42- >actual &&
+   $RUN for-each-ref refs/heads/ | cut -d" " -f 2- >actual &&
sort actual > expected &&
test_cmp expected actual
 '
@@ -65,7 +65,7 @@ test_expect_success 'verify_ref(new-master)' '
 '
 
 test_expect_success 'for_each_reflog()' '
-   $RUN for-each-reflog | sort | cut -c 42- >actual &&
+   $RUN for-each-reflog | sort | cut -d" " -f 2- >actual &&
cat >expected <<-\EOF &&
HEAD 0x1
refs/heads/master 0x0


[PATCH v3 04/11] t0002: abstract away SHA-1 specific constants

2018-08-28 Thread brian m. carlson
Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes.

Signed-off-by: brian m. carlson 
---
 t/t0002-gitfile.sh | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 3691023d51..0aa9908ea1 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -92,11 +92,12 @@ test_expect_success 'enter_repo non-strict mode' '
mv .git .realgit &&
echo "gitdir: .realgit" >.git
) &&
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote enter_repo >actual &&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '
@@ -106,21 +107,23 @@ test_expect_success 'enter_repo linked checkout' '
cd enter_repo &&
git worktree add  ../foo refs/tags/foo
) &&
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote foo >actual &&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '
 
 test_expect_success 'enter_repo strict mode' '
+   head=$(git -C enter_repo rev-parse HEAD) &&
git ls-remote --upload-pack="git upload-pack --strict" foo/.git >actual 
&&
-   cat >expected <<-\EOF &&
-   946e985ab20de757ca5b872b16d64e92ff3803a9HEAD
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/heads/master
-   946e985ab20de757ca5b872b16d64e92ff3803a9refs/tags/foo
+   cat >expected <<-EOF &&
+   $head   HEAD
+   $head   refs/heads/master
+   $head   refs/tags/foo
EOF
test_cmp expected actual
 '


[PATCH v3 05/11] t0027: make hash size independent

2018-08-28 Thread brian m. carlson
We transform various object IDs into all-zero object IDs for comparison.
Adjust the length as well so that this works for all hash algorithms.

Signed-off-by: brian m. carlson 
---
 t/t0027-auto-crlf.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index beb5927f77..0f1235d9d1 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -14,11 +14,13 @@ compare_files () {
 compare_ws_file () {
pfx=$1
exp=$2.expect
+   tmp=$2.tmp
act=$pfx.actual.$3
-   tr '\015\000abcdef0123456789' QN0 <"$2" >"$exp" &&
+   tr '\015\000abcdef0123456789' QN0 <"$2" >"$tmp" &&
tr '\015\000abcdef0123456789' QN0 <"$3" >"$act" &&
+   sed -e "s/*/$ZERO_OID/" "$tmp" >"$exp" &&
test_cmp "$exp" "$act" &&
-   rm "$exp" "$act"
+   rm "$exp" "$act" "$tmp"
 }
 
 create_gitattributes () {


[PATCH v3 01/11] t: add tool to translate hash-related values

2018-08-28 Thread brian m. carlson
Add a test function helper, test_oid, that produces output that varies
depending on the hash in use.  Add two additional helpers,
test_oid_cache, which can be used to load data for test_oid from
standard input, and test_oid_init, which can be used to load certain
fixed values from lookup charts.  Check that these functions work in
t, as the rest of the testsuite will soon come to depend on them.

Implement two basic lookup charts, one for common invalid or synthesized
object IDs, and one for various facts about the hash function in use.
Provide versions for both SHA-1 and SHA-256.

Note that due to the implementation used, names used for lookup can
currently consist only of shell identifier characters.  If this is a
problem in the future, we can hash the names before use.

Signed-off-by: Eric Sunshine 
Signed-off-by: brian m. carlson 
---
 t/oid-info/hash-info|  8 
 t/oid-info/oid  | 29 +++
 t/t-basic.sh| 35 
 t/test-lib-functions.sh | 44 +
 4 files changed, 116 insertions(+)
 create mode 100644 t/oid-info/hash-info
 create mode 100644 t/oid-info/oid

diff --git a/t/oid-info/hash-info b/t/oid-info/hash-info
new file mode 100644
index 00..ccdbfdf974
--- /dev/null
+++ b/t/oid-info/hash-info
@@ -0,0 +1,8 @@
+rawsz sha1:20
+rawsz sha256:32
+
+hexsz sha1:40
+hexsz sha256:64
+
+zero sha1:
+zero sha256:
diff --git a/t/oid-info/oid b/t/oid-info/oid
new file mode 100644
index 00..a754970523
--- /dev/null
+++ b/t/oid-info/oid
@@ -0,0 +1,29 @@
+# These are some common invalid and partial object IDs used in tests.
+001sha1:0001
+001sha256:0001
+002sha1:0002
+002sha256:0002
+003sha1:0003
+003sha256:0003
+004sha1:0004
+004sha256:0004
+005sha1:0005
+005sha256:0005
+006sha1:0006
+006sha256:0006
+007sha1:0007
+007sha256:0007
+# All zeros or Fs missing one or two hex segments.
+zero_1 sha1:000
+zero_1 
sha256:000
+zero_2 sha1:00
+zero_2 
sha256:00
+ff_1   sha1:fff
+ff_1   
sha256:fff
+ff_2   sha1:ff
+ff_2   
sha256:ff
+# More various invalid OIDs.
+numericsha1:0123456789012345678901234567890123456789
+numeric
sha256:0123456789012345678901234567890123456789012345678901234567890123
+deadbeef   sha1:deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
+deadbeef   
sha256:deadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeefdeadbeef
diff --git a/t/t-basic.sh b/t/t-basic.sh
index 850f651e4e..e3cace299e 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -821,6 +821,41 @@ test_expect_success 'tests clean up even on failures' "
EOF
 "
 
+test_oid_init
+
+test_expect_success 'test_oid provides sane info by default' '
+   test_oid zero >actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test "$hexsz" -eq $(wc -c actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test $(wc -c actual &&
+   grep "^00*$" actual &&
+   rawsz="$(test_oid rawsz)" &&
+   hexsz="$(test_oid hexsz)" &&
+   test $(wc -c /dev/null ||
+   error 'bug in the test script: bad hash algorithm'; } &&
+   eval "test_oid_${_k}_$_tag=\"\$_v\"" || return 1
+   done
+}
+
+test_oid () {
+   eval "
+   test -n \"\${test_oid_${test_hash_algo}_$1+set}\" &&
+   printf '%s' \"\${test_oid_${test_hash_algo}_$1}\"
+   "
+}


[PATCH v3 00/11] Hash-independent tests (part 3)

2018-08-28 Thread brian m. carlson
This is the next in the series of improvements to make tests
hash-independent.

A range-diff is below.

Changes from v2:
* Fix a typo in "zero_2".
* Provide better matching of expected output.
* Add and use test_oid_init instead of filename-based test_oid_cache.
* Add test_set_hash.
* Provide better error checking in newly added test functions.
* Move t constants into the test, removing the separate file.
* Switch to using a differently named temporary file in t0027.

Changes from v1:
* Adopt pure shell approach for helper.
* Add tests for the helpers.
* Explicitly refer to SHA-256 now that we know it will be NewHash.
* Updated t to remove SHA1 prerequisite.
* Change name of helper from test_translate to test_oid.
* Add helper to cache information in the shell.
* Simplified lookup of HEAD in t0002.
* Switched to using existing helper function in t0027.
* Simplified handling of IDs in t0064.

brian m. carlson (11):
  t: add tool to translate hash-related values
  t: use hash translation table
  t: update tests for SHA-256
  t0002: abstract away SHA-1 specific constants
  t0027: make hash size independent
  t0064: make hash size independent
  t1006: make hash size independent
  t1400: switch hard-coded object ID to variable
  t1405: make hash size independent
  t1406: make hash-size independent
  t1407: make hash size independent

 t/oid-info/hash-info   |   8 ++
 t/oid-info/oid |  29 +
 t/t-basic.sh   | 211 ++---
 t/t0002-gitfile.sh |  27 +++--
 t/t0027-auto-crlf.sh   |   6 +-
 t/t0064-sha1-array.sh  |  49 
 t/t1006-cat-file.sh|   6 +-
 t/t1400-update-ref.sh  |   2 +-
 t/t1405-main-ref-store.sh  |   4 +-
 t/t1406-submodule-ref-store.sh |   6 +-
 t/t1407-worktree-ref-store.sh  |   4 +-
 t/test-lib-functions.sh|  44 +++
 12 files changed, 283 insertions(+), 113 deletions(-)
 create mode 100644 t/oid-info/hash-info
 create mode 100644 t/oid-info/oid

range-diff:

1:  fb66b1ff7d !  1:  a897533a90 t: add tool to translate hash-related values
   @@ -3,19 +3,19 @@
t: add tool to translate hash-related values

Add a test function helper, test_oid, that produces output that varies
   -depending on the hash in use.  Add an additional helper, test_oid_cache,
   -that can be used to load data for test_oid, either through a list of
   -filenames or on standard input.  Check that these functions work in
   +depending on the hash in use.  Add two additional helpers,
   +test_oid_cache, which can be used to load data for test_oid from
   +standard input, and test_oid_init, which can be used to load certain
   +fixed values from lookup charts.  Check that these functions work in
t, as the rest of the testsuite will soon come to depend on them.

Implement two basic lookup charts, one for common invalid or synthesized
object IDs, and one for various facts about the hash function in use.
   -Individual tests can use their own translation files to map object IDs
   -or other data that needs to vary across hash functions.  Provide
   -versions for both SHA-1 and SHA-256.
   +Provide versions for both SHA-1 and SHA-256.

   -Note that due to the implementation used, keys cannot be anything but
   -shell identifier characters.
   +Note that due to the implementation used, names used for lookup can
   +currently consist only of shell identifier characters.  If this is a
   +problem in the future, we can hash the names before use.

Signed-off-by: Eric Sunshine 
Signed-off-by: brian m. carlson 
   @@ -56,7 +56,7 @@
+007
sha256:0007
+# All zeros or Fs missing one or two hex segments.
+zero_1 sha1:000
   -+zero_2 
sha256:000
   ++zero_1 
sha256:000
+zero_2 sha1:00
+zero_2 
sha256:00
+ff_1   sha1:fff
   @@ -76,33 +76,39 @@
   EOF
 "
 
   -+test_oid_cache hash-info oid
   ++test_oid_init
+
   -+test_expect_success 'test_oid_info provides sane info by default' '
   -+   test_oid zero &&
   ++test_expect_success 'test_oid provides sane info by default' '
+   test_oid zero >actual &&
   -+   grep "00*" actual &&
   -+   test "$(test_oid hexsz)" -eq $(wc -c actual &&
   -+   grep "00*" actual &&
   -+   test $(wc -c actual &&
   -+   grep "00*" actual &&
   ++   grep "^00*$" actual &&
   ++   rawsz="$(test_oid rawsz)" &&
   ++   hexsz="$(test_oid hexsz)" &&
   ++   test $(wc -c 

Re: avoid "Set preference list" during make test?

2018-08-28 Thread brian m. carlson
On Tue, Aug 28, 2018 at 11:53:55PM +, Tacitus Aedifex wrote:
> While running `make test` on the git source tree I keep getting asked:
> 
>  Set preference list to:
>Cipher: ...
>Digest: ...
>etc...
> 
> Is there any way to turn that prompt off so that `make test` completes
> without any keyboard input?

Can you tell us which test is running when you see that?  I suspect it
might be one of the tests using GnuPG; if so, can you tell us what GnuPG
version you're running?

Also, just for completeness, which version are you trying to run the
tests on, and is it from the repo or a tarball?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/9] introducing oideq()

2018-08-28 Thread brian m. carlson
On Tue, Aug 28, 2018 at 05:21:27PM -0400, Jeff King wrote:
> On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote:
> > I would quite like to see this series picked up for v2.20.  If we want
> > to minimize performance regressions with the SHA-256 work, I think it's
> > important.
> 
> Thanks. One of the things I was worried about was causing unnecessary
> conflicts with existing topics, including your work. But if everybody is
> on board, I'd be happy to see this go in early in the next release cycle
> (the longer we wait, the more annoying conflicts Junio has to resolve).

I can send out work that doesn't conflict with this while it makes its
way to master.  There are a lot of test fixes that can go in in the mean
time.

I will be sending out a series that actually implements SHA-256 as an
RFC soon, but I don't think there will be any conflicts (and it will
just be an RFC anyway).
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] read-cache.c: optimize reading index format v4

2018-08-28 Thread Ben Peart




On 8/28/2018 3:25 PM, Duy Nguyen wrote:

On Mon, Aug 27, 2018 at 9:36 PM Junio C Hamano  wrote:

PS. I notice that v4 does not pad to align entries at 4 byte boundary
like v2/v3. This could cause a slight slow down on x86 and segfault on
some other platforms.


Care to elaborate?

Long time ago, we used to mmap and read directly from the index file
contents, requiring either an unaligned read or padded entries.  But
that was eons ago and we first read and convert from on-disk using
get_be32() etc. to in-core structure, so I am not sure what you mean
by "segfault" here.



My bad. I saw this line

#define get_be16(p) ntohs(*(unsigned short *)(p))

and jumped to conclusion without realizing that block is for safe
unaligned access.


@@ -1898,7 +1884,8 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
   struct cache_header *hdr;
   void *mmap;
   size_t mmap_size;
- struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
+ const struct cache_entry *previous_ce = NULL;
+ struct cache_entry *dummy_entry = NULL;

   if (istate->initialized)
   return istate->cache_nr;
@@ -1936,11 +1923,10 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
   istate->initialized = 1;

   if (istate->version == 4) {
- previous_name = _name_buf;
+ previous_ce = dummy_entry = make_empty_transient_cache_entry(0);


I do like the idea of passing the previous ce around to tell the
next one what the previous name was, but I would have preferred to
see this done a bit more cleanly without requiring us to support "a
dummy entry with name whose length is 0"; a real cache entry never
has zero-length name, and our code may want to enforce it as a
sanity check.

I think we can just call create_from_disk() with NULL set to
previous_ce in the first round; of course, the logic to assign the
one we just created to previous_ce must check istate->version,
instead of "is previous_ce NULL?" (which is an indirect way to check
the same thing used in this patch).


Yeah I kinda hated dummy_entry too but the feeling wasn't strong
enough to move towards the index->version check. I guess I'm going to
do it now.



I ran some perf tests using p0002-read-cache.sh to compare V4 
performance before and after this patch so I could get a feel for how 
much it helps.


100,000 files

Test  HEAD~1   HEAD

read_cache/discard_cache 1000 times14.1210.75 -23.9%

1,000,000 files

Test  HEAD~1   HEAD

read_cache/discard_cache 1000 times   202.81   170.33 -16.0%


This provides a nice speedup and IMO simplifies the code as well. 
Nicely done.


avoid "Set preference list" during make test?

2018-08-28 Thread Tacitus Aedifex

While running `make test` on the git source tree I keep getting asked:

 Set preference list to:
   Cipher: ...
   Digest: ...
   etc...

Is there any way to turn that prompt off so that `make test` completes without 
any keyboard input?


//tæ


Re: [PATCH 2/2] tests: fix non-portable iconv invocation

2018-08-28 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> The iconv that comes with a FreeBSD 11.2-RELEASE-p2 box I have access
> to doesn't support the SHIFT-JIS encoding. Guard a test added in
> e92d62253 ("convert: add round trip check based on
> 'core.checkRoundtripEncoding'", 2018-04-15) first released with Git
> v2.18.0 with a prerequisite that checks for its availability.
>
> The iconv command is in POSIX, and we have numerous tests
> unconditionally relying on its ability to convert ASCII, UTF-8 and
> UTF-16, but unconditionally relying on the presence of more obscure
> encodings isn't portable.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  t/t0028-working-tree-encoding.sh | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

One nit:

[...]
> diff --git a/t/t0028-working-tree-encoding.sh 
> b/t/t0028-working-tree-encoding.sh
> index 12b8eb963a..b0f4490e8e 100755
> --- a/t/t0028-working-tree-encoding.sh
> +++ b/t/t0028-working-tree-encoding.sh
> @@ -203,7 +203,11 @@ test_expect_success 'error if encoding garbage is 
> already in Git' '
>   test_i18ngrep "error: BOM is required" err.out
>  '
>  
> -test_expect_success 'check roundtrip encoding' '
> +test_lazy_prereq ICONV_SHIFT_JIS '
> + iconv -f UTF-8 -t SHIFT-JIS /dev/null

Is this 2>/dev/null needed?  Leaving it out should make the test
easier to debug.

If I'm reading it correctly, test_run_lazy_prereq_ appears to respect
the normal --verbose etc settings, which means that the 2>/dev/null
could be left out.  A quick check[*] with and without -v seems to bear
this out.

> +'
> +
> +test_expect_success ICONV_SHIFT_JIS 'check roundtrip encoding' '
>   test_when_finished "rm -f roundtrip.shift roundtrip.utf16" &&
>   test_when_finished "git reset --hard HEAD" &&

With that tweak (removal of 2>/dev/null), this is
Reviewed-by: Jonathan Nieder 

Thanks.

[*]
 diff --git i/t/t-basic.sh w/t/t-basic.sh
 index 850f651e4e..879f63118e 100755
 --- i/t/t-basic.sh
 +++ w/t/t-basic.sh
 @@ -25,6 +25,10 @@ try_local_x () {
echo "$x"
  }
  
 +test_lazy_prereq NOISY_TEST '
 +  echo >&2 "PAAARTY!"
 +'
 +
  # This test is an experiment to check whether any Git users are using
  # Shells that don't support the "local" keyword. "local" is not
  # POSIX-standard, but it is very widely supported by POSIX-compliant
 @@ -35,7 +39,7 @@ try_local_x () {
  # fails this test, you can ignore the failure, but please report the
  # problem to the Git mailing list , as it might
  # convince us to continue avoiding the use of "local".
 -test_expect_success 'verify that the running shell supports "local"' '
 +test_expect_success NOISY_TEST 'verify that the running shell supports 
"local"' '
x="notlocal" &&
echo "local" >expected1 &&
try_local_x >actual1 &&


Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-28 Thread Thomas Gummerer
On 08/27, Junio C Hamano wrote:
> Thomas Gummerer  writes:
> 
> > Agreed.  I think it may be solvable if we'd actually get the
> > information about what belongs to which side from the merge algorithm
> > directly.
> 
> The merge machinery may (eh, rather, "does") know, but we do not
> have a way to express that in the working tree file that becomes the
> input to the rerere algorithm, without making backward-incompatible
> changes to the output format.

Right, I was more thinking along the lines of using the stages in the
index to redo the merge and get the information that way.  But that
may not work as well with using 'git rerere' from the command line,
and have other backwards compatibility woes, that I didn't quite think
through yet :)

> In a sense, that is already a solved problem, even though the
> solution was done a bit differently ;-) If the end users need to
> commit a half-resolved result with conflict markers (perhaps they
> want to share it among themselves and work on resolving further),
> what they can do is to also say that these are now part of contents,
> not conflict markers, with conflict-marker-size attribute.  Perhaps
> they prepare such a half-resolved result with unusual value of the
> attribute, so that later merge of these with standard conflict
> marker size will not get confused.

Right, I wasn't aware of the conflict-marker-size attribute.  Thanks
for mentioning it!

> That reminds me of another thing.  I've been running with these in
> my $GIT_DIR/info/attributes file for the past few years.  Perhaps we
> should add them to Documentation/.gitattributes and t/.gitattributes
> so that project participants would all benefit?
> 
> Documentation/git-merge.txt   conflict-marker-size=32
> Documentation/user-manual.txt conflict-marker-size=32
> t/t-*.sh  conflict-marker-size=32

I do think that would be a good idea.  I am wondering what the right
value is though.  Seeing such a long conflict marker before I knew
about this setting would have struck me as odd, and probably made me
try and track down where it is coming from.  But on the other hand it
makes the conflict markers very easy to tell apart from the rest of
the lines that kind of look like conflict markers.

I think these tradeoffs probably make it worth setting them to a value
this large.

One other file that I see needs such a treatment is
Documentation/gitk.txt, where the first header is 7 "="s, and
therefore could confuse 'git rerere' as well.  Arguably that's less
important, as there's unlikely to be a conflict containing that line,
but it may be worth including for completeness sake.

Maybe something like this?  Though it may be good for others to chime
in if they find this helpful or whether they find the long conflict
markers distracting.

--- >8 ---
Subject: [PATCH] .gitattributes: add conflict-marker-size for relevant files

Some files in git.git contain lines that look like conflict markers,
either in examples or tests, or in the case of Documentation/gitk.txt
because of the asciidoc heading.

Having conflict markers the same length as the actual content can be
confusing for humans, and is impossible to handle for tools like 'git
rerere'.  Work around that by setting the 'conflict-marker-size'
attribute for those files to 32, which makes the conflict markers
unambiguous.

Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
 .gitattributes | 4 
 1 file changed, 4 insertions(+)

diff --git a/.gitattributes b/.gitattributes
index 1bdc91e282..49b3051641 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -9,3 +9,7 @@
 /command-list.txt eol=lf
 /GIT-VERSION-GEN eol=lf
 /mergetools/* eol=lf
+/Documentation/git-merge.txt conflict-marker-size=32
+/Documentation/gitk.txt conflict-marker-size=32
+/Documentation/user-manual.txt conflict-marker-size=32
+/t/t-*.sh conflict-marker-size=32
-- 
2.18.0.1088.ge017bf2cd1


Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-28 Thread Eric Sunshine
On Tue, Aug 28, 2018 at 5:31 PM Derrick Stolee  wrote:
> On 8/28/2018 4:41 PM, Stefan Beller wrote:
> > On Tue, Aug 28, 2018 at 1:33 PM Derrick Stolee via GitGitGadget
> >  wrote:
> >> +   GIT_TEST_COMMIT_GRAPH=0 &&
> >> +   test_must_fail git merge -m final G
> > This could go on the same line without the && in between, setting the
> > variable as a prefix.
>
> It cannot! The Linux build I ran complained that you can't put
> environment variables through test_must_fail.

Is GIT_TEST_COMMIT_GRAPH exported? If not, it won't have an impact on
git-merge anyhow.

As for the special case of one-shot environment variable and
test_must_fail(), you'll find "env" used as a workaround in a number
of tests:

test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge ... &&


Re: [PATCH 2/2] submodule.c: warn about missing submodule git directories

2018-08-28 Thread Stefan Beller
On Tue, Aug 28, 2018 at 11:56 AM Junio C Hamano  wrote:
>
> Stefan Beller  writes:
>
> > This is the continuation of f2d48994dc1 (submodule.c: submodule_move_head
> > works with broken submodules, 2017-04-18), which tones down the case of
> > "broken submodule" in case of a missing git directory of the submodule to
> > be only a warning.
>
> After seeing this warning, as we do not do any remedial action in
> this codepath, the user with a repository in this state will keep
> seeing the 'missing' message.  Wouldn't we want to give a hint in
> addition (e.g. 'you can run "git submodule update $name" to
> recover', or something like that)?

Not quite, as this is only triggered in the case of 'old_head = NULL', which
is when you have a superproject that is missing the submodule in the
working tree before and wants to have it afterwards.

Looking at the test in the previous patch, I would think a reasonable workflow
in the test is

git clone --recurse-submodules super1 super1_clone && cd super1_clone
git checkout --recurse-submodules historic_state
# see warning, but checkout proceeds

git fetch --recurse-submodules
# clones the submodule as it was configured active via the clone

git checkout --recurse-submodules historic_state
# this checkout will put the submodule in place
#  not sure if -f is needed here, though.


I am currently working on the cloning of submodules that are not currently
in the working tree while fetching in the superproject, which would address
the latter part.

> The MOVE_HEAD_FORCE codepath that follows this hunk is, eh, already
> forcing to correct the situation, so there is no need to touch that,
> which makes sense, if I understand correctly.

No, that is not executed for now as it depends on 'old_head'.

In case of FORCE we might want to die instead of just warn about that submodule.

Stefan


Re: [PATCH] commit-reach: correct accidental #include of C file

2018-08-28 Thread Derrick Stolee

On 8/28/2018 5:36 PM, Jonathan Nieder wrote:

Without this change, the build breaks with clang:
For some reason, it didn't fail with GCC for me, but this is an 
obviously correct change to make. Thanks!

  libgit/ref-filter.pic.o: multiple definition of 'filter_refs'
  libgit/commit-reach.pic.o: previous definition here

Signed-off-by: Jonathan Nieder 
---
Jonathan Nieder wrote:

Derrick Stolee wrote:

--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,8 +1,10 @@
  #include "cache.h"
  #include "commit.h"
+#include "commit-graph.h"
  #include "decorate.h"
  #include "prio-queue.h"
  #include "tree.h"
+#include "ref-filter.c"

Did you mean "ref-filter.h"?

This broke the build here.  Is there some check that we can use to
prevent it happening again?  I don't think we ever intentionally
#include a .c file.

Here's what I'm applying locally.

Thanks,
Jonathan

  commit-reach.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-reach.c b/commit-reach.c
index c996524032..86715c103c 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -4,7 +4,7 @@
  #include "decorate.h"
  #include "prio-queue.h"
  #include "tree.h"
-#include "ref-filter.c"
+#include "ref-filter.h"
  #include "revision.h"
  #include "tag.h"
  #include "commit-reach.h"


[PATCH] commit-reach: correct accidental #include of C file

2018-08-28 Thread Jonathan Nieder
Without this change, the build breaks with clang:

 libgit/ref-filter.pic.o: multiple definition of 'filter_refs'
 libgit/commit-reach.pic.o: previous definition here

Signed-off-by: Jonathan Nieder 
---
Jonathan Nieder wrote:
> Derrick Stolee wrote:

>> --- a/commit-reach.c
>> +++ b/commit-reach.c
>> @@ -1,8 +1,10 @@
>>  #include "cache.h"
>>  #include "commit.h"
>> +#include "commit-graph.h"
>>  #include "decorate.h"
>>  #include "prio-queue.h"
>>  #include "tree.h"
>> +#include "ref-filter.c"
>
> Did you mean "ref-filter.h"?
>
> This broke the build here.  Is there some check that we can use to
> prevent it happening again?  I don't think we ever intentionally
> #include a .c file.

Here's what I'm applying locally.

Thanks,
Jonathan

 commit-reach.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-reach.c b/commit-reach.c
index c996524032..86715c103c 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -4,7 +4,7 @@
 #include "decorate.h"
 #include "prio-queue.h"
 #include "tree.h"
-#include "ref-filter.c"
+#include "ref-filter.h"
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
-- 
2.19.0.rc0.228.g281dcd1b4d0



Re: [PATCH 0/9] introducing oideq()

2018-08-28 Thread Derrick Stolee

On 8/28/2018 5:21 PM, Jeff King wrote:

On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote:


Due to the simplicity of the current code and our inlining, the compiler
can usually figure this out for now. So I wouldn't expect this patch to
actually improve performance right away. But as that discussion shows,
we are likely to take a performance hit as we move to more runtime
determination of the_hash_algo parameters. Having these callers use the
more strict form will potentially help us recover that.

So in that sense we _could_ simply punt on this series until then (and
it's certainly post-v2.19 material). But I think it's worth doing now,
simply from a readability/annotation standpoint. IMHO the resulting code
is more clear (though I've long since taught myself to read !foocmp() as
equality).

I would quite like to see this series picked up for v2.20.  If we want
to minimize performance regressions with the SHA-256 work, I think it's
important.

Thanks. One of the things I was worried about was causing unnecessary
conflicts with existing topics, including your work. But if everybody is
on board, I'd be happy to see this go in early in the next release cycle
(the longer we wait, the more annoying conflicts Junio has to resolve).


I'm happy to take this change whenever. In my opinion, right after 
v2.19.0 is cut would be a great time to merge it into master.


This v2 is good.

Reviewed-by: Derrick Stolee 



Re: [PATCH 2/2] submodule: munge paths to submodule git directories

2018-08-28 Thread Stefan Beller
> > > -   echo "gitdir: 
> > > ../../../.git/modules/sub3/modules/dirdir/subsub" 
> > > >./sub3/dirdir/subsub/.git_expect
> > > +   echo "gitdir: 
> > > ../../../.git/modules/sub3/modules/dirdir%2fsubsub" 
> > > >./sub3/dirdir/subsub/.git_expect
> >
> > One interesting thing about url-encoding is that it's not one-to-one.
> > This case could also be %2F, which is a different file (on a
> > case-sensitive filesystem). I think "%20" and "+" are similarly
> > interchangeable.
> >
> > If we were decoding the filenames, that's fine. The round-trip is
> > lossless.
> >
> > But that's not quite how the new code behaves. We encode the input and
> > then check to see if it matches an encoding we previously performed. So
> > if our urlencode routines ever change, this will subtly break.

And this is the problem:
a) we have a 'complicated' encoding here, which must never change
b) the "encode and check if it matches", will produce ugly code going forward,
as it tries to differentiate between submodules named "url_encoded(a)"
and "a" (e.g. "a%20b" and "a b" would conflict and we have to resolve
the conflict, although those two names are perfectly fine as they do not
have the original problem of having slashes)

Hence I would propose a simpler encoding:

1)/ -> _ ( replace a slash by an underscore)
2)_ -> __ (replace any underscore by 2 underscores, this is just the
  escaping mechanism to differentiate a/b and a_b)

3) (optional) instead of putting it all in modules/, use another
directory gitmodules/
for example. this will make sure we can tell if a repository has
been converted
or is stuck with a setup of a current git.

> This is exactly the reason why I wanted to get some opinions on what the
> best thing to do here would be.  I _think_ the best thing would probably
> be to write a specific routine to do the conversion, and it wouldn't
> even have to be all that complex.  Basically I'm just interested in
> converting '/' characters so that things no longer behave like
> nested directories.

Yeah, then let's just convert '/' with as little overhead as possible.

Thanks,
Stefan


Re: [PATCH] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-28 Thread Junio C Hamano
Tim Schumacher  writes:

> Previously, the sed command for generating manpage-base-url.xsl
> was printed to the console when being run.
>
> For the purpose of silencing it, define a $(QUIET) variable which
> contains an '@' if verbose mode isn't enabled and which is empty
> otherwise. This just silences the command invocation without doing
> anything else.
>
> Signed-off-by: Tim Schumacher 
> ---

I am not sure if this is a good change.  All these QUIET_$TOOL hide
details of running the $TOOL to produce the final output of the step,
but they still do report what they are creating via which $TOOL.

Shouldn't the step to create manpage-base-url.xsl be the same?  The
detail of creating it (i.e. token @@MAN_BASE_URL@@ is replaced with
the actual value) may want to be squelched, but shouldn't we still
be reporting that we are creating manpage-base-url.xsl file?

>  Documentation/Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index a42dcfc74..45454e9b5 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -217,6 +217,7 @@ endif
>  
>  ifneq ($(findstring $(MAKEFLAGS),s),s)
>  ifndef V
> + QUIET   = @
>   QUIET_ASCIIDOC  = @echo '   ' ASCIIDOC $@;
>   QUIET_XMLTO = @echo '   ' XMLTO $@;
>   QUIET_DB2TEXI   = @echo '   ' DB2TEXI $@;
> @@ -344,7 +345,7 @@ $(OBSOLETE_HTML): %.html : %.txto asciidoc.conf
>   mv $@+ $@
>  
>  manpage-base-url.xsl: manpage-base-url.xsl.in
> - sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
> + $(QUIET)sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
>  
>  %.1 %.5 %.7 : %.xml manpage-base-url.xsl
>   $(QUIET_XMLTO)$(RM) $@ && \


Re: [PATCH v2 04/18] commit-reach: move commit_contains from ref-filter

2018-08-28 Thread Derrick Stolee

On 8/28/2018 5:24 PM, Jonathan Nieder wrote:

Hi,

Derrick Stolee wrote:


There are several commit walks in the codebase. Group them together into
a new commit-reach.c file and corresponding header. After we group these
walks into one place, we can reduce duplicate logic by calling
equivalent methods.

All methods are direct moves, except we also make the commit_contains()
method public so its consumers in ref-filter.c can still call it. We can
also test this method in a test-tool in a later commit.

Signed-off-by: Derrick Stolee 
---
  commit-reach.c | 121 +
  commit-reach.h |  20 ++-
  ref-filter.c   | 145 +++--
  3 files changed, 147 insertions(+), 139 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index a6bc4781a6..01d796f011 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1,8 +1,10 @@
  #include "cache.h"
  #include "commit.h"
+#include "commit-graph.h"
  #include "decorate.h"
  #include "prio-queue.h"
  #include "tree.h"
+#include "ref-filter.c"

Did you mean "ref-filter.h"?

This broke the build here.  Is there some check that we can use to
prevent it happening again?  I don't think we ever intentionally
#include a .c file.

Woah! How did that ever work? I definitely built this locally.


Re: [PATCH 0/1] Define GIT_TEST_COMMIT_GRAPH for commit-graph test coverage

2018-08-28 Thread Derrick Stolee

On 8/28/2018 4:37 PM, Stefan Beller wrote:

On Tue, Aug 28, 2018 at 1:33 PM Derrick Stolee via GitGitGadget
 wrote:

The commit-graph (and multi-pack-index) features are optional data
structures that can make Git operations faster. Since they are optional, we
do not enable them in most Git tests. The commit-graph is tested in
t5318-commit-graph.sh (and t6600-test-reach.sh in ds/reachable), but that
one script cannot cover the data shapes present in the rest of the test
suite.

This patch introduces a new test environment variable, GIT_TEST_COMMIT_GRAPH
. Similar to GIT_TEST_SPLIT_INDEX, it enables the commit-graph and writes it
with every git commit command.
Thanks, Duy, for pointing out this direction

Did you mean to cc Duy (instead of me)?
(I'll happily review the patch, too... just asking)


I just added you because you've been on a lot of commit-graph things 
lately. But yes, I forgot to add Duy to the CC list. Added to this message.


Thanks,

-Stolee



Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-28 Thread Derrick Stolee

On 8/28/2018 4:41 PM, Stefan Beller wrote:

On Tue, Aug 28, 2018 at 1:33 PM Derrick Stolee via GitGitGadget
 wrote:

From: Derrick Stolee 

The commit-graph feature is tested in isolation by
t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
more interesting scenarios involving commit walks. Many of these
scenarios are covered by the existing test suite, but we need to
maintain coverage when the optional commit-graph structure is not
present.

To allow running the full test suite with the commit-graph present,
add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
to load the commit-graph when parsing commits, and writes the
commit-graph file after every 'git commit' command.

There are a few tests that rely on commits not existing in
pack-files to trigger important events, so manually set
GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

So the plan is to turn on the commit graph for the whole test suite
excluding these selected tests?

Excluding these specific _steps_, but yes.



+   GIT_TEST_COMMIT_GRAPH=0 &&
+   test_must_fail git merge -m final G

This could go on the same line without the && in between, setting the
variable as a prefix.


It cannot! The Linux build I ran complained that you can't put 
environment variables through test_must_fail.


-Stolee



[PATCH v2 2/2] rerere: add note about files with existing conflict markers

2018-08-28 Thread Thomas Gummerer
When a file contains lines that look like conflict markers, 'git
rerere' may fail not be able to record a conflict resolution.
Emphasize that in the man page, and mention a possible workaround for
the issue.

Suggested-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

Compared to v1, this now mentions the workaround of setting the
'conflict-marker-size', as mentioned in


 Documentation/git-rerere.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-rerere.txt b/Documentation/git-rerere.txt
index 031f31fa47..df310d2a58 100644
--- a/Documentation/git-rerere.txt
+++ b/Documentation/git-rerere.txt
@@ -211,6 +211,12 @@ would conflict the same way as the test merge you resolved 
earlier.
 'git rerere' will be run by 'git rebase' to help you resolve this
 conflict.
 
+[NOTE] 'git rerere' relies on the conflict markers in the file to
+detect the conflict.  If the file already contains lines that look the
+same as lines with conflict markers, 'git rerere' may fail to record a
+conflict resolution.  To work around this, the `conflict-marker-size`
+setting in linkgit:gitattributes[5] can be used.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
-- 
2.18.0.1088.ge017bf2cd1



[PATCH v2 1/2] rerere: mention caveat about unmatched conflict markers

2018-08-28 Thread Thomas Gummerer
4af3220 ("rerere: teach rerere to handle nested conflicts",
2018-08-05) introduced slightly better behaviour if the user commits
conflict markers and then gets another conflict in 'git rerere'.

However this is just a heuristic to punt on such conflicts better, and
doesn't deal with any unmatched conflict markers.  Make that clearer
in the documentation.

Suggested-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---

> That's fine.  I'd rather keep it but perhaps add a reminder to tell
> readers that it works only when the merging of contents that already
> records with nested conflict markers happen to "cleanly nest".

Yeah that makes sense.  Maybe something like this?

(replying to  here to keep
the patches in one thread)

 Documentation/technical/rerere.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/technical/rerere.txt 
b/Documentation/technical/rerere.txt
index e65ba9b0c6..8fefe51b00 100644
--- a/Documentation/technical/rerere.txt
+++ b/Documentation/technical/rerere.txt
@@ -149,7 +149,10 @@ version, and the sorting the conflict hunks, both for the 
outer and the
 inner conflict.  This is done recursively, so any number of nested
 conflicts can be handled.
 
+Note that this only works for conflict markers that "cleanly nest".  If
+there are any unmatched conflict markers, rerere will fail to handle
+the conflict and record a conflict resolution.
+
 The only difference is in how the conflict ID is calculated.  For the
 inner conflict, the conflict markers themselves are not stripped out
 before calculating the sha1.
-- 
2.18.0.1088.ge017bf2cd1



Re: [PATCH] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-28 Thread Eric Sunshine
On Tue, Aug 28, 2018 at 5:21 PM Tim Schumacher  wrote:
> Previously, the sed command for generating manpage-base-url.xsl
> was printed to the console when being run.
>
> For the purpose of silencing it, define a $(QUIET) variable which
> contains an '@' if verbose mode isn't enabled and which is empty
> otherwise. This just silences the command invocation without doing
> anything else.

One thing that is missing from this commit message is the explanation
of _why_ this is desirable. And, why only this command? Without
understanding the underlying reason(s), it's hard to judge the value
of this change.

Thanks.


Re: [PATCH v2 04/18] commit-reach: move commit_contains from ref-filter

2018-08-28 Thread Jonathan Nieder
Hi,

Derrick Stolee wrote:

> There are several commit walks in the codebase. Group them together into
> a new commit-reach.c file and corresponding header. After we group these
> walks into one place, we can reduce duplicate logic by calling
> equivalent methods.
>
> All methods are direct moves, except we also make the commit_contains()
> method public so its consumers in ref-filter.c can still call it. We can
> also test this method in a test-tool in a later commit.
>
> Signed-off-by: Derrick Stolee 
> ---
>  commit-reach.c | 121 +
>  commit-reach.h |  20 ++-
>  ref-filter.c   | 145 +++--
>  3 files changed, 147 insertions(+), 139 deletions(-)
> 
> diff --git a/commit-reach.c b/commit-reach.c
> index a6bc4781a6..01d796f011 100644
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -1,8 +1,10 @@
>  #include "cache.h"
>  #include "commit.h"
> +#include "commit-graph.h"
>  #include "decorate.h"
>  #include "prio-queue.h"
>  #include "tree.h"
> +#include "ref-filter.c"

Did you mean "ref-filter.h"?

This broke the build here.  Is there some check that we can use to
prevent it happening again?  I don't think we ever intentionally
#include a .c file.

Thanks,
Jonathan


Re: [PATCH v6] Implement --first-parent for git rev-list --bisect

2018-08-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Something like the following, perhaps?

Having said all that.

> +# See the drawing near the top --- e4 is in the middle of the first parent 
> chain
> +printf "%s\n" e4 |
> +test_output_expect_success '--bisect --first-parent' '
> + git rev-list --bisect --first-parent E ^F
> +'
> +
> +printf "%s\n" E e1 e2 e3 e4 e5 e6 e7 e8 |
> +test_output_expect_success "--first-parent" '
> + git rev-list --first-parent E ^F
> +'

The above two are probably not controversial.

> +test_output_expect_success '--bisect-vars --first-parent' '
> + git rev-list --bisect-vars --first-parent E ^F
> +' <<-EOF
> + bisect_rev='e5'
> + bisect_nr=4
> + bisect_good=4
> + bisect_bad=3
> + bisect_all=9
> + bisect_steps=2
> +EOF

Perhaps this one isn't either.


> +test_expect_success '--bisect-all --first-parent' '
> + cat >expect <<-EOF &&
> + $(git rev-parse tags/e5) (dist=4)
> + $(git rev-parse tags/e4) (dist=4)
> + $(git rev-parse tags/e6) (dist=3)
> + $(git rev-parse tags/e3) (dist=3)
> + $(git rev-parse tags/e7) (dist=2)
> + $(git rev-parse tags/e2) (dist=2)
> + $(git rev-parse tags/e8) (dist=1)
> + $(git rev-parse tags/e1) (dist=1)
> + $(git rev-parse tags/E) (dist=0)
> + EOF
> +
> + # Make sure we have the same entries, nothing more, nothing less
> + git rev-list --bisect-all --first-parent E ^F >actual &&
> + sort actual >actual.sorted &&
> + sort expect >expect.sorted &&
> + test_cmp expect.sorted actual.sorted &&

By sorting both, we attempt allow them to come out in any random
order when sorting done by --bisect-all gets tiebroken differently
between commits with the same dist value.  As Johannes said, this is
a bit too strict to insist that E *must* get dist 0, when the only
thing we may care about are e2 and e7 get the same dist value, which
is larger than the dist value given to E, etc.  If we wanted to ease
the over-strictness, we could omit " (dist=N)" from the 'expect' file,
and then

sed -e 's/ (dist=[0-9]*)$//' actual | sort >actual.sorted &&
sort expect >expect.sorted &&
test_cmp expect.sorted actual.sorted &&

to compare only the object names.

This over-strictness would have caught a bug in Git if we reverted
this new feature (i.e. "rev-list --first-parent --bisect-all" does
not refuse to work---it just does not do what we expect), which
gives distance of -1 in the output (!).  From that point of view, I
think it is also OK for the test to be spelling the values out like
the above.

> + # Make sure the entries are sorted in the dist order
> + sed -e "s/.*(dist=\([0-9]*\)).*/\1/" actual >actual.dists &&
> + sort -r -n actual.dists >actual.dists.sorted &&

I forgot to mention but I added "-n" here to make sure we
numerically sort the list of distance values.  Also you had a bogus
regexp to catch digits inherited from "something like this" I showed
in an older discussion (I think it is sufficient to say [0-9]* here).

The above makes sure that commits that have larger distance number
are listed earlier in the output, and we do not care if which one of
e4 or e5, both of which have distant number 4, is shown first---we'd
be happy as long as all the ones at distance 4 are shown before the
others at smaller distance number.  So with this in place, I think
we can omit the exact distance number from the earlier part of this
test, but as I said, it would have caught the bug that produces a
negative distance value in the output, so I am still on the fence,
leaning a bit toward being stricter than necessary.

> + test_cmp actual.dists.sorted actual.dists
> +'
> +
>  test_done


[PATCH v2 8/9] read-cache: use oideq() in ce_compare functions

2018-08-28 Thread Jeff King
These functions return the full oidcmp() value, but the
callers really only care whether it is non-zero. We can use
the more strict !oideq(), which a compiler may be able to
optimize further.

This does change the meaning of the return value subtly, but
it's unlikely that anybody would try to use them for
ordering. They're static-local in this file, and they
already return other error values that would confuse an
ordering (e.g., open() failure gives -1).

Signed-off-by: Jeff King 
---
 read-cache.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 421a7f4953..eb7cea6272 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -213,7 +213,7 @@ static int ce_compare_data(const struct cache_entry *ce, 
struct stat *st)
if (fd >= 0) {
struct object_id oid;
if (!index_fd(, fd, st, OBJ_BLOB, ce->name, 0))
-   match = oidcmp(, >oid);
+   match = !oideq(, >oid);
/* index_fd() closed the file descriptor already */
}
return match;
@@ -254,7 +254,7 @@ static int ce_compare_gitlink(const struct cache_entry *ce)
 */
if (resolve_gitlink_ref(ce->name, "HEAD", ) < 0)
return 0;
-   return oidcmp(, >oid);
+   return !oideq(, >oid);
 }
 
 static int ce_modified_check_fs(const struct cache_entry *ce, struct stat *st)
-- 
2.19.0.rc0.584.g84d5b2a847



[PATCH v2 5/9] convert "oidcmp() != 0" to "!oideq()"

2018-08-28 Thread Jeff King
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:

  if (oidcmp(E1, E2))

As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.

There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.

Signed-off-by: Jeff King 
---
 bisect.c   | 2 +-
 blame.c| 4 ++--
 builtin/fmt-merge-msg.c| 4 ++--
 builtin/merge.c| 2 +-
 builtin/pull.c | 2 +-
 builtin/rm.c   | 2 +-
 builtin/show-branch.c  | 4 ++--
 builtin/tag.c  | 2 +-
 bundle.c   | 2 +-
 commit-graph.c | 6 +++---
 commit.c   | 2 +-
 contrib/coccinelle/object_id.cocci | 6 ++
 diff-lib.c | 2 +-
 diff.c | 6 +++---
 diffcore-rename.c  | 2 +-
 dir.c  | 6 +++---
 fast-import.c  | 4 ++--
 fetch-pack.c   | 2 +-
 match-trees.c  | 2 +-
 notes.c| 2 +-
 read-cache.c   | 2 +-
 refs.c | 8 
 refs/files-backend.c   | 2 +-
 refs/packed-backend.c  | 2 +-
 refs/ref-cache.c   | 2 +-
 remote.c   | 2 +-
 sequencer.c| 8 
 sha1-file.c| 6 +++---
 submodule-config.c | 4 ++--
 t/helper/test-dump-cache-tree.c| 2 +-
 tree-diff.c| 2 +-
 31 files changed, 55 insertions(+), 49 deletions(-)

diff --git a/bisect.c b/bisect.c
index 41c56a665e..7c1d8f1a6d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -595,7 +595,7 @@ static struct commit_list *skip_away(struct commit_list 
*list, int count)
 
for (i = 0; cur; cur = cur->next, i++) {
if (i == index) {
-   if (oidcmp(>item->object.oid, current_bad_oid))
+   if (!oideq(>item->object.oid, current_bad_oid))
return cur;
if (previous)
return previous;
diff --git a/blame.c b/blame.c
index c47d7050d9..d5f7b7237c 100644
--- a/blame.c
+++ b/blame.c
@@ -1832,7 +1832,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,
 
sb->revs->children.name = "children";
while (c->parents &&
-  oidcmp(>object.oid, >final->object.oid)) {
+  !oideq(>object.oid, >final->object.oid)) {
struct commit_list *l = xcalloc(1, sizeof(*l));
 
l->item = c;
@@ -1842,7 +1842,7 @@ void setup_scoreboard(struct blame_scoreboard *sb,
c = c->parents->item;
}
 
-   if (oidcmp(>object.oid, >final->object.oid))
+   if (!oideq(>object.oid, >final->object.oid))
die(_("--reverse --first-parent together require range 
along first-parent chain"));
}
 
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 4c82c234cb..268f0c20ca 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -78,9 +78,9 @@ static struct merge_parent *find_merge_parent(struct 
merge_parents *table,
 {
int i;
for (i = 0; i < table->nr; i++) {
-   if (given && oidcmp(>item[i].given, given))
+   if (given && !oideq(>item[i].given, given))
continue;
-   if (commit && oidcmp(>item[i].commit, commit))
+   if (commit && !oideq(>item[i].commit, commit))
continue;
return >item[i];
}
diff --git a/builtin/merge.c b/builtin/merge.c
index 57abff999b..8d85d31a61 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1521,7 +1521,7 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
 * HEAD^^" would be missed.
 */
common_one = get_merge_bases(head_commit, j->item);
-   if (oidcmp(_one->item->object.oid, 
>item->object.oid)) {
+   if (!oideq(_one->item->object.oid, 
>item->object.oid)) {
up_to_date = 0;
break;
}
diff --git a/builtin/pull.c 

[PATCH v2 9/9] show_dirstat: simplify same-content check

2018-08-28 Thread Jeff King
We use two nested conditionals to store a content_changed
variable, but only bother to look at the result once,
directly after we set it. We can drop the variable entirely
and just use a single "if".

This needless complexity is the result of 2ff3a80334 (Teach
--dirstat not to completely ignore rearranged lines within a
file, 2011-04-11). Before that, we held onto the
content_changed variable much longer.

While we're touching the condition, we can swap out oidcmp()
for !oideq(). Our coccinelle patches didn't previously find
this case because of the intermediate variable, but now it's
a simple boolean in a conditional.

Signed-off-by: Jeff King 
---
 diff.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 5d3219b600..605ba4b6b8 100644
--- a/diff.c
+++ b/diff.c
@@ -2933,16 +2933,11 @@ static void show_dirstat(struct diff_options *options)
struct diff_filepair *p = q->queue[i];
const char *name;
unsigned long copied, added, damage;
-   int content_changed;
 
name = p->two->path ? p->two->path : p->one->path;
 
-   if (p->one->oid_valid && p->two->oid_valid)
-   content_changed = oidcmp(>one->oid, >two->oid);
-   else
-   content_changed = 1;
-
-   if (!content_changed) {
+   if (p->one->oid_valid && p->two->oid_valid &&
+   oideq(>one->oid, >two->oid)) {
/*
 * The SHA1 has not changed, so pre-/post-content is
 * identical. We can therefore skip looking at the
@@ -2989,7 +2984,7 @@ static void show_dirstat(struct diff_options *options)
 * made to the preimage.
 * If the resulting damage is zero, we know that
 * diffcore_count_changes() considers the two entries to
-* be identical, but since content_changed is true, we
+* be identical, but since the oid changed, we
 * know that there must have been _some_ kind of change,
 * so we force all entries to have damage > 0.
 */
-- 
2.19.0.rc0.584.g84d5b2a847


[PATCH v2 6/9] convert "hashcmp() != 0" to "!hasheq()"

2018-08-28 Thread Jeff King
This rounds out the previous three patches, covering the
inequality logic for the "hash" variant of the functions.

As with the previous three, the accompanying code changes
are the mechanical result of applying the coccinelle patch;
see those patches for more discussion.

Signed-off-by: Jeff King 
---
 builtin/index-pack.c   | 4 ++--
 builtin/show-branch.c  | 2 +-
 builtin/unpack-objects.c   | 2 +-
 commit-graph.c | 2 +-
 contrib/coccinelle/object_id.cocci | 9 +
 http-walker.c  | 2 +-
 http.c | 2 +-
 pack-check.c   | 6 +++---
 pack-write.c   | 2 +-
 packfile.c | 2 +-
 read-cache.c   | 4 ++--
 sha1-file.c| 2 +-
 12 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index edcb0a3dca..2004e25da2 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1166,7 +1166,7 @@ static void parse_pack_objects(unsigned char *hash)
/* Check pack integrity */
flush();
the_hash_algo->final_fn(hash, _ctx);
-   if (hashcmp(fill(the_hash_algo->rawsz), hash))
+   if (!hasheq(fill(the_hash_algo->rawsz), hash))
die(_("pack is corrupted (SHA1 mismatch)"));
use(the_hash_algo->rawsz);
 
@@ -1280,7 +1280,7 @@ static void conclude_pack(int fix_thin_pack, const char 
*curr_pack, unsigned cha
fixup_pack_header_footer(output_fd, pack_hash,
 curr_pack, nr_objects,
 read_hash, 
consumed_bytes-the_hash_algo->rawsz);
-   if (hashcmp(read_hash, tail_hash) != 0)
+   if (!hasheq(read_hash, tail_hash))
die(_("Unexpected tail checksum for %s "
  "(disk corruption?)"), curr_pack);
}
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 5f9432861b..65f4a4c83c 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -485,7 +485,7 @@ static void snarf_refs(int head, int remotes)
 static int rev_is_head(const char *head, const char *name,
   unsigned char *head_sha1, unsigned char *sha1)
 {
-   if (!head || (head_sha1 && sha1 && hashcmp(head_sha1, sha1)))
+   if (!head || (head_sha1 && sha1 && !hasheq(head_sha1, sha1)))
return 0;
skip_prefix(head, "refs/heads/", );
if (!skip_prefix(name, "refs/heads/", ))
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index ad438f5b41..80478808b3 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -579,7 +579,7 @@ int cmd_unpack_objects(int argc, const char **argv, const 
char *prefix)
if (fsck_finish(_options))
die(_("fsck error in pack objects"));
}
-   if (hashcmp(fill(the_hash_algo->rawsz), oid.hash))
+   if (!hasheq(fill(the_hash_algo->rawsz), oid.hash))
die("final sha1 did not match");
use(the_hash_algo->rawsz);
 
diff --git a/commit-graph.c b/commit-graph.c
index 7aed9f5371..64ce79420d 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -900,7 +900,7 @@ int verify_commit_graph(struct repository *r, struct 
commit_graph *g)
f = hashfd(devnull, NULL);
hashwrite(f, g->data, g->data_len - g->hash_len);
finalize_hashfile(f, checksum.hash, CSUM_CLOSE);
-   if (hashcmp(checksum.hash, g->data + g->data_len - g->hash_len)) {
+   if (!hasheq(checksum.hash, g->data + g->data_len - g->hash_len)) {
graph_report(_("the commit-graph file has incorrect checksum 
and is likely corrupt"));
verify_commit_graph_error = VERIFY_COMMIT_GRAPH_ERROR_HASH;
}
diff --git a/contrib/coccinelle/object_id.cocci 
b/contrib/coccinelle/object_id.cocci
index 4e1f1a7431..d8bdb48712 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -129,3 +129,12 @@ expression E1, E2;
 @@
 - oidcmp(E1, E2) != 0
 + !oideq(E1, E2)
+
+@@
+identifier f != hasheq;
+expression E1, E2;
+@@
+  f(...) {<...
+- hashcmp(E1, E2) != 0
++ !hasheq(E1, E2)
+  ...>}
diff --git a/http-walker.c b/http-walker.c
index 3a8edc7f2f..b3334bf657 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -543,7 +543,7 @@ static int fetch_object(struct walker *walker, unsigned 
char *sha1)
} else if (req->zret != Z_STREAM_END) {
walker->corrupt_object_found++;
ret = error("File %s (%s) corrupt", hex, req->url);
-   } else if (hashcmp(obj_req->oid.hash, req->real_sha1)) {
+   } else if (!hasheq(obj_req->oid.hash, req->real_sha1)) {
ret = error("File %s has bad hash", hex);
} else if (req->rename < 0) {
struct strbuf buf = STRBUF_INIT;
diff --git a/http.c b/http.c
index 4162860ee3..98ff122585 100644

[PATCH v2 7/9] convert hashmap comparison functions to oideq()

2018-08-28 Thread Jeff King
The comparison functions used for hashmaps don't care about
strict ordering; they only want to compare entries for
equality. Let's use the oideq() function instead, which can
potentially be better optimized. Note that unlike the
previous patches mass-converting calls like "!oidcmp()",
this patch could actually provide an improvement even with
the current implementation. Those comparison functions are
passed around as function pointers, so at compile-time the
compiler cannot realize that the caller (which is in another
file completely) will treat the return value as a boolean.

Note that this does change the return values in quite a
subtle way (it's still an int, but now the sign bit is
irrelevant for ordering). Because of their funny
hashmap-specific signature, it's unlikely that any of these
static functions would be reused for more generic ordering.
But to be double-sure, let's stop using "cmp" in their
names.

Calling them "eq" doesn't quite work either, because the
hashmap convention is actually _inverted_. "0" means "same",
and non-zero means "different". So I've called them "neq" by
convention here.

Signed-off-by: Jeff King 
---
 builtin/describe.c |  6 +++---
 oidmap.c   | 12 ++--
 patch-ids.c|  8 
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 1e7c75ba82..22c0541da5 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -62,7 +62,7 @@ static const char *prio_names[] = {
N_("head"), N_("lightweight"), N_("annotated"),
 };
 
-static int commit_name_cmp(const void *unused_cmp_data,
+static int commit_name_neq(const void *unused_cmp_data,
   const void *entry,
   const void *entry_or_key,
   const void *peeled)
@@ -70,7 +70,7 @@ static int commit_name_cmp(const void *unused_cmp_data,
const struct commit_name *cn1 = entry;
const struct commit_name *cn2 = entry_or_key;
 
-   return oidcmp(>peeled, peeled ? peeled : >peeled);
+   return !oideq(>peeled, peeled ? peeled : >peeled);
 }
 
 static inline struct commit_name *find_commit_name(const struct object_id 
*peeled)
@@ -596,7 +596,7 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
return cmd_name_rev(args.argc, args.argv, prefix);
}
 
-   hashmap_init(, commit_name_cmp, NULL, 0);
+   hashmap_init(, commit_name_neq, NULL, 0);
for_each_rawref(get_name, NULL);
if (!hashmap_get_size() && !always)
die(_("No names found, cannot describe anything."));
diff --git a/oidmap.c b/oidmap.c
index d9fb19ba6a..b0841a0f58 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -1,14 +1,14 @@
 #include "cache.h"
 #include "oidmap.h"
 
-static int cmpfn(const void *hashmap_cmp_fn_data,
-const void *entry, const void *entry_or_key,
-const void *keydata)
+static int oidmap_neq(const void *hashmap_cmp_fn_data,
+ const void *entry, const void *entry_or_key,
+ const void *keydata)
 {
const struct oidmap_entry *entry_ = entry;
if (keydata)
-   return oidcmp(_->oid, (const struct object_id *) keydata);
-   return oidcmp(_->oid,
+   return !oideq(_->oid, (const struct object_id *) keydata);
+   return !oideq(_->oid,
  &((const struct oidmap_entry *) entry_or_key)->oid);
 }
 
@@ -21,7 +21,7 @@ static int hash(const struct object_id *oid)
 
 void oidmap_init(struct oidmap *map, size_t initial_size)
 {
-   hashmap_init(>map, cmpfn, NULL, initial_size);
+   hashmap_init(>map, oidmap_neq, NULL, initial_size);
 }
 
 void oidmap_free(struct oidmap *map, int free_entries)
diff --git a/patch-ids.c b/patch-ids.c
index 8f7c25d5db..960ea24054 100644
--- a/patch-ids.c
+++ b/patch-ids.c
@@ -28,14 +28,14 @@ int commit_patch_id(struct commit *commit, struct 
diff_options *options,
 /*
  * When we cannot load the full patch-id for both commits for whatever
  * reason, the function returns -1 (i.e. return error(...)). Despite
- * the "cmp" in the name of this function, the caller only cares about
+ * the "neq" in the name of this function, the caller only cares about
  * the return value being zero (a and b are equivalent) or non-zero (a
  * and b are different), and returning non-zero would keep both in the
  * result, even if they actually were equivalent, in order to err on
  * the side of safety.  The actual value being negative does not have
  * any significance; only that it is non-zero matters.
  */
-static int patch_id_cmp(const void *cmpfn_data,
+static int patch_id_neq(const void *cmpfn_data,
const void *entry,
const void *entry_or_key,
const void *unused_keydata)
@@ -53,7 +53,7 @@ static int patch_id_cmp(const void *cmpfn_data,
commit_patch_id(b->commit, opt, >patch_id, 0))
  

[PATCH v2 4/9] convert "hashcmp() == 0" to hasheq()

2018-08-28 Thread Jeff King
This is the partner patch to the previous one, but covering
the "hash" variants instead of "oid".  Note that our
coccinelle rule is slightly more complex to avoid triggering
the call in hasheq().

I didn't bother to add a new rule to convert:

  - hasheq(E1->hash, E2->hash)
  + oideq(E1, E2)

Since these are new functions, there won't be any such
existing callers. And since most of the code is already
using oideq, we're not likely to introduce new ones.

We might still see "!hashcmp(E1->hash, E2->hash)" from topics
in flight. But because our new rule comes after the existing
ones, that should first get converted to "!oidcmp(E1, E2)"
and then to "oideq(E1, E2)".

Signed-off-by: Jeff King 
---
 builtin/fetch.c|  2 +-
 cache.h|  8 
 contrib/coccinelle/object_id.cocci |  9 +
 http-walker.c  |  2 +-
 notes.c|  2 +-
 object.c   |  2 +-
 pack-objects.c |  2 +-
 packfile.c | 10 +-
 8 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 32b1d5d383..84e0e8080f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -238,7 +238,7 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
 {
struct ref *rm = *head;
while (rm) {
-   if (!hashcmp(rm->old_oid.hash, sha1))
+   if (hasheq(rm->old_oid.hash, sha1))
return 1;
rm = rm->next;
}
diff --git a/cache.h b/cache.h
index d090f71706..d97db26bb6 100644
--- a/cache.h
+++ b/cache.h
@@ -1053,12 +1053,12 @@ static inline int oideq(const struct object_id *oid1, 
const struct object_id *oi
 
 static inline int is_null_sha1(const unsigned char *sha1)
 {
-   return !hashcmp(sha1, null_sha1);
+   return hasheq(sha1, null_sha1);
 }
 
 static inline int is_null_oid(const struct object_id *oid)
 {
-   return !hashcmp(oid->hash, null_sha1);
+   return hasheq(oid->hash, null_sha1);
 }
 
 static inline void hashcpy(unsigned char *sha_dst, const unsigned char 
*sha_src)
@@ -1095,7 +1095,7 @@ static inline void oidread(struct object_id *oid, const 
unsigned char *hash)
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
 {
-   return !hashcmp(sha1, the_hash_algo->empty_blob->hash);
+   return hasheq(sha1, the_hash_algo->empty_blob->hash);
 }
 
 static inline int is_empty_blob_oid(const struct object_id *oid)
@@ -1105,7 +1105,7 @@ static inline int is_empty_blob_oid(const struct 
object_id *oid)
 
 static inline int is_empty_tree_sha1(const unsigned char *sha1)
 {
-   return !hashcmp(sha1, the_hash_algo->empty_tree->hash);
+   return hasheq(sha1, the_hash_algo->empty_tree->hash);
 }
 
 static inline int is_empty_tree_oid(const struct object_id *oid)
diff --git a/contrib/coccinelle/object_id.cocci 
b/contrib/coccinelle/object_id.cocci
index 548c02336d..d90ba8a040 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -114,3 +114,12 @@ expression E1, E2;
 @@
 - oidcmp(E1, E2) == 0
 + oideq(E1, E2)
+
+@@
+identifier f != hasheq;
+expression E1, E2;
+@@
+  f(...) {<...
+- hashcmp(E1, E2) == 0
++ hasheq(E1, E2)
+  ...>}
diff --git a/http-walker.c b/http-walker.c
index 7cdfb2f24c..3a8edc7f2f 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -483,7 +483,7 @@ static int fetch_object(struct walker *walker, unsigned 
char *sha1)
 
list_for_each(pos, head) {
obj_req = list_entry(pos, struct object_request, node);
-   if (!hashcmp(obj_req->oid.hash, sha1))
+   if (hasheq(obj_req->oid.hash, sha1))
break;
}
if (obj_req == NULL)
diff --git a/notes.c b/notes.c
index b3386d6c36..33d16c1ec3 100644
--- a/notes.c
+++ b/notes.c
@@ -147,7 +147,7 @@ static struct leaf_node *note_tree_find(struct notes_tree 
*t,
void **p = note_tree_search(t, , , key_sha1);
if (GET_PTR_TYPE(*p) == PTR_TYPE_NOTE) {
struct leaf_node *l = (struct leaf_node *) CLR_PTR_TYPE(*p);
-   if (!hashcmp(key_sha1, l->key_oid.hash))
+   if (hasheq(key_sha1, l->key_oid.hash))
return l;
}
return NULL;
diff --git a/object.c b/object.c
index 51c4594515..e54160550c 100644
--- a/object.c
+++ b/object.c
@@ -95,7 +95,7 @@ struct object *lookup_object(struct repository *r, const 
unsigned char *sha1)
 
first = i = hash_obj(sha1, r->parsed_objects->obj_hash_size);
while ((obj = r->parsed_objects->obj_hash[i]) != NULL) {
-   if (!hashcmp(sha1, obj->oid.hash))
+   if (hasheq(sha1, obj->oid.hash))
break;
i++;
if (i == r->parsed_objects->obj_hash_size)
diff --git a/pack-objects.c b/pack-objects.c
index 6ef87e5683..2bc7626997 100644
--- a/pack-objects.c
+++ 

[PATCH v2 2/9] introduce hasheq() and oideq()

2018-08-28 Thread Jeff King
The main comparison functions we provide for comparing
object ids are hashcmp() and oidcmp(). These are more
flexible than a strict equality check, since they also
express ordering. That makes them useful for sorting and
binary searching. However, it also makes them potentially
slower than a strict equality check. Consider this C code,
which is traditionally what our hashcmp has looked like:

  #include 
  int hashcmp(const unsigned char *a, const unsigned char *b)
  {
  return memcmp(a, b, 20);
  }

Compiling with "gcc -O2 -S -fverbose-asm", the generated
assembly shows that we actually call memcmp(). But if we
change this to a strict equality check:

  return !memcmp(a, b, 20);

we get a faster inline version:

  movq(%rdi), %rax# MEM[(void *)a_4(D)], MEM[(void *)a_4(D)]
  movq8(%rdi), %rdx   # MEM[(void *)a_4(D)], tmp101
  xorq(%rsi), %rax# MEM[(void *)b_5(D)], tmp94
  xorq8(%rsi), %rdx   # MEM[(void *)b_5(D)], tmp93
  orq %rax, %rdx  # tmp94, tmp93
  jne .L2 #,
  movl16(%rsi), %eax  # MEM[(void *)b_5(D)], tmp104
  cmpl%eax, 16(%rdi)  # tmp104, MEM[(void *)a_4(D)]
  je  .L5 #,

Obviously our hashcmp() doesn't include the "!". But because
it's an inline function, optimizing compilers are able to
see "!hashcmp(a,b)" in calling code and take advantage of
this case. So there has been no value thus far in
introducing a more restricted interface for doing strict
equality checks.

But as Git learns about more values for the_hash_algo, our
hashcmp() will grow more complicated and may even delegate
at runtime to functions optimized specifically for that hash
size. That breaks the inline connection we have, and the
compiler will have to assume that the caller really cares
about the sign and magnitude of the memcmp() result, even
though the vast majority don't.

We can solve that by introducing a hasheq() function (and
matching oideq() wrapper), which callers can use to make it
clear that they only care about equality. For now, the
implementation will literally be "!hashcmp()", but it frees
us up later to introduce code optimized specifically for the
equality check.

Signed-off-by: Jeff King 
---
 cache.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/cache.h b/cache.h
index 4d014541ab..f6d227fac7 100644
--- a/cache.h
+++ b/cache.h
@@ -1041,6 +1041,16 @@ static inline int oidcmp(const struct object_id *oid1, 
const struct object_id *o
return hashcmp(oid1->hash, oid2->hash);
 }
 
+static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
+{
+   return !hashcmp(sha1, sha2);
+}
+
+static inline int oideq(const struct object_id *oid1, const struct object_id 
*oid2)
+{
+   return hasheq(oid1->hash, oid2->hash);
+}
+
 static inline int is_null_sha1(const unsigned char *sha1)
 {
return !hashcmp(sha1, null_sha1);
-- 
2.19.0.rc0.584.g84d5b2a847



[PATCH v2 3/9] convert "oidcmp() == 0" to oideq()

2018-08-28 Thread Jeff King
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.

The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).

This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.

I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).

Signed-off-by: Jeff King 
---
 bisect.c   |  4 ++--
 blame.c|  4 ++--
 builtin/am.c   |  2 +-
 builtin/checkout.c |  2 +-
 builtin/describe.c |  4 ++--
 builtin/diff.c |  2 +-
 builtin/difftool.c |  4 ++--
 builtin/fast-export.c  |  2 +-
 builtin/fetch.c|  4 ++--
 builtin/fmt-merge-msg.c|  2 +-
 builtin/index-pack.c   |  4 ++--
 builtin/log.c  |  6 +++---
 builtin/merge-tree.c   |  2 +-
 builtin/merge.c|  4 ++--
 builtin/pack-objects.c |  4 ++--
 builtin/pull.c |  2 +-
 builtin/receive-pack.c |  4 ++--
 builtin/remote.c   |  2 +-
 builtin/replace.c  |  6 +++---
 builtin/unpack-objects.c   |  2 +-
 builtin/update-index.c |  4 ++--
 bulk-checkin.c |  2 +-
 cache-tree.c   |  2 +-
 cache.h|  4 ++--
 combine-diff.c |  4 ++--
 commit-graph.c |  2 +-
 connect.c  |  2 +-
 contrib/coccinelle/object_id.cocci |  6 ++
 diff-lib.c |  2 +-
 diff.c |  6 +++---
 diffcore-break.c   |  2 +-
 fast-import.c  |  6 +++---
 http-push.c|  2 +-
 log-tree.c |  6 +++---
 merge-recursive.c  |  4 ++--
 notes-merge.c  | 24 +++---
 notes.c|  4 ++--
 pack-write.c   |  2 +-
 read-cache.c   |  2 +-
 ref-filter.c   |  2 +-
 refs/files-backend.c   |  4 ++--
 remote.c   |  6 +++---
 revision.c |  2 +-
 sequencer.c| 32 +++---
 sha1-array.c   |  2 +-
 sha1-file.c|  4 ++--
 sha1-name.c|  2 +-
 submodule.c|  2 +-
 transport.c|  2 +-
 unpack-trees.c |  6 +++---
 wt-status.c| 10 +-
 xdiff-interface.c  |  2 +-
 52 files changed, 117 insertions(+), 111 deletions(-)

diff --git a/bisect.c b/bisect.c
index e1275ba79e..41c56a665e 100644
--- a/bisect.c
+++ b/bisect.c
@@ -807,7 +807,7 @@ static void check_merge_bases(int rev_nr, struct commit 
**rev, int no_checkout)
 
for (; result; result = result->next) {
const struct object_id *mb = >item->object.oid;
-   if (!oidcmp(mb, current_bad_oid)) {
+   if (oideq(mb, current_bad_oid)) {
handle_bad_merge_base();
} else if (0 <= oid_array_lookup(_revs, mb)) {
continue;
@@ -988,7 +988,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
 
bisect_rev = >item->object.oid;
 
-   if (!oidcmp(bisect_rev, current_bad_oid)) {
+   if (oideq(bisect_rev, current_bad_oid)) {
exit_if_skipped_commits(tried, current_bad_oid);
printf("%s is the first %s commit\n", oid_to_hex(bisect_rev),
term_bad);
diff --git a/blame.c b/blame.c
index aca06f4b12..c47d7050d9 100644
--- a/blame.c
+++ b/blame.c
@@ -1457,14 +1457,14 @@ static void pass_blame(struct blame_scoreboard *sb, 
struct blame_origin *origin,
porigin = find(p, origin);
if (!porigin)
continue;
-   if (!oidcmp(>blob_oid, >blob_oid)) {
+   if (oideq(>blob_oid, >blob_oid)) {
 

[PATCH v2 1/9] coccinelle: use <...> for function exclusion

2018-08-28 Thread Jeff King
Sometimes we want to suppress a coccinelle transformation
inside a particular function. For example, in finding
conversions of hashcmp() to oidcmp(), we should not convert
the call in oidcmp() itself, since that would cause infinite
recursion. We write that like this:

  @@
  identifier f != oidcmp;
  expression E1, E2;
  @@
f(...) {...
  - hashcmp(E1->hash, E2->hash)
  + oidcmp(E1, E2)
...}

to match the interior of any function _except_ oidcmp().

Unfortunately, this doesn't catch all cases (e.g., the one
in sequencer.c that this patch fixes). The problem, as
explained by one of the Coccinelle developers in [1], is:

  For transformation, A ... B requires that B occur on every
  execution path starting with A, unless that execution path
  ends up in error handling code.  (eg, if (...) { ...
  return; }).  Here your A is the start of the function.  So
  you need a call to hashcmp on every path through the
  function, which fails when you add ifs.

  [...]

  Another issue with A ... B is that by default A and B
  should not appear in the matched region.  So your original
  rule matches only the case where every execution path
  contains exactly one call to hashcmp, not more than one.

One way to solve this is to put the pattern inside an
angle-bracket pattern like "<... P ...>", which allows zero
or more matches of P. That works (and is what this patch
does), but it has one drawback: it matches more than we care
about, and Coccinelle uses extra CPU. Here are timings for
"make coccicheck" before and after this patch:

  [before]
  real  1m27.122s
  user  7m34.451s
  sys   0m37.330s

  [after]
  real  2m18.040s
  user  10m58.310s
  sys   0m41.549s

That's not ideal, but it's more important for this to be
correct than to be fast. And coccicheck is already fairly
slow (and people don't run it for every single patch). So
it's an acceptable tradeoff.

There _is_ a better way to do it, which is to record the
position at which we find hashcmp(), and then check it
against the forbidden function list. Like:

  @@
  position p : script:python() { p[0].current_element != "oidcmp" };
  expression E1,E2;
  @@
  - hashcmp@p(E1->hash, E2->hash)
  + oidcmp(E1, E2)

This is only a little slower than the current code, and does
the right thing in all cases. Unfortunately, not all builds
of Coccinelle include python support (including the ones in
Debian). Requiring it may mean that fewer people can easily
run the tool, which is worse than it simply being a little
slower.

We may want to revisit this decision in the future if:

  - builds with python become more common

  - we find more uses for python support that tip the
cost-benefit analysis

But for now this patch sticks with the angle-bracket
solution, and converts all existing cocci patches. This
fixes only one missed case in the current code, though it
makes a much better difference for some new rules I'm adding
(converting "!hashcmp()" to "hasheq()" misses over half the
possible conversions using the old form).

[1] https://public-inbox.org/git/alpine.DEB.2.21.1808240652370.2344@hadrien/

Helped-by: Julia Lawall 
Signed-off-by: Jeff King 
---
 contrib/coccinelle/commit.cocci|  4 ++--
 contrib/coccinelle/object_id.cocci | 20 ++--
 sequencer.c|  2 +-
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/contrib/coccinelle/commit.cocci b/contrib/coccinelle/commit.cocci
index aec3345adb..c49aa558f0 100644
--- a/contrib/coccinelle/commit.cocci
+++ b/contrib/coccinelle/commit.cocci
@@ -15,10 +15,10 @@ expression c;
 identifier f !~ 
"^(get_commit_tree|get_commit_tree_in_graph_one|load_tree_for_commit)$";
 expression c;
 @@
-  f(...) {...
+  f(...) {<...
 - c->maybe_tree
 + get_commit_tree(c)
-  ...}
+  ...>}
 
 @@
 expression c;
diff --git a/contrib/coccinelle/object_id.cocci 
b/contrib/coccinelle/object_id.cocci
index 09afdbf994..5869979be7 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -20,10 +20,10 @@ expression E1;
 identifier f != oid_to_hex;
 expression E1;
 @@
-  f(...) {...
+  f(...) {<...
 - sha1_to_hex(E1->hash)
 + oid_to_hex(E1)
-  ...}
+  ...>}
 
 @@
 expression E1, E2;
@@ -35,10 +35,10 @@ expression E1, E2;
 identifier f != oid_to_hex_r;
 expression E1, E2;
 @@
-   f(...) {...
+   f(...) {<...
 - sha1_to_hex_r(E1, E2->hash)
 + oid_to_hex_r(E1, E2)
-  ...}
+  ...>}
 
 @@
 expression E1;
@@ -50,10 +50,10 @@ expression E1;
 identifier f != oidclr;
 expression E1;
 @@
-  f(...) {...
+  f(...) {<...
 - hashclr(E1->hash)
 + oidclr(E1)
-  ...}
+  ...>}
 
 @@
 expression E1, E2;
@@ -65,10 +65,10 @@ expression E1, E2;
 identifier f != oidcmp;
 expression E1, E2;
 @@
-  f(...) {...
+  f(...) {<...
 - hashcmp(E1->hash, E2->hash)
 + oidcmp(E1, E2)
-  ...}
+  ...>}
 
 @@
 expression E1, E2;
@@ -92,10 +92,10 @@ expression E1, E2;
 identifier f != oidcpy;
 expression E1, E2;
 @@
-  f(...) {...
+  f(...) {<...
 - hashcpy(E1->hash, E2->hash)
 + oidcpy(E1, E2)
-  ...}

[PATCH] doc: Don't echo sed command for manpage-base-url.xsl

2018-08-28 Thread Tim Schumacher
Previously, the sed command for generating manpage-base-url.xsl
was printed to the console when being run.

For the purpose of silencing it, define a $(QUIET) variable which
contains an '@' if verbose mode isn't enabled and which is empty
otherwise. This just silences the command invocation without doing
anything else.

Signed-off-by: Tim Schumacher 
---
 Documentation/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index a42dcfc74..45454e9b5 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -217,6 +217,7 @@ endif
 
 ifneq ($(findstring $(MAKEFLAGS),s),s)
 ifndef V
+   QUIET   = @
QUIET_ASCIIDOC  = @echo '   ' ASCIIDOC $@;
QUIET_XMLTO = @echo '   ' XMLTO $@;
QUIET_DB2TEXI   = @echo '   ' DB2TEXI $@;
@@ -344,7 +345,7 @@ $(OBSOLETE_HTML): %.html : %.txto asciidoc.conf
mv $@+ $@
 
 manpage-base-url.xsl: manpage-base-url.xsl.in
-   sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
+   $(QUIET)sed "s|@@MAN_BASE_URL@@|$(MAN_BASE_URL)|" $< > $@
 
 %.1 %.5 %.7 : %.xml manpage-base-url.xsl
$(QUIET_XMLTO)$(RM) $@ && \
-- 
2.19.0.rc0



Re: [PATCH 0/9] introducing oideq()

2018-08-28 Thread Jeff King
On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote:

> > Due to the simplicity of the current code and our inlining, the compiler
> > can usually figure this out for now. So I wouldn't expect this patch to
> > actually improve performance right away. But as that discussion shows,
> > we are likely to take a performance hit as we move to more runtime
> > determination of the_hash_algo parameters. Having these callers use the
> > more strict form will potentially help us recover that.
> > 
> > So in that sense we _could_ simply punt on this series until then (and
> > it's certainly post-v2.19 material). But I think it's worth doing now,
> > simply from a readability/annotation standpoint. IMHO the resulting code
> > is more clear (though I've long since taught myself to read !foocmp() as
> > equality).
> 
> I would quite like to see this series picked up for v2.20.  If we want
> to minimize performance regressions with the SHA-256 work, I think it's
> important.

Thanks. One of the things I was worried about was causing unnecessary
conflicts with existing topics, including your work. But if everybody is
on board, I'd be happy to see this go in early in the next release cycle
(the longer we wait, the more annoying conflicts Junio has to resolve).

> Applying the following patch on top of this series causes gcc to inline
> both branches, which is pretty much the best we can expect.  I haven't
> benchmarked it, though, so I can't say what the actual performance
> consequence is.
> [...]
> diff --git a/cache.h b/cache.h
> index 3bb88ac6d0..1c182c6ef6 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1033,7 +1033,10 @@ static inline int oidcmp(const struct object_id *oid1, 
> const struct object_id *o
>  
>  static inline int hasheq(const unsigned char *sha1, const unsigned char 
> *sha2)
>  {
> - return !hashcmp(sha1, sha2);
> + if (the_hash_algo->rawsz == 32) {
> + return !memcmp(sha1, sha2, 32);
> + }
> + return !memcmp(sha1, sha2, 20);
>  }

Yeah, I can reproduce that, too.  I'm actually surprised that's enough
to convince the compiler to inline, based on my earlier tests. I wonder
if it is because it is more able to see the boolean nature of the
memcmp() (i.e., in the earlier versions it was not willing to carry that
information across the conditional boundary for some reason). Or perhaps
it is that this is written slightly differently from my earlier case,
where the fallback did not say "32" explicitly, but rather:

  return memcmp(sha1, sha2, the_hash_algo->rawsz);

For your form, we'd probably want to add a third case arm with a BUG(),
just in case.

> As for this series itself, other than the typos people have pointed out,
> it looks good to me.

Thanks. Here it is again with those typos fixed, but no changes to the
patches themselves.

 1:  7bcf611959 =  1:  7bcf611959 coccinelle: use <...> for function exclusion
 2:  6025ebe5d1 !  2:  0e90944f5a introduce hasheq() and oideq()
@@ -5,11 +5,10 @@
 The main comparison functions we provide for comparing
 object ids are hashcmp() and oidcmp(). These are more
 flexible than a strict equality check, since they also
-express ordering. That makes them them useful for sorting
-and binary searching. However, it also makes them
-potentially slower than a strict equality check. Consider
-this C code, which is traditionally what our hashcmp has
-looked like:
+express ordering. That makes them useful for sorting and
+binary searching. However, it also makes them potentially
+slower than a strict equality check. Consider this C code,
+which is traditionally what our hashcmp has looked like:
 
   #include 
   int hashcmp(const unsigned char *a, const unsigned char *b)
@@ -51,7 +50,7 @@
 though the vast majority don't.
 
 We can solve that by introducing a hasheq() function (and
-matching oideq() wrapper), which callers can use to make
+matching oideq() wrapper), which callers can use to make it
 clear that they only care about equality. For now, the
 implementation will literally be "!hashcmp()", but it frees
 us up later to introduce code optimized specifically for the
 3:  35a05a7e5c =  3:  4f1ea17e79 convert "oidcmp() == 0" to oideq()
 4:  b18fc3994d =  4:  6614cc71e4 convert "hashcmp() == 0" to hasheq()
 5:  0fbcfcf7cc =  5:  c3dd2b2c80 convert "oidcmp() != 0" to "!oideq()"
 6:  32e169b886 =  6:  c94914f1ef convert "hashcmp() != 0" to "!hasheq()"
 7:  b90051c38b =  7:  c773fa9ca4 convert hashmap comparison functions to 
oideq()
 8:  b1e6ad80f0 =  8:  ada5809fab read-cache: use oideq() in ce_compare 
functions
 9:  18be86e078 !  9:  62f24d63a4 show_dirstat: simplify same-content check
@@ -2,7 +2,7 @@
 
 show_dirstat: simplify same-content check
 
-We two nested conditionals to store a content_changed
+We use two 

[PATCH 5/9] worktree: disallow adding same path multiple times

2018-08-28 Thread Eric Sunshine
A given path should only ever be associated with a single registered
worktree. This invariant is enforced by refusing to create a new
worktree at a given path if that path already exists. For example:

$ git worktree add -q --detach foo
$ git worktree add -q --detach foo
fatal: 'foo' already exists

However, the check can be fooled, and the invariant broken, if the
path is missing. Continuing the example:

$ rm -fr foo
$ git worktree add -q --detach foo
$ git worktree list
...  eadebfe [master]
.../foo  eadebfe (detached HEAD)
.../foo  eadebfe (detached HEAD)

This "corruption" leads to the unfortunate situation in which the
worktree can not be removed:

$ git worktree remove foo
fatal: validation failed, cannot remove working tree: '.../foo'
does not point back to '.git/worktrees/foo'

Nor can the bogus entry be pruned:

$ git worktree prune -v
$ git worktree list
...  eadebfe [master]
.../foo  eadebfe (detached HEAD)
.../foo  eadebfe (detached HEAD)

without first deleting the worktree directory manually:

$ rm -fr foo
$ git worktree prune -v
Removing .../foo: gitdir file points to non-existent location
Removing .../foo1: gitdir file points to non-existent location
$ git worktree list
...  eadebfe [master]

or by manually deleting the worktree entry in .git/worktrees.

To address this problem, upgrade "git worktree add" validation to
allow worktree creation only if the given path is not already
associated with an existing worktree (even if the path itself is
non-existent), thus preventing such bogus worktree entries from being
created in the first place.

Reported-by: Jeff King 
Signed-off-by: Eric Sunshine 
---
 builtin/worktree.c  | 25 +
 t/t2025-worktree-add.sh |  7 +++
 2 files changed, 32 insertions(+)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 46c93771e8..1122f27b5f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -221,8 +221,33 @@ static const char *worktree_basename(const char *path, int 
*olen)
 
 static void validate_worktree_add(const char *path, const struct add_opts 
*opts)
 {
+   struct worktree **worktrees;
+   struct worktree *wt;
+   int locked;
+
if (file_exists(path) && !is_empty_dir(path))
die(_("'%s' already exists"), path);
+
+   worktrees = get_worktrees(0);
+   /*
+* find_worktree()'s suffix matching may undesirably find the main
+* rather than a linked worktree (for instance, when the basenames
+* of the main worktree and the one being created are the same).
+* We're only interested in linked worktrees, so skip the main
+* worktree with +1.
+*/
+   wt = find_worktree(worktrees + 1, NULL, path);
+   if (!wt)
+   goto done;
+
+   locked = !!is_worktree_locked(wt);
+   if (locked)
+   die(_("'%s' is a missing but locked worktree;\nuse 'unlock' and 
'prune' or 'remove' to clear"), path);
+   else
+   die(_("'%s' is a missing but already registered worktree;\nuse 
'prune' or 'remove' to clear"), path);
+
+done:
+   free_worktrees(worktrees);
 }
 
 static int add_worktree(const char *path, const char *refname,
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 07d292317c..67fccc6591 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -552,4 +552,11 @@ test_expect_success '"add" in bare repo invokes 
post-checkout hook' '
test_cmp hook.expect goozy/hook.actual
 '
 
+test_expect_success '"add" an existing but missing worktree' '
+   git worktree add --detach pneu &&
+   test_must_fail git worktree add --detach pneu &&
+   rm -fr pneu &&
+   test_must_fail git worktree add --detach pneu
+'
+
 test_done
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 4/9] worktree: prepare for more checks of whether path can become worktree

2018-08-28 Thread Eric Sunshine
Certain conditions must be met for a path to be a valid candidate as the
location of a new worktree; for instance, the path must not exist or
must be an empty directory. Although the number of conditions is small,
new conditions will soon be added so factor out the existing checks into
a separate function to avoid further bloating add_worktree().

Signed-off-by: Eric Sunshine 
---
 builtin/worktree.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 0affcb476c..46c93771e8 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -219,6 +219,12 @@ static const char *worktree_basename(const char *path, int 
*olen)
return name;
 }
 
+static void validate_worktree_add(const char *path, const struct add_opts 
*opts)
+{
+   if (file_exists(path) && !is_empty_dir(path))
+   die(_("'%s' already exists"), path);
+}
+
 static int add_worktree(const char *path, const char *refname,
const struct add_opts *opts)
 {
@@ -233,8 +239,7 @@ static int add_worktree(const char *path, const char 
*refname,
struct commit *commit = NULL;
int is_branch = 0;
 
-   if (file_exists(path) && !is_empty_dir(path))
-   die(_("'%s' already exists"), path);
+   validate_worktree_add(path, opts);
 
/* is 'refname' a branch or commit? */
if (!opts->detach && !strbuf_check_branch_ref(, refname) &&
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 9/9] worktree: delete .git/worktrees if empty after 'remove'

2018-08-28 Thread Eric Sunshine
For cleanliness, "git worktree prune" deletes the .git/worktrees
directory if it is empty after pruning is complete.

For consistency, make "git worktree remove " likewise delete
.git/worktrees if it is empty after the removal.

Signed-off-by: Eric Sunshine 
---
 builtin/worktree.c   |  8 +++-
 t/t2028-worktree-move.sh | 12 
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index a95fe67da6..c4abbde2b8 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -62,6 +62,11 @@ static int delete_git_dir(const char *id)
return ret;
 }
 
+static void delete_worktrees_dir_if_empty(void)
+{
+   rmdir(git_path("worktrees")); /* ignore failed removal */
+}
+
 static int prune_worktree(const char *id, struct strbuf *reason)
 {
struct stat st;
@@ -149,7 +154,7 @@ static void prune_worktrees(void)
}
closedir(dir);
if (!show_only)
-   rmdir(git_path("worktrees"));
+   delete_worktrees_dir_if_empty();
strbuf_release();
 }
 
@@ -918,6 +923,7 @@ static int remove_worktree(int ac, const char **av, const 
char *prefix)
 * from here.
 */
ret |= delete_git_dir(wt->id);
+   delete_worktrees_dir_if_empty();
 
free_worktrees(worktrees);
return ret;
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index 1b5079e8fa..33c0337733 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -173,4 +173,16 @@ test_expect_success 'remove locked worktree (force)' '
git worktree remove --force --force gumby
 '
 
+test_expect_success 'remove cleans up .git/worktrees when empty' '
+   git init moog &&
+   (
+   cd moog &&
+   test_commit bim &&
+   git worktree add --detach goom &&
+   test_path_exists .git/worktrees &&
+   git worktree remove goom &&
+   test_path_is_missing .git/worktrees
+   )
+'
+
 test_done
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 6/9] worktree: teach 'add' to respect --force for registered but missing path

2018-08-28 Thread Eric Sunshine
For safety, "git worktree add " will refuse to add a new
worktree at  if  is already associated with a worktree
entry, even if  is missing (for instance, has been deleted or
resides on non-mounted removable media or network share). The typical
way to re-create a worktree at  in such a situation is either to
prune all "broken" entries ("git worktree prune") or to selectively
remove the worktree entry manually ("git worktree remove ").

However, neither of these approaches ("prune" nor "remove") is
especially convenient, and they may be unsuitable for scripting when a
tool merely wants to re-use a worktree if it exists or create it from
scratch if it doesn't (much as a tool might use "mkdir -p" to re-use
or create a directory).

Therefore, teach 'add' to respect --force as a convenient way to
re-use a path already associated with a worktree entry if the path is
non-existent. For a locked worktree, require --force to be specified
twice.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-worktree.txt |  8 ++--
 builtin/worktree.c | 10 --
 t/t2025-worktree-add.sh| 13 -
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 29a5b7e252..8537692f05 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -120,8 +120,12 @@ OPTIONS
 --force::
By default, `add` refuses to create a new working tree when
`` is a branch name and is already checked out by
-   another working tree and `remove` refuses to remove an unclean
-   working tree. This option overrides these safeguards.
+   another working tree, or if `` is already assigned to some
+   working tree but is missing (for instance, if `` was deleted
+   manually). This option overrides these safeguards. To add a missing but
+   locked working tree path, specify `--force` twice.
++
+`remove` refuses to remove an unclean working tree unless `--force` is used.
 
 -b ::
 -B ::
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 1122f27b5f..3eb2f89b0f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -241,10 +241,16 @@ static void validate_worktree_add(const char *path, const 
struct add_opts *opts)
goto done;
 
locked = !!is_worktree_locked(wt);
+   if ((!locked && opts->force) || (locked && opts->force > 1)) {
+   if (delete_git_dir(wt->id))
+   die(_("unable to re-add worktree '%s'"), path);
+   goto done;
+   }
+
if (locked)
-   die(_("'%s' is a missing but locked worktree;\nuse 'unlock' and 
'prune' or 'remove' to clear"), path);
+   die(_("'%s' is a missing but locked worktree;\nuse 'add -f -f' 
to override, or 'unlock' and 'prune' or 'remove' to clear"), path);
else
-   die(_("'%s' is a missing but already registered worktree;\nuse 
'prune' or 'remove' to clear"), path);
+   die(_("'%s' is a missing but already registered worktree;\nuse 
'add -f' to override, or 'prune' or 'remove' to clear"), path);
 
 done:
free_worktrees(worktrees);
diff --git a/t/t2025-worktree-add.sh b/t/t2025-worktree-add.sh
index 67fccc6591..286bba35d8 100755
--- a/t/t2025-worktree-add.sh
+++ b/t/t2025-worktree-add.sh
@@ -556,7 +556,18 @@ test_expect_success '"add" an existing but missing 
worktree' '
git worktree add --detach pneu &&
test_must_fail git worktree add --detach pneu &&
rm -fr pneu &&
-   test_must_fail git worktree add --detach pneu
+   test_must_fail git worktree add --detach pneu &&
+   git worktree add --force --detach pneu
+'
+
+test_expect_success '"add" an existing locked but missing worktree' '
+   git worktree add --detach gnoo &&
+   git worktree lock gnoo &&
+   test_when_finished "git worktree unlock gnoo || :" &&
+   rm -fr gnoo &&
+   test_must_fail git worktree add --detach gnoo &&
+   test_must_fail git worktree add --force --detach gnoo &&
+   git worktree add --force --force --detach gnoo
 '
 
 test_done
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 0/9] worktree: fix bugs and broaden --force applicability

2018-08-28 Thread Eric Sunshine
This series started as a fix for a bug reported by Peff[1] in which the
"database" of worktrees could be corrupted (or at least become
internally inconsistent) by having multiple worktree entries associated
with the same path.

Peff's particular use-case for git-worktree is Documentation/doc-diff
which wants to re-use a worktree if it exists or create it anew if it
doesn't. Unfortunately, this is a bit more difficult than it should be
if the worktree directory is deleted manually (without pruning the
worktree entry) between script invocations. To simplify this use-case
for tools, it was suggested[2] that "git worktree add --force" could
deal with the problem of a registered-but-missing worktree (much as a
tool might rely upon "mkdir -p" to create or re-use an existing
directory). This series implements that proposal, as well.

Fixing the original bug revealed another existing bug, and after several
additional "while we're here" changes, the series ended up a bit longer
than expected.

[1]: https://public-inbox.org/git/20180821193556.ga...@sigill.intra.peff.net/
[2]: 
https://public-inbox.org/git/capig+ctghgbbo5vfzn+vp2vm00npkhuqm0douqo37arxrax...@mail.gmail.com/

Eric Sunshine (9):
  worktree: don't die() in library function find_worktree()
  worktree: move delete_git_dir() earlier in file for upcoming new
callers
  worktree: generalize delete_git_dir() to reduce code duplication
  worktree: prepare for more checks of whether path can become worktree
  worktree: disallow adding same path multiple times
  worktree: teach 'add' to respect --force for registered but missing
path
  worktree: teach 'move' to override lock when --force given twice
  worktree: teach 'remove' to override lock when --force given twice
  worktree: delete .git/worktrees if empty after 'remove'

 Documentation/git-worktree.txt |  12 +++-
 builtin/worktree.c | 113 ++---
 t/t2025-worktree-add.sh|  18 ++
 t/t2028-worktree-move.sh   |  44 +
 worktree.c |   6 +-
 5 files changed, 154 insertions(+), 39 deletions(-)

-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 7/9] worktree: teach 'move' to override lock when --force given twice

2018-08-28 Thread Eric Sunshine
For consistency with "add -f -f", which allows a missing but locked
worktree path to be re-used, allow "move -f -f" to override a lock,
as well, as a convenience.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-worktree.txt |  3 +++
 builtin/worktree.c | 13 +
 t/t2028-worktree-move.sh   | 14 ++
 3 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index 8537692f05..d08b8d8e4f 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -125,6 +125,9 @@ OPTIONS
manually). This option overrides these safeguards. To add a missing but
locked working tree path, specify `--force` twice.
 +
+`move` refuses to move a locked working tree unless `--force` is specified
+twice.
++
 `remove` refuses to remove an unclean working tree unless `--force` is used.
 
 -b ::
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 3eb2f89b0f..354a6c0eb5 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -740,13 +740,17 @@ static void validate_no_submodules(const struct worktree 
*wt)
 
 static int move_worktree(int ac, const char **av, const char *prefix)
 {
+   int force = 0;
struct option options[] = {
+   OPT__FORCE(,
+N_("force move even if worktree is dirty or locked"),
+PARSE_OPT_NOCOMPLETE),
OPT_END()
};
struct worktree **worktrees, *wt;
struct strbuf dst = STRBUF_INIT;
struct strbuf errmsg = STRBUF_INIT;
-   const char *reason;
+   const char *reason = NULL;
char *path;
 
ac = parse_options(ac, av, prefix, options, worktree_usage, 0);
@@ -777,12 +781,13 @@ static int move_worktree(int ac, const char **av, const 
char *prefix)
 
validate_no_submodules(wt);
 
-   reason = is_worktree_locked(wt);
+   if (force < 2)
+   reason = is_worktree_locked(wt);
if (reason) {
if (*reason)
-   die(_("cannot move a locked working tree, lock reason: 
%s"),
+   die(_("cannot move a locked working tree, lock reason: 
%s\nuse 'move -f -f' to override or unlock first"),
reason);
-   die(_("cannot move a locked working tree"));
+   die(_("cannot move a locked working tree;\nuse 'move -f -f' to 
override or unlock first"));
}
if (validate_worktree(wt, , 0))
die(_("validation failed, cannot move working tree: %s"),
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index 60aba7c41a..9756ede8f1 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -98,6 +98,20 @@ test_expect_success 'move worktree to another dir' '
test_cmp expected2 actual2
 '
 
+test_expect_success 'move locked worktree (force)' '
+   test_when_finished "
+   git worktree unlock flump || :
+   git worktree remove flump || :
+   git worktree unlock ploof || :
+   git worktree remove ploof || :
+   " &&
+   git worktree add --detach flump &&
+   git worktree lock flump &&
+   test_must_fail git worktree move flump ploof" &&
+   test_must_fail git worktree move --force flump ploof" &&
+   git worktree move --force --force flump ploof
+'
+
 test_expect_success 'remove main worktree' '
test_must_fail git worktree remove .
 '
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 3/9] worktree: generalize delete_git_dir() to reduce code duplication

2018-08-28 Thread Eric Sunshine
prune_worktrees() and delete_git_dir() both remove worktree
administrative entries from .git/worktrees, and their implementations
are nearly identical. The only difference is that prune_worktrees() is
also capable of removing a bogus non-worktree-related file from
.git/worktrees.

Simplify by extending delete_git_dir() to handle the little bit of
extra functionality needed by prune_worktrees(), and drop the
effectively duplicate code from the latter.

Signed-off-by: Eric Sunshine 
---
 builtin/worktree.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index a8128289cc..0affcb476c 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -47,16 +47,17 @@ static int git_worktree_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
-static int delete_git_dir(struct worktree *wt)
+static int delete_git_dir(const char *id)
 {
struct strbuf sb = STRBUF_INIT;
-   int ret = 0;
+   int ret;
 
-   strbuf_addstr(, git_common_path("worktrees/%s", wt->id));
-   if (remove_dir_recursively(, 0)) {
+   strbuf_addstr(, git_common_path("worktrees/%s", id));
+   ret = remove_dir_recursively(, 0);
+   if (ret < 0 && errno == ENOTDIR)
+   ret = unlink(sb.buf);
+   if (ret)
error_errno(_("failed to delete '%s'"), sb.buf);
-   ret = -1;
-   }
strbuf_release();
return ret;
 }
@@ -130,10 +131,8 @@ static int prune_worktree(const char *id, struct strbuf 
*reason)
 static void prune_worktrees(void)
 {
struct strbuf reason = STRBUF_INIT;
-   struct strbuf path = STRBUF_INIT;
DIR *dir = opendir(git_path("worktrees"));
struct dirent *d;
-   int ret;
if (!dir)
return;
while ((d = readdir(dir)) != NULL) {
@@ -146,18 +145,12 @@ static void prune_worktrees(void)
printf("%s\n", reason.buf);
if (show_only)
continue;
-   git_path_buf(, "worktrees/%s", d->d_name);
-   ret = remove_dir_recursively(, 0);
-   if (ret < 0 && errno == ENOTDIR)
-   ret = unlink(path.buf);
-   if (ret)
-   error_errno(_("failed to remove '%s'"), path.buf);
+   delete_git_dir(d->d_name);
}
closedir(dir);
if (!show_only)
rmdir(git_path("worktrees"));
strbuf_release();
-   strbuf_release();
 }
 
 static int prune(int ac, const char **av, const char *prefix)
@@ -882,7 +875,7 @@ static int remove_worktree(int ac, const char **av, const 
char *prefix)
 * continue on even if ret is non-zero, there's no going back
 * from here.
 */
-   ret |= delete_git_dir(wt);
+   ret |= delete_git_dir(wt->id);
 
free_worktrees(worktrees);
return ret;
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 8/9] worktree: teach 'remove' to override lock when --force given twice

2018-08-28 Thread Eric Sunshine
For consistency with "add -f -f" and "move -f -f" which override
the lock on a worktree, allow "remove -f -f" to do so, as well, as a
convenience.

Signed-off-by: Eric Sunshine 
---
 Documentation/git-worktree.txt |  1 +
 builtin/worktree.c | 11 ++-
 t/t2028-worktree-move.sh   | 10 ++
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index d08b8d8e4f..e2ee9fc21b 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -129,6 +129,7 @@ OPTIONS
 twice.
 +
 `remove` refuses to remove an unclean working tree unless `--force` is used.
+To remove a locked working tree, specify `--force` twice.
 
 -b ::
 -B ::
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 354a6c0eb5..a95fe67da6 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -875,13 +875,13 @@ static int remove_worktree(int ac, const char **av, const 
char *prefix)
int force = 0;
struct option options[] = {
OPT__FORCE(,
-N_("force removing even if the worktree is dirty"),
+N_("force removal even if worktree is dirty or 
locked"),
 PARSE_OPT_NOCOMPLETE),
OPT_END()
};
struct worktree **worktrees, *wt;
struct strbuf errmsg = STRBUF_INIT;
-   const char *reason;
+   const char *reason = NULL;
int ret = 0;
 
ac = parse_options(ac, av, prefix, options, worktree_usage, 0);
@@ -894,12 +894,13 @@ static int remove_worktree(int ac, const char **av, const 
char *prefix)
die(_("'%s' is not a working tree"), av[0]);
if (is_main_worktree(wt))
die(_("'%s' is a main working tree"), av[0]);
-   reason = is_worktree_locked(wt);
+   if (force < 2)
+   reason = is_worktree_locked(wt);
if (reason) {
if (*reason)
-   die(_("cannot remove a locked working tree, lock 
reason: %s"),
+   die(_("cannot remove a locked working tree, lock 
reason: %s\nuse 'remove -f -f' to override or unlock first"),
reason);
-   die(_("cannot remove a locked working tree"));
+   die(_("cannot remove a locked working tree;\nuse 'remove -f -f' 
to override or unlock first"));
}
if (validate_worktree(wt, , WT_VALIDATE_WORKTREE_MISSING_OK))
die(_("validation failed, cannot remove working tree: %s"),
diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index 9756ede8f1..1b5079e8fa 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -163,4 +163,14 @@ test_expect_success 'proper error when worktree not found' 
'
done
 '
 
+test_expect_success 'remove locked worktree (force)' '
+   git worktree add --detach gumby &&
+   test_when_finished "git worktree remove gumby || :" &&
+   git worktree lock gumby &&
+   test_when_finished "git worktree unlock gumby || :" &&
+   test_must_fail git worktree remove gumby &&
+   test_must_fail git worktree remove --force gumby &&
+   git worktree remove --force --force gumby
+'
+
 test_done
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 1/9] worktree: don't die() in library function find_worktree()

2018-08-28 Thread Eric Sunshine
Callers don't expect library function find_worktree() to die(); they
expect it to return the named worktree if found, or NULL if not.
Although find_worktree() itself never invokes die(), it calls
real_pathdup() with 'die_on_error' incorrectly set to 'true', thus will
die() indirectly if the user-provided path is not to real_pathdup()'s
liking. This can be observed, for instance, with any git-worktree
command which searches for an existing worktree:

$ git worktree unlock foo
fatal: 'foo' is not a working tree
$ git worktree unlock foo/bar
fatal: Invalid path '.../foo': No such file or directory

The first error message is the expected one from "git worktree unlock"
not finding the specified worktree; the second is from find_worktree()
invoking real_pathdup() incorrectly and die()ing prematurely.

Aside from the inconsistent error message between the two cases, this
bug hasn't otherwise been a serious problem since existing callers all
die() anyhow when the worktree can't be found. However, that may not be
true of callers added in the future, so fix find_worktree() to avoid
die()ing.

Signed-off-by: Eric Sunshine 
---
 t/t2028-worktree-move.sh | 8 
 worktree.c   | 6 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/t/t2028-worktree-move.sh b/t/t2028-worktree-move.sh
index 5f7d45b7b7..60aba7c41a 100755
--- a/t/t2028-worktree-move.sh
+++ b/t/t2028-worktree-move.sh
@@ -141,4 +141,12 @@ test_expect_success 'NOT remove missing-but-locked 
worktree' '
test_path_is_dir .git/worktrees/gone-but-locked
 '
 
+test_expect_success 'proper error when worktree not found' '
+   for i in noodle noodle/bork
+   do
+   test_must_fail git worktree lock $i 2>err &&
+   test_i18ngrep "not a working tree" err || return 1
+   done
+'
+
 test_done
diff --git a/worktree.c b/worktree.c
index 97cda5f97b..b0d0b5426d 100644
--- a/worktree.c
+++ b/worktree.c
@@ -217,7 +217,11 @@ struct worktree *find_worktree(struct worktree **list,
 
if (prefix)
arg = to_free = prefix_filename(prefix, arg);
-   path = real_pathdup(arg, 1);
+   path = real_pathdup(arg, 0);
+   if (!path) {
+   free(to_free);
+   return NULL;
+   }
for (; *list; list++)
if (!fspathcmp(path, real_path((*list)->path)))
break;
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH 2/9] worktree: move delete_git_dir() earlier in file for upcoming new callers

2018-08-28 Thread Eric Sunshine
This is a pure code movement to avoid having to forward-declare the
function when new callers are subsequently added.

Signed-off-by: Eric Sunshine 
---
 builtin/worktree.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 41e7714396..a8128289cc 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -47,6 +47,20 @@ static int git_worktree_config(const char *var, const char 
*value, void *cb)
return git_default_config(var, value, cb);
 }
 
+static int delete_git_dir(struct worktree *wt)
+{
+   struct strbuf sb = STRBUF_INIT;
+   int ret = 0;
+
+   strbuf_addstr(, git_common_path("worktrees/%s", wt->id));
+   if (remove_dir_recursively(, 0)) {
+   error_errno(_("failed to delete '%s'"), sb.buf);
+   ret = -1;
+   }
+   strbuf_release();
+   return ret;
+}
+
 static int prune_worktree(const char *id, struct strbuf *reason)
 {
struct stat st;
@@ -822,20 +836,6 @@ static int delete_git_work_tree(struct worktree *wt)
return ret;
 }
 
-static int delete_git_dir(struct worktree *wt)
-{
-   struct strbuf sb = STRBUF_INIT;
-   int ret = 0;
-
-   strbuf_addstr(, git_common_path("worktrees/%s", wt->id));
-   if (remove_dir_recursively(, 0)) {
-   error_errno(_("failed to delete '%s'"), sb.buf);
-   ret = -1;
-   }
-   strbuf_release();
-   return ret;
-}
-
 static int remove_worktree(int ac, const char **av, const char *prefix)
 {
int force = 0;
-- 
2.19.0.rc1.350.ge57e33dbd1



Trivial enhancement: All commands which require an author should accept --author

2018-08-28 Thread Ulrich Gemkow
Hello,

A trivial enhancement request:

All commands which require that the author is set (and complain if
it is not set) should accept the option --author.

At least the command stash does not accept this option. We are using
git version 2.17.1 (Ubuntu 18.04).

Thanks for the great work!

Best regards

-Ulrich

-- 
| Ulrich Gemkow
| University of Stuttgart
| Institute of Communication Networks and Computer Engineering (IKR)


Re: [PATCH v6] Implement --first-parent for git rev-list --bisect

2018-08-28 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> I would have preferred to reuse the already existing commits generated in
>> the `setup` phase rather than generating yet another batch, and to not
>> introduce an inconsistent way to present a commit graph (compare your
>> diagram with the one in
>> https://github.com/git/git/blob/v2.18.0/t/t6002-rev-list-bisect.sh#L64-L90
>> i.e. *in the same file*)
>
> As I already said in the previous round, I do agree with these.
> That is, ...
>
>>>  
>>> +# We generate the following commit graph:
>>> +#
>>> +#   B -- C
> ... the above graph construction should not be necessary.  An
> earlier part of t6002 would have already created a history of
> suitable shape to use for writing the following tests.

Something like the following, perhaps?  I have a feeling that use of
test-output-expect-success is trying to be too cute without making
the result easier to read, and also makes it harder to debug the
test script when it does not work as expected (e.g. you need to see
where the output from the actual command is stored by going to the
definition of the shell function), and would have preferred to see
these three tests written without it (and instead using explicit
'expect' vs 'actual' comparison), but at least the patch below shows
what I meant when I suggested updating the tests to reuse the
existing history (I do not speak for Johannes, but I am guessing we
are on the same page on this one).

Thanks.

 t/t6002-rev-list-bisect.sh | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
index a661408038..d27d0087d6 100755
--- a/t/t6002-rev-list-bisect.sh
+++ b/t/t6002-rev-list-bisect.sh
@@ -263,4 +263,50 @@ test_expect_success 'rev-parse --bisect can default to 
good/bad refs' '
test_cmp expect.sorted actual.sorted
 '
 
+# See the drawing near the top --- e4 is in the middle of the first parent 
chain
+printf "%s\n" e4 |
+test_output_expect_success '--bisect --first-parent' '
+   git rev-list --bisect --first-parent E ^F
+'
+
+printf "%s\n" E e1 e2 e3 e4 e5 e6 e7 e8 |
+test_output_expect_success "--first-parent" '
+   git rev-list --first-parent E ^F
+'
+
+test_output_expect_success '--bisect-vars --first-parent' '
+   git rev-list --bisect-vars --first-parent E ^F
+' <<-EOF
+   bisect_rev='e5'
+   bisect_nr=4
+   bisect_good=4
+   bisect_bad=3
+   bisect_all=9
+   bisect_steps=2
+EOF
+
+test_expect_success '--bisect-all --first-parent' '
+   cat >expect <<-EOF &&
+   $(git rev-parse tags/e5) (dist=4)
+   $(git rev-parse tags/e4) (dist=4)
+   $(git rev-parse tags/e6) (dist=3)
+   $(git rev-parse tags/e3) (dist=3)
+   $(git rev-parse tags/e7) (dist=2)
+   $(git rev-parse tags/e2) (dist=2)
+   $(git rev-parse tags/e8) (dist=1)
+   $(git rev-parse tags/e1) (dist=1)
+   $(git rev-parse tags/E) (dist=0)
+   EOF
+
+   # Make sure we have the same entries, nothing more, nothing less
+   git rev-list --bisect-all --first-parent E ^F >actual &&
+   sort actual >actual.sorted &&
+   sort expect >expect.sorted &&
+   test_cmp expect.sorted actual.sorted &&
+   # Make sure the entries are sorted in the dist order
+   sed -e "s/.*(dist=\([1-9]*[0-9]\)).*/\1/" actual >actual.dists &&
+   sort -r -n actual.dists >actual.dists.sorted &&
+   test_cmp actual.dists.sorted actual.dists
+'
+
 test_done


Re: [PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-28 Thread Stefan Beller
On Tue, Aug 28, 2018 at 1:33 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The commit-graph feature is tested in isolation by
> t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
> more interesting scenarios involving commit walks. Many of these
> scenarios are covered by the existing test suite, but we need to
> maintain coverage when the optional commit-graph structure is not
> present.
>
> To allow running the full test suite with the commit-graph present,
> add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
> to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
> to load the commit-graph when parsing commits, and writes the
> commit-graph file after every 'git commit' command.
>
> There are a few tests that rely on commits not existing in
> pack-files to trigger important events, so manually set
> GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

So the plan is to turn on the commit graph for the whole test suite
excluding these selected tests?


> diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
> index 4984ca583d..73d5284a91 100755
> --- a/t/t0410-partial-clone.sh
> +++ b/t/t0410-partial-clone.sh
> @@ -181,7 +181,7 @@ test_expect_success 'rev-list stops traversal at missing 
> and promised commit' '
>
> git -C repo config core.repositoryformatversion 1 &&
> git -C repo config extensions.partialclone "arbitrary string" &&
> -   git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
> +   GIT_TEST_COMMIT_GRAPH=0 git -C repo rev-list 
> --exclude-promisor-objects --objects bar >out &&


> +   GIT_TEST_COMMIT_GRAPH=0 &&
> +   test_must_fail git merge -m final G

This could go on the same line without the && in between, setting the
variable as a prefix.


Re: [PATCH 0/1] Define GIT_TEST_COMMIT_GRAPH for commit-graph test coverage

2018-08-28 Thread Stefan Beller
On Tue, Aug 28, 2018 at 1:33 PM Derrick Stolee via GitGitGadget
 wrote:
>
> The commit-graph (and multi-pack-index) features are optional data
> structures that can make Git operations faster. Since they are optional, we
> do not enable them in most Git tests. The commit-graph is tested in
> t5318-commit-graph.sh (and t6600-test-reach.sh in ds/reachable), but that
> one script cannot cover the data shapes present in the rest of the test
> suite.
>
> This patch introduces a new test environment variable, GIT_TEST_COMMIT_GRAPH
> . Similar to GIT_TEST_SPLIT_INDEX, it enables the commit-graph and writes it
> with every git commit command.

> Thanks, Duy, for pointing out this direction

Did you mean to cc Duy (instead of me)?
(I'll happily review the patch, too... just asking)

>
> A few tests needed to be modified. These are the same tests that were
> mentioned in my previous example patch [2].
>
> When this merges down, I'll create a CI build in VSTS that runs the test
> suite with this option enabled.
>
> Thanks, -Stolee
>
> [1]
> https://public-inbox.org/git/CACsJy8CKnXVJYkKM_W=N=Vq-TVXf+YCqZP_uP7B-dN_6xddB=g...@mail.gmail.com/
> Re: [PATCH 0/9] multi-pack-index cleanups (Discussing test environment
> variables)
>
> [2]
> https://public-inbox.org/git/20180718152244.45513-1-dsto...@microsoft.com/
> [PATCH] DO-NOT-MERGE: write and read commit-graph always
>
> Based-On: ds/commit-graph-with-grafts Cc: jna...@gmail.com
>
> Derrick Stolee (1):
>   commit-graph: define GIT_TEST_COMMIT_GRAPH
>
>  builtin/commit.c|  4 
>  commit-graph.c  |  5 +++--
>  commit-graph.h  |  2 ++
>  t/README|  4 
>  t/t0410-partial-clone.sh|  2 +-
>  t/t5307-pack-missing-commit.sh  | 10 --
>  t/t6011-rev-list-with-bad-commit.sh | 10 ++
>  t/t6024-recursive-merge.sh  |  9 ++---
>  8 files changed, 34 insertions(+), 12 deletions(-)
>
>
> base-commit: 829a321569d8e8f2c582aef9f0c990df976ab842
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-26%2Fderrickstolee%2Fshallow%2Ftest-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-26/derrickstolee/shallow/test-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/26
> --
> gitgitgadget


[PATCH 1/1] commit-graph: define GIT_TEST_COMMIT_GRAPH

2018-08-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The commit-graph feature is tested in isolation by
t5318-commit-graph.sh and t6600-test-reach.sh, but there are many
more interesting scenarios involving commit walks. Many of these
scenarios are covered by the existing test suite, but we need to
maintain coverage when the optional commit-graph structure is not
present.

To allow running the full test suite with the commit-graph present,
add a new test environment variable, GIT_TEST_COMMIT_GRAPH. Similar
to GIT_TEST_SPLIT_INDEX, this variable makes every Git command try
to load the commit-graph when parsing commits, and writes the
commit-graph file after every 'git commit' command.

There are a few tests that rely on commits not existing in
pack-files to trigger important events, so manually set
GIT_TEST_COMMIT_GRAPH to false for the necessary commands.

There is one test in t6024-recursive-merge.sh that relies on the
merge-base algorithm picking one of two ambiguous merge-bases, and
the commit-graph feature changes which merge-base is picked.

Signed-off-by: Derrick Stolee 
---
 builtin/commit.c|  4 
 commit-graph.c  |  5 +++--
 commit-graph.h  |  2 ++
 t/README|  4 
 t/t0410-partial-clone.sh|  2 +-
 t/t5307-pack-missing-commit.sh  | 10 --
 t/t6011-rev-list-with-bad-commit.sh | 10 ++
 t/t6024-recursive-merge.sh  |  9 ++---
 8 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 158e3f843a..a25e652102 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -33,6 +33,7 @@
 #include "sequencer.h"
 #include "mailmap.h"
 #include "help.h"
+#include "commit-graph.h"
 
 static const char * const builtin_commit_usage[] = {
N_("git commit [] [--] ..."),
@@ -1651,6 +1652,9 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
 "new_index file. Check that disk is not full and quota 
is\n"
 "not exceeded, and then \"git reset HEAD\" to recover."));
 
+   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
+   write_commit_graph_reachable(get_object_directory(), 0);
+
rerere(0);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
run_commit_hook(use_editor, get_index_file(), "post-commit", NULL);
diff --git a/commit-graph.c b/commit-graph.c
index 4bd1a4abbf..5cae83fe03 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -237,8 +237,9 @@ static int prepare_commit_graph(struct repository *r)
return !!r->objects->commit_graph;
r->objects->commit_graph_attempted = 1;
 
-   if (repo_config_get_bool(r, "core.commitgraph", _value) ||
-   !config_value)
+   if (!git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+   (repo_config_get_bool(r, "core.commitgraph", _value) ||
+   !config_value))
/*
 * This repository is not configured to use commit graphs, so
 * do not load one. (But report commit_graph_attempted anyway
diff --git a/commit-graph.h b/commit-graph.h
index 13d736cdde..cf9141f356 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -5,6 +5,8 @@
 #include "repository.h"
 #include "string-list.h"
 
+#define GIT_TEST_COMMIT_GRAPH "GIT_TEST_COMMIT_GRAPH"
+
 struct commit;
 
 char *get_commit_graph_filename(const char *obj_dir);
diff --git a/t/README b/t/README
index 8373a27fea..2b90c433f5 100644
--- a/t/README
+++ b/t/README
@@ -315,6 +315,10 @@ packs on demand. This normally only happens when the 
object size is
 over 2GB. This variable forces the code path on any object larger than
  bytes.
 
+GIT_TEST_COMMIT_GRAPH=, when true, forces the commit-graph to
+be written after every 'git commit' command, and overrides the
+'core.commitGraph' setting to true.
+
 Naming Tests
 
 
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 4984ca583d..73d5284a91 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -181,7 +181,7 @@ test_expect_success 'rev-list stops traversal at missing 
and promised commit' '
 
git -C repo config core.repositoryformatversion 1 &&
git -C repo config extensions.partialclone "arbitrary string" &&
-   git -C repo rev-list --exclude-promisor-objects --objects bar >out &&
+   GIT_TEST_COMMIT_GRAPH=0 git -C repo rev-list --exclude-promisor-objects 
--objects bar >out &&
grep $(git -C repo rev-parse bar) out &&
! grep $FOO out
 '
diff --git a/t/t5307-pack-missing-commit.sh b/t/t5307-pack-missing-commit.sh
index ae52a1882d..15df19a040 100755
--- a/t/t5307-pack-missing-commit.sh
+++ b/t/t5307-pack-missing-commit.sh
@@ -24,11 +24,17 @@ test_expect_success 'check corruption' '
 '
 
 test_expect_success 'rev-list notices corruption (1)' '
-   test_must_fail git rev-list HEAD
+   (
+   GIT_TEST_COMMIT_GRAPH=0 &&
+   test_must_fail git rev-list HEAD
+  

Re: [PATCH v4 4/6] tests: use shorter here-docs in chainlint.sed for AIX sed

2018-08-28 Thread Eric Sunshine
On Tue, Aug 28, 2018 at 4:14 PM Ævar Arnfjörð Bjarmason
 wrote:
> On Fri, Aug 24 2018, Eric Sunshine wrote:
> > Shortening the names makes them ugly and often unreadable. That's not
> > a complaint with this patch; just a general observation regarding
> > 8-byte limitation with this platform's "sed" (and POSIX). See a few
> > suggested improvements below, but probably not worth a re-roll.
> >> +:clssolo
> > ":endsolo"
>
> I was meaning to get to this with a re-roll, but since this is already
> in next & these label renames seem cosmetic, it seems better to do this
> after 2.19, i.e. the compiler doesn't care about the specific names, but
> shortening them to <=8 fixes the bug.

Sounds fine. There's no rush. Thanks.


Re: [PATCH v4 4/6] tests: use shorter here-docs in chainlint.sed for AIX sed

2018-08-28 Thread Ævar Arnfjörð Bjarmason


On Fri, Aug 24 2018, Eric Sunshine wrote:

> On Fri, Aug 24, 2018 at 11:20 AM Ævar Arnfjörð Bjarmason
>  wrote:
>> Improve the portability of chainlint by using shorter here-docs. On
>> AIX sed will complain about:
>>
>> sed: 0602-417 The label :hereslurp is greater than eight
>> characters
>
> Shortening the names makes them ugly and often unreadable. That's not
> a complaint with this patch; just a general observation regarding
> 8-byte limitation with this platform's "sed" (and POSIX). See a few
> suggested improvements below, but probably not worth a re-roll.
>
>> This, in combination with the previous fix to this file makes
>> GIT_TEST_CHAIN_LINT=1 (which is the default) working again on AIX
>> without issues, and the "gmake check-chainlint" test also passes.
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>
> FWIW,
> Acked-by: Eric Sunshine 
>
>> diff --git a/t/chainlint.sed b/t/chainlint.sed
>> -:checkchain
>> +:chkchn
>
> ":chkchain"
>
>>  # found incomplete line "...\" -- slurp up next line
>> -:incomplete
>> +:icmplte
>
> ":fold" (for "fold out NL")
>
>>  # found nested multi-line "(...\n...)" -- pass through untouched
>> -:nestslurp
>> +:nstslurp
>
> ":nesteat"
>
>> -:nestcontinue
>> +:nstcnt
>
> ":nestcont"
>
>> -:nestclose
>> +:nstclose
>
> ":nestend" or ":endnest"
>
>>  # found closing ")" on own line -- drop "suspect" from final line of 
>> subshell
>>  # since that line legitimately lacks "&&" and exit subshell loop
>> -:closesolo
>> +:clssolo
>
> ":endsolo"

I was meaning to get to this with a re-roll, but since this is already
in next & these label renames seem cosmetic, it seems better to do this
after 2.19, i.e. the compiler doesn't care about the specific names, but
shortening them to <=8 fixes the bug.


Re: [PATCH 02/21] read-cache.c: remove 'const' from index_has_changes()

2018-08-28 Thread Stefan Beller
> "r" it is! I forgot about it. But this is for local variable or
> argument names only right? The field name (in diff_options for
> example) should stay something more descriptive like repo, I think.

Yea I agree, we should have more descriptive things in long lived structs.

Thanks,
Stefan


Re: Git clean removing tracked files semi-regularly

2018-08-28 Thread Derrick Stolee

On 8/28/2018 2:29 PM, Brennan Conroy wrote:

I'm using windows. Have had it happen on 8.1 and 10.

"git status" shows "nothing to commit, working tree clean". I am using git 
version 2.17.1.windows.2


I tested on my machine and could not reproduce this issue. Could you 
find a full repro (first command 'git clone 
https://github.com/aspnet/SignalR') and create an Issue at 
https://github.com/git-for-windows/git/issues ?


Thanks,

-Stolee



Re: [PATCH 1/2] tests: fix non-portable "${var:-"str"}" construct

2018-08-28 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> On both AIX 7200-00-01-1543 and FreeBSD 11.2-RELEASE-p2 the
> "${var:-"str"}" syntax means something different than what it does
> under the bash or dash shells.
>
> Both will consider the start of the new unescaped quotes to be a new
> argument to test_expect_success, resulting in the following error:
>
> error: bug in the test script: 'git diff-tree initial # magic
> is (not' does not look like a prereq
>
> Fix this by removing the redundant quotes. There's no need for them,
> and the resulting code works under all the aforementioned shells.

Yup, there is no need for that inner dq pair in this particular case.

I was more worried about scripted Porcelains, like 

   : "${GIT_OBJECT_DIRECTORY="$(git rev-parse --git-path objects)"}"

which I think can safely lose the outer dq pair.  In t/ directory,
there is this one in test-lib-functions.sh which I do not offhand
know how these problematic shells would handle.

echo "#!${2-"$SHELL_PATH"}" &&

Other than the presence of these two that are not covered by this
patch, the patch itself looks good.  Thanks.

> fixes a regression in c2f1d3989 ("t4013: test new output from diff
> --abbrev --raw", 2017-12-03) first released with Git v2.16.0.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  t/t4013-diff-various.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index f8d853595b..73f7038253 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -140,7 +140,7 @@ do
>   expect="$TEST_DIRECTORY/t4013/diff.$test"
>   actual="$pfx-diff.$test"
>  
> - test_expect_success "git $cmd # magic is ${magic:-"(not used)"}" '
> + test_expect_success "git $cmd # magic is ${magic:-(not used)}" '
>   {
>   echo "$ git $cmd"
>   case "$magic" in


Re: GIT_TRACE doesn't show content filter files it's operating on

2018-08-28 Thread Stas Bekman
On 2018-08-27 06:13 PM, Stas Bekman wrote:
[...]
> I now know how get the filenames for "clean/smudge" filters. Can you
> please help with the same for "textconv". %f doesn't work - it gets
> stuck there waiting for stdin, the following seems to pass, but I'm not
> sure it's correct:

On a closer look this is not needed for "textconv" as it already logs
the filename in the GIT_TRACE=1 trace.

I have no more remaining questions, Thank you for your help, Jeff.

-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


[PATCH 1/2] tests: fix non-portable "${var:-"str"}" construct

2018-08-28 Thread Ævar Arnfjörð Bjarmason
On both AIX 7200-00-01-1543 and FreeBSD 11.2-RELEASE-p2 the
"${var:-"str"}" syntax means something different than what it does
under the bash or dash shells.

Both will consider the start of the new unescaped quotes to be a new
argument to test_expect_success, resulting in the following error:

error: bug in the test script: 'git diff-tree initial # magic
is (not' does not look like a prereq

Fix this by removing the redundant quotes. There's no need for them,
and the resulting code works under all the aforementioned shells. This
fixes a regression in c2f1d3989 ("t4013: test new output from diff
--abbrev --raw", 2017-12-03) first released with Git v2.16.0.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t4013-diff-various.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index f8d853595b..73f7038253 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -140,7 +140,7 @@ do
expect="$TEST_DIRECTORY/t4013/diff.$test"
actual="$pfx-diff.$test"
 
-   test_expect_success "git $cmd # magic is ${magic:-"(not used)"}" '
+   test_expect_success "git $cmd # magic is ${magic:-(not used)}" '
{
echo "$ git $cmd"
case "$magic" in
-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH 0/2] FreeBSD & AIX test portability fixes

2018-08-28 Thread Ævar Arnfjörð Bjarmason
This makes the vanilla test suite pass without errors on the FreeBSD
system noted in the commit messages.

On AIX things are still horribly broken, but this fixes one more issue
that also affects that system.

Ævar Arnfjörð Bjarmason (2):
  tests: fix non-portable "${var:-"str"}" construct
  tests: fix non-portable iconv invocation

 t/t0028-working-tree-encoding.sh | 6 +-
 t/t4013-diff-various.sh  | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH 2/2] tests: fix non-portable iconv invocation

2018-08-28 Thread Ævar Arnfjörð Bjarmason
The iconv that comes with a FreeBSD 11.2-RELEASE-p2 box I have access
to doesn't support the SHIFT-JIS encoding. Guard a test added in
e92d62253 ("convert: add round trip check based on
'core.checkRoundtripEncoding'", 2018-04-15) first released with Git
v2.18.0 with a prerequisite that checks for its availability.

The iconv command is in POSIX, and we have numerous tests
unconditionally relying on its ability to convert ASCII, UTF-8 and
UTF-16, but unconditionally relying on the presence of more obscure
encodings isn't portable.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t0028-working-tree-encoding.sh | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/t/t0028-working-tree-encoding.sh b/t/t0028-working-tree-encoding.sh
index 12b8eb963a..b0f4490e8e 100755
--- a/t/t0028-working-tree-encoding.sh
+++ b/t/t0028-working-tree-encoding.sh
@@ -203,7 +203,11 @@ test_expect_success 'error if encoding garbage is already 
in Git' '
test_i18ngrep "error: BOM is required" err.out
 '
 
-test_expect_success 'check roundtrip encoding' '
+test_lazy_prereq ICONV_SHIFT_JIS '
+   iconv -f UTF-8 -t SHIFT-JIS /dev/null
+'
+
+test_expect_success ICONV_SHIFT_JIS 'check roundtrip encoding' '
test_when_finished "rm -f roundtrip.shift roundtrip.utf16" &&
test_when_finished "git reset --hard HEAD" &&
 
-- 
2.19.0.rc0.228.g281dcd1b4d0



Re: Feature request: be able to pass arguments to difftool command

2018-08-28 Thread Junio C Hamano
"H.Merijn Brand"  writes:

> So, my wish would be to have an option, possibly using -- to pass
> additional command line arguments to git difftool, so that
>
>  $ git difftool $commit~1..$commit -- -m -v2
>
> would pass the arguments after -- transparantly to ccdiff (in my case)

At the syntax level passing any option after "--" would be a no
starter, as I would imagine that "git difftool $revs -- $paths"
should still be supported.

At the concept level, however, I can see why such a feature would be
useful.  Perhaps

$ git difftool --backend-option=-m --backend-option=-v2 HEAD
$ git mergetool --backend-option=--foo

with appropriate way(s) [*1*] to make it easier to type (and
implement) would be an acceptable avenue to pursue, I wonder?


[Footnote]

*1* There are various possible ways, not all of them are mutually
incompatible.

a. Give a short-form synonym, e.g. -X, to "--backend-option";

b. Assume that backend option always begins with a dash and add
   one when missing, e.g. -Xm becomes --backend-option=-m

c. Allow giving multiple backend options on a single option and
   split at whitespace, e.g. --backend-option="-m -v2"

d. Allow difftool.$toolname.opts configuration variable that is
   multi-valued, so you can say

git -c difftool.ccdiff.opts=-v2 -c difftool.ccdiff.opts=-m difftool

   (of course, not necessarily from the command line but the
   point is you could configure it)

Some of these (e.g. b, c) may not be desirable, though.



Re: [PATCH 00/21] Kill the_index part 4

2018-08-28 Thread Duy Nguyen
On Mon, Aug 27, 2018 at 7:32 PM Stefan Beller  wrote:
> > Besides some small conflicts on 'pu', like the previous part, it also
> > breaks 'pu' because of API changes. The fix is trivial though, just
> > prepend the_repository as the first argument for the broken function
> > calls.
>
> This sounds like a problem. I said the same when sending the
> object store lookup series, which ended via
> 3a2a1dc1707 (Merge branch 'sb/object-store-lookup', 2018-08-02)
> in master, but I do recall Junio being somewhat unhappy about it[1].
>
> By just adding the argument to the function, it might merge easily
> but not compile, which would have to be fixed up; and that object store
> series seemed to touch a lot of functions that were also used in series
> in-flight.

Which is why I state it clearly here so Junio could choose not to pick
this series up (I'm totally ok with that, I could wait until the dust
settles and send again).

Another option is me adding a patch that renames diff_setup() to
diff_setup2() or something that takes 'struct repository *', leaving
diff_setup() as a light wrapper around diff_setup2(). I would need to
wait until in-flight series are merged, then rename diff_setup2() back
to diff_setup(). Not sure if it's worth doing.

> >  diff.c | 259 +++--
>
> Ugh? That sounds like there is an interesting change coming.

Yeah. sequencer.c also needs struct repository in lots of places
(sha1-name.c comes second). But at least builtin/ code looks a lot
like it's supposed to be when 'struct repository *' is introduced, we
now have lots of function calls there that pass the_repository...
-- 
Duy


Re: [PATCH 02/21] read-cache.c: remove 'const' from index_has_changes()

2018-08-28 Thread Duy Nguyen
On Mon, Aug 27, 2018 at 8:37 PM Stefan Beller  wrote:
>
> On Sun, Aug 26, 2018 at 3:03 AM Nguyễn Thái Ngọc Duy  
> wrote:
> >
> > This function calls do_diff_cache() which eventually needs to set this
> > "istate" to unpack_options->src_index (*). This is an unfornate fact
>
> unfortunate
>
> > diff --git a/diff.c b/diff.c
>
> Unlike I thought in the cover letter, this is just adding the repository all
> over the place and not adding new code, despite the size.
>
> A cursory read seems to make sense, though I'd nitpick on the
> choice of the variable name as I used to use 'r' for the repository
> struct. I am not saying that is better, but we could think if we want to
> strive for some consistency eventually. (for example most strbufs are
> named 'sb', as they are temporary helpers with no explicit naming
> required. So maybe we could strive to name all "repository pass throughs"
> to be "repo" or "r").

"r" it is! I forgot about it. But this is for local variable or
argument names only right? The field name (in diff_options for
example) should stay something more descriptive like repo, I think.
-- 
Duy


Re: [PATCH] read-cache.c: optimize reading index format v4

2018-08-28 Thread Duy Nguyen
On Mon, Aug 27, 2018 at 9:36 PM Junio C Hamano  wrote:
> > PS. I notice that v4 does not pad to align entries at 4 byte boundary
> > like v2/v3. This could cause a slight slow down on x86 and segfault on
> > some other platforms.
>
> Care to elaborate?
>
> Long time ago, we used to mmap and read directly from the index file
> contents, requiring either an unaligned read or padded entries.  But
> that was eons ago and we first read and convert from on-disk using
> get_be32() etc. to in-core structure, so I am not sure what you mean
> by "segfault" here.
>

My bad. I saw this line

#define get_be16(p) ntohs(*(unsigned short *)(p))

and jumped to conclusion without realizing that block is for safe
unaligned access.

> > @@ -1898,7 +1884,8 @@ int do_read_index(struct index_state *istate, const 
> > char *path, int must_exist)
> >   struct cache_header *hdr;
> >   void *mmap;
> >   size_t mmap_size;
> > - struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> > + const struct cache_entry *previous_ce = NULL;
> > + struct cache_entry *dummy_entry = NULL;
> >
> >   if (istate->initialized)
> >   return istate->cache_nr;
> > @@ -1936,11 +1923,10 @@ int do_read_index(struct index_state *istate, const 
> > char *path, int must_exist)
> >   istate->initialized = 1;
> >
> >   if (istate->version == 4) {
> > - previous_name = _name_buf;
> > + previous_ce = dummy_entry = 
> > make_empty_transient_cache_entry(0);
>
> I do like the idea of passing the previous ce around to tell the
> next one what the previous name was, but I would have preferred to
> see this done a bit more cleanly without requiring us to support "a
> dummy entry with name whose length is 0"; a real cache entry never
> has zero-length name, and our code may want to enforce it as a
> sanity check.
>
> I think we can just call create_from_disk() with NULL set to
> previous_ce in the first round; of course, the logic to assign the
> one we just created to previous_ce must check istate->version,
> instead of "is previous_ce NULL?" (which is an indirect way to check
> the same thing used in this patch).

Yeah I kinda hated dummy_entry too but the feeling wasn't strong
enough to move towards the index->version check. I guess I'm going to
do it now.
-- 
Duy


Re: Contributor Summit planning

2018-08-28 Thread Jonathan Nieder
Jonathan Nieder wrote:

> The current IRC experience might be a bit unrepresentative, due to
> https://freenode.net/news/spam-shake:

https://freenode.net/news/spambot-attack may be a better link.

Thanks,
Jonathan


Re: Contributor Summit planning

2018-08-28 Thread Jonathan Nieder
Hi Derrick,

Derrick Stolee wrote:

> A focused aside, since you brought up the online "standup": it seems the IRC
> channel has been less than ideal, with people trying to participate but
> having nickname issues or being muted. You also describe another issue: the
> timing. Having a real-time discussion has its benefits, but also it leaves
> many people out.

For me, the real-time element is the entire point.  If timezones are a
problem for some people, I'm happy to e.g. alternate with a different
hour.

The current IRC experience might be a bit unrepresentative, due to
https://freenode.net/news/spam-shake:

| As you may be aware there has been a prolonged spambot attack
| directed at freenode (and other IRC networks) in recent weeks,
| targeting a number of individuals involved with freenode and the
| wider IRC communities.

A kind person configured the channel to withstand this spam attack.
This involved users having to authenticate to Freenode using
https://freenode.net/kb/answer/registration#nickname-setup so that it
knows whether you are a spammer.  Sorry for the inconvenience.

My offer to +v anyone affected by the channel's current settings still
stands (just /msg me).  Zero people have taken me up on this offer so
far.

> One idea to try next time is to create a mailing list thread asking for
> statuses, and each person can chime in asynchronously and spawn a new
> discussion based on that status. Perhaps we can try that next time.

I don't want to discourage a good idea.  The logical extension of this
(not one thread but a whole list) reminds me of the Kernel Newbies
mailing list , which appears to work well
in that context.  Given my current time commitments, I wouldn't be
able to participate, but I would be happy to see other volunteers set
something like that up if interested.

The usual practice of sending email about current work on progress to
git@vger to get feedback also still works, and that is something I can
commit to continue to spend time on.

Thanks,
Jonathan


Re: [PATCH 2/2] submodule.c: warn about missing submodule git directories

2018-08-28 Thread Junio C Hamano
Stefan Beller  writes:

> This is the continuation of f2d48994dc1 (submodule.c: submodule_move_head
> works with broken submodules, 2017-04-18), which tones down the case of
> "broken submodule" in case of a missing git directory of the submodule to
> be only a warning.

After seeing this warning, as we do not do any remedial action in
this codepath, the user with a repository in this state will keep
seeing the 'missing' message.  Wouldn't we want to give a hint in
addition (e.g. 'you can run "git submodule update $name" to
recover', or something like that)?

The MOVE_HEAD_FORCE codepath that follows this hunk is, eh, already
forcing to correct the situation, so there is no need to touch that,
which makes sense, if I understand correctly.

> Signed-off-by: Stefan Beller 
> ---
>  submodule.c   | 16 
>  t/t2013-checkout-submodule.sh |  2 +-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/submodule.c b/submodule.c
> index 50cbf5f13ed..689439a3d0c 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1641,6 +1641,22 @@ int submodule_move_head(const char *path,
>   } else {
>   char *gitdir = xstrfmt("%s/modules/%s",
>   get_git_common_dir(), sub->name);
> +
> + if (!is_git_directory(gitdir)) {
> + /*
> +  * It is safe to assume we could just clone
> +  * the submodule here, as we passed the
> +  * is_submodule_active test above (i.e. the
> +  * user is interested in this submodule.
> +  *
> +  * However as this code path is exercised
> +  * for operations that typically do not involve
> +  * network operations, let's not do that for 
> now.
> +  */
> + warning(_("Submodule '%s' missing"), path);
> + free(gitdir);
> + return 0;
> + }
>   connect_work_tree_and_git_dir(path, gitdir, 0);
>   free(gitdir);
>  
> diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
> index c69640fc341..82ef4576b91 100755
> --- a/t/t2013-checkout-submodule.sh
> +++ b/t/t2013-checkout-submodule.sh
> @@ -81,7 +81,7 @@ test_expect_success 'setup superproject with historic 
> submodule' '
>   test_must_be_empty super1/.gitmodules
>  '
>  
> -test_expect_failure 'checkout old state with deleted submodule' '
> +test_expect_success 'checkout old state with deleted submodule' '
>   test_when_finished "rm -rf super1 sub1 super1_clone" &&
>   git clone --recurse-submodules super1 super1_clone &&
>   git -C super1_clone checkout --recurse-submodules historic_state


Re: [PATCH v1 00/25] RFC: structured logging

2018-08-28 Thread Jeff Hostetler




On 8/28/2018 1:38 PM, Junio C Hamano wrote:

g...@jeffhostetler.com writes:


From: Jeff Hostetler 

This RFC patch series adds structured logging to git.  The motivation,
...

Jeff Hostetler (25):
   structured-logging: design document
   structured-logging: add STRUCTURED_LOGGING=1 to Makefile
   structured-logging: add structured logging framework
   structured-logging: add session-id to log events
   structured-logging: set sub_command field for branch command
   structured-logging: set sub_command field for checkout command
   structured-logging: t0420 basic tests
   structured-logging: add detail-event facility
   structured-logging: add detail-event for lazy_init_name_hash
   structured-logging: add timer facility
   structured-logging: add timer around do_read_index
   structured-logging: add timer around do_write_index
   structured-logging: add timer around wt-status functions
   structured-logging: add timer around preload_index
   structured-logging: t0420 tests for timers
   structured-logging: add aux-data facility
   structured-logging: add aux-data for index size
   structured-logging: add aux-data for size of sparse-checkout file
   structured-logging: t0420 tests for aux-data
   structured-logging: add structured logging to remote-curl
   structured-logging: add detail-events for child processes
   structured-logging: add child process classification
   structured-logging: t0420 tests for child process detail events
   structured-logging: t0420 tests for interacitve child_summary
   structured-logging: add config data facility



I noticed that Travis job has been failing with a trivially fixable
failure, so I'll push out today's 'pu' with the attached applied on
top.  This may become unapplicable to the code when issues raised in
recent reviews addressed, though.

  structured-logging.c | 6 ++
  1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/structured-logging.c b/structured-logging.c
index 0e3f79ee48..78abcd2e59 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name)
 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__command_name)
-   free(my__command_name);
+   free(my__command_name);
my__command_name = xstrdup(command_name);
  }
  
@@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name)

 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__sub_command_name)
-   free(my__sub_command_name);
+   free(my__sub_command_name);
my__sub_command_name = xstrdup(sub_command_name);
  }
  



Sorry about that.

Let me withdraw the current series.  I'm working on a new version that
addresses the comments on the mailing list.  It combines my logging
with a variation on the nested perf logging that Duy suggested and
the consolidation that you were talking about last week.

Jeff



Re: [PATCH v6] Implement --first-parent for git rev-list --bisect

2018-08-28 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Tiago,
>
> On Tue, 28 Aug 2018, Tiago Botelho wrote:
>
>> This will enable users to implement bisecting on first parents
>> which can be useful for when the commits from a feature branch
>> that we want to merge are not always tested.
>
> This message is still lacking the explanation I asked for, namely for the
> lines:
>
>   @@ -329,6 +334,11 @@ static struct commit_list 
> *do_find_bisection(struct commit_list *list,
>   if (0 <= weight(p))
>   continue;
>   for (q = p->item->parents; q; q = q->next) {
>   +   if ((bisect_flags & BISECT_FIRST_PARENT)) {
>   +   if (weight(q) < 0)
>   +   q = NULL;
>   +   break;
>   +   }
>   if (q->item->object.flags & UNINTERESTING)
>   continue;
>   if (0 <= weight(q))

I've just finished scanning the discussion thread on public-inbox
for v5, v4, v3, v2 and the initial round of this series, but found
your comments only on the tests.  If you have a pointer that would
be great; it also is OK to say what kind of explanation is needed
for that addition again.  

FWIW I too was puzzled about the correctness of the added logic
above, especially the part that reads weight(q) before checking if
it is not UNINTERESTING, but I covered it on a separate message.

> I would have preferred to reuse the already existing commits generated in
> the `setup` phase rather than generating yet another batch, and to not
> introduce an inconsistent way to present a commit graph (compare your
> diagram with the one in
> https://github.com/git/git/blob/v2.18.0/t/t6002-rev-list-bisect.sh#L64-L90
> i.e. *in the same file*)

As I already said in the previous round, I do agree with these.
That is, ...

>> diff --git a/t/t6002-rev-list-bisect.sh b/t/t6002-rev-list-bisect.sh
>> index a66140803..1bc297de5 100755
>> --- a/t/t6002-rev-list-bisect.sh
>> +++ b/t/t6002-rev-list-bisect.sh
>> @@ -263,4 +263,62 @@ test_expect_success 'rev-parse --bisect can default to 
>> good/bad refs' '
>>  test_cmp expect.sorted actual.sorted
>>  '
>>  
>> +# We generate the following commit graph:
>> +#
>> +#   B -- C
>> +#  /  \
>> +# AFX
>> +#  \  /
>> +#   D - CC - EX
>> +
>> +test_expect_success 'setup' '
>> +  test_commit A &&
>> +  test_commit B &&
>> +  test_commit C &&
>> +  git reset --hard A &&
>> +  test_commit D &&
>> +  test_commit CC &&
>> +  test_commit EX &&
>> +  test_merge FX C
>> +'

... the above graph construction should not be necessary.  An
earlier part of t6002 would have already created a history of
suitable shape to use for writing the following tests.

>> +test_output_expect_success "--bisect --first-parent" 'git rev-list --bisect 
>> --first-parent FX ^A' <> +$(git rev-parse CC)
>> +EOF
>> +
>> +test_output_expect_success "--first-parent" 'git rev-list --first-parent FX 
>> ^A' <> +$(git rev-parse FX)
>> +$(git rev-parse EX)
>> +$(git rev-parse CC)
>> +$(git rev-parse D)
>> +EOF
>> +
>> +test_output_expect_success "--bisect-vars --first-parent" 'git rev-list 
>> --bisect-vars --first-parent FX ^A' <> +bisect_rev='$(git rev-parse CC)'
>> +bisect_nr=1
>> +bisect_good=1
>> +bisect_bad=1
>> +bisect_all=4
>> +bisect_steps=1
>> +EOF
>> +
>> +test_expect_success "--bisect-all --first-parent" '
>> +cat >expect <> +$(git rev-parse CC) (dist=2)
>> +$(git rev-parse EX) (dist=1)
>> +$(git rev-parse D) (dist=1)
>> +$(git rev-parse FX) (dist=0)
>> +EOF
>> +
>> +# Make sure we have the same entries, nothing more, nothing less
>> +git rev-list --bisect-all --first-parent FX ^A >actual &&
>> +  sort actual >actual.sorted &&
>> +  sort expect >expect.sorted &&
>> +  test_cmp expect.sorted actual.sorted &&
>> +  # Make sure the entries are sorted in the dist order
>> +  sed -e "s/.*(dist=\([1-9]*[0-9]\)).*/\1/" actual >actual.dists &&
>> +  sort -r actual.dists >actual.dists.sorted &&
>> +  test_cmp actual.dists.sorted actual.dists
>> +'
>> +
>>  test_done
>> -- 
>> 2.16.3
>> 
>> 


RE: Git clean removing tracked files semi-regularly

2018-08-28 Thread Brennan Conroy
I'm using windows. Have had it happen on 8.1 and 10.

"git status" shows "nothing to commit, working tree clean". I am using git 
version 2.17.1.windows.2

-Original Message-
From: brian m. carlson  
Sent: Monday, August 27, 2018 5:58 PM
To: Brennan Conroy 
Cc: git@vger.kernel.org
Subject: Re: Git clean removing tracked files semi-regularly

On Mon, Aug 27, 2018 at 07:06:59PM +, Brennan Conroy wrote:
> Hello, I work on a project that uses git. "git clean -xdf" is a common 
> command to run to clean up the environment, however sometimes this 
> command deletes the entire contents of two of the folders and all the 
> files it deletes are being tracked which according to the 
> documentation should not be deleted.
> 
> The project is located at https://github.com/aspnet/SignalR and the 
> two folders it likes to delete are 
> https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr-
> protocol-msgpack and 
> https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr
> 
> If you need me to collect git logs etc. please don't hesitate to ask!

It would be helpful to know more about your environment.  Is this only 
reproducible on certain operating systems (e.g. case-insensitive ones)?
What version of Git are you using?  What does "git status" say before you run 
git clean?
--
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


Re: [PATCH] add -p: coalesce hunks before testing applicability

2018-08-28 Thread Junio C Hamano
Jochen Sprickerhof  writes:

> When a hunk was split before being edited manually, it does not apply
> anymore cleanly. Apply coalesce_overlapping_hunks() first to make it
> work. Enable test for it as well.
>
> Signed-off-by: Jochen Sprickerhof 
> ---
>  git-add--interactive.perl  | 8 
>  t/t3701-add-interactive.sh | 2 +-
>  2 files changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 36f38ced9..c9f434e4a 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -1195,10 +1195,10 @@ sub edit_hunk_loop {
>   # delta from the original unedited hunk.
>   $hunk->{OFS_DELTA} and
>   $newhunk->{OFS_DELTA} += $hunk->{OFS_DELTA};
> - if (diff_applies($head,
> -  @{$hunks}[0..$ix-1],
> -  $newhunk,
> -  @{$hunks}[$ix+1..$#{$hunks}])) {
> + my @hunk = @{$hunks};
> + splice (@hunk, $ix, 1, $newhunk);
> + @hunk = coalesce_overlapping_hunks(@hunk);
> + if (diff_applies($head, @hunk)) {
>   $newhunk->{DISPLAY} = [color_diff(@{$newtext})];
>   return $newhunk;
>   }

OK.  I think I understand how this may be needed in certain forms of
edit.  I do not think coalesce would reliably work against arbitrary
kind of edit, but the function is called at the end of the loop of
the caller of this function anyway, so it is not like this is making
anything worse at all.  Let's queue it and try it out.

Thanks.

All of the following is a tangent for future/further work, and
should not be done as part of your patch.  While this change may
work around the immediate issue of diff_applies() returning "no", it
makes it feel a bit iffy to keep the interface to return $newhunk.

With the current interface, the function is saying the caller "here
is a text that shows a new hunk, when spliced back into the @hunk
array and coalesced together with others, would apply cleanly to the
current index".  But the corrected code is already doing a trial
splice with trial coalescing anyway, so perhaps it may make more
sense to update the interface into this function for the caller to
say "the user asks to edit $ix'th hunk in @$hunks" and the function
to answer "Here is the edited result in @$hunks; I've made sure it
would cleanly apply".

Incidentally, that would make it possible in the future to allow
edit_hunk_loop to be told "the user wants to edit this $ix'th hunk",
allow the editor to split that hunk into multiple hunks, and return
the result by stuffing them (not just a single $newhunk) into the
@hunk array.  The caller's loop could then further select or join
these hunks---such an interaction is impossible with the current
interface that allows edit_hunk_loop to take a single hunk and
return a single newhunk.

But again, all of the above is a tangent for future/further work,
and should not be done as part of your patch.


Re: [PATCH v1 00/25] RFC: structured logging

2018-08-28 Thread Junio C Hamano
g...@jeffhostetler.com writes:

> From: Jeff Hostetler 
>
> This RFC patch series adds structured logging to git.  The motivation,
> ...
>
> Jeff Hostetler (25):
>   structured-logging: design document
>   structured-logging: add STRUCTURED_LOGGING=1 to Makefile
>   structured-logging: add structured logging framework
>   structured-logging: add session-id to log events
>   structured-logging: set sub_command field for branch command
>   structured-logging: set sub_command field for checkout command
>   structured-logging: t0420 basic tests
>   structured-logging: add detail-event facility
>   structured-logging: add detail-event for lazy_init_name_hash
>   structured-logging: add timer facility
>   structured-logging: add timer around do_read_index
>   structured-logging: add timer around do_write_index
>   structured-logging: add timer around wt-status functions
>   structured-logging: add timer around preload_index
>   structured-logging: t0420 tests for timers
>   structured-logging: add aux-data facility
>   structured-logging: add aux-data for index size
>   structured-logging: add aux-data for size of sparse-checkout file
>   structured-logging: t0420 tests for aux-data
>   structured-logging: add structured logging to remote-curl
>   structured-logging: add detail-events for child processes
>   structured-logging: add child process classification
>   structured-logging: t0420 tests for child process detail events
>   structured-logging: t0420 tests for interacitve child_summary
>   structured-logging: add config data facility


I noticed that Travis job has been failing with a trivially fixable
failure, so I'll push out today's 'pu' with the attached applied on
top.  This may become unapplicable to the code when issues raised in
recent reviews addressed, though.

 structured-logging.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/structured-logging.c b/structured-logging.c
index 0e3f79ee48..78abcd2e59 100644
--- a/structured-logging.c
+++ b/structured-logging.c
@@ -593,8 +593,7 @@ void slog_set_command_name(const char *command_name)
 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__command_name)
-   free(my__command_name);
+   free(my__command_name);
my__command_name = xstrdup(command_name);
 }
 
@@ -606,8 +605,7 @@ void slog_set_sub_command_name(const char *sub_command_name)
 * the cmd_() and/or it may be too early to force a
 * lazy load.
 */
-   if (my__sub_command_name)
-   free(my__sub_command_name);
+   free(my__sub_command_name);
my__sub_command_name = xstrdup(sub_command_name);
 }
 
-- 
2.19.0-rc0-48-gb9dfa238d5




Re: [PATCH 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-08-28 Thread Jonathan Nieder
Johannes Schindelin wrote:
> On Mon, 27 Aug 2018, Junio C Hamano wrote:
>> Johannes Schindelin  writes:
>>> Jonathan Nieder wrote:

 Please include this information in the commit message.  It's super
 helpful to find this kind of information about why a patch does what
 it does when encountering a patch later "in the wild" (in git log -S
 output).
[...]
>> I think what Jonathan finds helpful is the other half of the story
>
> I will await Jonathan's clarification.

Junio's understanding is correct.

More generally, I greatly appreciate the kind of motivation and
backstory that you write in cover letters, and I wish that more of
that would find its way into the commit messages instead.  Really I
wish (and don't take this the wrong way --- I am not asking you to
write it unless it's your own itch) that GitGitGadget would put the
cover letter in single-patch series after the "---" line in the
individual patches, since that would make it easier for reviewers to
point out what content from the cover letter would be useful to add to
the commit message.

That said, this is minor and not a reason to reroll this patch.  It was
more that I wanted to give the hint for later patches.

Thanks much and hope that helps,
Jonathan


  1   2   >