[PATCH v2 0/1] Some left-over add-on for bw/config-h

2018-11-14 Thread Johannes Schindelin via GitGitGadget
Back when bw/config-h was developed (and backported to Git for Windows), I
came up with a patch to use git_dir if commondir is NULL, and contributed
that as v1 of this patch. However, it was deemed a bug if that happens, so
let's instead detect that condition and report it.

Change since v1:

 * Be loud about this bug instead of papering over it.

Johannes Schindelin (1):
  config: report a bug if git_dir exists without commondir

 config.c | 2 ++
 1 file changed, 2 insertions(+)


base-commit: 8858448bb49332d353febc078ce4a3abcc962efe
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-78%2Fdscho%2Fbw%2Fconfig-h-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-78/dscho/bw/config-h-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/78

Range-diff vs v1:

 1:  a3854e4ed8 ! 1:  0767f98378 do_git_config_sequence(): fall back to git_dir 
if commondir is NULL
 @@ -1,8 +1,9 @@
  Author: Johannes Schindelin 
  
 -do_git_config_sequence(): fall back to git_dir if commondir is NULL
 +config: report a bug if git_dir exists without commondir
  
 -Just some defensive programming.
 +This did happen at some stage, and was fixed relatively quickly. Make
 +sure that we detect very quickly, too, should that happen again.
  
  Signed-off-by: Johannes Schindelin 
  
 @@ -14,7 +15,7 @@
if (opts->commondir)
repo_config = mkpathdup("%s/config", opts->commondir);
  + else if (opts->git_dir)
 -+ repo_config = mkpathdup("%s/config", opts->git_dir);
 ++ BUG("git_dir without commondir");
else
repo_config = NULL;
   

-- 
gitgitgadget


Re: [PATCH 0/1] Some left-over add-on for bw/config-h

2018-11-13 Thread Jeff King
On Wed, Nov 14, 2018 at 02:52:24PM +0900, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > Back when bw/config-h was developed (and backported to Git for Windows), I
> > came up with this patch. It seems to not be strictly necessary, but I like
> > the safety of falling back to the Git directory when no common directory is
> > configured (for whatever reason).
> 
> Shouldn't that be diagnosed as an error with BUG()?  That is, if we
> know there is the current repository, and if commondir is not set,
> isn't it a bug that we want to look into in the start-up sequence?
> 
> The phrase "for whatever reason" makes me wonder if this is truly
> "defensive programming", rather than just sweeping the bug under the
> rug.
> 
> FWIW, with this toy patch applied on top of this patch, all tests
> that I usually run seem to pass.

Heh, I independently made the same change after reading the patch and
came here to report similar results.

I think the key thing here is that git_dir is never meant to be used as
a source for config. It is there to let the "includeIf gitdir" directive
work. So it would indeed be surprising and a bug if we had one but not
the other.

-Peff


Re: [PATCH 0/1] Some left-over add-on for bw/config-h

2018-11-13 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> Back when bw/config-h was developed (and backported to Git for Windows), I
> came up with this patch. It seems to not be strictly necessary, but I like
> the safety of falling back to the Git directory when no common directory is
> configured (for whatever reason).

Shouldn't that be diagnosed as an error with BUG()?  That is, if we
know there is the current repository, and if commondir is not set,
isn't it a bug that we want to look into in the start-up sequence?

The phrase "for whatever reason" makes me wonder if this is truly
"defensive programming", rather than just sweeping the bug under the
rug.

FWIW, with this toy patch applied on top of this patch, all tests
that I usually run seem to pass.

 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index c5726af14d..e667e2ebce 100644
--- a/config.c
+++ b/config.c
@@ -1669,7 +1669,7 @@ static int do_git_config_sequence(const struct 
config_options *opts,
if (opts->commondir)
repo_config = mkpathdup("%s/config", opts->commondir);
else if (opts->git_dir)
-   repo_config = mkpathdup("%s/config", opts->git_dir);
+   BUG("git-dir exists but no commondir?");
else
repo_config = NULL;
 


[PATCH 0/1] Some left-over add-on for bw/config-h

2018-11-13 Thread Johannes Schindelin via GitGitGadget
Back when bw/config-h was developed (and backported to Git for Windows), I
came up with this patch. It seems to not be strictly necessary, but I like
the safety of falling back to the Git directory when no common directory is
configured (for whatever reason).

Johannes Schindelin (1):
  do_git_config_sequence(): fall back to git_dir if commondir is NULL

 config.c | 2 ++
 1 file changed, 2 insertions(+)


base-commit: 8858448bb49332d353febc078ce4a3abcc962efe
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-78%2Fdscho%2Fbw%2Fconfig-h-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-78/dscho/bw/config-h-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/78
-- 
gitgitgadget


[PATCH v4 2/3] git.c: handle_alias: prepend alias info when first argument is -h

2018-10-09 Thread Rasmus Villemoes
Most git commands respond to -h anywhere in the command line, or at
least as a first and lone argument, by printing the usage
information. For aliases, we can provide a little more information that
might be useful in interpreting/understanding the following output by
prepending a line telling that the command is an alias, and for what.

When one invokes a simple alias, such as "cp = cherry-pick"
with -h, this results in

$ git cp -h
'cp' is aliased to 'cherry-pick'
usage: git cherry-pick [] ...
...

When the alias consists of more than one word, this provides the
additional benefit of informing the user which options are implicit in
using the alias, e.g. with "cp = cherry-pick -n":

$ git cp -h
'cp' is aliased to 'cherry-pick -n'
usage: git cherry-pick [] ...
...

For shell commands, we cannot know how it responds to -h, but printing
this line to stderr should not hurt, and can help in figuring out what
is happening in a case like

$ git sc -h
'sc' is aliased to '!somecommand'
somecommand: invalid option '-h'

Suggested-by: Jeff King 
Signed-off-by: Rasmus Villemoes 
---
 git.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git.c b/git.c
index a6f4b44af5..0211c2d4c0 100644
--- a/git.c
+++ b/git.c
@@ -318,6 +318,9 @@ static int handle_alias(int *argcp, const char ***argv)
alias_command = (*argv)[0];
alias_string = alias_lookup(alias_command);
if (alias_string) {
+   if (*argcp > 1 && !strcmp((*argv)[1], "-h"))
+   fprintf_ln(stderr, _("'%s' is aliased to '%s'"),
+  alias_command, alias_string);
if (alias_string[0] == '!') {
struct child_process child = CHILD_PROCESS_INIT;
int nongit_ok;
-- 
2.19.1.4.g721af0fda3



[PATCH v3 2/3] git.c: handle_alias: prepend alias info when first argument is -h

2018-10-03 Thread Rasmus Villemoes
Most git commands respond to -h anywhere in the command line, or at
least as a first and lone argument, by printing the usage
information. For aliases, we can provide a little more information that
might be useful in interpreting/understanding the following output by
prepending a line telling that the command is an alias, and for what.

When one invokes a simple alias, such as "cp = cherry-pick"
with -h, this results in

$ git cp -h
'cp' is aliased to 'cherry-pick'
usage: git cherry-pick [] ...
...

When the alias consists of more than one word, this provides the
additional benefit of informing the user which options are implicit in
using the alias, e.g. with "cp = cherry-pick -n":

$ git cp -h
'cp' is aliased to 'cherry-pick -n'
usage: git cherry-pick [] ...
...

For shell commands, we cannot know how it responds to -h, but printing
this line to stderr should not hurt, and can help in figuring out what
is happening in a case like

$ git sc -h
'sc' is aliased to '!somecommand'
somecommand: invalid option '-h'

Suggested-by: Jeff King 
Signed-off-by: Rasmus Villemoes 
---
 git.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git.c b/git.c
index a6f4b44af5..0211c2d4c0 100644
--- a/git.c
+++ b/git.c
@@ -318,6 +318,9 @@ static int handle_alias(int *argcp, const char ***argv)
alias_command = (*argv)[0];
alias_string = alias_lookup(alias_command);
if (alias_string) {
+   if (*argcp > 1 && !strcmp((*argv)[1], "-h"))
+   fprintf_ln(stderr, _("'%s' is aliased to '%s'"),
+  alias_command, alias_string);
if (alias_string[0] == '!') {
struct child_process child = CHILD_PROCESS_INIT;
int nongit_ok;
-- 
2.19.0



Re: [PATCH v2 2/3] git.c: handle_alias: prepend alias info when first argument is -h

2018-10-02 Thread Jeff King
On Mon, Oct 01, 2018 at 01:21:06PM +0200, Rasmus Villemoes wrote:

> Most git commands respond to -h anywhere in the command line, or at
> least as a first and lone argument, by printing the usage
> information. For aliases, we can provide a little more information that
> might be useful in interpreting/understanding the following output by
> prepending a line telling that the command is an alias, and for what.
> 
> When one invokes a simple alias, such as "cp = cherry-pick"
> with -h, this results in
> 
> $ git cp -h
> 'cp' is aliased to 'cherry-pick'
> usage: git cherry-pick [] ...
> ...
> 
> When the alias consists of more than one word, this provides the
> additional benefit of informing the user which options are implicit in
> using the alias, e.g. with "cp = cherry-pick -n":
> 
> $ git cp -h
> 'cp' is aliased to 'cherry-pick -n'
> usage: git cherry-pick [] ...
> ...
> 
> For shell commands, we cannot know how it responds to -h, but printing
> this line to stderr should not hurt, and can help in figuring out what
> is happening in a case like
> 
> $ git sc -h
> 'sc' is aliased to '!somecommand'
> somecommand: invalid option '-h'

Nicely explained.

> diff --git a/git.c b/git.c
> index a6f4b44af5..0211c2d4c0 100644
> --- a/git.c
> +++ b/git.c
> @@ -318,6 +318,9 @@ static int handle_alias(int *argcp, const char ***argv)
>   alias_command = (*argv)[0];
>   alias_string = alias_lookup(alias_command);
>   if (alias_string) {
> + if (*argcp > 1 && !strcmp((*argv)[1], "-h"))
> + fprintf_ln(stderr, _("'%s' is aliased to '%s'"),
> +alias_command, alias_string);
>   if (alias_string[0] == '!') {
>   struct child_process child = CHILD_PROCESS_INIT;
>   int nongit_ok;

And the implementation makes sense.

-Peff


[PATCH v2 2/3] git.c: handle_alias: prepend alias info when first argument is -h

2018-10-01 Thread Rasmus Villemoes
Most git commands respond to -h anywhere in the command line, or at
least as a first and lone argument, by printing the usage
information. For aliases, we can provide a little more information that
might be useful in interpreting/understanding the following output by
prepending a line telling that the command is an alias, and for what.

When one invokes a simple alias, such as "cp = cherry-pick"
with -h, this results in

$ git cp -h
'cp' is aliased to 'cherry-pick'
usage: git cherry-pick [] ...
...

When the alias consists of more than one word, this provides the
additional benefit of informing the user which options are implicit in
using the alias, e.g. with "cp = cherry-pick -n":

$ git cp -h
'cp' is aliased to 'cherry-pick -n'
usage: git cherry-pick [] ...
...

For shell commands, we cannot know how it responds to -h, but printing
this line to stderr should not hurt, and can help in figuring out what
is happening in a case like

$ git sc -h
'sc' is aliased to '!somecommand'
somecommand: invalid option '-h'

Suggested-by: Jeff King 
Signed-off-by: Rasmus Villemoes 
---
 git.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/git.c b/git.c
index a6f4b44af5..0211c2d4c0 100644
--- a/git.c
+++ b/git.c
@@ -318,6 +318,9 @@ static int handle_alias(int *argcp, const char ***argv)
alias_command = (*argv)[0];
alias_string = alias_lookup(alias_command);
if (alias_string) {
+   if (*argcp > 1 && !strcmp((*argv)[1], "-h"))
+   fprintf_ln(stderr, _("'%s' is aliased to '%s'"),
+  alias_command, alias_string);
if (alias_string[0] == '!') {
struct child_process child = CHILD_PROCESS_INIT;
int nongit_ok;
-- 
2.19.0



[PATCH v5 1/9] fetch: change "branch" to "reference" in --force -h output

2018-08-31 Thread Ævar Arnfjörð Bjarmason
The -h output has been referring to the --force command as forcing the
overwriting of local branches, but since "fetch" more generally
fetches all sorts of references in all refs/ namespaces, let's talk
about forcing the update of a a "reference" instead.

This wording was initially introduced in 8320199873 ("Rewrite
builtin-fetch option parsing to use parse_options().", 2007-12-04).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/fetch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..b0706b3803 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -114,7 +114,7 @@ static struct option builtin_fetch_options[] = {
 N_("append to .git/FETCH_HEAD instead of overwriting")),
OPT_STRING(0, "upload-pack", _pack, N_("path"),
   N_("path to upload pack on remote end")),
-   OPT__FORCE(, N_("force overwrite of local branch"), 0),
+   OPT__FORCE(, N_("force overwrite of local reference"), 0),
OPT_BOOL('m', "multiple", ,
 N_("fetch from multiple remotes")),
OPT_SET_INT('t', "tags", ,
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH v4 1/6] fetch: change "branch" to "reference" in --force -h output

2018-08-30 Thread Ævar Arnfjörð Bjarmason
The -h output has been referring to the --force command as forcing the
overwriting of local branches, but since "fetch" more generally
fetches all sorts of references in all refs/ namespaces, let's talk
about forcing the update of a a "reference" instead.

This wording was initially introduced in 8320199873 ("Rewrite
builtin-fetch option parsing to use parse_options().", 2007-12-04).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 builtin/fetch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 61bec5d213..b0706b3803 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -114,7 +114,7 @@ static struct option builtin_fetch_options[] = {
 N_("append to .git/FETCH_HEAD instead of overwriting")),
OPT_STRING(0, "upload-pack", _pack, N_("path"),
   N_("path to upload pack on remote end")),
-   OPT__FORCE(, N_("force overwrite of local branch"), 0),
+   OPT__FORCE(, N_("force overwrite of local reference"), 0),
OPT_BOOL('m', "multiple", ,
 N_("fetch from multiple remotes")),
OPT_SET_INT('t', "tags", ,
-- 
2.19.0.rc1.350.ge57e33dbd1



[PATCH v5 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder
From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.

Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Because the inefficiency primarily arises when an
object is deltified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.

So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile|   1 +
 delta-islands.c | 493 
 delta-islands.h |  11 ++
 pack-objects.h  |   4 +
 4 files changed, 509 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Makefile b/Makefile
index e3364a42a5..5967198c0a 100644
--- a/Makefile
+++ b/Makefile
@@ -847,6 +847,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..e7123d44a3
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,493 @@
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[FLEX_ARRAY];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+   }
+
+   return 1;
+}
+
+#define ISLAND_BITMAP_BLOCK(x) (x / 32)
+#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
+
+static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
+{
+   self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
+}
+
+static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
+{
+   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
0;
+}
+
+int in_same_island(const struct object_id *trg_oid, const struct object_id 
*src_oid)
+{
+   khiter_t trg_pos, src_pos;
+
+   /* If we aren't using islands

Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder
On Mon, Aug 13, 2018 at 9:00 PM, Jeff King  wrote:
> On Mon, Aug 13, 2018 at 05:33:59AM +0200, Christian Couder wrote:
>
>> >> + memcpy(_core, oid->hash, sizeof(uint64_t));
>> >> + rl->hash += sha_core;
>> >
>> > Hmm, so the first 64-bits of the oid of each ref that is part of
>> > this island is added together as a 'hash' for the island. And this
>> > is used to de-duplicate the islands? Any false positives? (does it
>> > matter - it would only affect performance, not correctness, right?)
>>
>> I would think that a false positive from pure chance is very unlikely.
>> We would need to approach billions of delta islands (as 2 to the power
>> 64/2 is in the order of billions) for the probability to be
>> significant. GitHub has less than 50 millions users and it is very
>> unlikely that a significant proportion of these users will fork the
>> same repo.
>>
>> Now if there is a false positive because two forks have exactly the
>> same refs, then it is not a problem if they are considered the same,
>> because they are actually the same.
>
> Right, the idea is to find such same-ref setups to avoid spending a
> pointless bit in the per-object bitmap. In the GitHub setup, it would be
> an indication that two people forked at exactly the same time, so they
> have the same refs and the same delta requirements. If one of them later
> updates, that relationship would change at the next repack.
>
> I don't know that we ever collected numbers for how often this happens.
> So let me see if I can dig some up.
>
> On our git/git repository network, it looks like we have ~14k forks, and
> ~4k are unique by this hashing scheme. So it really is saving us
> 10k-bits per bitmap. That's over 1k-byte per object in the worst case.
> There are ~24M objects (many times what is in git.git, but people push
> lots of random things to their forks), so that's saving us up to 24GB in
> RAM. Of course it almost certainly isn't that helpful in practice, since
> we copy-on-write the bitmaps to avoid the full cost per object. But I
> think it's fair to say it is helping (more numbers below).

[...]

> So all in all (and I'd emphasize this is extremely rough) I think it
> probably costs about 2GB for the feature in this particular case. But
> you need much more to repack at this size sanely anyway.

Thanks for the interesting numbers!


Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-16 Thread Christian Couder
On Mon, Aug 13, 2018 at 8:11 PM, Jeff King  wrote:
> On Mon, Aug 13, 2018 at 01:17:18PM +0100, Ramsay Jones wrote:
>
>> >>> +struct island_bitmap {
>> >>> + uint32_t refcount;
>> >>> + uint32_t bits[];
>> >>
>> >> Use FLEX_ARRAY here? We are slowly moving toward requiring
>> >> certain C99 features, but I can't remember a flex array
>> >> weather-balloon patch.
>> >
>> > This was already discussed by Junio and Peff there:
>> >
>> > https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/
>>
>> That is a fine discussion, without a firm conclusion, but I don't
>> think you can simply do nothing here:
>>
>>   $ cat -n junk.c
>>1  #include 
>>2
>>3  struct island_bitmap {
>>4  uint32_t refcount;
>>5  uint32_t bits[];
>>6  };
>>7
>>   $ gcc --std=c89 --pedantic -c junk.c
>>   junk.c:5:11: warning: ISO C90 does not support flexible array members 
>> [-Wpedantic]
>> uint32_t bits[];
>>  ^~~~
>>   $ gcc --std=c99 --pedantic -c junk.c
>
> Right, whether we use the FLEX_ALLOC macros or not, this needs to be
> declared with FLEX_ARRAY, not an empty "[]".

Ok, it will use FLEX_ARRAY in the reroll I will send soon.

Thanks Ramsay and Peff!


[PATCHv4 5/6] compat/precompose_utf8.h: use more common include guard style

2018-08-15 Thread Elijah Newren
Reviewed-by: Jonathan Nieder 
Signed-off-by: Elijah Newren 
---
 compat/precompose_utf8.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index a94e7c4342..6f843d3e1a 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -1,4 +1,6 @@
 #ifndef PRECOMPOSE_UNICODE_H
+#define PRECOMPOSE_UNICODE_H
+
 #include 
 #include 
 #include 
@@ -41,5 +43,4 @@ int precompose_utf8_closedir(PREC_DIR *dirp);
 #define DIR PREC_DIR
 #endif /* PRECOMPOSE_UNICODE_C */
 
-#define  PRECOMPOSE_UNICODE_H
 #endif /* PRECOMPOSE_UNICODE_H */
-- 
2.18.0.553.g74975b7909



Re: [PATCHv3 5/6] compat/precompose_utf8.h: use more common include guard style

2018-08-14 Thread Jonathan Nieder
Elijah Newren wrote:

> Signed-off-by: Elijah Newren 
> ---
>  compat/precompose_utf8.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

The more unusual style is less likely to be recognized by compilers, so
we can waste some I/O re-reading the header at compile time.

> --- a/compat/precompose_utf8.h
> +++ b/compat/precompose_utf8.h
> @@ -1,4 +1,6 @@
>  #ifndef PRECOMPOSE_UNICODE_H
> +#define PRECOMPOSE_UNICODE_H
> +
>  #include 
>  #include 
>  #include 

Not about this patch: these headers are #include-d in git-compat-util.h
which we assume to be already included first before anything else.  Can
we remove these redundant #includes?


Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-13 Thread Jeff King
On Mon, Aug 13, 2018 at 05:33:59AM +0200, Christian Couder wrote:

> >> + memcpy(_core, oid->hash, sizeof(uint64_t));
> >> + rl->hash += sha_core;
> >
> > Hmm, so the first 64-bits of the oid of each ref that is part of
> > this island is added together as a 'hash' for the island. And this
> > is used to de-duplicate the islands? Any false positives? (does it
> > matter - it would only affect performance, not correctness, right?)
> 
> I would think that a false positive from pure chance is very unlikely.
> We would need to approach billions of delta islands (as 2 to the power
> 64/2 is in the order of billions) for the probability to be
> significant. GitHub has less than 50 millions users and it is very
> unlikely that a significant proportion of these users will fork the
> same repo.
> 
> Now if there is a false positive because two forks have exactly the
> same refs, then it is not a problem if they are considered the same,
> because they are actually the same.

Right, the idea is to find such same-ref setups to avoid spending a
pointless bit in the per-object bitmap. In the GitHub setup, it would be
an indication that two people forked at exactly the same time, so they
have the same refs and the same delta requirements. If one of them later
updates, that relationship would change at the next repack.

I don't know that we ever collected numbers for how often this happens.
So let me see if I can dig some up.

On our git/git repository network, it looks like we have ~14k forks, and
~4k are unique by this hashing scheme. So it really is saving us
10k-bits per bitmap. That's over 1k-byte per object in the worst case.
There are ~24M objects (many times what is in git.git, but people push
lots of random things to their forks), so that's saving us up to 24GB in
RAM. Of course it almost certainly isn't that helpful in practice, since
we copy-on-write the bitmaps to avoid the full cost per object. But I
think it's fair to say it is helping (more numbers below).

The distribution of buckets itself looks pretty power-law-ish from
eyeballing it. The biggest one has 92 forks, and then it trails off
quickly to smaller numbers, and then eventually a long tail of
singletons.

The numbers for torvalds/linux are similar. There are ~23k forks, only
~8k of which are unique. rails/rails has 12k/17k unique. So there's some
fluctuation, but the notion that there will be a bunch of identical
forks seems solid.

If you're curious what the actual memory usage is for a repack of
git/git, here are some rough numbers from eyeballing RSS via "top" while
watching the repack progress meter (I didn't want to do a full valgrind
run because it's S-L-O-W):

  - after loading the islands, we're at ~3GB RSS. Some of that is shared
(the mmap'd packed-refs file is 1GB by itself), but probably 1-1.5GB
is real heap, likely from the gigantic fork list.

  - we grow to around 6GB RSS before starting "counting objects". I
think this is largely prepare_revision_walk(), since we're making
"struct commit" objects for all the ref tips (and this is mmap-ing
packfiles, too, which counts against our RSS)

  - by the time "counting objects" is done, we're at ~10GB. We're not
yet running with Duy's memory-slimming for pack-objects, so it's
100+ bytes per object_entry. Which means we're probably only
spending a GB or so on island bitmaps, but at this point we've only
marked commits and root-trees.

  - By the time "propagating island marks" is done, we'll have marked
all the trees and blobs, and RSS is up to ~11GB.

So all in all (and I'd emphasize this is extremely rough) I think it
probably costs about 2GB for the feature in this particular case. But
you need much more to repack at this size sanely anyway.

If I were doing this patch series from scratch, I might implement the
fork de-duping and copy-on-write bits successively so I could take good
measurements of how much they help (and brag about it in the commit
messages). But I don't know if it's worth untangling now or not.

-Peff


Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-13 Thread Jeff King
On Mon, Aug 13, 2018 at 01:17:18PM +0100, Ramsay Jones wrote:

> >>> +struct island_bitmap {
> >>> + uint32_t refcount;
> >>> + uint32_t bits[];
> >>
> >> Use FLEX_ARRAY here? We are slowly moving toward requiring
> >> certain C99 features, but I can't remember a flex array
> >> weather-balloon patch.
> > 
> > This was already discussed by Junio and Peff there:
> > 
> > https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/
> > 
> 
> That is a fine discussion, without a firm conclusion, but I don't
> think you can simply do nothing here:
> 
>   $ cat -n junk.c
>1  #include 
>2  
>3  struct island_bitmap {
>4  uint32_t refcount;
>5  uint32_t bits[];
>6  };
>7  
>   $ gcc --std=c89 --pedantic -c junk.c
>   junk.c:5:11: warning: ISO C90 does not support flexible array members 
> [-Wpedantic]
> uint32_t bits[];
>  ^~~~
>   $ gcc --std=c99 --pedantic -c junk.c

Right, whether we use the FLEX_ALLOC macros or not, this needs to be
declared with FLEX_ARRAY, not an empty "[]".

I'm fine either way on using the FLEX_ALLOC macros.

> >> ... Ah, OK, trg_ => target.
> > 
> > I am ok to replace "trg" with "target" (or maybe "dst"? or something
> > else) and "src" with "source" if you think it would make things
> > clearer.
> 
> If it had been dst_ (or target), I would not have had a 'huh?'
> moment, but it is not all that important.

FWIW, these are all inherited from try_delta(), etc, in the existing
code. I'm fine with using another term, but we should probably do it
universally. And if we do, probably "base" is a better name than "src",
since the direction depends on which part of the relationship you are
considering. I'm not sure what that makes "dst".

-Peff


[PATCHv3 5/6] compat/precompose_utf8.h: use more common include guard style

2018-08-13 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 compat/precompose_utf8.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index a94e7c4342..6f843d3e1a 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -1,4 +1,6 @@
 #ifndef PRECOMPOSE_UNICODE_H
+#define PRECOMPOSE_UNICODE_H
+
 #include 
 #include 
 #include 
@@ -41,5 +43,4 @@ int precompose_utf8_closedir(PREC_DIR *dirp);
 #define DIR PREC_DIR
 #endif /* PRECOMPOSE_UNICODE_C */
 
-#define  PRECOMPOSE_UNICODE_H
 #endif /* PRECOMPOSE_UNICODE_H */
-- 
2.18.0.549.gd4454f3f9b



Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-13 Thread Ramsay Jones



On 13/08/18 04:33, Christian Couder wrote:
> On Mon, Aug 13, 2018 at 3:14 AM, Ramsay Jones
[snip]
>> And neither the sha1 or str hash-maps are destroyed here.
>> (That is not necessarily a problem, of course! ;-) )
> 
> The instances are declared as static:
> 
>   static khash_sha1 *island_marks;
> 
>   static kh_str_t *remote_islands;
> 
> so it maybe ok.

Yes, that's fine.

> 
>>> +struct island_bitmap {
>>> + uint32_t refcount;
>>> + uint32_t bits[];
>>
>> Use FLEX_ARRAY here? We are slowly moving toward requiring
>> certain C99 features, but I can't remember a flex array
>> weather-balloon patch.
> 
> This was already discussed by Junio and Peff there:
> 
> https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/
> 

That is a fine discussion, without a firm conclusion, but I don't
think you can simply do nothing here:

  $ cat -n junk.c
   1#include 
   2
   3struct island_bitmap {
   4uint32_t refcount;
   5uint32_t bits[];
   6};
   7
  $ gcc --std=c89 --pedantic -c junk.c
  junk.c:5:11: warning: ISO C90 does not support flexible array members 
[-Wpedantic]
uint32_t bits[];
 ^~~~
  $ gcc --std=c99 --pedantic -c junk.c
  $ 
  
>>> +};
> 
>>> +int in_same_island(const struct object_id *trg_oid, const struct object_id 
>>> *src_oid)
>>
>> Hmm, what does the trg_ prefix stand for?
>>
>>> +{
>>> + khiter_t trg_pos, src_pos;
>>> +
>>> + /* If we aren't using islands, assume everything goes together. */
>>> + if (!island_marks)
>>> + return 1;
>>> +
>>> + /*
>>> +  * If we don't have a bitmap for the target, we can delta it
>>
>> ... Ah, OK, trg_ => target.
> 
> I am ok to replace "trg" with "target" (or maybe "dst"? or something
> else) and "src" with "source" if you think it would make things
> clearer.

If it had been dst_ (or target), I would not have had a 'huh?'
moment, but it is not all that important.

> 
>>> +static void add_ref_to_island(const char *island_name, const struct 
>>> object_id *oid)
>>> +{
>>> + uint64_t sha_core;
>>> + struct remote_island *rl = NULL;
>>> +
>>> + int hash_ret;
>>> + khiter_t pos = kh_put_str(remote_islands, island_name, _ret);
>>> +
>>> + if (hash_ret) {
>>> + kh_key(remote_islands, pos) = xstrdup(island_name);
>>> + kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct 
>>> remote_island));
>>> + }
>>> +
>>> + rl = kh_value(remote_islands, pos);
>>> + oid_array_append(>oids, oid);
>>> +
>>> + memcpy(_core, oid->hash, sizeof(uint64_t));
>>> + rl->hash += sha_core;
>>
>> Hmm, so the first 64-bits of the oid of each ref that is part of
>> this island is added together as a 'hash' for the island. And this
>> is used to de-duplicate the islands? Any false positives? (does it
>> matter - it would only affect performance, not correctness, right?)
> 
> I would think that a false positive from pure chance is very unlikely.
> We would need to approach billions of delta islands (as 2 to the power
> 64/2 is in the order of billions) for the probability to be
> significant. GitHub has less than 50 millions users and it is very
> unlikely that a significant proportion of these users will fork the
> same repo.
> 
> Now if there is a false positive because two forks have exactly the
> same refs, then it is not a problem if they are considered the same,
> because they are actually the same.

Yep, good point.

ATB,
Ramsay Jones


Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-12 Thread Christian Couder
On Mon, Aug 13, 2018 at 3:14 AM, Ramsay Jones
 wrote:
> On 12/08/18 06:11, Christian Couder wrote:

>> Because the inefficiency primarily arises when an
>> object is delitified against another object that does
>
> s/delitified/deltified/ ?

Ok, this will be in the next reroll if any.

>> not exist in the same fork, we partition objects into
>> sets that appear in the same fork, and define
>> "delta islands". When finding delta base, we do not
>> allow an object outside the same island to be
>> considered as its base.

>> --- /dev/null
>> +++ b/delta-islands.c
>> @@ -0,0 +1,493 @@
>> +#include "cache.h"
>> +#include "attr.h"
>> +#include "object.h"
>> +#include "blob.h"
>> +#include "commit.h"
>> +#include "tag.h"
>> +#include "tree.h"
>> +#include "delta.h"
>> +#include "pack.h"
>> +#include "tree-walk.h"
>> +#include "diff.h"
>> +#include "revision.h"
>> +#include "list-objects.h"
>> +#include "progress.h"
>> +#include "refs.h"
>> +#include "khash.h"
>
> I was wondering how many copies of the inline functions
> introduced by this header we had, so:
>
>   $ nm git | grep ' t ' | cut -d' ' -f3 | sort | uniq -c | sort -rn | grep kh_
> 3 kh_resize_sha1
> 3 kh_put_sha1
> 3 kh_init_sha1
> 3 kh_get_sha1
> 1 kh_resize_str
> 1 kh_resize_sha1_pos
> 1 kh_put_str
> 1 kh_put_sha1_pos
> 1 kh_init_str
> 1 kh_init_sha1_pos
> 1 kh_get_str
> 1 kh_get_sha1_pos
> 1 kh_destroy_sha1
>   $
>
> Looking at the individual object files, we see:
>
>   $ nm pack-bitmap-write.o | grep ' t ' | grep kh_
>   01cc t kh_get_sha1
>   01b7 t kh_init_sha1
>   085e t kh_put_sha1
>   0310 t kh_resize_sha1
>   $
>
> So, the two instances of the sha1 hash-map are never
> destroyed (kh_destroy_sha1 is not present in the object
> file).

This is interesting (even though it seems related to more code than
the current patch series).

As those hash maps are in 'struct bitmap_writer' and a static instance is used:

  static struct bitmap_writer writer;

it maybe ok.

>   $ nm pack-bitmap.o | grep ' t ' | grep kh_
>   02d9 t kh_destroy_sha1
>   032b t kh_get_sha1
>   0daa t kh_get_sha1_pos
>   02c4 t kh_init_sha1
>   0d95 t kh_init_sha1_pos
>   09bd t kh_put_sha1
>   1432 t kh_put_sha1_pos
>   046f t kh_resize_sha1
>   0eee t kh_resize_sha1_pos
>   $
>
> The sha1_pos hash-map is not destroyed here.

Yeah, maybe a line like:

kh_destroy_pos(b->ext_index.positions);

is missing from free_bitmap_index()?

Adding that should be in a separate patch from this series though.

>   $ nm delta-islands.o | grep ' t ' | grep kh_
>   02be t kh_get_sha1
>   0e52 t kh_get_str
>   02a9 t kh_init_sha1
>   0e3d t kh_init_str
>   0950 t kh_put_sha1
>   14e4 t kh_put_str
>   0402 t kh_resize_sha1
>   0f96 t kh_resize_str
>   $
>
> And neither the sha1 or str hash-maps are destroyed here.
> (That is not necessarily a problem, of course! ;-) )

The instances are declared as static:

  static khash_sha1 *island_marks;

  static kh_str_t *remote_islands;

so it maybe ok.

>> +struct island_bitmap {
>> + uint32_t refcount;
>> + uint32_t bits[];
>
> Use FLEX_ARRAY here? We are slowly moving toward requiring
> certain C99 features, but I can't remember a flex array
> weather-balloon patch.

This was already discussed by Junio and Peff there:

https://public-inbox.org/git/20180727130229.gb18...@sigill.intra.peff.net/

>> +};

>> +int in_same_island(const struct object_id *trg_oid, const struct object_id 
>> *src_oid)
>
> Hmm, what does the trg_ prefix stand for?
>
>> +{
>> + khiter_t trg_pos, src_pos;
>> +
>> + /* If we aren't using islands, assume everything goes together. */
>> + if (!island_marks)
>> + return 1;
>> +
>> + /*
>> +  * If we don't have a bitmap for the target, we can delta it
>
> ... Ah, OK, trg_ => target.

I am ok to replace "trg" with "target" (or maybe "dst"? or something
else) and "src" with "source" if you think it would make things
clearer.

>> +static void add_ref_to_island(const char *island_name, const struct 
>> object_id *oid)
>> +{
>> + uint64_t sha_core;
>> + struct remote_island *rl = NULL;
>> +
>> + int hash_ret;
>> + khiter_t pos = kh_put_str(remote_islands, island_name, _ret);
>> +
>> + if (hash_ret) {
>> + kh_key(remote_islands, pos) = xstrdup(island_name);
>> + kh_value(remote_islands, pos) = xcalloc(1, sizeof(struct 
>> remote_island));
>> + }
>> +
>> + rl = kh_value(remote_islands, pos);
>> + oid_array_append(>oids, oid);
>> +
>> + memcpy(_core, oid->hash, sizeof(uint64_t));
>> + rl->hash += sha_core;
>
> Hmm, so the first 64-bits of the oid of each ref that is part of
> this island is 

Re: [PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-12 Thread Ramsay Jones



On 12/08/18 06:11, Christian Couder wrote:
> From: Jeff King 
> 
> Hosting providers that allow users to "fork" existing
> repos want those forks to share as much disk space as
> possible.
> 
> Alternates are an existing solution to keep all the
> objects from all the forks into a unique central repo,
> but this can have some drawbacks. Especially when
> packing the central repo, deltas will be created
> between objects from different forks.
> 
> This can make cloning or fetching a fork much slower
> and much more CPU intensive as Git might have to
> compute new deltas for many objects to avoid sending
> objects from a different fork.
> 
> Because the inefficiency primarily arises when an
> object is delitified against another object that does

s/delitified/deltified/ ?

> not exist in the same fork, we partition objects into
> sets that appear in the same fork, and define
> "delta islands". When finding delta base, we do not
> allow an object outside the same island to be
> considered as its base.
> 
> So "delta islands" is a way to store objects from
> different forks in the same repo and packfile without
> having deltas between objects from different forks.
> 
> This patch implements the delta islands mechanism in
> "delta-islands.{c,h}", but does not yet make use of it.
> 
> A few new fields are added in 'struct object_entry'
> in "pack-objects.h" though.
> 
> The documentation will follow in a patch that actually
> uses delta islands in "builtin/pack-objects.c".
> 
> Signed-off-by: Jeff King 
> Signed-off-by: Christian Couder 
> ---
>  Makefile|   1 +
>  delta-islands.c | 493 
>  delta-islands.h |  11 ++
>  pack-objects.h  |   4 +
>  4 files changed, 509 insertions(+)
>  create mode 100644 delta-islands.c
>  create mode 100644 delta-islands.h
> 
> diff --git a/Makefile b/Makefile
> index bc4fc8eeab..e7994888e8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -841,6 +841,7 @@ LIB_OBJS += csum-file.o
>  LIB_OBJS += ctype.o
>  LIB_OBJS += date.o
>  LIB_OBJS += decorate.o
> +LIB_OBJS += delta-islands.o
>  LIB_OBJS += diffcore-break.o
>  LIB_OBJS += diffcore-delta.o
>  LIB_OBJS += diffcore-order.o
> diff --git a/delta-islands.c b/delta-islands.c
> new file mode 100644
> index 00..c0b0da6837
> --- /dev/null
> +++ b/delta-islands.c
> @@ -0,0 +1,493 @@
> +#include "cache.h"
> +#include "attr.h"
> +#include "object.h"
> +#include "blob.h"
> +#include "commit.h"
> +#include "tag.h"
> +#include "tree.h"
> +#include "delta.h"
> +#include "pack.h"
> +#include "tree-walk.h"
> +#include "diff.h"
> +#include "revision.h"
> +#include "list-objects.h"
> +#include "progress.h"
> +#include "refs.h"
> +#include "khash.h"

I was wondering how many copies of the inline functions
introduced by this header we had, so:

  $ nm git | grep ' t ' | cut -d' ' -f3 | sort | uniq -c | sort -rn | grep kh_
3 kh_resize_sha1
3 kh_put_sha1
3 kh_init_sha1
3 kh_get_sha1
1 kh_resize_str
1 kh_resize_sha1_pos
1 kh_put_str
1 kh_put_sha1_pos
1 kh_init_str
1 kh_init_sha1_pos
1 kh_get_str
1 kh_get_sha1_pos
1 kh_destroy_sha1
  $ 

Looking at the individual object files, we see:

  $ nm pack-bitmap-write.o | grep ' t ' | grep kh_
  01cc t kh_get_sha1
  01b7 t kh_init_sha1
  085e t kh_put_sha1
  0310 t kh_resize_sha1
  $ 

So, the two instances of the sha1 hash-map are never
destroyed (kh_destroy_sha1 is not present in the object
file).

  $ nm pack-bitmap.o | grep ' t ' | grep kh_
  02d9 t kh_destroy_sha1
  032b t kh_get_sha1
  0daa t kh_get_sha1_pos
  02c4 t kh_init_sha1
  0d95 t kh_init_sha1_pos
  09bd t kh_put_sha1
  1432 t kh_put_sha1_pos
  046f t kh_resize_sha1
  0eee t kh_resize_sha1_pos
  $ 

The sha1_pos hash-map is not destroyed here.

  $ nm delta-islands.o | grep ' t ' | grep kh_
  02be t kh_get_sha1
  0e52 t kh_get_str
  02a9 t kh_init_sha1
  0e3d t kh_init_str
  0950 t kh_put_sha1
  14e4 t kh_put_str
  0402 t kh_resize_sha1
  0f96 t kh_resize_str
  $ 

And neither the sha1 or str hash-maps are destroyed here.
(That is not necessarily a problem, of course! ;-) )
   
> +#include "pack-bitmap.h"
> +#include "pack-objects.h"
> +#include &q

[PATCH v4 1/7] Add delta-islands.{c,h}

2018-08-11 Thread Christian Couder
From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.

Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Because the inefficiency primarily arises when an
object is delitified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.

So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile|   1 +
 delta-islands.c | 493 
 delta-islands.h |  11 ++
 pack-objects.h  |   4 +
 4 files changed, 509 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Makefile b/Makefile
index bc4fc8eeab..e7994888e8 100644
--- a/Makefile
+++ b/Makefile
@@ -841,6 +841,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..c0b0da6837
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,493 @@
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+   }
+
+   return 1;
+}
+
+#define ISLAND_BITMAP_BLOCK(x) (x / 32)
+#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
+
+static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
+{
+   self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
+}
+
+static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
+{
+   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
0;
+}
+
+int in_same_island(const struct object_id *trg_oid, const struct object_id 
*src_oid)
+{
+   khiter_t trg_pos, src_pos;
+
+   /* If we aren't using islands, assume ever

Re: [PATCH v3 2/8] Add delta-islands.{c,h}

2018-08-11 Thread Christian Couder
On Sat, Aug 11, 2018 at 4:12 PM, Jeff King  wrote:
> On Sat, Aug 11, 2018 at 12:32:32PM +0200, Christian Couder wrote:
>
>> Ok, I have made the following changes in the branch I will send next.
>>
>> diff --git a/delta-islands.c b/delta-islands.c
>> index 92137f2eca..22e4360810 100644
>> --- a/delta-islands.c
>> +++ b/delta-islands.c
>> @@ -322,8 +322,7 @@ static int island_config_callback(const char *k,
>> const char *v, void *cb)
>>
>> if (island_regexes_nr >= island_regexes_alloc) {
>> island_regexes_alloc = (island_regexes_alloc + 8) * 
>> 2;
>> -   island_regexes = xrealloc(island_regexes,
>> -   island_regexes_alloc * 
>> sizeof(regex_t));
>> +   REALLOC_ARRAY(island_regexes, island_regexes_alloc);
>> }
>
> I think this whole block could actually be ALLOC_GROW().

Yeah, thanks! The whole block will be replaced with the following in
the next reroll:

ALLOC_GROW(island_regexes, island_regexes_nr + 1, island_regexes_alloc);


[PATCHv2 5/6] compat/precompose_utf8.h: use more common include guard style

2018-08-11 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 compat/precompose_utf8.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index a94e7c4342..6f843d3e1a 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -1,4 +1,6 @@
 #ifndef PRECOMPOSE_UNICODE_H
+#define PRECOMPOSE_UNICODE_H
+
 #include 
 #include 
 #include 
@@ -41,5 +43,4 @@ int precompose_utf8_closedir(PREC_DIR *dirp);
 #define DIR PREC_DIR
 #endif /* PRECOMPOSE_UNICODE_C */
 
-#define  PRECOMPOSE_UNICODE_H
 #endif /* PRECOMPOSE_UNICODE_H */
-- 
2.18.0.549.gd4454f3f9b



Re: [PATCH v3 2/8] Add delta-islands.{c,h}

2018-08-11 Thread Jeff King
On Sat, Aug 11, 2018 at 12:32:32PM +0200, Christian Couder wrote:

> Ok, I have made the following changes in the branch I will send next.
> 
> diff --git a/delta-islands.c b/delta-islands.c
> index 92137f2eca..22e4360810 100644
> --- a/delta-islands.c
> +++ b/delta-islands.c
> @@ -322,8 +322,7 @@ static int island_config_callback(const char *k,
> const char *v, void *cb)
> 
> if (island_regexes_nr >= island_regexes_alloc) {
> island_regexes_alloc = (island_regexes_alloc + 8) * 2;
> -   island_regexes = xrealloc(island_regexes,
> -   island_regexes_alloc * 
> sizeof(regex_t));
> +   REALLOC_ARRAY(island_regexes, island_regexes_alloc);
> }

I think this whole block could actually be ALLOC_GROW().

-Peff


Re: [PATCH v3 2/8] Add delta-islands.{c,h}

2018-08-11 Thread Christian Couder
On Sat, Aug 11, 2018 at 11:04 AM, SZEDER Gábor  wrote:
>> diff --git a/delta-islands.c b/delta-islands.c
>> new file mode 100644
>> index 00..448ddcbbe4
>> --- /dev/null
>> +++ b/delta-islands.c
>
>> +static void deduplicate_islands(void)
>> +{
>> + struct remote_island *island, *core = NULL, **list;
>> + unsigned int island_count, dst, src, ref, i = 0;
>> +
>> + island_count = kh_size(remote_islands);
>> + list = xmalloc(island_count * sizeof(struct remote_island *));
>
> Please use ALLOC_ARRAY here.

Ok, I have made the following changes in the branch I will send next.

diff --git a/delta-islands.c b/delta-islands.c
index 92137f2eca..22e4360810 100644
--- a/delta-islands.c
+++ b/delta-islands.c
@@ -322,8 +322,7 @@ static int island_config_callback(const char *k,
const char *v, void *cb)

if (island_regexes_nr >= island_regexes_alloc) {
island_regexes_alloc = (island_regexes_alloc + 8) * 2;
-   island_regexes = xrealloc(island_regexes,
-   island_regexes_alloc * sizeof(regex_t));
+   REALLOC_ARRAY(island_regexes, island_regexes_alloc);
}

if (*v != '^')
@@ -425,7 +424,7 @@ static void deduplicate_islands(void)
unsigned int island_count, dst, src, ref, i = 0;

island_count = kh_size(remote_islands);
-   list = xmalloc(island_count * sizeof(struct remote_island *));
+   ALLOC_ARRAY(list, island_count);

kh_foreach_value(remote_islands, island, {
list[i++] = island;

Thanks!


Re: [PATCH v3 2/8] Add delta-islands.{c,h}

2018-08-11 Thread SZEDER Gábor
> diff --git a/delta-islands.c b/delta-islands.c
> new file mode 100644
> index 00..448ddcbbe4
> --- /dev/null
> +++ b/delta-islands.c

> +static void deduplicate_islands(void)
> +{
> + struct remote_island *island, *core = NULL, **list;
> + unsigned int island_count, dst, src, ref, i = 0;
> +
> + island_count = kh_size(remote_islands);
> + list = xmalloc(island_count * sizeof(struct remote_island *));

Please use ALLOC_ARRAY here.

> +
> + kh_foreach_value(remote_islands, island, {
> + list[i++] = island;
> + });
> +
> + for (ref = 0; ref + 1 < island_count; ref++) {
> + for (src = ref + 1, dst = src; src < island_count; src++) {
> + if (list[ref]->hash == list[src]->hash)
> + continue;
> +
> + if (src != dst)
> + list[dst] = list[src];
> +
> + dst++;
> + }
> + island_count = dst;
> + }
> +
> + island_bitmap_size = (island_count / 32) + 1;
> + core = get_core_island();
> +
> + for (i = 0; i < island_count; ++i) {
> + mark_remote_island_1(list[i], core && list[i]->hash == 
> core->hash);
> + }
> +
> + free(list);
> +}


[PATCH 5/9] compat/precompose_utf8.h: use more common include guard style

2018-08-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 compat/precompose_utf8.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index a94e7c4342..6f843d3e1a 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -1,4 +1,6 @@
 #ifndef PRECOMPOSE_UNICODE_H
+#define PRECOMPOSE_UNICODE_H
+
 #include 
 #include 
 #include 
@@ -41,5 +43,4 @@ int precompose_utf8_closedir(PREC_DIR *dirp);
 #define DIR PREC_DIR
 #endif /* PRECOMPOSE_UNICODE_C */
 
-#define  PRECOMPOSE_UNICODE_H
 #endif /* PRECOMPOSE_UNICODE_H */
-- 
2.18.0.549.gd4454f3f9b



[PATCH 7/9] sha1dc/sha1.h: add missing include

2018-08-10 Thread Elijah Newren
Signed-off-by: Elijah Newren 
---
 sha1dc/sha1.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index 1e4e94be54..e9ca7a83ac 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -14,6 +14,7 @@ extern "C" {
 
 #ifndef SHA1DC_NO_STANDARD_INCLUDES
 #include 
+#include 
 #endif
 
 /* sha-1 compression function that takes an already expanded message, and 
additionally store intermediate states */
-- 
2.18.0.549.gd4454f3f9b



[PATCH v3 2/8] Add delta-islands.{c,h}

2018-08-09 Thread Christian Couder
From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.

Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Because the inefficiency primarily arises when an
object is delitified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.

So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile|   1 +
 delta-islands.c | 497 
 delta-islands.h |  11 ++
 pack-objects.h  |   4 +
 4 files changed, 513 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Makefile b/Makefile
index bc4fc8eeab..e7994888e8 100644
--- a/Makefile
+++ b/Makefile
@@ -841,6 +841,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..448ddcbbe4
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,497 @@
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+   }
+
+   return 1;
+}
+
+#define ISLAND_BITMAP_BLOCK(x) (x / 32)
+#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
+
+static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
+{
+   self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
+}
+
+static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
+{
+   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
0;
+}
+
+int in_same_island(const struct object_id *trg_oid, const struct object_id 
*src_oid)
+{
+   khiter_t trg_pos, src_pos;
+
+   /* If we aren't using islands, assume ever

Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-08-06 Thread Duy Nguyen
On Mon, Aug 6, 2018 at 8:54 PM Christian Couder
 wrote:
> > Is it worth it? The "pahole" comment in this file is up to date. We
> > use 80 bytes per object. This series makes the struct 88 bytes (I've
> > just rerun pahole).
>
> Did you run it on V1 or on V2? I guess on V2, but then what do you
> think about converting the 'layer' field into a bit field, which might
> be simpler and save space?

V2. I kinda ignored the layer field because Jeff was suggesting that
it may not be needed. It may be better to just move it out though
because it's not that complicated and even if we have enough bits
left, they may be spread out that it's hard to reorder so that they're
grouped together and use can use them.

> > On linux repo with 12M objects, "git pack-objects
> > --all" needs extra 96MB memory even if this feature is not used. So
> > yes I still think it's worth moving these fields out of struct
> > object_entry.
>
> And what about the fields already in struct object_entry? While I am
> at it, I think I could move some of them too if it is really so worth

Way ahead of you. In v2.17.0 this struct is 136 bytes so going down to
80 bytes now is a big memory saving. I think I  squeezed everything
possible and was a bit too aggressive that I created a new performance
regression in v2.18.0 ;-)
-- 
Duy


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-08-06 Thread Christian Couder
On Mon, Aug 6, 2018 at 5:53 PM, Duy Nguyen  wrote:
> On Sun, Aug 5, 2018 at 8:53 PM Christian Couder
>  wrote:
>>
>> As you can see the patch 6/6 (in the v2 of this patch series that I
>> just sent) moves `unsigned int tree_depth` from 'struct object_entry'
>> to 'struct packing_data'. I am not sure that I did it right and that
>> it is worth it though as it is a bit more complex.
>
> It is a bit more complex than I expected. But I think if you go with
> Jeff's suggestion (i.e. think of the new tree_depth array an extension
> of objects array) it's a bit simpler: you access both arrays using the
> same index, both arrays should have the same size and be realloc'd at
> the same time in packlist_alloc().

Ok, I will take a look at doing that to simplify things. Thanks to
Peff and you for that suggestion!

> Is it worth it? The "pahole" comment in this file is up to date. We
> use 80 bytes per object. This series makes the struct 88 bytes (I've
> just rerun pahole).

Did you run it on V1 or on V2? I guess on V2, but then what do you
think about converting the 'layer' field into a bit field, which might
be simpler and save space?

> On linux repo with 12M objects, "git pack-objects
> --all" needs extra 96MB memory even if this feature is not used. So
> yes I still think it's worth moving these fields out of struct
> object_entry.

And what about the fields already in struct object_entry? While I am
at it, I think I could move some of them too if it is really so worth
it.


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-08-06 Thread Duy Nguyen
On Sun, Aug 5, 2018 at 8:53 PM Christian Couder
 wrote:
>
> On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen  wrote:
> > On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
> >  wrote:
>
> >> diff --git a/pack-objects.h b/pack-objects.h
> >> index edf74dabdd..8eecd67991 100644
> >> --- a/pack-objects.h
> >> +++ b/pack-objects.h
> >> @@ -100,6 +100,10 @@ struct object_entry {
> >> unsigned type_:TYPE_BITS;
> >> unsigned no_try_delta:1;
> >> unsigned in_pack_type:TYPE_BITS; /* could be delta */
> >> +
> >> +   unsigned int tree_depth; /* should be repositioned for packing? */
> >> +   unsigned char layer;
> >> +
> >
> > This looks very much like an optional feature. To avoid increasing
> > pack-objects memory usage for common case, please move this to struct
> > packing_data.
>
> As you can see the patch 6/6 (in the v2 of this patch series that I
> just sent) moves `unsigned int tree_depth` from 'struct object_entry'
> to 'struct packing_data'. I am not sure that I did it right and that
> it is worth it though as it is a bit more complex.

It is a bit more complex than I expected. But I think if you go with
Jeff's suggestion (i.e. think of the new tree_depth array an extension
of objects array) it's a bit simpler: you access both arrays using the
same index, both arrays should have the same size and be realloc'd at
the same time in packlist_alloc().

Is it worth it? The "pahole" comment in this file is up to date. We
use 80 bytes per object. This series makes the struct 88 bytes (I've
just rerun pahole). On linux repo with 12M objects, "git pack-objects
--all" needs extra 96MB memory even if this feature is not used. So
yes I still think it's worth moving these fields out of struct
object_entry.
-- 
Duy


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-08-06 Thread Jeff King
On Sun, Aug 05, 2018 at 08:53:19PM +0200, Christian Couder wrote:

> - 'layer' is used in add_to_write_order() which is called from many
> places in add_descendants_to_write_order() and compute_write_order()
> for example like this:
> 
> for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
> add_to_write_order(wo, endp, s);
> }
> 
> So it seems to me that moving 'layer' to 'struct packing_data' would
> require changing the DELTA_SIBLING macros or adding a hack to easily
> find the index in an array corresponding to a 'struct object_entry'.

Right, that hack is how the various the existing memory-shrinking macros
work. Every "struct object_entry *" that we deal with _must_ be in the
list in "packing_data", so that we can compute an index as
"entry - to_pack.objects".

So I think Duy is just suggesting:

  void oe_set_layer(struct packing_data *pack,
struct object_entry *entry,
int layer)
  {
  if (!pack->layers)
  ALLOC_ARRAY(pack->layers, pack->nr_objects);
  pack->layers[entry - pack->objects] = layer;
  }

  int oe_get_layer(struct packing_data *pack,
   struct object_entry *entry)
  {
  if (!pack->layers)
  return 0;
  return pack->layers[entry - pack->nr_objects];
  }

That said, I wonder if need to cache the layer at all. We compute the
layers all at once in the new compute_pack_layers(). But it is just a
linear walk over the objects, asking for each "is this in the core
island?".

Could we just ask "is this in the core island?" when we consider each
object? That's a hash lookup each time we ask instead of looking at our
int, but I'd think we would only ask in linear proportion to the number
of objects. So there's no algorithmic speedup, though maybe a
constant-time one (for two layers, I'd expect we'd probably ask about
each object twice).

> - I don't see why it is different from other fields in 'struct
> object_entry' that are also used in compute_write_order(), for
> example:
> 
> uint32_t delta_idx;/* delta base object */
> uint32_t delta_child_idx; /* deltified objects who bases me */
> uint32_t delta_sibling_idx; /* other deltified objects who ... */
> 
> Would we also want to move these fields to 'struct packing_data'? Why
> or why not? If we want to move them, then it might be a good idea to
> do all the moving at the same time to make sure we get something
> coherent.

We actually use those multiple times. They're reset and used in
compute_write_order(), but I think we use them earlier as part of the
delta search (at the very least we use delta_idx; maybe no the
child/sibling stuff).

-Peff


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-08-05 Thread Christian Couder
On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
>  wrote:

>> diff --git a/pack-objects.h b/pack-objects.h
>> index edf74dabdd..8eecd67991 100644
>> --- a/pack-objects.h
>> +++ b/pack-objects.h
>> @@ -100,6 +100,10 @@ struct object_entry {
>> unsigned type_:TYPE_BITS;
>> unsigned no_try_delta:1;
>> unsigned in_pack_type:TYPE_BITS; /* could be delta */
>> +
>> +   unsigned int tree_depth; /* should be repositioned for packing? */
>> +   unsigned char layer;
>> +
>
> This looks very much like an optional feature. To avoid increasing
> pack-objects memory usage for common case, please move this to struct
> packing_data.

As you can see the patch 6/6 (in the v2 of this patch series that I
just sent) moves `unsigned int tree_depth` from 'struct object_entry'
to 'struct packing_data'. I am not sure that I did it right and that
it is worth it though as it is a bit more complex.

About `unsigned char layer` I am even less sure that it's worth it for
the following reasons:

- 'struct object_entry' contains bit fields as its last members and
then the following comment:

/*
 * pahole results on 64-bit linux (gcc and clang)
 *
 *   size: 80, bit_padding: 20 bits, holes: 8 bits
 *
 * and on 32-bit (gcc)
 *
 *   size: 76, bit_padding: 20 bits, holes: 8 bits
 */

I am not sure if this comment is still up to date, but if it true
maybe there is enough space in the padding to add 'layer' as a 8 bit
bit field somewhere without increasing the size of 'struct
object_entry'.

- 'layer' is used in add_to_write_order() which is called from many
places in add_descendants_to_write_order() and compute_write_order()
for example like this:

for (s = DELTA_SIBLING(e); s; s = DELTA_SIBLING(s)) {
add_to_write_order(wo, endp, s);
}

So it seems to me that moving 'layer' to 'struct packing_data' would
require changing the DELTA_SIBLING macros or adding a hack to easily
find the index in an array corresponding to a 'struct object_entry'.

- I don't see why it is different from other fields in 'struct
object_entry' that are also used in compute_write_order(), for
example:

uint32_t delta_idx;/* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who ... */

Would we also want to move these fields to 'struct packing_data'? Why
or why not? If we want to move them, then it might be a good idea to
do all the moving at the same time to make sure we get something
coherent.

Thanks,
Christian.


[PATCH v2 2/6] Add delta-islands.{c,h}

2018-08-05 Thread Christian Couder
From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.

Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Because the inefficiency primarily arises when an
object is delitified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.

So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile|   1 +
 delta-islands.c | 487 
 delta-islands.h |  11 ++
 pack-objects.h  |   4 +
 4 files changed, 503 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Makefile b/Makefile
index 08e5c54549..2804298977 100644
--- a/Makefile
+++ b/Makefile
@@ -840,6 +840,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..f7902a64ad
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,487 @@
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+   }
+
+   return 1;
+}
+
+#define ISLAND_BITMAP_BLOCK(x) (x / 32)
+#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
+
+static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
+{
+   self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
+}
+
+static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
+{
+   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
0;
+}
+
+int in_same_island(const struct object_id *trg_oid, const struct object_id 
*src_oid)
+{
+   khiter_t trg_pos, src_pos;
+
+   /* If we aren't using islands, assume ever

Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-27 Thread Jeff King
On Tue, Jul 24, 2018 at 09:47:29AM -0700, Junio C Hamano wrote:

> > +/*
> > + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a 
> > copy
> > + * of "old". Otherwise, the new bitmap is empty.
> > + */
> > +static struct island_bitmap *island_bitmap_new(const struct island_bitmap 
> > *old)
> > +{
> > +   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
> > +   struct island_bitmap *b = xcalloc(1, size);
> 
> Is one of the variants of flex array macros applicable here?

Hmm, maybe. The tricky thing about this bitmap (and this touches on your
"why another bitmap" question below) is that we have a _ton_ of them
(one per object). So we want to do what we can to keep memory usage
down. So there are two weird things:

 - these bitmaps do not know their own size. They are all exactly the
   same size (one bit per island), and we store the size only once in
   the whole program.

 - they are copy-on-write. This helps because there's a lot of
   locality to the maps within the object graph. But it also means
   sometimes we initialize empty, and sometimes from another array.

So I think it might work to do:

  if (old) {
/* copy bits from existing bitmap */
FLEX_ALLOC_MEM(b, bits, old->bits, st_mult(island_bitmap_size, 4));
  } else {
/* start with all-zero bits courtesy of xcalloc */
FLEX_ALLOC_MEM(b, bits, NULL, 0);
  }

That'd still waste an extra byte per struct (or 4, I guess, due to
padding), since we unconditionally NUL-terminate the flex-struct (since
they were really designed around string storage).

> > +static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
> > +{
> > +   return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
> > 0;
> > +}
> > +
> 
> Not necessarily a complaint, but do we need another implementation
> of bitmap here, or the compressed bitmap used for the pack bitmap is
> unsuited for the purpose of this thing (e.g. perhaps it is overkill,
> as we won't be shooting for saving disk footprint of a bitmap that
> we are not going to save on disk anyway)?

Mostly it's about memory savings, as I discussed above. We _could_ use
ewah bitmaps instead of naive ones here. Those are bad for random access
to bits, but I think most of the operations are "is this bitmap a subset
of this other one" which can be done by walking them linearly.

I doubt the compression would be all that significant, though. They do
best when there's a lot of locality in the bitmap. Here our bits are the
islands, grouped and ordered by some subset of the refname. I don't
think there's any reason to think that refs/heads/a and refs/heads/b
would be correlated. OTOH, if most objects are in most islands (e.g.,
because the islands represent a bunch of forks of some "base" history),
then you might get a big run of 1's for those objects.

I don't think we really experimented with it. To be honest, I don't
recall how much measuring we did for the refcounted bitmaps, either. So
I'm not sure how much the copy-on-write saves (though my vague
recollection is that it was worth doing if you have a lot of islands).

> > +   /*
> > +* We process only trees, as commits and tags have already been handled
> > +* (and passed their marks on to root trees, as well. We must make sure
> > +* to process them in descending tree-depth order so that marks
> > +* propagate down the tree properly, even if a sub-tree is found in
> > +* multiple parent trees.
> > +*/
> > +   todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> > +   for (i = 0; i < to_pack->nr_objects; i++) {
> > +   if (oe_type(_pack->objects[i]) == OBJ_TREE)
> > +   todo[nr++] = _pack->objects[i];
> > +   }
> > +   qsort(todo, nr, sizeof(*todo), cmp_tree_depth);
> 
> Hmph, at this stage nobody actually seems to set tree_depth; I am
> wondering how a tree that is at the root in one commit but appears
> as a subdirectory in another commit is handled by this code.

The tree depth is set when we actually walk the graph collecting
objects, and we use the max depth at which we've seen it. That said,
this happens in show_object(), which I think we'd avoid entering twice
for the same object (due to list-objects setting the SEEN flag).

So I suspect that yes, you could have a case where a tree has multiple
depths, and we'd visit them topologically out of order. And that could
lead to objects reachable from that tree missing some of their islands.
Something like:

  - refs/remotes/123/master points to commit C which points to tree X,
which points to blob B

  - refs/remotes/456/master points to commit D which points to tree Y,
which points to tree X, which points to blob B

We mark C with island "123" and D with island "456", and ditto for their
root trees X and Y. Then we sort the trees by depth, with the intent
that Y would come before its child X. But due to the bogus depth, X may
come first.  We propagate the mark for 123 down 

Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-27 Thread Jeff King
On Sun, Jul 22, 2018 at 07:48:33AM +0200, Christian Couder wrote:

> + /*
> +  * We process only trees, as commits and tags have already been handled
> +  * (and passed their marks on to root trees, as well. We must make sure
> +  * to process them in descending tree-depth order so that marks
> +  * propagate down the tree properly, even if a sub-tree is found in
> +  * multiple parent trees.
> +  */
> + todo = xmalloc(to_pack->nr_objects * sizeof(*todo));

I was fiddling with "make coccicheck", and it looks like this code could
stand some modernization. This could use ALLOC_ARRAY().

> + for (i = 0; i < to_pack->nr_objects; i++) {
> + if (oe_type(_pack->objects[i]) == OBJ_TREE)
> + todo[nr++] = _pack->objects[i];
> + }
> + qsort(todo, nr, sizeof(*todo), cmp_tree_depth);

And this QSORT().

There are a few others, I won't list them all. The only tricky one I see
is:

> + free(tree->buffer);
> + tree->buffer = NULL;
> + tree->object.parsed = 0;

This suggests FREE_AND_NULL(), but I think it actually the whole block
should become a call to free_tree_buffer().

-Peff


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-24 Thread Junio C Hamano
Christian Couder  writes:

> From: Jeff King 
>
> Hosting providers that allow users to "fork" existing
> repos want as much as possible those forks to use a
> small amount of disk space.

"... want those forks to consume as little amount of disk space as
possible?"  Or perhaps "... want those forks to share as much disk
space as possible"?

> Using alternates to keep all the objects from all the forks into a
> unique central repo is way to do that, but ...

s/is way to/is a way to/, I guess, but that makes it sound as if the
series invented a way to share objects without using alternates.

> it can have some
> drawbacks. Especially when packing the central repo, deltas will
> be created between objects from different forks.
>
> This can make cloning or fetching a fork much slower and much more
> CPU intensive as Git might have to compute new deltas for many
> objects to avoid sending objects from a different fork.
>
> Delta islands is a way to store objects from different forks in
> the same repo and packfile without having deltas between objects
> from different forks.

I think another paragraph between the above two is needed to briefly
outline the central idea (which would make it clear why that is
called "delta islands") is called for.  Perhaps

Because the inefficiency primarily arises when an object is
delitified against another object that does not exist in the
same fork, we partion objects into sets that appear in the
same fork, and define "delta islands".  When finding delta
base, we do not allow an object outside the same island to
be considered as its base.

or something along that line.  Perhaps that would even make the last
paragraph unnecessary.

> +struct island_bitmap {
> + uint32_t refcount;
> + uint32_t bits[];
> +};
> +
> +static uint32_t island_bitmap_size;
> +
> +/*
> + * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
> + * of "old". Otherwise, the new bitmap is empty.
> + */
> +static struct island_bitmap *island_bitmap_new(const struct island_bitmap 
> *old)
> +{
> + size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
> + struct island_bitmap *b = xcalloc(1, size);

Is one of the variants of flex array macros applicable here?

> + if (old)
> + memcpy(b, old, size);
> +
> + b->refcount = 1;
> + return b;
> +}
> +
> +static void island_bitmap_or(struct island_bitmap *a, const struct 
> island_bitmap *b)
> +{
> + uint32_t i;
> +
> + for (i = 0; i < island_bitmap_size; ++i)
> + a->bits[i] |= b->bits[i];
> +}
> +
> +static int island_bitmap_is_subset(struct island_bitmap *self,
> + struct island_bitmap *super)
> +{
> + uint32_t i;
> +
> + if (self == super)
> + return 1;
> +
> + for (i = 0; i < island_bitmap_size; ++i) {
> + if ((self->bits[i] & super->bits[i]) != self->bits[i])
> + return 0;
> + }
> +
> + return 1;
> +}
> +#define ISLAND_BITMAP_BLOCK(x) (x / 32)
> +#define ISLAND_BITMAP_MASK(x) (1 << (x % 32))
> +
> +static void island_bitmap_set(struct island_bitmap *self, uint32_t i)
> +{
> + self->bits[ISLAND_BITMAP_BLOCK(i)] |= ISLAND_BITMAP_MASK(i);
> +}
> +
> +static int island_bitmap_get(struct island_bitmap *self, uint32_t i)
> +{
> + return (self->bits[ISLAND_BITMAP_BLOCK(i)] & ISLAND_BITMAP_MASK(i)) != 
> 0;
> +}
> +

Not necessarily a complaint, but do we need another implementation
of bitmap here, or the compressed bitmap used for the pack bitmap is
unsuited for the purpose of this thing (e.g. perhaps it is overkill,
as we won't be shooting for saving disk footprint of a bitmap that
we are not going to save on disk anyway)?

> +static int cmp_tree_depth(const void *va, const void *vb)
> +{
> + struct object_entry *a = *(struct object_entry **)va;
> + struct object_entry *b = *(struct object_entry **)vb;
> + return a->tree_depth - b->tree_depth;
> +}
> +
> +void resolve_tree_islands(int progress, struct packing_data *to_pack)
> +{
> + struct progress *progress_state = NULL;
> + struct object_entry **todo;
> + int nr = 0;
> + int i;
> +
> + if (!island_marks)
> + return;
> +
> + /*
> +  * We process only trees, as commits and tags have already been handled
> +  * (and passed their marks on to root trees, as well. We must make sure
> +  * to process them in descending tree-depth order so that marks
> +  * propagate down the tree properly, even if a sub-tree is found in
> +  * multiple parent trees.
> +  */
> + todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> + for (i = 0; i < to_pack->nr_objects; i++) {
> + if (oe_type(_pack->objects[i]) == OBJ_TREE)
> + todo[nr++] = _pack->objects[i];
> + }
> + qsort(todo, nr, sizeof(*todo), cmp_tree_depth);

Hmph, at this stage nobody actually seems to set tree_depth; I am
wondering 

Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-22 Thread Christian Couder
On Sun, Jul 22, 2018 at 10:50 AM, Duy Nguyen  wrote:
> On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
>  wrote:

>> +pack.island::
>> +   A regular expression configuring a set of delta islands. See
>> +   "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details.
>> +
>
> That section is not added until 3/5 though.

Yeah, so I guess it is better to move this hunk to 3/5 and keep
pack.island undocumented until the delta islands code is actually used
by pack-objects.

>> diff --git a/delta-islands.c b/delta-islands.c
>> new file mode 100644
>> index 00..645fe966c5
>> --- /dev/null
>> +++ b/delta-islands.c
>> @@ -0,0 +1,490 @@
>> +#include "builtin.h"
>
> A bit weird that builtin.h would be needed...

Yeah, I will get rid of that include in the next iteration.

>> +   if (progress)
>> +   progress_state = start_progress("Propagating island marks", 
>> nr);
>
> _() (same comment for other strings too)

Ok, the strings will be marked for translation in the next iteration.

>> diff --git a/pack-objects.h b/pack-objects.h
>> index edf74dabdd..8eecd67991 100644
>> --- a/pack-objects.h
>> +++ b/pack-objects.h
>> @@ -100,6 +100,10 @@ struct object_entry {
>> unsigned type_:TYPE_BITS;
>> unsigned no_try_delta:1;
>> unsigned in_pack_type:TYPE_BITS; /* could be delta */
>> +
>> +   unsigned int tree_depth; /* should be repositioned for packing? */
>> +   unsigned char layer;
>> +
>
> This looks very much like an optional feature. To avoid increasing
> pack-objects memory usage for common case, please move this to struct
> packing_data.

Ok, I will take a look at that.

Thanks for the review,
Christian.


Re: [RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-22 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 7:51 AM Christian Couder
 wrote:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a32172a43c..f682e92a1a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2542,6 +2542,10 @@ Note that changing the compression level will not 
> automatically recompress
>  all existing objects. You can force recompression by passing the -F option
>  to linkgit:git-repack[1].
>
> +pack.island::
> +   A regular expression configuring a set of delta islands. See
> +   "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details.
> +

That section is not added until 3/5 though.

> diff --git a/delta-islands.c b/delta-islands.c
> new file mode 100644
> index 00..645fe966c5
> --- /dev/null
> +++ b/delta-islands.c
> @@ -0,0 +1,490 @@
> +#include "builtin.h"

A bit weird that builtin.h would be needed...

> +void resolve_tree_islands(int progress, struct packing_data *to_pack)
> +{
> +   struct progress *progress_state = NULL;
> +   struct object_entry **todo;
> +   int nr = 0;
> +   int i;
> +
> +   if (!island_marks)
> +   return;
> +
> +   /*
> +* We process only trees, as commits and tags have already been 
> handled
> +* (and passed their marks on to root trees, as well. We must make 
> sure
> +* to process them in descending tree-depth order so that marks
> +* propagate down the tree properly, even if a sub-tree is found in
> +* multiple parent trees.
> +*/
> +   todo = xmalloc(to_pack->nr_objects * sizeof(*todo));
> +   for (i = 0; i < to_pack->nr_objects; i++) {
> +   if (oe_type(_pack->objects[i]) == OBJ_TREE)
> +   todo[nr++] = _pack->objects[i];
> +   }
> +   qsort(todo, nr, sizeof(*todo), cmp_tree_depth);
> +
> +   if (progress)
> +   progress_state = start_progress("Propagating island marks", 
> nr);

_() (same comment for other strings too)

> diff --git a/pack-objects.h b/pack-objects.h
> index edf74dabdd..8eecd67991 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -100,6 +100,10 @@ struct object_entry {
> unsigned type_:TYPE_BITS;
> unsigned no_try_delta:1;
> unsigned in_pack_type:TYPE_BITS; /* could be delta */
> +
> +   unsigned int tree_depth; /* should be repositioned for packing? */
> +   unsigned char layer;
> +

This looks very much like an optional feature. To avoid increasing
pack-objects memory usage for common case, please move this to struct
packing_data.

> unsigned preferred_base:1; /*
> * we do not pack this, but is available
> * to be used as the base object to delta
> --
> 2.18.0.237.gffdb1dbdaa
-- 
Duy


[RFC PATCH 2/5] Add delta-islands.{c,h}

2018-07-21 Thread Christian Couder
From: Jeff King 

Hosting providers that allow users to "fork" existing
repos want as much as possible those forks to use a
small amount of disk space.

Using alternates to keep all the objects from all
the forks into a unique central repo is way to do
that, but it can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.

This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.

Delta islands is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.

This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.

The new "pack.island" config variable which can be
used to configure the different islands is described
a bit in "Documentation/config.txt", but the real
documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Documentation/config.txt |   4 +
 Makefile |   1 +
 delta-islands.c  | 490 +++
 delta-islands.h  |  11 +
 pack-objects.h   |   4 +
 5 files changed, 510 insertions(+)
 create mode 100644 delta-islands.c
 create mode 100644 delta-islands.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a32172a43c..f682e92a1a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2542,6 +2542,10 @@ Note that changing the compression level will not 
automatically recompress
 all existing objects. You can force recompression by passing the -F option
 to linkgit:git-repack[1].
 
+pack.island::
+   A regular expression configuring a set of delta islands. See
+   "DELTA ISLANDS" in linkgit:git-pack-objects[1] for details.
+
 pack.deltaCacheSize::
The maximum memory in bytes used for caching deltas in
linkgit:git-pack-objects[1] before writing them out to a pack.
diff --git a/Makefile b/Makefile
index 08e5c54549..2804298977 100644
--- a/Makefile
+++ b/Makefile
@@ -840,6 +840,7 @@ LIB_OBJS += csum-file.o
 LIB_OBJS += ctype.o
 LIB_OBJS += date.o
 LIB_OBJS += decorate.o
+LIB_OBJS += delta-islands.o
 LIB_OBJS += diffcore-break.o
 LIB_OBJS += diffcore-delta.o
 LIB_OBJS += diffcore-order.o
diff --git a/delta-islands.c b/delta-islands.c
new file mode 100644
index 00..645fe966c5
--- /dev/null
+++ b/delta-islands.c
@@ -0,0 +1,490 @@
+#include "builtin.h"
+#include "cache.h"
+#include "attr.h"
+#include "object.h"
+#include "blob.h"
+#include "commit.h"
+#include "tag.h"
+#include "tree.h"
+#include "delta.h"
+#include "pack.h"
+#include "tree-walk.h"
+#include "diff.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "progress.h"
+#include "refs.h"
+#include "khash.h"
+#include "pack-bitmap.h"
+#include "pack-objects.h"
+#include "delta-islands.h"
+#include "sha1-array.h"
+#include "config.h"
+
+KHASH_INIT(str, const char *, void *, 1, kh_str_hash_func, kh_str_hash_equal)
+
+static khash_sha1 *island_marks;
+static unsigned island_counter;
+static unsigned island_counter_core;
+
+static kh_str_t *remote_islands;
+
+struct remote_island {
+   uint64_t hash;
+   struct oid_array oids;
+};
+
+struct island_bitmap {
+   uint32_t refcount;
+   uint32_t bits[];
+};
+
+static uint32_t island_bitmap_size;
+
+/*
+ * Allocate a new bitmap; if "old" is not NULL, the new bitmap will be a copy
+ * of "old". Otherwise, the new bitmap is empty.
+ */
+static struct island_bitmap *island_bitmap_new(const struct island_bitmap *old)
+{
+   size_t size = sizeof(struct island_bitmap) + (island_bitmap_size * 4);
+   struct island_bitmap *b = xcalloc(1, size);
+
+   if (old)
+   memcpy(b, old, size);
+
+   b->refcount = 1;
+   return b;
+}
+
+static void island_bitmap_or(struct island_bitmap *a, const struct 
island_bitmap *b)
+{
+   uint32_t i;
+
+   for (i = 0; i < island_bitmap_size; ++i)
+   a->bits[i] |= b->bits[i];
+}
+
+static int island_bitmap_is_subset(struct island_bitmap *self,
+   struct island_bitmap *super)
+{
+   uint32_t i;
+
+   if (self == super)
+   return 1;
+
+   for (i = 0; i < island_bitmap_size; ++i) {
+   if ((self->bits[i] & super->bits[i]) != self->bits[i])
+   return 0;
+  

Re: [PATCH v4 3/9] checkout.[ch]: move struct declaration into *.h

2018-06-01 Thread Ævar Arnfjörð Bjarmason


On Fri, Jun 01 2018, Junio C Hamano wrote:

> Thomas Gummerer  writes:
>
>> We seem to have plenty of structs defined in '.c' files, if they are
>> only needed there.  Its use also seems to be single purpose for the
>> callback data, so I'm a bit puzzled how having this in a header file
>> instead of the .c file would be helpful?
>>
>> I feel like having only the "public" part in the header file also
>> helps developers that are just looking for documentation of the
>> functions they are looking at, by having less things to go through,
>> that they wouldn't necessarily care about.
>
> Yup, sounds like a sensible criterion to choose between <*.h> and
> <*.c>.

Yes this makes sense. Thanks both. I misunderstood the idiom. Makes
sense in hindsight (& with both of your advices) only to put structs in
the *.h if it's actually used outside of that API. Will move this around
in v5.


Re: [PATCH v4 3/9] checkout.[ch]: move struct declaration into *.h

2018-05-31 Thread Junio C Hamano
Thomas Gummerer  writes:

> We seem to have plenty of structs defined in '.c' files, if they are
> only needed there.  Its use also seems to be single purpose for the
> callback data, so I'm a bit puzzled how having this in a header file
> instead of the .c file would be helpful?
>
> I feel like having only the "public" part in the header file also
> helps developers that are just looking for documentation of the
> functions they are looking at, by having less things to go through,
> that they wouldn't necessarily care about.

Yup, sounds like a sensible criterion to choose between <*.h> and
<*.c>.


Re: [PATCH v4 3/9] checkout.[ch]: move struct declaration into *.h

2018-05-31 Thread Thomas Gummerer
On 05/31, Ævar Arnfjörð Bjarmason wrote:
> Move the tracking_name_data struct used in checkout.c into its
> corresponding header file. This wasn't done in 7c85a87c54 ("checkout:
> factor out functions to new lib file", 2017-11-26) when checkout.[ch]
> were created, and is more consistent with the rest of the codebase.

We seem to have plenty of structs defined in '.c' files, if they are
only needed there.  Its use also seems to be single purpose for the
callback data, so I'm a bit puzzled how having this in a header file
instead of the .c file would be helpful?

I feel like having only the "public" part in the header file also
helps developers that are just looking for documentation of the
functions they are looking at, by having less things to go through,
that they wouldn't necessarily care about.

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  checkout.c | 7 ---
>  checkout.h | 7 +++
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/checkout.c b/checkout.c
> index bdefc888ba..8d68f75ad1 100644
> --- a/checkout.c
> +++ b/checkout.c
> @@ -3,13 +3,6 @@
>  #include "refspec.h"
>  #include "checkout.h"
>  
> -struct tracking_name_data {
> - /* const */ char *src_ref;
> - char *dst_ref;
> - struct object_id *dst_oid;
> - int unique;
> -};
> -
>  static int check_tracking_name(struct remote *remote, void *cb_data)
>  {
>   struct tracking_name_data *cb = cb_data;
> diff --git a/checkout.h b/checkout.h
> index 4cd4cd1c23..04b52f9ffe 100644
> --- a/checkout.h
> +++ b/checkout.h
> @@ -3,6 +3,13 @@
>  
>  #include "cache.h"
>  
> +struct tracking_name_data {
> + /* const */ char *src_ref;
> + char *dst_ref;
> + struct object_id *dst_oid;
> + int unique;
> +};
> +
>  /*
>   * Check if the branch name uniquely matches a branch name on a remote
>   * tracking branch.  Return the name of the remote if such a branch
> -- 
> 2.17.0.290.gded63e768a
> 


[PATCH v4 3/9] checkout.[ch]: move struct declaration into *.h

2018-05-31 Thread Ævar Arnfjörð Bjarmason
Move the tracking_name_data struct used in checkout.c into its
corresponding header file. This wasn't done in 7c85a87c54 ("checkout:
factor out functions to new lib file", 2017-11-26) when checkout.[ch]
were created, and is more consistent with the rest of the codebase.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 checkout.c | 7 ---
 checkout.h | 7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/checkout.c b/checkout.c
index bdefc888ba..8d68f75ad1 100644
--- a/checkout.c
+++ b/checkout.c
@@ -3,13 +3,6 @@
 #include "refspec.h"
 #include "checkout.h"
 
-struct tracking_name_data {
-   /* const */ char *src_ref;
-   char *dst_ref;
-   struct object_id *dst_oid;
-   int unique;
-};
-
 static int check_tracking_name(struct remote *remote, void *cb_data)
 {
struct tracking_name_data *cb = cb_data;
diff --git a/checkout.h b/checkout.h
index 4cd4cd1c23..04b52f9ffe 100644
--- a/checkout.h
+++ b/checkout.h
@@ -3,6 +3,13 @@
 
 #include "cache.h"
 
+struct tracking_name_data {
+   /* const */ char *src_ref;
+   char *dst_ref;
+   struct object_id *dst_oid;
+   int unique;
+};
+
 /*
  * Check if the branch name uniquely matches a branch name on a remote
  * tracking branch.  Return the name of the remote if such a branch
-- 
2.17.0.290.gded63e768a



Re: why do "git log -h" and "git show -h" print the same thing?

2018-05-27 Thread Thomas Gummerer
On 05/24, Stefan Beller wrote:
> On Thu, May 24, 2018 at 6:37 AM, Robert P. J. Day <rpj...@crashcourse.ca> 
> wrote:
> >
> >   maybe this is deliberate, but it's confusing that, with git 2.17.0,
> > the output of both "git log -h" and "git show -h" is exactly the same:
> >
> > $ git log -h
> > usage: git log [] [] [[--] ...]
> >or: git show [] ...
> >
> > -q, --quiet   suppress diff output
> > --source  show source
> > --use-mailmap Use mail map file
> > --decorate-refs 
> >   only decorate refs that match 
> > --decorate-refs-exclude 
> >   do not decorate refs that match 
> > --decorate[=...]  decorate options
> > -L <n,m:file> Process line range n,m in file, counting from 1
> > $
> >
> > is that what's *supposed* to happen?
> 
> I would think so, show is just "log -p" with the range clamped
> down to ^...

Hmm, that's not true though if the object is not a commit, from my
understanding?  While 'git show' does share some of its code with 'git
log', what you get as output may be quite different from what you'd
get with 'git log'.  So I feel like we're leaking an implementation
detail to the user here, and I can see how it is confusing especially
for new users.

So I do feel like there's some room for improvement here, by only
showing the help for the command that was actually used.  I would
welcome a patch to that extend, but obviously I'm not the authority
here :)

> It's been in the code like that for a couple years by now,
> e.g. see
> e66dc0cc4b1a6 log.c: fix translation markings, 2015-01-06


Re: why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Robert P. J. Day
On Thu, 24 May 2018, Stefan Beller wrote:

> On Thu, May 24, 2018 at 6:37 AM, Robert P. J. Day <rpj...@crashcourse.ca> 
> wrote:
> >
> >   maybe this is deliberate, but it's confusing that, with git 2.17.0,
> > the output of both "git log -h" and "git show -h" is exactly the same:
> >
> > $ git log -h
> > usage: git log [] [] [[--] ...]
> >or: git show [] ...
> >
> > -q, --quiet   suppress diff output
> > --source  show source
> > --use-mailmap Use mail map file
> > --decorate-refs 
> >   only decorate refs that match 
> > --decorate-refs-exclude 
> >   do not decorate refs that match 
> > --decorate[=...]  decorate options
> > -L <n,m:file> Process line range n,m in file, counting from 1
> > $
> >
> > is that what's *supposed* to happen?
>
> I would think so, show is just "log -p" with the range clamped
> down to ^...
>
> It's been in the code like that for a couple years by now,
> e.g. see
> e66dc0cc4b1a6 log.c: fix translation markings, 2015-01-06

  ah, very well, it just caught me by surprise.

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



Re: why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Stefan Beller
On Thu, May 24, 2018 at 6:37 AM, Robert P. J. Day <rpj...@crashcourse.ca> wrote:
>
>   maybe this is deliberate, but it's confusing that, with git 2.17.0,
> the output of both "git log -h" and "git show -h" is exactly the same:
>
> $ git log -h
> usage: git log [] [] [[--] ...]
>or: git show [] ...
>
> -q, --quiet   suppress diff output
> --source  show source
> --use-mailmap Use mail map file
> --decorate-refs 
>   only decorate refs that match 
> --decorate-refs-exclude 
>   do not decorate refs that match 
> --decorate[=...]  decorate options
> -L <n,m:file> Process line range n,m in file, counting from 1
> $
>
> is that what's *supposed* to happen?

I would think so, show is just "log -p" with the range clamped
down to ^...

It's been in the code like that for a couple years by now,
e.g. see
e66dc0cc4b1a6 log.c: fix translation markings, 2015-01-06


why do "git log -h" and "git show -h" print the same thing?

2018-05-24 Thread Robert P. J. Day

  maybe this is deliberate, but it's confusing that, with git 2.17.0,
the output of both "git log -h" and "git show -h" is exactly the same:

$ git log -h
usage: git log [] [] [[--] ...]
   or: git show [] ...

-q, --quiet   suppress diff output
--source  show source
--use-mailmap Use mail map file
--decorate-refs 
  only decorate refs that match 
--decorate-refs-exclude 
  do not decorate refs that match 
--decorate[=...]  decorate options
-L <n,m:file> Process line range n,m in file, counting from 1
$

is that what's *supposed* to happen?

rday

-- 


Robert P. J. Day Ottawa, Ontario, CANADA
  http://crashcourse.ca/dokuwiki

Twitter:   http://twitter.com/rpjday
LinkedIn:   http://ca.linkedin.com/in/rpjday



[PATCH 0/5] V2B: simplify convert.c/h

2017-12-31 Thread tboegi
From: Torsten Bögershausen <tbo...@web.de>

Simplify the convert.h/convert.c logic amd don't touch convert_to_git()
The rest is v2 from Lars

Lars Schneider (4):
  strbuf: add xstrdup_toupper()
  utf8: add function to detect prohibited UTF-16/32 BOM
  utf8: add function to detect a missing UTF-16/32 BOM
  convert: add support for 'checkout-encoding' attribute

Torsten Bögershausen (1):
  convert_to_git(): checksafe becomes an integer

 Documentation/gitattributes.txt |  59 +++
 apply.c |   4 +-
 convert.c   | 210 +---
 convert.h   |  19 ++--
 diff.c  |   4 +-
 environment.c   |   2 +-
 sha1_file.c |   8 +-
 strbuf.c|  13 +++
 strbuf.h|   1 +
 t/t0028-checkout-encoding.sh| 197 +
 utf8.c  |  37 +++
 utf8.h  |  25 +
 12 files changed, 549 insertions(+), 30 deletions(-)
 create mode 100755 t/t0028-checkout-encoding.sh

-- 
2.16.0.rc0.4.ga4e00d4fa4



Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-24 Thread René Scharfe
Am 24.10.2017 um 02:52 schrieb Junio C Hamano:
> René Scharfe <l@web.de> writes:
> 
>> Am 21.10.2017 um 14:18 schrieb Junio C Hamano:
>>> René Scharfe <l@web.de> writes:
>>>
>>>> FWIW, I use "-?" for that everywhere.  I have yet to find a command or
>>>> environment where it does something dangerous.
>>>
>>> Yeah, it would have made the world a better place if we made that
>>> choice back in 2008.  If we start a transition to make it so right
>>> now, we might be able to make the world a better place by 2022,
>>> perhaps.  I am not sure if the pain during the transition is worth
>>> it, though.
>>
>> "-?" works fine with builtins already -- they complain that the option
>> is unknown and then show the short help text.
> 
> Ah, I misunderstood what you meant, then.  I thought you were
> advocating to move the built-in short-help support to know about and
> explicitly react to "-?", and somehow forgot that it "works" more or
> less already.

I don't mean to advocate here -- it's more like trying to get the
accounting right.  A little bit of OCD perhaps?

> The fact that "-?" already works for most things is good, but the
> transition pain still remains, as what's costly is to transition
> people's expectation (i.e. "'-?' and not '-h' is the way to get
> short help from any subcommand"), not the implementation to fill the
> gaps for those that do not yet support '-?', I am afraid.

A minor cost for help-seeking users would be the extra error message at
the top of the short help text.

The main change would cause "git ls-remote -h" to show remote heads and
"git show-ref -h" to show HEAD.  Confused users would have to resort to
e.g. man, -help, --help, their search engine of choice, or -?.  I feel
this could be justified.  It would be different if e.g. reset started
to take -h as shorthand for --hard, as this would cause data loss.

The hard part would probably be allowing the execution of commands with
unknown arguments outside of repositories to show the help text, even
if they'd normally (with correct arguments) require one.  Converting
all commands from RUN_SETUP to RUN_SETUP_GENTLY seems painful.  Showing
help when a required repo is not found might be possible somehow.

With that I'm going to shut up, as I don't even use the affected
commands, nor can I imagine being in the place of someone impacted by
"git  -h" not showing a help text.

René


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-23 Thread Junio C Hamano
René Scharfe <l@web.de> writes:

> Am 21.10.2017 um 14:18 schrieb Junio C Hamano:
>> René Scharfe <l@web.de> writes:
>> 
>>> FWIW, I use "-?" for that everywhere.  I have yet to find a command or
>>> environment where it does something dangerous.
>> 
>> Yeah, it would have made the world a better place if we made that
>> choice back in 2008.  If we start a transition to make it so right
>> now, we might be able to make the world a better place by 2022,
>> perhaps.  I am not sure if the pain during the transition is worth
>> it, though.
>
> "-?" works fine with builtins already -- they complain that the option
> is unknown and then show the short help text.

Ah, I misunderstood what you meant, then.  I thought you were
advocating to move the built-in short-help support to know about and
explicitly react to "-?", and somehow forgot that it "works" more or
less already.

The fact that "-?" already works for most things is good, but the
transition pain still remains, as what's costly is to transition
people's expectation (i.e. "'-?' and not '-h' is the way to get
short help from any subcommand"), not the implementation to fill the
gaps for those that do not yet support '-?', I am afraid.



Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-23 Thread René Scharfe
Am 21.10.2017 um 14:18 schrieb Junio C Hamano:
> René Scharfe <l@web.de> writes:
> 
>> FWIW, I use "-?" for that everywhere.  I have yet to find a command or
>> environment where it does something dangerous.
> 
> Yeah, it would have made the world a better place if we made that
> choice back in 2008.  If we start a transition to make it so right
> now, we might be able to make the world a better place by 2022,
> perhaps.  I am not sure if the pain during the transition is worth
> it, though.

"-?" works fine with builtins already -- they complain that the option
is unknown and then show the short help text.

If we removed the special handling for "-h" from parse-options it would
do the same for most commands, i.e. we'd get an additional error
message, followed by the short help text that we're used to see.  We
could check for "git grep -h" without any other arguments and show the
usage text in that case.  We could do the same for "git ls-remote -h"
as well, but I'm not sure we want to.

So the cost would be an extra error message in most cases, and possibly
the inability to show help with "git ls-remote -h".  That doesn't sound
very painful.

René


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-21 Thread Junio C Hamano
René Scharfe  writes:

> FWIW, I use "-?" for that everywhere.  I have yet to find a command or
> environment where it does something dangerous.

Yeah, it would have made the world a better place if we made that
choice back in 2008.  If we start a transition to make it so right
now, we might be able to make the world a better place by 2022,
perhaps.  I am not sure if the pain during the transition is worth
it, though.


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-21 Thread René Scharfe
Am 20.10.2017 um 07:35 schrieb Junio C Hamano:
> Jeff King <p...@peff.net> writes:
> 
>>  It seems weird and inconsistent to me that the meaning of "-h"
>> depends on the position and presence of other unrelated options.

The position is relevant with parse-options for *each* flag for a
different reason as well: You can't put a flag after a non-flag.  I
find that more annoying, as it slightly but noticeable slows down
adding a flag to a previous command by requiring to navigate to the
middle of the line.

(I guess I should train myself to use Meta-b for going back one word
more often..)

> I may be biased as every time I think about this one, the first one
> that comes to my mind is how "grep -h" (not "git grep", but GNU
> grep) behaves.  Yes, "-h" means something else, but by itself, the
> command does not make sense, and "ls-remote -h" is an exception to
> the rule: most sane commands do not have a sensible semantics when
> they take only "-h" and nothing else.  And even for ls-remote it is
> true only when you try to be extra lazy and do not say which remote
> you are asking.
> 
> And that is why I think "'cmd -h" and nothing else consistently
> gives help" may overall be positive in usability, than a rule that
> says "'cmd -h' is for short-help unless -h means something else for
> the command".

FWIW, I use "-?" for that everywhere.  I have yet to find a command or
environment where it does something dangerous.

René


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-21 Thread Jeff King
On Fri, Oct 20, 2017 at 02:35:36PM +0900, Junio C Hamano wrote:

> I may be biased as every time I think about this one, the first one
> that comes to my mind is how "grep -h" (not "git grep", but GNU
> grep) behaves.  Yes, "-h" means something else, but by itself, the
> command does not make sense, and "ls-remote -h" is an exception to
> the rule: most sane commands do not have a sensible semantics when
> they take only "-h" and nothing else.  And even for ls-remote it is
> true only when you try to be extra lazy and do not say which remote
> you are asking.

Yeah, I have to admit "grep -h" is a lot more compelling, since it
matches normal grep. And I've used "grep -h" many times without thinking
about it, simply because the rule just falls out naturally there
(there's nothing possible to do without a regex argument, so we'd show
the usage either way).

> And that is why I think "'cmd -h" and nothing else consistently
> gives help" may overall be positive in usability, than a rule that
> says "'cmd -h' is for short-help unless -h means something else for
> the command".

Yeah, I was shooting more for "let's avoid assigning -h to things". But
that's not really possible if we want to be consistent with upstream
grep, which is probably more important.

I think you've convinced me.

-Peff


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-19 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

>  It seems weird and inconsistent to me that the meaning of "-h"
> depends on the position and presence of other unrelated options. Maybe
> it's just me. I know _why_ it's that way, but this seems like one of
> those weird corners of the interface that end up confusing people and
> giving Git's interface the reputation of being full of mysterious
> inconsistencies. I suspect one of the reasons nobody has complained
> about it is that "ls-remote" is not commonly used, and "ls-remote -h"
> less so (I didn't even know it was there until this conversation).

Technically, yes, it may be weird.

I may be biased as every time I think about this one, the first one
that comes to my mind is how "grep -h" (not "git grep", but GNU
grep) behaves.  Yes, "-h" means something else, but by itself, the
command does not make sense, and "ls-remote -h" is an exception to
the rule: most sane commands do not have a sensible semantics when
they take only "-h" and nothing else.  And even for ls-remote it is
true only when you try to be extra lazy and do not say which remote
you are asking.

And that is why I think "'cmd -h" and nothing else consistently
gives help" may overall be positive in usability, than a rule that
says "'cmd -h' is for short-help unless -h means something else for
the command".


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-19 Thread Jeff King
On Fri, Oct 20, 2017 at 10:04:23AM +0900, Junio C Hamano wrote:

> > Yuck. This "we only treat -h as special in certain cases" rule is
> > sufficiently magical that I don't think we want to advertise and lock
> > ourselves into it.
> 
> Hmph.  I think it is way too late to be worried about "locked into"
> the convention---hasn't the "git cmd -h" been with us forever?

I guess. I still find it pretty nasty (not that "-h" works for help, but
that it can override the normal usage).

> Besides, I personally feel that there is nothing magical in the rule
> that is "we always treat 'git $cmd -h' as asking for short help, but
> individual subcommand may choose to use -h for, perhaps to be
> compatible with other tools and conventions, in other situations".

 It seems weird and inconsistent to me that the meaning of "-h"
depends on the position and presence of other unrelated options. Maybe
it's just me. I know _why_ it's that way, but this seems like one of
those weird corners of the interface that end up confusing people and
giving Git's interface the reputation of being full of mysterious
inconsistencies. I suspect one of the reasons nobody has complained
about it is that "ls-remote" is not commonly used, and "ls-remote -h"
less so (I didn't even know it was there until this conversation).

-Peff


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-19 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> On Thu, Oct 19, 2017 at 08:26:47PM +0200, René Scharfe wrote:
>
>> diff --git a/Documentation/git-ls-remote.txt 
>> b/Documentation/git-ls-remote.txt
>> index 5f2628c8f8..82622e7fbc 100644
>> --- a/Documentation/git-ls-remote.txt
>> +++ b/Documentation/git-ls-remote.txt
>> @@ -29,6 +29,9 @@ OPTIONS
>>  These options are _not_ mutually exclusive; when given
>>  both, references stored in refs/heads and refs/tags are
>>  displayed.
>> ++
>> +*NOTE*: -h without any other parameters shows a short help text instead
>> +of any refs.
>
> Yuck. This "we only treat -h as special in certain cases" rule is
> sufficiently magical that I don't think we want to advertise and lock
> ourselves into it.

Hmph.  I think it is way too late to be worried about "locked into"
the convention---hasn't the "git cmd -h" been with us forever?

Besides, I personally feel that there is nothing magical in the rule
that is "we always treat 'git $cmd -h' as asking for short help, but
individual subcommand may choose to use -h for, perhaps to be
compatible with other tools and conventions, in other situations".


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-19 Thread Jeff King
On Thu, Oct 19, 2017 at 08:26:47PM +0200, René Scharfe wrote:

> diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
> index 5f2628c8f8..82622e7fbc 100644
> --- a/Documentation/git-ls-remote.txt
> +++ b/Documentation/git-ls-remote.txt
> @@ -29,6 +29,9 @@ OPTIONS
>   These options are _not_ mutually exclusive; when given
>   both, references stored in refs/heads and refs/tags are
>   displayed.
> ++
> +*NOTE*: -h without any other parameters shows a short help text instead
> +of any refs.

Yuck. This "we only treat -h as special in certain cases" rule is
sufficiently magical that I don't think we want to advertise and lock
ourselves into it.

-Peff


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-19 Thread René Scharfe
Am 18.10.2017 um 01:22 schrieb Junio C Hamano:
> René Scharfe <l@web.de> writes:
> 
>> Stop advertising -h as the short equivalent of --heads, because it's
>> used for showing a short help text for almost all other git commands.
>> Since the ba5f28bf79 (ls-remote: use parse-options api) it has only
>> been working when used together with other parameters anyway.
>>
>> Signed-off-by: Rene Scharfe <l@web.de>
>> ---
>> That would be step on the way towards more consistent command line
>> switches, in the same vein as d69155119 (t0012: test "-h" with
>> builtins).
> 
> Sorry, but I am not sure whom this and the other approach are trying
> to help.
> 
> The rule we have currently seems to be that "git cmd -h" (no other
> arguments) consistently gives a short help, and if a subcommand
> supports an option "-h" specific to it that is not about giving a
> short help, the caller needs to be aware of it to invoke the option,
> by making sure that there is some other arguments on the command
> line if "-h" form of that subcommand specific option is used, by
> doing e.g. "git ls-remote -h origin" or "git ls-remote --head".
> 
> I can see that this "alternative" approach makes it less convenent
> for folks who have followed that rule by hiding "-h" (with the
> intention of deprecating and possibly removing it in the future) and
> encouraging (and later foring) "--head" to be used instead.
> 
> The other approach burdens new users by changing the rule to "some
> subcommands that have their own '-h' option cannot be asked for a
> brief usage with 'git cmd -h'".  But the thing is, these new users
> who do not know which subcommands do have their own '-h' and which
> ones do not are the ones that benefit most from the consistent "'git
> cmd -h' with no other argument gets short help" rule.

They would help by aligning documentation and code, at a price.  But
of course there is another way to do that -- I just didn't see it the
other day.  We don't have to take anything away:

-- >8 --
Subject: [PATCH] ls-remote: document behavior of lone parameter -h

Reported-by: Thomas Rikl <tr...@online.de>
Analyzed-by: Martin Ågren <martin.ag...@gmail.com>
Analyzed-by: Junio C Hamano <gits...@pobox.com>
Signed-off-by: Rene Scharfe <l@web.de>
---
 Documentation/git-ls-remote.txt | 10 ++
 builtin/ls-remote.c |  5 -
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
index 5f2628c8f8..82622e7fbc 100644
--- a/Documentation/git-ls-remote.txt
+++ b/Documentation/git-ls-remote.txt
@@ -29,6 +29,9 @@ OPTIONS
These options are _not_ mutually exclusive; when given
both, references stored in refs/heads and refs/tags are
displayed.
++
+*NOTE*: -h without any other parameters shows a short help text instead
+of any refs.
 
 --refs::
Do not show peeled tags or pseudorefs like HEAD in the output.
@@ -89,6 +92,13 @@ EXAMPLES
f25a265a342aed6041ab0cc484224d9ca54b6f41refs/tags/v0.99.1
c5db5456ae3b0873fc659c19fafdde22313cc441refs/tags/v0.99.2
7ceca275d047c90c0c7d5afb13ab97efdf51bd6erefs/tags/v0.99.3
+   $ git ls-remote -hh
+   From https://git.kernel.org/pub/scm/git/git.git/
+   4c2224e83951a685185bb8c1f83b28e22fee0e27refs/heads/maint
+   5fe978a5381f1fbad26a80e682ddd2a401966740refs/heads/master
+   76aedb4517c834be2dc89efb5f9d15908e324422refs/heads/next
+   c781a84b5204fb294c9ccc79f8b3baceeb32c061refs/heads/pu
+   5d38b589ccc7a8355c62f1577865df5b8216c00drefs/heads/todo
 
 GIT
 ---
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c4be98ab9e..0438dfec05 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -56,7 +56,10 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
   N_("path of git-upload-pack on the remote host"),
   PARSE_OPT_HIDDEN },
OPT_BIT('t', "tags", , N_("limit to tags"), REF_TAGS),
-   OPT_BIT('h', "heads", , N_("limit to heads"), REF_HEADS),
+   OPT_BIT(0, "heads", , N_("limit to heads"), REF_HEADS),
+   OPT_BIT('h', NULL, ,
+   N_("if only parameter: show this help text, otherwise 
limit to heads"),
+   REF_HEADS),
OPT_BIT(0, "refs", , N_("do not show peeled tags"), 
REF_NORMAL),
OPT_BOOL(0, "get-url", _url,
 N_("take url..insteadOf into account")),
-- 
2.14.2


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-17 Thread Junio C Hamano
René Scharfe <l@web.de> writes:

> Stop advertising -h as the short equivalent of --heads, because it's
> used for showing a short help text for almost all other git commands.
> Since the ba5f28bf79 (ls-remote: use parse-options api) it has only
> been working when used together with other parameters anyway.
>
> Signed-off-by: Rene Scharfe <l@web.de>
> ---
> That would be step on the way towards more consistent command line
> switches, in the same vein as d69155119 (t0012: test "-h" with
> builtins).

Sorry, but I am not sure whom this and the other approach are trying
to help.

The rule we have currently seems to be that "git cmd -h" (no other
arguments) consistently gives a short help, and if a subcommand
supports an option "-h" specific to it that is not about giving a
short help, the caller needs to be aware of it to invoke the option,
by making sure that there is some other arguments on the command
line if "-h" form of that subcommand specific option is used, by
doing e.g. "git ls-remote -h origin" or "git ls-remote --head".

I can see that this "alternative" approach makes it less convenent
for folks who have followed that rule by hiding "-h" (with the
intention of deprecating and possibly removing it in the future) and
encouraging (and later foring) "--head" to be used instead.  

The other approach burdens new users by changing the rule to "some
subcommands that have their own '-h' option cannot be asked for a
brief usage with 'git cmd -h'".  But the thing is, these new users
who do not know which subcommands do have their own '-h' and which
ones do not are the ones that benefit most from the consistent "'git
cmd -h' with no other argument gets short help" rule.

When making a backward incompatible change, it is asking us to go to
those who are inconvenienced and say "I am sorry that we are making
things less convenient for you, but by doing so we can gain this
greater good which is ...".  I can explain how this and the other
approach make things less convenient for existing or new users, but
I am having a hard time formulating what the greater good is.

In short, I cannot sell this change to our users.  Please help me do
so if you want this (or the other) change made to our system.



Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-17 Thread Jonathan Nieder
René Scharfe wrote:

> Stop advertising -h as the short equivalent of --heads, because it's
> used for showing a short help text for almost all other git commands.
> Since the ba5f28bf79 (ls-remote: use parse-options api) it has only
> been working when used together with other parameters anyway.
>
> Signed-off-by: Rene Scharfe <l@web.de>
> ---
>  Documentation/git-ls-remote.txt | 1 -
>  builtin/ls-remote.c | 4 +++-
>  2 files changed, 3 insertions(+), 2 deletions(-)

Yes, I think this is a good step.
Reviewed-by: Jonathan Nieder <jrnie...@gmail.com>

I would be tempted to go even further and make "git ls-remote -h"
print a warning to stderr to encourage people to update their scripts
to use --heads instead.

Thanks.


Re: [Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-17 Thread Martin Ågren
On 17 October 2017 at 17:39, René Scharfe <l@web.de> wrote:
> Stop advertising -h as the short equivalent of --heads, because it's
> used for showing a short help text for almost all other git commands.
> Since the ba5f28bf79 (ls-remote: use parse-options api) it has only
> been working when used together with other parameters anyway.
>
> Signed-off-by: Rene Scharfe <l@web.de>
> ---
> That would be step on the way towards more consistent command line
> switches, in the same vein as d69155119 (t0012: test "-h" with
> builtins).

FWIW, my first inclination would be to go with a patch like this instead
of the other two (where you introduce an exception so that git ls-remote
-h does not display the usage). ba5f28bf79 was in 2.8.0. That's 18
months ago. Not an eternity, but still some time ago. Not fixing this
regression has an obvious downside, but there's also a downside to
adding the exception.

As long as a lone -h may give the usage or do something else entirely,
then -- put bluntly -- to know whether you can request the usage with
git foo -h without risk of messing up your repo, you'll need to look up
the usage some other way. At which point you've solved your original
problem, without -h.

Of course, we could promise that -h will give the usage or, in case of
historical wart(s), do something harmless. But it seems a bit awkward,
and might limit the perceived usefulness/safeness or -h.

> diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
> index 5f2628c8f8..898836a111 100644
> --- a/Documentation/git-ls-remote.txt
> +++ b/Documentation/git-ls-remote.txt
> @@ -21,7 +21,6 @@ commit IDs.
>
>  OPTIONS
>  ---
> --h::
>  --heads::
>  -t::
>  --tags::

Do we want to document -h as a deprecated alias, so that people have a
slightly larger chance of noticing and adapting?


[Alt. PATCH] ls-remote: deprecate -h as short for --heads

2017-10-17 Thread René Scharfe
Stop advertising -h as the short equivalent of --heads, because it's
used for showing a short help text for almost all other git commands.
Since the ba5f28bf79 (ls-remote: use parse-options api) it has only
been working when used together with other parameters anyway.

Signed-off-by: Rene Scharfe <l@web.de>
---
That would be step on the way towards more consistent command line
switches, in the same vein as d69155119 (t0012: test "-h" with
builtins).

 Documentation/git-ls-remote.txt | 1 -
 builtin/ls-remote.c | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-ls-remote.txt b/Documentation/git-ls-remote.txt
index 5f2628c8f8..898836a111 100644
--- a/Documentation/git-ls-remote.txt
+++ b/Documentation/git-ls-remote.txt
@@ -21,7 +21,6 @@ commit IDs.
 
 OPTIONS
 ---
--h::
 --heads::
 -t::
 --tags::
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c4be98ab9e..840deedbef 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -56,7 +56,9 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
   N_("path of git-upload-pack on the remote host"),
   PARSE_OPT_HIDDEN },
OPT_BIT('t', "tags", , N_("limit to tags"), REF_TAGS),
-   OPT_BIT('h', "heads", , N_("limit to heads"), REF_HEADS),
+   OPT_BIT(0, "heads", , N_("limit to heads"), REF_HEADS),
+   { OPTION_BIT, 'h', NULL, , NULL, N_("limit to heads"),
+   PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, NULL, REF_HEADS },
OPT_BIT(0, "refs", , N_("do not show peeled tags"), 
REF_NORMAL),
OPT_BOOL(0, "get-url", _url,
 N_("take url..insteadOf into account")),
-- 
2.14.2


Re: Bug: git ls-remote -h / --head results differ in output

2017-10-15 Thread René Scharfe
Am 15.10.2017 um 13:08 schrieb Martin Ågren:
> On 15 October 2017 at 12:02, Thomas Rikl <tr...@online.de> wrote:
>> Example:
>>
>> tom1 ~/emacs/spacemacs/.emacs.d $ export LANG=en_US.utf8
>>
>> tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote -h
>> usage: git ls-remote [--heads] [--tags] [--refs] [--upload-pack=]
>>   [-q | --quiet] [--exit-code] [--get-url]
>>   [--symref] [ [...]]
>>
>>  -q, --quiet   do not print remote URL
>>  --upload-pack   path of git-upload-pack on the remote host
>>  -t, --tagslimit to tags
>>  -h, --heads   limit to heads
>>  --refsdo not show peeled tags
>>  --get-url take url..insteadOf into account
>>  --exit-code   exit with exit code 2 if no matching refs are
>> found
>>  --symref  show underlying ref in addition to the object
>> pointed by it
>>
>> tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote --head
>>  From https://github.com/syl20bnr/spacemacs
>> 07014deead544c51fa6a826e91fe2ef05bf04323 refs/heads/develop
>> 8e1af145480d53e8d32cdff2c83291889903164b refs/heads/master
>> 2450b7e276634ece07b6b7ec6ca6c287af86caf3 refs/heads/release-0.101
>> 8dadfc1494544bb152e80c2a436e43bc3713b389 refs/heads/release-0.102
>> d993a021847cde2c42865bab6afa8adbb2edda0b refs/heads/release-0.103
>> 44d4525543b1f2a385142721d0cb16cd3b0be580 refs/heads/release-0.104
>> 9f9faa404e3dec3e08cc73cf7b5a0439fc309800 refs/heads/release-0.105
>> 8e1af145480d53e8d32cdff2c83291889903164b refs/heads/release-0.200
>> tom1 ~/emacs/spacemacs/.emacs.d $ git --version
>> git version 2.14.2
>>
>> on archlinux: Linux achse 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 09:58:47
>> CEST 2017 x86_64 GNU/Linux
> 
> The behavior you observed matches with the documentation in gitcli(7)
> and is arguably correct. A lone "-h" prints the help/usage. But I would
> agree that this can be confusing, especially when considering
> git-ls-remote(1) on its own, without any extra knowledge about what a
> lone -h should do.
> 
> So -h and --heads can only be used interchangeably if you have other
> arguments. To see this, you could, e.g., try "git ls-remote -h -h".
> 
> Some more details. This looks like ba5f28bf7 (ls-remote: use
> parse-options api, 2016-01-19). Before that, "-h" was handled internally
> in builtin/ls-files.c. After that it is handled in the general
> options-parsing machinery. See for example 5ad0d3d52 (parse-options:
> allow -h as a short option, 2015-11-17), which explicitly wants to print
> the usage-string if "-h" is given as the lone argument.
> 
> I'm not sure which is the best way forward here, or how many other
> commands could have similar pitfalls. For example, the "-h"-flag of git
> grep shouldn't be able to cause this problem.

The flag PARSE_OPT_NO_INTERNAL_HELP should be used to let git ls-remote
fully handle -h.

The same goes for git show-ref, but perhaps it's better to remove its
hidden option -h by now.

But stepping back a bit I wonder if the demure internal -h handler (that
only speaks up when nothing else is said) is a bit too subtle.
Reverting 5ad0d3d52 would make the need for PARSE_OPT_NO_INTERNAL_HELP
more obvious.

René


Re: Bug: git ls-remote -h / --head results differ in output

2017-10-15 Thread Martin Ågren
On 15 October 2017 at 12:02, Thomas Rikl <tr...@online.de> wrote:
> Example:
>
> tom1 ~/emacs/spacemacs/.emacs.d $ export LANG=en_US.utf8
>
> tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote -h
> usage: git ls-remote [--heads] [--tags] [--refs] [--upload-pack=]
>  [-q | --quiet] [--exit-code] [--get-url]
>  [--symref] [ [...]]
>
> -q, --quiet   do not print remote URL
> --upload-pack   path of git-upload-pack on the remote host
> -t, --tagslimit to tags
> -h, --heads   limit to heads
> --refsdo not show peeled tags
> --get-url take url..insteadOf into account
> --exit-code   exit with exit code 2 if no matching refs are
> found
> --symref  show underlying ref in addition to the object
> pointed by it
>
> tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote --head
> From https://github.com/syl20bnr/spacemacs
> 07014deead544c51fa6a826e91fe2ef05bf04323 refs/heads/develop
> 8e1af145480d53e8d32cdff2c83291889903164b refs/heads/master
> 2450b7e276634ece07b6b7ec6ca6c287af86caf3 refs/heads/release-0.101
> 8dadfc1494544bb152e80c2a436e43bc3713b389 refs/heads/release-0.102
> d993a021847cde2c42865bab6afa8adbb2edda0b refs/heads/release-0.103
> 44d4525543b1f2a385142721d0cb16cd3b0be580 refs/heads/release-0.104
> 9f9faa404e3dec3e08cc73cf7b5a0439fc309800 refs/heads/release-0.105
> 8e1af145480d53e8d32cdff2c83291889903164b refs/heads/release-0.200
> tom1 ~/emacs/spacemacs/.emacs.d $ git --version
> git version 2.14.2
>
> on archlinux: Linux achse 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 09:58:47
> CEST 2017 x86_64 GNU/Linux

The behavior you observed matches with the documentation in gitcli(7)
and is arguably correct. A lone "-h" prints the help/usage. But I would
agree that this can be confusing, especially when considering
git-ls-remote(1) on its own, without any extra knowledge about what a
lone -h should do.

So -h and --heads can only be used interchangeably if you have other
arguments. To see this, you could, e.g., try "git ls-remote -h -h".

Some more details. This looks like ba5f28bf7 (ls-remote: use
parse-options api, 2016-01-19). Before that, "-h" was handled internally
in builtin/ls-files.c. After that it is handled in the general
options-parsing machinery. See for example 5ad0d3d52 (parse-options:
allow -h as a short option, 2015-11-17), which explicitly wants to print
the usage-string if "-h" is given as the lone argument.

I'm not sure which is the best way forward here, or how many other
commands could have similar pitfalls. For example, the "-h"-flag of git
grep shouldn't be able to cause this problem.

Martin


Bug: git ls-remote -h / --head results differ in output

2017-10-15 Thread Thomas Rikl

Example:

tom1 ~/emacs/spacemacs/.emacs.d $ export LANG=en_US.utf8

tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote -h
usage: git ls-remote [--heads] [--tags] [--refs] [--upload-pack=]
 [-q | --quiet] [--exit-code] [--get-url]
 [--symref] [ [...]]

    -q, --quiet   do not print remote URL
    --upload-pack   path of git-upload-pack on the remote host
    -t, --tags    limit to tags
    -h, --heads   limit to heads
    --refs    do not show peeled tags
    --get-url take url..insteadOf into account
    --exit-code   exit with exit code 2 if no matching refs are 
found
    --symref  show underlying ref in addition to the object 
pointed by it


tom1 ~/emacs/spacemacs/.emacs.d $ git ls-remote --head
From https://github.com/syl20bnr/spacemacs
07014deead544c51fa6a826e91fe2ef05bf04323 refs/heads/develop
8e1af145480d53e8d32cdff2c83291889903164b refs/heads/master
2450b7e276634ece07b6b7ec6ca6c287af86caf3 refs/heads/release-0.101
8dadfc1494544bb152e80c2a436e43bc3713b389 refs/heads/release-0.102
d993a021847cde2c42865bab6afa8adbb2edda0b refs/heads/release-0.103
44d4525543b1f2a385142721d0cb16cd3b0be580 refs/heads/release-0.104
9f9faa404e3dec3e08cc73cf7b5a0439fc309800 refs/heads/release-0.105
8e1af145480d53e8d32cdff2c83291889903164b refs/heads/release-0.200
tom1 ~/emacs/spacemacs/.emacs.d $ git --version
git version 2.14.2

on archlinux: Linux achse 4.13.5-1-ARCH #1 SMP PREEMPT Fri Oct 6 
09:58:47 CEST 2017 x86_64 GNU/Linux




Re: [PATCH 1/1] ls-remote: remove "-h" from help text

2017-06-28 Thread S. Fricke
Hi Jonathan,


> > This regression was fixed in 91a640ffb6d9 ("ls-remote: a lone "-h" is
> > [...]
> 
> Without this patch, I'm able to run
> 
>   git ls-remote -h .
> 
> This patch removes that support.  Intended?

*hihi* okay it was to counter-intuitive for me. "91a640ffb6d" talks about this
issue.

Thanks for enlightenment :-)
Silvio

-- 
-- S. Fricke  sil...@port1024.net --
   Diplom-Informatiker (FH)
   Linux-Development Matrix: @silvio:port1024.net   



Re: [PATCH 1/1] ls-remote: remove "-h" from help text

2017-06-27 Thread Jonathan Nieder
Hi,

Silvio Fricke wrote:

> This regression was fixed in 91a640ffb6d9 ("ls-remote: a lone "-h" is
> asking for help") and the wrong help text was introduced in ba5f28bf79ea
> ("ls-remote: use parse-options api").
> This patch removes the "-h" on the help text.
>
> Signed-off-by: Silvio Fricke <silvio.fri...@gmail.com>
> ---
>  builtin/ls-remote.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Without this patch, I'm able to run

git ls-remote -h .

This patch removes that support.  Intended?

Thanks and hope that helps,
Jonathan


[PATCH 1/1] ls-remote: remove "-h" from help text

2017-06-27 Thread Silvio Fricke
This regression was fixed in 91a640ffb6d9 ("ls-remote: a lone "-h" is
asking for help") and the wrong help text was introduced in ba5f28bf79ea
("ls-remote: use parse-options api").
This patch removes the "-h" on the help text.

Signed-off-by: Silvio Fricke <silvio.fri...@gmail.com>
---
 builtin/ls-remote.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index b2d7d5ce6..cd54f7e98 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -56,7 +56,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
   N_("path of git-upload-pack on the remote host"),
   PARSE_OPT_HIDDEN },
OPT_BIT('t', "tags", , N_("limit to tags"), REF_TAGS),
-   OPT_BIT('h', "heads", , N_("limit to heads"), REF_HEADS),
+   OPT_BIT(0, "heads", , N_("limit to heads"), REF_HEADS),
OPT_BIT(0, "refs", , N_("do not show peeled tags"), 
REF_NORMAL),
OPT_BOOL(0, "get-url", _url,
 N_("take url..insteadOf into account")),
-- 
2.13.2



Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-06-14 Thread Jeff King
On Tue, Jun 13, 2017 at 04:08:03PM -0700, Jonathan Nieder wrote:

> > +test_expect_success 'generate builtin list' '
> > +   git --list-builtins >builtins
> > +'
> > +
> > +while read builtin
> > +do
> > +   test_expect_success "$builtin can handle -h" '
> > +   test_expect_code 129 git $builtin -h >output 2>&1 &&
> > +   test_i18ngrep usage output
> > +   '
> > +done  
> I love this.  A few thoughts:
> 
> - I think the "generate builtin list" test should be marked as setup
>   so people know it can't be skipped.  I'll send a followup to do that.
> 
> - By the same token, if "generate builtin list" fails then these
>   commands afterward would fail in an unpleasant way.  Fortunately
>   that's straightforward to fix, too.

Be my guest if you want, but I don't think either of those is actually
worth spending any time on. It's a nice convenience when tests are
independent, but ultimately if any single test fails, the results of all
tests after it must be suspect. There are too many hidden dependencies
to treat it any other way.

So I have no problem with skipping tests while debugging for a quicker
turnaround, with the knowledge that what you're seeing _might_ be
slightly invalidated. But I don't think it's worth dumping developer
time into trying to make each block independent. That's what individual
scripts are for.

(I also think it's particularly worthless for this script; the whole
thing runs in ~500ms on my machine. It takes longer to type
GIT_TEST_SKIP).

> - This checks both stdout and stderr to verify that the usage appears
>   somewhere.  I would have expected commands to be consistent ---
>   should they always send usage to stdout, since the user requested it,
>   or to stderr, since that's what we have done historically?
> 
>   So I understand why this test is agnostic about that but I suspect
>   it's pointing to some inconsistency that could be fixed independently.

The difference is between parse-option's usage_with_options() and
usage(). The former sends "-h" to stdout and errors to stderr, which I
think is sensible. The latter only knows about stderr.

I think it would be reasonable to have follow the parse-options strategy
consistently. It will definitely take some tweaking, though. There are
lots of commands that respect "-h" only by hitting some default
unhandled case. So you'll not only have to have a stdout analog for
usage(), but you'll have to teach each command when to use each.

> - Do we plan to translate the "usage:" prefix some day?  I could go
>   both ways --- on one hand, having a predictable error:/usage:/fatal:
>   prefix may make life easier for e.g. GUI authors trying to show
>   different kinds of error messages in different ways, but on the other
>   hand, always using English for the prefix may be unfriendly to non
>   English speakers.
> 
>   In the past I had assumed it would never be translated but now I'm
>   not so sure.  What do you think?

I don't have an opinion either way. I know we don't translate it now,
but it has been discussed. I figured it couldn't hurt to use i18ngrep to
future-proof it.

> - Should we have something like this for non-builtins, too?

That would be nice, though you'll probably need some cooperation from
the Makefile to get the list. I was specifically worried about catching
the particular setup_git_directory() interaction with builtins here, so
it seemed like a good stopping point.

But as a general rule, making sure that everything responds sensibly to
"-h" seems like a good thing.

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-06-13 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> --- a/t/t0012-help.sh
> +++ b/t/t0012-help.sh
> @@ -49,4 +49,16 @@ test_expect_success "--help does not work for guides" "
>   test_i18ncmp expect actual
>  "
> 
> +test_expect_success 'generate builtin list' '
> + git --list-builtins >builtins
> +'
> +
> +while read builtin
> +do
> + test_expect_success "$builtin can handle -h" '
> + test_expect_code 129 git $builtin -h >output 2>&1 &&
> + test_i18ngrep usage output
> + '
> +done 


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-06-01 Thread Junio C Hamano
Jeff King  writes:

> If we don't mind a one-time pain, I think we can just convert the
> existing usage() to something more like usage_with_options(). The patch
> below does that (and teaches the latter to handle a NULL options field).
>
> The diff is ugly, and the conversion is mostly mechanical. But I think
> some sites can be improved. For example, look at the change in bundle.c,
> which now hands a set of strings rather than formatting the whole "or:"
> chain manually.
> ...
> With bonus points for actually describing each option (though at that
> point it may be as much work to just convert the thing to parse-options,
> which would also be fine with me).
>
> That's tedious work which would need attention paid to each individual
> command. So I'd probably do a single mechanical patch like this one
> (that keeps the output identical), and then let people fix each one up
> on top.

Yup.  Unlike my other hack, this does look more useful to me.  It
was kind of surprising that the change to parse-options is just a
single liner, but it's just "no input resulting in no output" ;-)

> diff --git a/git.c b/git.c
> index 1b8b7f51a..3496f8a23 100644
> --- a/git.c
> +++ b/git.c
> @@ -3,12 +3,14 @@
>  #include "help.h"
>  #include "run-command.h"
>  
> -const char git_usage_string[] =
> +const char *git_usage_string[] = {
>   "git [--version] [--help] [-C ] [-c name=value]\n"
>   "   [--exec-path[=]] [--html-path] [--man-path] 
> [--info-path]\n"
>   "   [-p | --paginate | --no-pager] [--no-replace-objects] 
> [--bare]\n"
>   "   [--git-dir=] [--work-tree=] 
> [--namespace=]\n"
> - "[]";
> + "[]",
> + NULL
> +};
>  
>  const char git_more_info_string[] =
>   N_("'git help -a' and 'git help -g' list available subcommands and 
> some\n"
> @@ -694,7 +696,7 @@ int cmd_main(int argc, const char **argv)
>   } else {
>   /* The user didn't specify a command; give them help */
>   commit_pager_choice();
> - printf("usage: %s\n\n", git_usage_string);
> + printf("usage: %s\n\n", git_usage_string[0]);
>   list_common_cmds_help();
>   printf("\n%s\n", _(git_more_info_string));
>   exit(1);


> diff --git a/parse-options.c b/parse-options.c
> index a23a1e67f..b0cc2a410 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -600,6 +600,9 @@ static int usage_with_options_internal(struct 
> parse_opt_ctx_t *ctx,
>   usagestr++;
>   }
>  
> + if (!opts)
> + return PARSE_OPT_HELP;
> +
>   if (opts->type != OPTION_GROUP)
>   fputc('\n', outfile);



Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-06-01 Thread Jeff King
On Thu, Jun 01, 2017 at 02:54:08PM +0900, Junio C Hamano wrote:

> Junio C Hamano <gits...@pobox.com> writes:
> 
> > For now, I will mix this in when queuing the whole thing in 'pu', as
> > I hate to push out something that does not even work for me to the
> > general public.
> > 
> > -- >8 --
> > Subject: [PATCH] diff- and log- family: handle "git cmd -h" early
> > ...
> 
> And then the check_help_option() thing may look like this.  
> 
> I am not proud of the way it "unifies" the two styles of usage
> strings, obviously.

Heh, I came up with a similar hack. It is pretty nasty.

If we don't mind a one-time pain, I think we can just convert the
existing usage() to something more like usage_with_options(). The patch
below does that (and teaches the latter to handle a NULL options field).

The diff is ugly, and the conversion is mostly mechanical. But I think
some sites can be improved. For example, look at the change in bundle.c,
which now hands a set of strings rather than formatting the whole "or:"
chain manually.

I think we could take this even further and break some of the really
long entries (e.g., index-pack) into:

  static const char *index_pack_usage[] = {
"git index-pack ",
"git index-pack --stdin",
"",
"-v",
"-o ",
etc...
NULL
  };

With bonus points for actually describing each option (though at that
point it may be as much work to just convert the thing to parse-options,
which would also be fine with me).

That's tedious work which would need attention paid to each individual
command. So I'd probably do a single mechanical patch like this one
(that keeps the output identical), and then let people fix each one up
on top.

> One benefit this patch has is that it makes it easier to highlight
> what it does *not* touch.
> 
> $ git grep -A2 -E -e 'a(rg)?c [!=]= 2 .*strcmp.*-h'
> 
> shows there are somewhat curious construct
> 
>   if (argc != 2 || !strcmp(argv[1], "-h"))
>   usage(...);
> 
> left in the code.  Upon closer inspection, they all happen to be
> doing the right thing for their current set of options and
> arguments, but they are somewhat ugly.

Yeah, I noticed those too while looking for prior art in my "-h" series.
I think they're all sane. And in fact some of them are necessary (and I
think I even added one like what you quoted above) because the arguments
are not otherwise parsed.

So for something that calls parse_options() later, just:

  if (argc == 2 && !strcmp(argv[1], "-h"))
usage(...);

is sufficient to catch the single funny case, and parse_options()
catches the rest. But for some commands, this is all the parsing they
do. We could split that in two, like:

  /* early "-h" check */
  if (argc == 2 && !strcmp(argv[1], "-h"))
usage(...);
  /* now check for argument sanity */
  if (argc != 2)
usage(...);

That would make sense if we wanted to treat "-h" different from a normal
"huh?" response, but we don't currently. I don't know that it's worth
worrying about too much either way.

Anyway, here's the usage[] patch. I based it on the jk/consistent-h
branch, since I added a few new calls there. I won't be surprised if I
missed a NULL terminator somewhere, but it at least passes the big "-h"
loop in t0012. :)

---
diff --git a/builtin.h b/builtin.h
index 9e4a89816..c31fac64f 100644
--- a/builtin.h
+++ b/builtin.h
@@ -8,7 +8,7 @@
 
 #define DEFAULT_MERGE_LOG_LEN 20
 
-extern const char git_usage_string[];
+extern const char *git_usage_string[];
 extern const char git_more_info_string[];
 
 #define PRUNE_PACKED_DRY_RUN 01
diff --git a/builtin/blame.c b/builtin/blame.c
index 1043e5376..58cc1e8e8 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -30,10 +30,8 @@
 #include "dir.h"
 #include "progress.h"
 
-static char blame_usage[] = N_("git blame [] [] [] 
[--] ");
-
 static const char *blame_opt_usage[] = {
-   blame_usage,
+   N_("git blame [] [] [] [--] "),
"",
N_(" are documented in git-rev-list(1)"),
NULL
@@ -2865,7 +2863,7 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
if (parse_range_arg(range_list.items[range_i].string,
nth_line_cb, , lno, anchor,
, , sb.path))
-   usage(blame_usage);
+   usage(blame_opt_usage);
if (lno < top || ((lno || bottom) && lno < bottom))
die(Q_("file %s has only %lu line",
   "file %s has only %lu lines",
diff --git a/builtin/bu

Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-06-01 Thread Jeff King
On Thu, Jun 01, 2017 at 02:42:13PM +0900, Junio C Hamano wrote:

> For now, I will mix this in when queuing the whole thing in 'pu', as
> I hate to push out something that does not even work for me to the
> general public.
> 
> -- >8 --
> Subject: [PATCH] diff- and log- family: handle "git cmd -h" early

Yeah, I think this is a good step in the interim.

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-31 Thread Junio C Hamano
Junio C Hamano <gits...@pobox.com> writes:

> For now, I will mix this in when queuing the whole thing in 'pu', as
> I hate to push out something that does not even work for me to the
> general public.
> 
> -- >8 --
> Subject: [PATCH] diff- and log- family: handle "git cmd -h" early
> ...

And then the check_help_option() thing may look like this.  

I am not proud of the way it "unifies" the two styles of usage
strings, obviously.

One benefit this patch has is that it makes it easier to highlight
what it does *not* touch.

    $ git grep -A2 -E -e 'a(rg)?c [!=]= 2 .*strcmp.*-h'

shows there are somewhat curious construct

if (argc != 2 || !strcmp(argv[1], "-h"))
usage(...);

left in the code.  Upon closer inspection, they all happen to be
doing the right thing for their current set of options and
arguments, but they are somewhat ugly.


 builtin/am.c   |  3 +--
 builtin/branch.c   |  3 +--
 builtin/check-ref-format.c |  3 +--
 builtin/checkout-index.c   |  7 ---
 builtin/commit.c   |  6 ++
 builtin/diff-files.c   |  3 +--
 builtin/diff-index.c   |  3 +--
 builtin/diff-tree.c|  3 +--
 builtin/gc.c   |  3 +--
 builtin/index-pack.c   |  3 +--
 builtin/ls-files.c |  3 +--
 builtin/merge-ours.c   |  3 +--
 builtin/merge.c|  3 +--
 builtin/pack-redundant.c   |  3 +--
 builtin/rev-list.c |  3 +--
 builtin/update-index.c |  3 +--
 builtin/upload-archive.c   |  3 +--
 fast-import.c  |  3 +--
 git-compat-util.h  |  3 +++
 usage.c| 11 +++
 20 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 8881d73615..12b7298907 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2307,8 +2307,7 @@ int cmd_am(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-   usage_with_options(usage, options);
+   check_help_option(argc, argv, usage, options);
 
git_config(git_am_config, NULL);
 
diff --git a/builtin/branch.c b/builtin/branch.c
index 83fcda43dc..8c4465f0e4 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -599,8 +599,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
filter.kind = FILTER_REFS_BRANCHES;
filter.abbrev = -1;
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-   usage_with_options(builtin_branch_usage, options);
+   check_help_option(argc, argv, builtin_branch_usage, options);
 
git_config(git_branch_config, NULL);
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index eac499450f..aab5776dd5 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -55,8 +55,7 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
    int flags = 0;
const char *refname;
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-   usage(builtin_check_ref_format_usage);
+   check_help_option(argc, argv, builtin_check_ref_format_usage, NULL);
 
if (argc == 3 && !strcmp(argv[1], "--branch"))
return check_ref_format_branch(argv[2]);
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 07631d0c9c..8dd28ae8ba 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -179,9 +179,10 @@ int cmd_checkout_index(int argc, const char **argv, const 
char *prefix)
OPT_END()
};
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-   usage_with_options(builtin_checkout_index_usage,
-  builtin_checkout_index_options);
+   check_help_option(argc, argv,
+ builtin_checkout_index_usage,
+ builtin_checkout_index_options);
+
git_config(git_default_config, NULL);
prefix_length = prefix ? strlen(prefix) : 0;
 
diff --git a/builtin/commit.c b/builtin/commit.c
index 66c9ac587b..05c2f61e33 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1376,8 +1376,7 @@ int cmd_status(int argc, const char **argv, const char 
*prefix)
OPT_END(),
};
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-   usage_with_options(builtin_status_usage, 
builtin_status_options);
+   check_help_option(argc, argv, builtin_status_usage, 
builtin_status_options);
 
status_init_config(, git_status_config);
argc = parse_options(argc, argv, prefix,
@@ -1661,8 +1660,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
 
-   if (argc == 2 && !strcmp(argv[1], "-h"))
-  

Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-31 Thread Junio C Hamano
Junio C Hamano <gits...@pobox.com> writes:

> Heh, I found another ;-)  
>
> 95e98cd9 ("revision.c: use refs_for_each*() instead of
> for_each_*_submodule()", 2017-04-19), which is in the middle of
> Duy's nd/prune-in-worktree series, does this:
> ...
> when jk/consistent-h is merged into it and then "git diff-files -h"
> is run.
>
> I guess anything that calls setup_revisions() from the "git cmd -h"
> bypass need to be prepared with that
>
>   check_help_option(argc, argv, usage, options);
>
> thing.  Which is a bit sad, but I tend to agree with you that
> restructuring to make usage[] of everybody available to git.c
> is probably too noisy for the benefit it would give us.

For now, I will mix this in when queuing the whole thing in 'pu', as
I hate to push out something that does not even work for me to the
general public.

-- >8 --
Subject: [PATCH] diff- and log- family: handle "git cmd -h" early

"git $builtin -h" bypasses the usual repository setup and calls the
cmd_$builtin() function, expecting it to show the help text.

Unfortunately the commands in the log- and the diff- family want to
call into the revisions machinery, which by definition needs to have
a repository already discovered, before they can parse the options.

Handle the "git $builtin -h" special case very early in these
commands to work around potential issues.

Signed-off-by: Junio C Hamano <gits...@pobox.com>
---
 builtin/diff-files.c | 3 +++
 builtin/diff-index.c | 3 +++
 builtin/diff-tree.c  | 3 +++
 builtin/rev-list.c   | 3 +++
 4 files changed, 12 insertions(+)

diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 15c61fd8d1..6be1df684a 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -20,6 +20,9 @@ int cmd_diff_files(int argc, const char **argv, const char 
*prefix)
int result;
unsigned options = 0;
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(diff_files_usage);
+
init_revisions(, prefix);
gitmodules_config();
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 1af373d002..02dd35ba45 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -17,6 +17,9 @@ int cmd_diff_index(int argc, const char **argv, const char 
*prefix)
int i;
int result;
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(diff_cache_usage);
+
init_revisions(, prefix);
gitmodules_config();
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 5ea1c14317..f633b10b08 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -105,6 +105,9 @@ int cmd_diff_tree(int argc, const char **argv, const char 
*prefix)
struct setup_revision_opt s_r_opt;
int read_stdin = 0;
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(diff_tree_usage);
+
init_revisions(opt, prefix);
gitmodules_config();
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 718c6059c9..b250c515b1 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -277,6 +277,9 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
int use_bitmap_index = 0;
const char *show_progress = NULL;
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(rev_list_usage);
+
git_config(git_default_config, NULL);
init_revisions(, prefix);
revs.abbrev = DEFAULT_ABBREV;
-- 
2.13.0-513-g1c0955652f



Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-31 Thread Jeff King
On Thu, Jun 01, 2017 at 01:17:55PM +0900, Junio C Hamano wrote:

> Jeff King <p...@peff.net> writes:
> 
> > Anyway, the problem is sk/dash-is-previous, specifically fc5684b47
> > (revision.c: args starting with "-" might be a revision, 2017-02-25). It
> > looks like the revision parser used to just bail on "-h", because
> > revision.c would say "I don't recognize this" and then cmd_rev_list()
> > would similarly say "I don't recognize this" and call usage(). But now
> > we actually try to read it as a ref, which obviously requires being
> > inside a repository.
> 
> Heh, I found another ;-)  
> 
> 95e98cd9 ("revision.c: use refs_for_each*() instead of
> for_each_*_submodule()", 2017-04-19), which is in the middle of
> Duy's nd/prune-in-worktree series, does this:

Hrm, yeah. The problem is that handle_revision_pseudo_opt() initializes
the ref store at the top of the function, even if we don't see any
arguments that require us to use it (and obviously in the "-h" case, we
don't).

That's an implementation detail that we could fix, but I do think in
general that we should probably just declare it forbidden to call
setup_revisions() when the repo hasn't been discovered.

> I guess anything that calls setup_revisions() from the "git cmd -h"
> bypass need to be prepared with that
> 
>   check_help_option(argc, argv, usage, options);
> 
> thing.  Which is a bit sad, but I tend to agree with you that
> restructuring to make usage[] of everybody available to git.c
> is probably too noisy for the benefit it would give us.

The other options are:

  - reverting the "-h" magic in git.c. It really is the source of most
of this confusion, I think, because functions which assume RUN_SETUP
are having that assumption broken. But at the same time I do think
it makes "-h" a lot friendlier, and I'd prefer to keep it.

  - reverting the BUG() in setup_git_env(); this has been flushing out a
lot of bugs, and I think is worth keeping

I did look at writing something like check_help_option(). One of the
annoyances is that we have two different usage formats: one that's a
straight string for usage(), and one that's an array-of-strings for
parse_options(). We could probably unify those.

It doesn't actually save that much code, though. The real value is that
it abstracts the "did git.c decide to skip RUN_SETUP?" logic.

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-31 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> Anyway, the problem is sk/dash-is-previous, specifically fc5684b47
> (revision.c: args starting with "-" might be a revision, 2017-02-25). It
> looks like the revision parser used to just bail on "-h", because
> revision.c would say "I don't recognize this" and then cmd_rev_list()
> would similarly say "I don't recognize this" and call usage(). But now
> we actually try to read it as a ref, which obviously requires being
> inside a repository.

Heh, I found another ;-)  

95e98cd9 ("revision.c: use refs_for_each*() instead of
for_each_*_submodule()", 2017-04-19), which is in the middle of
Duy's nd/prune-in-worktree series, does this:

#0  die (err=0x6128f8 "BUG: setup_git_env called without repository") at 
usage.c:114
#1  0x004f9467 in setup_git_env () at environment.c:172
#2  0x004f966c in get_git_dir () at environment.c:214
#3  0x0055113b in get_main_ref_store () at refs.c:1544
#4  0x00570ee0 in handle_revision_pseudo_opt (submodule=0x0, 
revs=0x7fffd6a0, argc=1, argv=0x7fffe180, flags=0x7fffc59c)
at revision.c:2110
#5  0x005716f5 in setup_revisions (argc=2, argv=0x7fffe178, 
revs=0x7fffd6a0, opt=0x0) at revision.c:2254
#6  0x0043074a in cmd_diff_files (argc=2, argv=0x7fffe178, 
prefix=0x0)
at builtin/diff-files.c:29
#7  0x00405907 in run_builtin (p=0x87ba00 <commands+672>, argc=2, 
argv=0x7fffe178) at git.c:376
#8  0x00405bb5 in handle_builtin (argc=2, argv=0x7fffe178) at 
git.c:584
#9  0x00405e04 in cmd_main (argc=2, argv=0x7fffe178) at git.c:683
#10 0x0000004a3364 in main (argc=2, argv=0x7fffe178) at common-main.c:43

when jk/consistent-h is merged into it and then "git diff-files -h"
is run.

I guess anything that calls setup_revisions() from the "git cmd -h"
bypass need to be prepared with that

  check_help_option(argc, argv, usage, options);

thing.  Which is a bit sad, but I tend to agree with you that
restructuring to make usage[] of everybody available to git.c
is probably too noisy for the benefit it would give us.




Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Junio C Hamano
Jeff King  writes:

> And the idea is that ranges like "-.." should work. TBH, I'm not sure
> how I feel about that, for exactly the reason that came up here: it
> makes it hard to syntactically differentiate the "-" shorthand from
> actual options. We do have @{-1} already for this purpose. I don't mind
> "-" as a shortcut for things like "git checkout -" or "git show -", but
> it feels like most of the benefit is lost when you're combining it with
> other operators.

All correct and that is why I haven't seriously considered merging
the topic further down (yet).  Things like -.. makes readers of the
commands go "Huh?", and "git tbdiff ..- -.." does not even work ;-)


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Jeff King
[+cc Siddharth, so quoting copiously]

On Tue, May 30, 2017 at 11:27:56AM -0400, Jeff King wrote:

> > Travis seems to be seeing the same failure.  Curiously, the topic by
> > itself passes for me; iow, pu fails, pu^2 doesn't fail.
> > 
> > git.git/pu$ ./git rev-list -h
> > BUG: environment.c:173: setup_git_env called without repository
> > Aborted (core dumped)
> > 
> > Hmph...
> 
> Ah, OK, I can reproduce when merged with pu. Bisecting it was tricky.
> To see the problem, you need both my new test _and_ b1ef400ee
> (setup_git_env: avoid blind fall-back to ".git", 2016-10-20). The latter
> is only in v2.13, so topics forked from v2.12 need that commit applied.
> 
> Anyway, the problem is sk/dash-is-previous, specifically fc5684b47
> (revision.c: args starting with "-" might be a revision, 2017-02-25). It
> looks like the revision parser used to just bail on "-h", because
> revision.c would say "I don't recognize this" and then cmd_rev_list()
> would similarly say "I don't recognize this" and call usage(). But now
> we actually try to read it as a ref, which obviously requires being
> inside a repository.
> 
> Normally that's OK, but because of the "-h doesn't set up the repo"
> thing from 99caeed05, we may not have setup the repo, and so looking up
> refs is forbidden. The fix is probably to have revision.c explicitly
> recognize "-h" and bail on it as an unknown option (it can't handle
> the flag itself because only the caller knows the full usage()).
> 
> I do wonder, though, if there's any other downside to trying to look up
> other options as revisions (at least it wastes time doing nonsense
> revision lookups on options known only to cmd_rev_list()).  I'm not sure
> why that commit passes everything starting with a dash as a possible
> revision, rather than just "-".
> 
> I.e.:
> 
> diff --git a/revision.c b/revision.c
> index 5470c33ac..1e26c3951 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -2233,7 +2233,14 @@ int setup_revisions(int argc, const char **argv, 
> struct rev_info *revs, struct s
>   }
>   if (opts < 0)
>   exit(128);
> - maybe_opt = 1;
> + if (arg[1]) {
> + /* arg is an unknown option */
> + argv[left++] = arg;
> + continue;
> + } else {
> + /* special token "-" */
> + maybe_opt = 1;
> + }
>   }
>  
>  
> 
> I don't see anything in the commit message, but I didn't dig in the
> mailing list.

I think this line of reasoning comes from

  
http://public-inbox.org/git/20170206181026.GA4010@ubuntu-512mb-blr1-01.localdomain/

And the idea is that ranges like "-.." should work. TBH, I'm not sure
how I feel about that, for exactly the reason that came up here: it
makes it hard to syntactically differentiate the "-" shorthand from
actual options. We do have @{-1} already for this purpose. I don't mind
"-" as a shortcut for things like "git checkout -" or "git show -", but
it feels like most of the benefit is lost when you're combining it with
other operators.

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Jeff King
On Tue, May 30, 2017 at 10:23:54PM +0900, Junio C Hamano wrote:

> Jeff King <p...@peff.net> writes:
> 
> > Nope, I have those patches directly on your e83352ef23, and it passes. I
> > wonder if there's something funny between our environments. What does
> > the failure look like for you?
> 
> Travis seems to be seeing the same failure.  Curiously, the topic by
> itself passes for me; iow, pu fails, pu^2 doesn't fail.
> 
> git.git/pu$ ./git rev-list -h
> BUG: environment.c:173: setup_git_env called without repository
> Aborted (core dumped)
> 
> Hmph...

Ah, OK, I can reproduce when merged with pu. Bisecting it was tricky.
To see the problem, you need both my new test _and_ b1ef400ee
(setup_git_env: avoid blind fall-back to ".git", 2016-10-20). The latter
is only in v2.13, so topics forked from v2.12 need that commit applied.

Anyway, the problem is sk/dash-is-previous, specifically fc5684b47
(revision.c: args starting with "-" might be a revision, 2017-02-25). It
looks like the revision parser used to just bail on "-h", because
revision.c would say "I don't recognize this" and then cmd_rev_list()
would similarly say "I don't recognize this" and call usage(). But now
we actually try to read it as a ref, which obviously requires being
inside a repository.

Normally that's OK, but because of the "-h doesn't set up the repo"
thing from 99caeed05, we may not have setup the repo, and so looking up
refs is forbidden. The fix is probably to have revision.c explicitly
recognize "-h" and bail on it as an unknown option (it can't handle
the flag itself because only the caller knows the full usage()).

I do wonder, though, if there's any other downside to trying to look up
other options as revisions (at least it wastes time doing nonsense
revision lookups on options known only to cmd_rev_list()).  I'm not sure
why that commit passes everything starting with a dash as a possible
revision, rather than just "-".

I.e.:

diff --git a/revision.c b/revision.c
index 5470c33ac..1e26c3951 100644
--- a/revision.c
+++ b/revision.c
@@ -2233,7 +2233,14 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
}
if (opts < 0)
exit(128);
-   maybe_opt = 1;
+   if (arg[1]) {
+   /* arg is an unknown option */
+   argv[left++] = arg;
+   continue;
+   } else {
+   /* special token "-" */
+   maybe_opt = 1;
+   }
}
 
 

I don't see anything in the commit message, but I didn't dig in the
mailing list. 


-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> Nope, I have those patches directly on your e83352ef23, and it passes. I
> wonder if there's something funny between our environments. What does
> the failure look like for you?

Travis seems to be seeing the same failure.  Curiously, the topic by
itself passes for me; iow, pu fails, pu^2 doesn't fail.

git.git/pu$ ./git rev-list -h
BUG: environment.c:173: setup_git_env called without repository
Aborted (core dumped)

Hmph...


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Jeff King
On Tue, May 30, 2017 at 03:08:25PM +0900, Junio C Hamano wrote:

> Jeff King <p...@peff.net> writes:
> 
> > On Tue, May 30, 2017 at 03:03:18PM +0900, Junio C Hamano wrote:
> >
> >> > +test_expect_success 'generate builtin list' '
> >> > +git --list-builtins >builtins
> >> > +'
> >> > +
> >> > +while read builtin
> >> > +do
> >> > +    test_expect_success "$builtin can handle -h" '
> >> > +test_expect_code 129 git $builtin -h >output 2>&1 &&
> >> > +test_i18ngrep usage output
> >> > +'
> >> > +done  >> > +
> >> 
> >> These still seem to need further tweaks?
> >> 
> >> diff-files
> >> diff-index
> >> diff-tree
> >> rev-list
> >
> > How so? They pass the test for me, and the output for a manual run looks
> > sane.
> 
> Am I missing some patches (I have these 8) outside the series,
> perhaps?

Nope, I have those patches directly on your e83352ef23, and it passes. I
wonder if there's something funny between our environments. What does
the failure look like for you?

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> On Tue, May 30, 2017 at 03:03:18PM +0900, Junio C Hamano wrote:
>
>> > +test_expect_success 'generate builtin list' '
>> > +  git --list-builtins >builtins
>> > +'
>> > +
>> > +while read builtin
>> > +do
>> > +  test_expect_success "$builtin can handle -h" '
>> > +  test_expect_code 129 git $builtin -h >output 2>&1 &&
>> > +  test_i18ngrep usage output
>> > +  '
>> > +done > > +
>> 
>> These still seem to need further tweaks?
>> 
>> diff-files
>> diff-index
>> diff-tree
>> rev-list
>
> How so? They pass the test for me, and the output for a manual run looks
> sane.

Am I missing some patches (I have these 8) outside the series,
perhaps?



Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Jeff King
On Tue, May 30, 2017 at 03:03:18PM +0900, Junio C Hamano wrote:

> > +test_expect_success 'generate builtin list' '
> > +   git --list-builtins >builtins
> > +'
> > +
> > +while read builtin
> > +do
> > +   test_expect_success "$builtin can handle -h" '
> > +   test_expect_code 129 git $builtin -h >output 2>&1 &&
> > +   test_i18ngrep usage output
> > +   '
> > +done  > +
> 
> These still seem to need further tweaks?
> 
> diff-files
> diff-index
> diff-tree
> rev-list

How so? They pass the test for me, and the output for a manual run looks
sane.

-Peff


Re: [PATCH 8/8] t0012: test "-h" with builtins

2017-05-30 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> This patch just tests that "git foo -h" works for every
> builtin, where we see a 129 exit code (the normal code for
> our usage() helper), and that the word "usage" appears in
> the output.
>
> Signed-off-by: Jeff King <p...@peff.net>
> ---
>  t/t0012-help.sh | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/t/t0012-help.sh b/t/t0012-help.sh
> index 8faba2e8b..487b92a5d 100755
> --- a/t/t0012-help.sh
> +++ b/t/t0012-help.sh
> @@ -49,4 +49,16 @@ test_expect_success "--help does not work for guides" "
>   test_i18ncmp expect actual
>  "
>  
> +test_expect_success 'generate builtin list' '
> + git --list-builtins >builtins
> +'
> +
> +while read builtin
> +do
> + test_expect_success "$builtin can handle -h" '
> + test_expect_code 129 git $builtin -h >output 2>&1 &&
> + test_i18ngrep usage output
> + '
> +done  +

These still seem to need further tweaks?

diff-files
diff-index
diff-tree
rev-list



Re: [PATCH 0/8] consistent "-h" handling in builtins

2017-05-29 Thread Junio C Hamano
The series was pretty straight-forward.  Thanks for working on this.


Re: [PATCH 1/8] am: handle "-h" argument earlier

2017-05-29 Thread Junio C Hamano
Jeff King <p...@peff.net> writes:

> We can't easily move that setup to after the parse_options()
> call; the point is to set up defaults that are overwritten
> by the option parsing. Instead, we'll detect the "-h" case
> early and show the usage then. This matches the behavior of
> other builtins which have a similar setup-ordering issue
> (e.g., git-branch).

Thanks.  And this structure of the series is very much appreciated.


> Signed-off-by: Jeff King <p...@peff.net>
> ---
>  builtin/am.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/builtin/am.c b/builtin/am.c
> index 0f63dcab1..5ee146bfb 100644
> --- a/builtin/am.c
> +++ b/builtin/am.c
> @@ -2311,6 +2311,9 @@ int cmd_am(int argc, const char **argv, const char 
> *prefix)
>   OPT_END()
>   };
>  
> + if (argc == 2 && !strcmp(argv[1], "-h"))
> + usage_with_options(usage, options);
> +
>   git_config(git_am_config, NULL);
>  
>   am_state_init();


[PATCH 8/8] t0012: test "-h" with builtins

2017-05-29 Thread Jeff King
Since commit 99caeed05 (Let 'git  -h' show usage
without a git dir, 2009-11-09), the git wrapper handles "-h"
specially, skipping any repository setup but still calling
the builtin's cmd_foo() function. This means that every
cmd_foo() must be ready to handle this case, but we don't
have any systematic tests. This led to "git am -h" being
broken for some time without anybody noticing.

This patch just tests that "git foo -h" works for every
builtin, where we see a 129 exit code (the normal code for
our usage() helper), and that the word "usage" appears in
the output.

Signed-off-by: Jeff King <p...@peff.net>
---
 t/t0012-help.sh | 12 
 1 file changed, 12 insertions(+)

diff --git a/t/t0012-help.sh b/t/t0012-help.sh
index 8faba2e8b..487b92a5d 100755
--- a/t/t0012-help.sh
+++ b/t/t0012-help.sh
@@ -49,4 +49,16 @@ test_expect_success "--help does not work for guides" "
test_i18ncmp expect actual
 "
 
+test_expect_success 'generate builtin list' '
+   git --list-builtins >builtins
+'
+
+while read builtin
+do
+   test_expect_success "$builtin can handle -h" '
+   test_expect_code 129 git $builtin -h >output 2>&1 &&
+   test_i18ngrep usage output
+   '
+done 

[PATCH 5/8] submodule--helper: show usage for "-h"

2017-05-29 Thread Jeff King
Normal users shouldn't ever call submodule--helper, but it
doesn't hurt to give them a normal usage message if they try
"-h".

Signed-off-by: Jeff King <p...@peff.net>
---
The usage message isn't that helpful _either_, and I admit
my ulterior motive is just to make the test at the end of
this series pass. :)

I was tempted to actually dump the names from the commands
array to stdout, but this command really is an internal
helper. It's probably not worth spending time on such
niceties.

 builtin/submodule--helper.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 566a5b6a6..a78b003c2 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1222,9 +1222,8 @@ static struct cmd_struct commands[] = {
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
 {
int i;
-   if (argc < 2)
-   die(_("submodule--helper subcommand must be "
- "called with a subcommand"));
+       if (argc < 2 || !strcmp(argv[1], "-h"))
+   usage("git submodule--helper ");
 
for (i = 0; i < ARRAY_SIZE(commands); i++) {
if (!strcmp(argv[1], commands[i].cmd)) {
-- 
2.13.0.613.g11c956365



[PATCH 3/8] upload-archive: handle "-h" option early

2017-05-29 Thread Jeff King
Normally upload-archive forks off upload-archive--writer to
do the real work, and relays any errors back over the
sideband channel. This is a good thing when the command is
properly invoked remotely via ssh or git-daemon. But it's
confusing to curious users who try "git upload-archive -h".

Let's catch this invocation early and give a real usage
message, rather than spewing "-h does not appear to be a git
repository" amidst packet-lines. The chance of a false
positive due to a real client asking for the repo "-h" is
quite small.

Likewise, we'll catch "-h" in upload-archive--writer. People
shouldn't be invoking it manually, but it doesn't hurt to
give a sane message if they do.

Signed-off-by: Jeff King <p...@peff.net>
---
 builtin/upload-archive.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index cde06977b..84532ae9a 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -22,7 +22,7 @@ int cmd_upload_archive_writer(int argc, const char **argv, 
const char *prefix)
struct argv_array sent_argv = ARGV_ARRAY_INIT;
const char *arg_cmd = "argument ";
 
-   if (argc != 2)
+   if (argc != 2 || !strcmp(argv[1], "-h"))
usage(upload_archive_usage);
 
if (!enter_repo(argv[1], 0))
@@ -76,6 +76,9 @@ int cmd_upload_archive(int argc, const char **argv, const 
char *prefix)
 {
struct child_process writer = { argv };
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage(upload_archive_usage);
+
/*
 * Set up sideband subprocess.
 *
-- 
2.13.0.613.g11c956365



[PATCH 1/8] am: handle "-h" argument earlier

2017-05-29 Thread Jeff King
If the user provides "-h" on the command line, then our
parse_options() invocation will show a usage message and
quit. But if "-h" is the only argument, the git wrapper
behaves specially: it ignores our RUN_SETUP flag and calls
cmd_am() without having done repository setup at all.  This
is due to 99caeed05 (Let 'git  -h' show usage
without a git dir, 2009-11-09).

Before cmd_am() calls parse_options(), though, it runs a few
other setup functions. One of these is am_state_init(),
which uses git_pathdup() to set up the default rebase-apply
path. But calling git_pathdup() when we haven't done
repository setup will fall back to using ".git". That's
mostly harmless (since we won't use the value anyway), but
is forbidden since b1ef400eec ("setup_git_env: avoid blind
fall-back to ".git"", 2016-10-20), and we now BUG().

We can't easily move that setup to after the parse_options()
call; the point is to set up defaults that are overwritten
by the option parsing. Instead, we'll detect the "-h" case
early and show the usage then. This matches the behavior of
other builtins which have a similar setup-ordering issue
(e.g., git-branch).

Signed-off-by: Jeff King <p...@peff.net>
---
 builtin/am.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/builtin/am.c b/builtin/am.c
index 0f63dcab1..5ee146bfb 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -2311,6 +2311,9 @@ int cmd_am(int argc, const char **argv, const char 
*prefix)
OPT_END()
};
 
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(usage, options);
+
git_config(git_am_config, NULL);
 
am_state_init();
-- 
2.13.0.613.g11c956365



[PATCH 0/8] consistent "-h" handling in builtins

2017-05-29 Thread Jeff King
On Mon, May 29, 2017 at 11:32:50AM -0400, Jeff King wrote:

> I'll try to put together patches in the next day or so. Comments welcome
> in the meantime.

So here they are. For those just joining us, the immediate problem is
that "git am -h" is broken (whether you're in a repo or not). That's
fixed by the first patch, which can go to "maint" separately (although
the rest are pretty unadventurous).

The rest of it cleans up "-h" handling for a variety of builtin
commands, and then adds a test to make sure they all behave sanely
(which unsurprisingly is how I found all the problems fixed by the
earlier patches).

  [1/8]: am: handle "-h" argument earlier
  [2/8]: credential: handle invalid arguments earlier
  [3/8]: upload-archive: handle "-h" option early
  [4/8]: remote-{ext,fd}: print usage message on invalid arguments
  [5/8]: submodule--helper: show usage for "-h"
  [6/8]: version: convert to parse-options
  [7/8]: git: add hidden --list-builtins option
  [8/8]: t0012: test "-h" with builtins

 builtin/am.c|  3 +++
 builtin/credential.c|  4 ++--
 builtin/remote-ext.c|  5 -
 builtin/remote-fd.c |  5 -
 builtin/submodule--helper.c |  5 ++---
 builtin/upload-archive.c|  5 -
 git.c   | 12 
 help.c  | 25 -
 t/t0012-help.sh | 12 
 9 files changed, 63 insertions(+), 13 deletions(-)

-Peff


Re: [PATCH v14 30/41] Move libified code from builtin/apply.c to apply.{c,h}

2016-09-06 Thread Stefan Beller
On Sun, Sep 4, 2016 at 1:18 PM, Christian Couder
<christian.cou...@gmail.com> wrote:
> As most of the apply code in builtin/apply.c has been libified by a number of
> previous commits, it can now be moved to apply.{c,h}, so that more code can
> use it.
>
> Helped-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> Helped-by: Ramsay Jones <ram...@ramsayjones.plus.com>
> Signed-off-by: Christian Couder <chrisc...@tuxfamily.org>
> ---
>  apply.c | 4731 ++
>  apply.h |   19 +
>  builtin/apply.c | 4733 
> +--
>  3 files changed, 4751 insertions(+), 4732 deletions(-)

So I wanted to review this patch, so I wrote a patch to review this patch. ;)
https://public-inbox.org/git/82367750-adea-6dee-198a-e39ac7a84...@gmail.com/T/#t

> +   if (!state->apply_in_reverse &&
> +   state->ws_error_action != nowarn_ws_error)
> +   check_whitespace(state, line, len, 
> patch->ws_rule);
> +   added++;
> +   newlines--;
> +   trailing = 0;
> +   break;
> +
> +   /*
> +* We allow "\ No newline at end of file". Depending
> + * on locale settings when the patch was produced we
> + * don't know what this line looks like. The only
> + * thing we do know is that it begins with "\ ".

The previous three lines are white space broken AFAICT. The seem to be
white space broken in the original location as well, so no need for a reroll
just for this. But in case you do a reroll, you may want to fix these
on the fly?

> +
> +int apply_option_parse_exclude(const struct option *opt,
> +  const char *arg, int unset)
> +
> +int apply_option_parse_include(const struct option *opt,
> +  const char *arg, int unset)
> +int apply_option_parse_p(const struct option *opt,
> +const char *arg,
> +int unset)

These three functions seem slightly different, not just moved.
Oh you removed the static key word!

Apart from the one minor nit and the removal of the static keyword,
the rest is just moved code.

Thanks,
Stefan


h

2016-09-06 Thread Chris B


Sent from my iPhone


Re: [PATCH v13 03/14] Move libified code from builtin/apply.c to apply.{c,h}

2016-09-01 Thread Christian Couder
On Wed, Aug 31, 2016 at 11:57 PM, Stefan Beller <sbel...@google.com> wrote:
> On Sat, Aug 27, 2016 at 11:45 AM, Christian Couder
> <christian.cou...@gmail.com> wrote:
>> As most of the apply code in builtin/apply.c has been libified by a number of
>> previous commits, it can now be moved to apply.{c,h}, so that more code can
>> use it.
>>
>> Helped-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
>> Helped-by: Ramsay Jones <ram...@ramsayjones.plus.com>
>> Signed-off-by: Christian Couder <chrisc...@tuxfamily.org>
>> ---
>>  apply.c | 4731 
>> ++
>>  apply.h |   19 +
>>  builtin/apply.c | 4733 
>> +--
>
> I deduce by roughly the same line count in the .c files, it is just
> moving files over.
> (On my todo list I have an idea how to make reviewing patches like this 
> easier.)

Yeah, at one point I used some special options to try to get a smaller
diff, but it got lost along the way.

>>  3 files changed, 4751 insertions(+), 4732 deletions(-)
>>
>> diff --git a/apply.c b/apply.c
>> index 2eac3e3..7b96130 100644
>> --- a/apply.c
>> +++ b/apply.c
>> @@ -1,5 +1,23 @@
>> +/*
>> + * apply.c
>> + *
>> + * Copyright (C) Linus Torvalds, 2005
>
> We're very inconsistent with the intellectual property log.
> Sometimes we have that at the top of a file, sometimes we don't
> and rather point at the git history to find out who touched the code.
> I'd rather use the history instead of having a bunch of copyright lines.
> So maybe consider to drop this introductory comment as a
> {preparatory, follow up} cleanup?
>
> (This is also a nit that doesn't require a reroll on its own)

Yeah, if people are ok with it I may od it in a follow up patch.


Re: [PATCH v13 03/14] Move libified code from builtin/apply.c to apply.{c,h}

2016-08-31 Thread Stefan Beller
On Sat, Aug 27, 2016 at 11:45 AM, Christian Couder
<christian.cou...@gmail.com> wrote:
> As most of the apply code in builtin/apply.c has been libified by a number of
> previous commits, it can now be moved to apply.{c,h}, so that more code can
> use it.
>
> Helped-by: Nguyễn Thái Ngọc Duy <pclo...@gmail.com>
> Helped-by: Ramsay Jones <ram...@ramsayjones.plus.com>
> Signed-off-by: Christian Couder <chrisc...@tuxfamily.org>
> ---
>  apply.c | 4731 ++
>  apply.h |   19 +
>  builtin/apply.c | 4733 
> +--

I deduce by roughly the same line count in the .c files, it is just
moving files over.
(On my todo list I have an idea how to make reviewing patches like this easier.)

>  3 files changed, 4751 insertions(+), 4732 deletions(-)
>
> diff --git a/apply.c b/apply.c
> index 2eac3e3..7b96130 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -1,5 +1,23 @@
> +/*
> + * apply.c
> + *
> + * Copyright (C) Linus Torvalds, 2005

We're very inconsistent with the intellectual property log.
Sometimes we have that at the top of a file, sometimes we don't
and rather point at the git history to find out who touched the code.
I'd rather use the history instead of having a bunch of copyright lines.
So maybe consider to drop this introductory comment as a
{preparatory, follow up} cleanup?

(This is also a nit that doesn't require a reroll on its own)


  1   2   3   >