Draft of Git Rev News edition 56

2019-10-23 Thread Christian Couder
Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-56.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/399

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition on Saturday
October 26th. Yeah this is a bit unusual as we usually try to publish
on Wednesdays, sorry about that!

Thanks,
Christian.


Re: [GSoC] Follow-up post

2019-10-22 Thread Christian Couder
Hi Matheus,

On Mon, Oct 21, 2019 at 7:14 PM Matheus Tavares Bernardino
 wrote:
>
> I wrote a small follow-up post to talk about the conclusion of my GSoC
> project. I believe the main remaining tasks are now finally complete
> :) If you would be interested in taking a look, the post is at
> https://matheustavares.gitlab.io/posts/gsoc-follow-ups

Thank you for the follow-up blog post!

It is a great explanation of the work you did to come up with the
current mt/threaded-grep-in-object-store!

Best,
Christian.


Re: Outreachy Winter 2019

2019-10-22 Thread Christian Couder
Hi Karina,

Please see my answer below.

On Mon, Oct 21, 2019 at 6:59 PM Karina Saucedo
 wrote:
>
> Hello, my name is Karina and I'm and Outreachy applicant.
> I´m interested in applying to the project 'Add did you mean hints´ and
> I was wondering how can I start contributing since there seem to be no
> issues on the github page. Thank you!

Thank you for your introduction email and for your interest in Git!

Emily posted some interesting information on the Git mailing list that
you can see in the archives there:

https://public-inbox.org/git/20191007203654.ga20...@google.com/

We require Outreachy (and GSoC) applicants to work on a microproject
before they can be selected. There are microproject suggestions in
Emily's email and the following discussions.

Unfortunately we only have a web page that we prepared for the last
Google Summer of Code:

https://git.github.io/SoC-2019-Microprojects/

So it might not be up to date but you can still find interesting
information on top of what Emily posted.

Don't hesitate to ask if you have any further questions.

Best,
Christian.


Re: [PATCH v2 9/9] pack-objects: improve partial packfile reuse

2019-10-19 Thread Christian Couder
Hi Philip,

On Sat, Oct 19, 2019 at 5:30 PM Philip Oakley  wrote:

> On 19/10/2019 11:35, Christian Couder wrote:

> > +static void write_reused_pack_one(size_t pos, struct hashfile *out,
> > +   struct pack_window **w_curs)
> > +{
> > + off_t offset, next, cur;
> > + enum object_type type;
> > + unsigned long size;
>
> Is this a mem_sized size or a counter for less that 4GiB items?

What I can see is that `&size` is passed as the last argument to
unpack_object_header() below. And unpack_object_header() is defined in
packfile.h like this:

int unpack_object_header(struct packed_git *, struct pack_window **,
off_t *, unsigned long *);

since at least 336226c259 (packfile.h: drop extern from function
declarations, 2019-04-05)

So fixing this, if it needs to be fixed, should probably be part of a
separate topic fixing unpack_object_header().

> > +
> > + offset = reuse_packfile->revindex[pos].offset;
> > + next = reuse_packfile->revindex[pos + 1].offset;
> >
> > - if (reuse_packfile_offset < 0)
> > - reuse_packfile_offset = reuse_packfile->pack_size - 
> > the_hash_algo->rawsz;
> > + record_reused_object(offset, offset - hashfile_total(out));
> >
> > - total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
> > + cur = offset;
> > + type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
> > + assert(type >= 0);

> > +static void try_partial_reuse(struct bitmap_index *bitmap_git,
> > +   size_t pos,
> > +   struct bitmap *reuse,
> > +   struct pack_window **w_curs)
> >   {
> > + struct revindex_entry *revidx;
> > + off_t offset;
> > + enum object_type type;
> > + unsigned long size;
>
> Is this mem_sized or a <4GiB size?

Again this `size` variable is passed as the last argument to
unpack_object_header() below.

...

> Apologies if these are dumb queries..

I think it's a valid concern, so it's certainly ok for you to ask
these questions.


Re: [PATCH v2 5/9] pack-bitmap: introduce bitmap_walk_contains()

2019-10-19 Thread Christian Couder
Hi Philip,

On Sat, Oct 19, 2019 at 5:25 PM Philip Oakley  wrote:
>
> Hi Christian,
> can I check one thing?

Yeah, sure! Thanks for taking a look at my patches!

> On 19/10/2019 11:35, Christian Couder wrote:

> > +int bitmap_walk_contains(struct bitmap_index *bitmap_git,
> > +  struct bitmap *bitmap, const struct object_id *oid)
> > +{
> > + int idx;
> Excuse my ignorance here...
>
> For the case on Windows (int/long 32 bit), is this return value
> guaranteed to be less than 2GiB, i.e. not a memory offset?
>
> I'm just thinking ahead to the resolution of the 4GiB file limit issue
> on Git-for-Windows (https://github.com/git-for-windows/git/pull/2179)

I understand your concern, unfortunately, below we have:

idx = bitmap_position(bitmap_git, oid);

and bitmap_position() returns an int at least since 3ae5fa0768
(pack-bitmap: remove bitmap_git global variable, 2018-06-07)

So I think the fix would be much more involved than just changing the
type of the idx variable. It would likely involve modifying
bitmap_position(), and thus would probably best be addressed in a
separate patch series.

> > +
> > + if (!bitmap)
> > + return 0;
> > +
> > + idx = bitmap_position(bitmap_git, oid);
> > + return idx >= 0 && bitmap_get(bitmap, idx);
> > +}


[PATCH v2 8/9] builtin/pack-objects: introduce obj_is_packed()

2019-10-19 Thread Christian Couder
From: Jeff King 

Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4fcfcf6097..08898331ef 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2553,6 +2553,11 @@ static void ll_find_deltas(struct object_entry **list, 
unsigned list_size,
free(p);
 }
 
+static int obj_is_packed(const struct object_id *oid)
+{
+   return !!packlist_find(&to_pack, oid);
+}
+
 static void add_tag_chain(const struct object_id *oid)
 {
struct tag *tag;
@@ -2564,7 +2569,7 @@ static void add_tag_chain(const struct object_id *oid)
 * it was included via bitmaps, we would not have parsed it
 * previously).
 */
-   if (packlist_find(&to_pack, oid))
+   if (obj_is_packed(oid))
return;
 
tag = lookup_tag(the_repository, oid);
@@ -2588,7 +2593,7 @@ static int add_ref_tag(const char *path, const struct 
object_id *oid, int flag,
 
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, &peeled)&& /* peelable? */
-   packlist_find(&to_pack, &peeled))  /* object packed? */
+   obj_is_packed(&peeled)) /* object packed? */
add_tag_chain(oid);
return 0;
 }
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 7/9] pack-objects: introduce pack.allowPackReuse

2019-10-19 Thread Christian Couder
From: Jeff King 

Let's make it possible to configure if we want pack reuse or not.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Documentation/config/pack.txt | 4 
 builtin/pack-objects.c| 8 +++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/pack.txt b/Documentation/config/pack.txt
index 1d66f0c992..58323a351f 100644
--- a/Documentation/config/pack.txt
+++ b/Documentation/config/pack.txt
@@ -27,6 +27,10 @@ Note that changing the compression level will not 
automatically recompress
 all existing objects. You can force recompression by passing the -F option
 to linkgit:git-repack[1].
 
+pack.allowPackReuse::
+   When true, which is the default, Git will try to reuse parts
+   of existing packfiles when preparing new packfiles.
+
 pack.island::
An extended regular expression configuring a set of delta
islands. See "DELTA ISLANDS" in linkgit:git-pack-objects[1]
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f2c2703090..4fcfcf6097 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -96,6 +96,7 @@ static off_t reuse_packfile_offset;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
+static int allow_pack_reuse = 1;
 static enum {
WRITE_BITMAP_FALSE = 0,
WRITE_BITMAP_QUIET,
@@ -2699,6 +2700,10 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
use_bitmap_index_default = git_config_bool(k, v);
return 0;
}
+   if (!strcmp(k, "pack.allowpackreuse")) {
+   allow_pack_reuse = git_config_bool(k, v);
+   return 0;
+   }
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
@@ -3030,7 +3035,8 @@ static void loosen_unused_packed_objects(void)
  */
 static int pack_options_allow_reuse(void)
 {
-   return pack_to_stdout &&
+   return allow_pack_reuse &&
+  pack_to_stdout &&
   allow_ofs_delta &&
   !ignore_packed_keep_on_disk &&
   !ignore_packed_keep_in_core &&
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 1/9] builtin/pack-objects: report reused packfile objects

2019-10-19 Thread Christian Couder
From: Jeff King 

To see when packfile reuse kicks in or not, it is useful to
show reused packfile objects statistics in the output of
upload-pack.

Helped-by: James Ramsay 
Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5876583220..f2c2703090 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3509,7 +3509,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-" reused %"PRIu32" (delta %"PRIu32")"),
-  written, written_delta, reused, reused_delta);
+" reused %"PRIu32" (delta %"PRIu32"),"
+" pack-reused %"PRIu32),
+  written, written_delta, reused, reused_delta,
+  reuse_packfile_objects);
return 0;
 }
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 9/9] pack-objects: improve partial packfile reuse

2019-10-19 Thread Christian Couder
From: Jeff King 

Let's store the chunks of the packfile that we reuse in
a dynamic array of `struct reused_chunk`, and let's use
a reuse_packfile_bitmap to speed up reusing parts of
packfiles.

The dynamic array of `struct reused_chunk` is useful
because we need to know not just the number of zero bits,
but the accumulated offset due to missing objects. So
without the array we'd end up having to walk over the
revindex for that set of objects. The array is
basically caching those accumulated offsets (for the
parts we _do_ include), so we don't have to compute them
repeatedly.

The old code just tried to dump a whole segment of the
pack verbatim. That's faster than the traditional way of
actually adding objects to the packing list, but it
didn't kick in very often. This new code is really going
for a middle ground: do _some_ per-object work, but way
less than we'd traditionally do.

For instance, packing torvalds/linux on GitHub servers
just now reused 6.5M objects, but only needed ~50k
chunks.

Additional checks are added in have_duplicate_entry()
and obj_is_packed() to avoid duplicate objects in the
reuse bitmap. It was probably buggy to not have such a
check before.

If a client both asks for a tag by sha1 and specifies
"include-tag", we may end up including the tag in the reuse
bitmap (due to the first thing), and then later adding it to
the packlist (due to the second). This results in duplicate
objects in the pack, which git chokes on. We should notice
that we are already including it when doing the include-tag
portion, and avoid adding it to the packlist.

The simplest place to fix this is right in add_ref_tag,
where we could avoid peeling the tag at all if we know that
we are already including it. However, I've pushed the check
instead into have_duplicate_entry(). This fixes not only
this case, but also means that we cannot have any similar
problems lurking in other code.

No tests, because git does not actually exhibit this "ask
for it and also include-tag" behavior. We do one or the
other on clone, depending on whether --single-branch is set.
However, libgit2 does both.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 214 -
 pack-bitmap.c  | 150 +
 pack-bitmap.h  |   3 +-
 3 files changed, 280 insertions(+), 87 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 08898331ef..f710f8dde3 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -92,7 +92,7 @@ static struct progress *progress_state;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
-static off_t reuse_packfile_offset;
+static struct bitmap *reuse_packfile_bitmap;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
@@ -785,57 +785,177 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
-static off_t write_reused_pack(struct hashfile *f)
+/*
+ * Record the offsets needed in our reused packfile chunks due to
+ * "gaps" where we omitted some objects.
+ */
+static struct reused_chunk {
+   off_t start;
+   off_t offset;
+} *reused_chunks;
+static int reused_chunks_nr;
+static int reused_chunks_alloc;
+
+static void record_reused_object(off_t where, off_t offset)
 {
-   unsigned char buffer[8192];
-   off_t to_write, total;
-   int fd;
+   if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == 
offset)
+   return;
 
-   if (!is_pack_valid(reuse_packfile))
-   die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
+   ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
+  reused_chunks_alloc);
+   reused_chunks[reused_chunks_nr].start = where;
+   reused_chunks[reused_chunks_nr].offset = offset;
+   reused_chunks_nr++;
+}
 
-   fd = git_open(reuse_packfile->pack_name);
-   if (fd < 0)
-   die_errno(_("unable to open packfile for reuse: %s"),
- reuse_packfile->pack_name);
+/*
+ * Binary search to find the chunk that "where" is in. Note
+ * that we're not looking for an exact match, just the first
+ * chunk that contains it (which implicitly ends at the start
+ * of the next chunk.
+ */
+static off_t find_reused_offset(off_t where)
+{
+   int lo = 0, hi = reused_chunks_nr;
+   while (lo < hi) {
+   int mi = lo + ((hi - lo) / 2);
+   if (where == reused_chunks[mi].start)
+   return reused_chunks[mi].offset;
+   if (where < reused_chunks[mi].start)
+   hi = mi;
+   else
+   lo = mi + 1;
+   }
 
-   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
-   die_errno(_("

[PATCH v2 3/9] ewah/bitmap: introduce bitmap_word_alloc()

2019-10-19 Thread Christian Couder
From: Jeff King 

In a following commit we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.

We will also always access at least one word for each bitmap,
so we want to make sure that at least one is always
allocated.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 ewah/bitmap.c | 13 +
 ewah/ewok.h   |  1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 52f1178db4..b5fed9621f 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -22,21 +22,26 @@
 #define EWAH_MASK(x) ((eword_t)1 << (x % BITS_IN_EWORD))
 #define EWAH_BLOCK(x) (x / BITS_IN_EWORD)
 
-struct bitmap *bitmap_new(void)
+struct bitmap *bitmap_word_alloc(size_t word_alloc)
 {
struct bitmap *bitmap = xmalloc(sizeof(struct bitmap));
-   bitmap->words = xcalloc(32, sizeof(eword_t));
-   bitmap->word_alloc = 32;
+   bitmap->words = xcalloc(word_alloc, sizeof(eword_t));
+   bitmap->word_alloc = word_alloc;
return bitmap;
 }
 
+struct bitmap *bitmap_new(void)
+{
+   return bitmap_word_alloc(32);
+}
+
 void bitmap_set(struct bitmap *self, size_t pos)
 {
size_t block = EWAH_BLOCK(pos);
 
if (block >= self->word_alloc) {
size_t old_size = self->word_alloc;
-   self->word_alloc = block * 2;
+   self->word_alloc = block ? block * 2 : 1;
REALLOC_ARRAY(self->words, self->word_alloc);
memset(self->words + old_size, 0x0,
(self->word_alloc - old_size) * sizeof(eword_t));
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 84b2a29faa..1b98b57c8b 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -172,6 +172,7 @@ struct bitmap {
 };
 
 struct bitmap *bitmap_new(void);
+struct bitmap *bitmap_word_alloc(size_t word_alloc);
 void bitmap_set(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 4/9] pack-bitmap: don't rely on bitmap_git->reuse_objects

2019-10-19 Thread Christian Couder
From: Jeff King 

We will no longer compute bitmap_git->reuse_objects in a
following commit, so we cannot rely on it anymore to
terminate the loop early; we have to iterate to the end.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 pack-bitmap.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index e07c798879..016d0319fc 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -622,7 +622,7 @@ static void show_objects_for_type(
enum object_type object_type,
show_reachable_fn show_reach)
 {
-   size_t pos = 0, i = 0;
+   size_t i = 0;
uint32_t offset;
 
struct ewah_iterator it;
@@ -630,13 +630,15 @@ static void show_objects_for_type(
 
struct bitmap *objects = bitmap_git->result;
 
-   if (bitmap_git->reuse_objects == bitmap_git->pack->num_objects)
-   return;
-
ewah_iterator_init(&it, type_filter);
 
-   while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+   for (i = 0; i < objects->word_alloc &&
+   ewah_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
+   size_t pos = (i * BITS_IN_EWORD);
+
+   if (!word)
+   continue;
 
for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
struct object_id oid;
@@ -648,9 +650,6 @@ static void show_objects_for_type(
 
offset += ewah_bit_ctz64(word >> offset);
 
-   if (pos + offset < bitmap_git->reuse_objects)
-   continue;
-
entry = &bitmap_git->pack->revindex[pos + offset];
nth_packed_object_oid(&oid, bitmap_git->pack, 
entry->nr);
 
@@ -659,9 +658,6 @@ static void show_objects_for_type(
 
show_reach(&oid, object_type, 0, hash, 
bitmap_git->pack, entry->offset);
}
-
-   pos += BITS_IN_EWORD;
-   i++;
}
 }
 
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 5/9] pack-bitmap: introduce bitmap_walk_contains()

2019-10-19 Thread Christian Couder
From: Jeff King 

We will use this helper function in a following commit to
tell us if an object is packed.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 pack-bitmap.c | 12 
 pack-bitmap.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 016d0319fc..8a51302a1a 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -826,6 +826,18 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index 
*bitmap_git,
return 0;
 }
 
+int bitmap_walk_contains(struct bitmap_index *bitmap_git,
+struct bitmap *bitmap, const struct object_id *oid)
+{
+   int idx;
+
+   if (!bitmap)
+   return 0;
+
+   idx = bitmap_position(bitmap_git, oid);
+   return idx >= 0 && bitmap_get(bitmap, idx);
+}
+
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 show_reachable_fn show_reachable)
 {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 466c5afa09..6ab6033dbe 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -3,6 +3,7 @@
 
 #include "ewah/ewok.h"
 #include "khash.h"
+#include "pack.h"
 #include "pack-objects.h"
 
 struct commit;
@@ -53,6 +54,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data 
*mapping,
 kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
+int bitmap_walk_contains(struct bitmap_index *,
+struct bitmap *bitmap, const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 6/9] csum-file: introduce hashfile_total()

2019-10-19 Thread Christian Couder
From: Jeff King 

We will need this helper function in a following commit
to give us total number of bytes fed to the hashfile so far.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 csum-file.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/csum-file.h b/csum-file.h
index a98b1eee53..f9cbd317fb 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -42,6 +42,15 @@ void hashflush(struct hashfile *f);
 void crc32_begin(struct hashfile *);
 uint32_t crc32_end(struct hashfile *);
 
+/*
+ * Returns the total number of bytes fed to the hashfile so far (including ones
+ * that have not been written out to the descriptor yet).
+ */
+static inline off_t hashfile_total(struct hashfile *f)
+{
+   return f->total + f->offset;
+}
+
 static inline void hashwrite_u8(struct hashfile *f, uint8_t data)
 {
hashwrite(f, &data, sizeof(data));
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 2/9] packfile: expose get_delta_base()

2019-10-19 Thread Christian Couder
From: Jeff King 

In a following commit get_delta_base() will be used outside
packfile.c, so let's make it non static and declare it in
packfile.h.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 packfile.c | 10 +-
 packfile.h |  3 +++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/packfile.c b/packfile.c
index 355066de17..81e66847bf 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1173,11 +1173,11 @@ const struct packed_git *has_packed_and_bad(struct 
repository *r,
return NULL;
 }
 
-static off_t get_delta_base(struct packed_git *p,
-   struct pack_window **w_curs,
-   off_t *curpos,
-   enum object_type type,
-   off_t delta_obj_offset)
+off_t get_delta_base(struct packed_git *p,
+struct pack_window **w_curs,
+off_t *curpos,
+enum object_type type,
+off_t delta_obj_offset)
 {
unsigned char *base_info = use_pack(p, w_curs, *curpos, NULL);
off_t base_offset;
diff --git a/packfile.h b/packfile.h
index fc7904ec81..ec536a4ae5 100644
--- a/packfile.h
+++ b/packfile.h
@@ -151,6 +151,9 @@ void *unpack_entry(struct repository *r, struct packed_git 
*, off_t, enum object
 unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned 
long len, enum object_type *type, unsigned long *sizep);
 unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, 
off_t);
 int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, 
unsigned long *);
+off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
+off_t *curpos, enum object_type type,
+off_t delta_obj_offset);
 
 void release_pack_memory(size_t);
 
-- 
2.24.0.rc0.9.gef620577e2



[PATCH v2 0/9] Rewrite packfile reuse code

2019-10-19 Thread Christian Couder
This patch series is rewriting the code that tries to reuse existing
packfiles.

The code in this patch series was written by GitHub, and Peff nicely
provided it in the following discussion:

https://public-inbox.org/git/3e56b0fd-ebe8-4057-a93a-16ebb09fb...@jramsay.com.au/

The first version of this patch series was also discussed:

https://public-inbox.org/git/20190913130226.7449-1-chrisc...@tuxfamily.org/

Thanks to the reviewers!

According to Peff this new code is a lot smarter than what it
replaces. It allows "holes" in the chunks of packfile to be reused,
and skips over them. It rewrites OFS_DELTA offsets as it goes to
account for the holes. So it's basically a linear walk over the
packfile, but with the important distinction that we don't add those
objects to the object_entry array, which makes them very lightweight
(especially in memory use, but they also aren't considered bases for
finding new deltas, etc). It seems like a good compromise between the
cost to serve a clone and the quality of the resulting packfile.

Changes since the previous patch series are the following:

  - Rebased onto current master (v2.24.0-rc0), which means that
conflicts due to dropping the last argument of packlist_find() are
resolved.

  - Replaced "in a following patch" with "in a following commit" in
commit messages.

  - Squashed previous patchs 3/10 and 4/10 into new patch 3/9 as
suggested by Peff and Jonathan.

  - Changed commit message of patch 3/9, so that the justification
should be correct now.

  - Also in patch 3/9, `block ? block * 2 : 1` is now used to compute
the number of words that should be allocated.

  - Changed commit message of patch 4/9 (previously 5/10), so that the
justification should be correct now.

  - Document pack.allowPackReuse in patch 7/9.

  - Also in patch 7/9 move using the `allow_pack_reuse` variable into
pack_options_allow_reuse() as suggested by Junio.

  - Improved commit message in patch 9/9 a lot by adding many comments
made by Peff about it.

  - In patch 9/9 removed `if (0) { ... }` code.

  - Also in patch 9/9 changed parameter name from
`struct bitmap **bitmap` to `struct bitmap **reuse_out` in
pack-bitmap.h as suggested by Jonathan.

I tried to use `git range-diff` to see if I could send its output to
compare this patch series with the previous one, but it looks like it
unfortunately thinks that new patch 7/9 is completely different than
previous patch 8/10, and also it shows a lot more changes from patch
10/10 to patch 9/9 than necessary for some reason. So I decided not to
send it, but if you want it anyway please tell me.

I have put Peff as the author of all the commits.

Jeff King (9):
  builtin/pack-objects: report reused packfile objects
  packfile: expose get_delta_base()
  ewah/bitmap: introduce bitmap_word_alloc()
  pack-bitmap: don't rely on bitmap_git->reuse_objects
  pack-bitmap: introduce bitmap_walk_contains()
  csum-file: introduce hashfile_total()
  pack-objects: introduce pack.allowPackReuse
  builtin/pack-objects: introduce obj_is_packed()
  pack-objects: improve partial packfile reuse

 Documentation/config/pack.txt |   4 +
 builtin/pack-objects.c| 235 +++---
 csum-file.h   |   9 ++
 ewah/bitmap.c |  13 +-
 ewah/ewok.h   |   1 +
 pack-bitmap.c | 178 +
 pack-bitmap.h |   6 +-
 packfile.c|  10 +-
 packfile.h|   3 +
 9 files changed, 349 insertions(+), 110 deletions(-)

-- 
2.24.0.rc0.9.gef620577e2



Re: [RFC PATCH 05/10] pack-bitmap: don't rely on bitmap_git->reuse_objects

2019-10-11 Thread Christian Couder
On Fri, Oct 11, 2019 at 1:44 AM Jonathan Tan  wrote:
>
> > As we now allocate 2 more words than necessary for each
> > bitmap to serve as marks telling us that we can stop
> > iterating over the words, we don't need to rely on
> > bitmap_git->reuse_objects to stop iterating over the words.
>
> As Peff states [1], this justification is probably incorrect as well.
> The actual justification seems to be that we will no longer compute
> reuse_objects (in a future patch), so we cannot rely on it anymore to
> terminate the loop early; we have to iterate to the end.
>
> [1] https://public-inbox.org/git/20191002155721.gd6...@sigill.intra.peff.net/

Ok, thanks! I will change the description.


Re: [RFC PATCH 04/10] ewah/bitmap: always allocate 2 more words

2019-10-11 Thread Christian Couder
On Fri, Oct 11, 2019 at 1:40 AM Jonathan Tan  wrote:
>
> > From: Jeff King 
> >
> > In a following patch we will allocate a variable number
> > of words in some bitmaps. When iterating over the words we
> > will need a mark to tell us when to stop iterating. Let's
> > always allocate 2 more words, that will always contain 0,
> > as that mark.
>
> [snip]
>
> >   if (block >= self->word_alloc) {
> >   size_t old_size = self->word_alloc;
> > - self->word_alloc = block * 2;
> > + self->word_alloc = (block + 1) * 2;
> >   REALLOC_ARRAY(self->words, self->word_alloc);
> >   memset(self->words + old_size, 0x0,
> >   (self->word_alloc - old_size) * sizeof(eword_t));
>
> This patch set was mentioned as needing more thorough review in "What's
> Cooking" [1], so I thought I'd give it a try.

Thanks!

> As Peff said [2], the
> justification in the commit message looks incorrect. He suggests that it
> is most likely because "block" might be 0 (which is possible because a
> previous patch eliminated the minimum of 32), which makes sense to me.

Ok I will try to come up with a better justification, though Peff said
that he would took another look at this series and I'd rather wait
until he has done that.

> In any case, the next patch does not use 0 as a sentinel mark. Iteration
> stops when word_alloc is reached anyway, and since this is a regular
> bitmap, 0 is a valid word and cannot be used as a sentinel. (Maybe 0 is
> a valid word in a compressed EWAH bitmap too...not sure about that.)

Yeah I misread this. Hopefully Peff can shed some light on this.

> I think this should be squashed with patch 3, adding to that commit
> message "since word_alloc might be 0, we need to change the growth
> function". (Or just make the minimum word_alloc be 1 or 32 or something
> positive, if that's possible.)

Yeah, thank you for the suggestion. I still wonder why 2 is added
instead of just 1 though.


Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse

2019-10-11 Thread Christian Couder
On Fri, Oct 11, 2019 at 1:59 AM Jonathan Tan  wrote:
>
> I'm going to start with pack-bitmap.h, then builtin/pack-objects.c.
>
> >  int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
> >  struct packed_git **packfile,
> > -uint32_t *entries, off_t *up_to);
> > +uint32_t *entries,
> > +struct bitmap **bitmap);
>
> Makes sense. The existing code determines if the given packfile is
> suitable, and if yes, the span of the packfile to use. With this patch,
> we resolve to use the given packfile, and want to compute which objects
> to use, storing the computation result in an uncompressed bitmap.
> (Resolving to use the given packfile is not that much different from
> existing behavior, as far as I know, since we currently only consider
> one packfile at most anyway.)
>
> So replacing the out param makes sense, although a more descriptive name
> than "bitmap" would be nice.

Yeah, in pack-bitmap.c this argument is called "reuse_out", so the
same name could be used in pack-bitmap.c too. I changed that on my
current version.

> > +/*
> > + * Record the offsets needed in our reused packfile chunks due to
> > + * "gaps" where we omitted some objects.
> > + */
> > +static struct reused_chunk {
> > + off_t start;
> > + off_t offset;
> > +} *reused_chunks;
> > +static int reused_chunks_nr;
> > +static int reused_chunks_alloc;
>
> This makes sense - offsets may be different when we omit objects from
> the packfile. I think this can be computed by calculating the number of
> zero bits between the current object's index and the nth object prior
> (where n is the offset) in the bitmap resulting from
> reuse_partial_packfile_from_bitmap() above, thus eliminating the need
> for this array, but I haven't tested it.

Thanks for the idea. I will let others comment on that though before
maybe trying to implement it. I think it could come as a further
optimization in a following patch if it makes things faster or reduce
memory usage.

> > + if (0) {
> > + off_t expected_size = cur - offset;
> > +
> > + if (len + ofs_len < expected_size) {
> > + unsigned max_pad = (len >= 4) ? 9 : 5;
> > + header[len - 1] |= 0x80;
> > + while (len < max_pad && len + ofs_len 
> > < expected_size)
> > + header[len++] = 0x80;
> > + header[len - 1] &= 0x7F;
> > + }
> > + }
>
> This if(0) should be deleted.

Yeah, good eyes. I removed it on my current version.

> > @@ -1002,6 +1132,10 @@ static int have_duplicate_entry(const struct 
> > object_id *oid,
> >  {
> >   struct object_entry *entry;
> >
> > + if (reuse_packfile_bitmap &&
> > + bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, oid))
> > + return 1;
>
> Hmm...why did we previously not need to check the reuse information, but
> we do now? I gave the code a cursory glance but couldn't find the
> answer.
>
> > @@ -2571,7 +2706,9 @@ static void ll_find_deltas(struct object_entry 
> > **list, unsigned list_size,
> >
> >  static int obj_is_packed(const struct object_id *oid)
> >  {
> > - return !!packlist_find(&to_pack, oid, NULL);
> > + return packlist_find(&to_pack, oid, NULL) ||
> > + (reuse_packfile_bitmap &&
> > +  bitmap_walk_contains(bitmap_git, reuse_packfile_bitmap, 
> > oid));
>
> Same question here - why do we need to check the reuse information?

Maybe a reuse bitmap makes it cheap enough to check that information?

Thank you for the review,
Christian.


Re: Raise your hand to Ack jk/code-of-conduct if your Ack fell thru cracks

2019-10-09 Thread Christian Couder
On Wed, Oct 9, 2019 at 2:20 AM Junio C Hamano  wrote:
>
> For reference, here is the CoC the patch wants to add (there is no
> such topic yet locally, nor a single patch that can be made into
> such a topic, so there isn't exactly something people can Ack on
> yet. So here is a "preview" of what we would see once such a series
> lands).

Acked-by: Christian Couder 

Thanks to everyone involved,
Christian.


Re: [RFC PATCH 10/10] pack-objects: improve partial packfile reuse

2019-10-02 Thread Christian Couder
On Thu, Oct 3, 2019 at 4:06 AM Junio C Hamano  wrote:
>
> Jeff King  writes:
>
> > Hmm, I see the early parts of this graduated to 'next'. I'm not sure
> > everything there is completely correct, though. E.g. I'm not sure of the
> > reasoning in df75281e78 (ewah/bitmap: always allocate 2 more words,
> > 2019-09-13).

Yeah, when I prepared the series I wondered why we allocate 2 more
words instead of just 1 more, but I forgot to ask that when sending
it.

> > I'm sorry for being so slow on giving it a more careful review. I was
> > traveling for work, then playing catch-up, and am now going on vacation.
> > So it might be a little while yet.
>
> Thanks for a status update.  I do not mind moving this topic much
> slower than other topics at all (if somebody is actively working on
> it, I do not even mind reverting the merge and requeuing an updated
> series, but I do not think that is the case here).

I think the series requires at least documenting pack.allowPackReuse
which is introduced in d35b73c5e9 (pack-objects: introduce
pack.allowPackReuse, 2019-09-13). I was planning to send an additional
patch to do that, but if you prefer I can add the documentation to the
same commit that introduce the config variable and resend everything.

> It would give me
> much more confidence in the topic if we can collectively promise
> ourselves that we'll give it a good review before we let it graduate
> to 'master'.

Yeah, a review from Peff could be especially insightful as the code
comes from GitHub.


Re: [PATCH v4] promisor-remote: skip move_to_tail when no-op

2019-09-30 Thread Christian Couder
On Tue, Oct 1, 2019 at 12:03 AM Emily Shaffer  wrote:
>
> Previously, when promisor_remote_move_to_tail() is called for a
> promisor_remote which is currently the final element in promisors, a
> cycle is created in the promisors linked list. This cycle leads to a
> double free later on in promisor_remote_clear() when the final element
> of the promisors list is removed: promisors is set to promisors->next (a
> no-op, as promisors->next == promisors); the previous value of promisors
> is free()'d; then the new value of promisors (which is equal to the
> previous value of promisors) is also free()'d. This double-free error
> was unrecoverable for the user without removing the filter or re-cloning
> the repo and hoping to miss this edge case.
>
> Now, when promisor_remote_move_to_tail() would be a no-op, just do a
> no-op. In cases of promisor_remote_move_to_tail() where r is not already
> at the tail of the list, it works as before.

Yeah, thank you Emily and Peff for finding and fixing this! The fix
and the test look good to me.


Number of Outreachy interns and co-mentors

2019-09-26 Thread Christian Couder
Hi Peff and everyone,

On https://www.outreachy.org/apply/project-selection/#git it looks
like we will only have 1 intern as the title of our section is "Git -
1 intern". I wonder if it's because only funding for 1 intern has been
secured or if there is another reason.

Also I am not sure how people can register that they are ok to
co-mentor one of the project, but it looks like the person who
registered a project can invite co-mentors. So I think it would be
nice if people, who would be ok to co-mentor, could tell which
projects they would be ok to co-mentor, so that we all know and that
they can be invited to co-mentor on the site.

It would be great if we could have 2 (co-)mentors per project and it
would likely help getting funding for all the interns.

Thanks,
Christian.


[ANNOUNCE] Git Rev News edition 55

2019-09-25 Thread Christian Couder
Hi everyone,

The 55th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2019/09/25/edition-55/

Thanks a lot to Emily Shaffer, James Ramsay, Pratik Karki and Thomas
Ferris who contributed this month!

Enjoy,
Christian, Jakub, Markus and Gabriel.

PS: An issue for the next edition is already opened and contributions
are welcome:
https://github.com/git/git.github.io/issues/399


Draft of Git Rev News edition 55

2019-09-23 Thread Christian Couder
Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-55.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/394

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

In this edition I try to cover a few more topics than usual, though
the articles are shorter. Feel free to improve on any article if you
think that there are important things that could be added to them.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition late on
Wednesday September 25th.

Thanks,
Christian.


Re: Git in Outreachy December 2019?

2019-09-23 Thread Christian Couder
On Mon, Sep 23, 2019 at 3:35 PM Emily Shaffer  wrote:
>
> On Fri, Sep 20, 2019 at 10:04:48AM -0700, Jonathan Tan wrote:
>
> > I'm new to Outreachy and programs like this, so does anyone have an
> > opinion on my draft proposal below? It does not have any immediate
> > user-facing benefit, but it does have a definite end point.
>
> I'd appreciate similar opinion if anybody has it - and I'd also really
> feel more comfortable with a co-mentor.

First as the deadline is tomorrow, I think it is important to submit
projects to Outreachy even if they are not perfect and even if we
would like a co-mentor (which is also my case by the way).

We can hopefully improve the projects after the deadline or perhaps
drop them if we cannot find enough co-mentors or if we don't agree
with the goal or find it too difficult.

> """
> "Did You Mean..?"
>
> There are some situations where it's fairly clear what a user meant to
> do, even though they did not do that thing correctly. For example, if a
> user runs `git commit` with tracked, modified, unstaged files in their
> worktree, but no staged files at all, it's fairly likely that they
> simply forgot to add the files they wanted. In this case, the error
> message is slightly obtuse:
>
> $ git commit
> On branch master
> Changes not staged for commit:
> modified:   foo.txt
>
> no changes added to commit
>
>
> Since we have an idea of what the user _meant_ to do, we can offer
> something more like:
>
> $ git commit
> On branch master
> Changes not staged for commit:
> modified:   foo.txt
>
> Stage listed changes and continue? [Y/n]
>
> While the above case is a good starting place, other similar cases can
> be added afterwards if time permits. These helper prompts should be
> enabled/disabled via a config option so that people who are used to
> their current workflow won't be impacted.
> """

I agree that it might help. There could be significant discussion
about what the UI should be though. For example maybe we could just
promote `git commit -p` in the tutorials instead of doing the above.
Or have a commit.patch config option if we haven't one already.

Thanks,
Christian.


Re: Git in Outreachy December 2019?

2019-09-23 Thread Christian Couder
On Mon, Sep 23, 2019 at 10:53 AM Jonathan Tan  wrote:
>
> > Prospective mentors need to sign up on that site, and should propose a
> > project they'd be willing to mentor.

Yeah, you are very welcome to sign up soon if you haven't already done
so as I think the deadline is really soon.

> > I'm happy to discuss possible projects if anybody has an idea but isn't
> > sure how to develop it into a proposal.
>
> I'm new to Outreachy and programs like this, so does anyone have an
> opinion on my draft proposal below? It does not have any immediate
> user-facing benefit, but it does have a definite end point.

No need for user-facing benefits. Refactoring or improving the code in
other useful ways are very good subjects (as I already said in my
reply to Emily and Dscho).

> Also let me know if an Outreachy proposal should have more detail, etc.
>
> Refactor "git index-pack" logic into library code
>
> Currently, whenever any Git code needs a pack to be indexed, it
> needs to spawn a new "git index-pack" process, passing command-line
> arguments and communicating with it using file descriptors (standard
> input and output), much like an end-user would if invoking "git
> index-pack" directly. Refactor the pack indexing logic into library
> code callable from other Git code, make "git index-pack" a thin
> wrapper around that library code, and (to demonstrate that the
> refactoring works) change fetch-pack.c to use the library code
> instead of spawning the "git index-pack" process.
>
> This allows the pack indexing code to communicate with its callers
> with the full power of C (structs, callbacks, etc.) instead of being
> restricted to command-line arguments and file descriptors. It also
> simplifies debugging in that there will no longer be 2
> inter-communicating processes to deal with, only 1.

I think this is really great, both the idea and the description! No
need for more details.

Thanks,
Christian.


Re: Git in Outreachy December 2019?

2019-09-17 Thread Christian Couder
Hi Emily and Dscho,

On Tue, Sep 17, 2019 at 1:28 PM Johannes Schindelin
 wrote:
> On Mon, 16 Sep 2019, Emily Shaffer wrote:
>
> > Jonathan Tan, Jonathan Nieder, Josh Steadmon and I met on Friday to
> > talk about projects and we came up with a trimmed list; not sure what
> > more needs to be done to make them into fully-fledged proposals.
>
> Thank you for doing this!

Yeah, great!

> > For starter microprojects, we came up with:
> >
> >  - cleanup a test script (although we need to identify particularly
> >which ones and what counts as "clean")
> >  - moving doc from documentation/technical/api-* to comments in the
> >appropriate header instead
> >  - teach a command which currently handles its own argv how to use
> >parse-options instead
> >  - add a user.timezone option which Git can use if present rather than
> >checking system local time
>
> Nice projects, all. There are a couple more ideas on
> https://github.com/gitgitgadget/git/issues, they could probably use some
> tagging.

Thanks! Maybe we should have a page with Outreachy microprojects on
https://git.github.io/

I will see if I find the time to create one soon with the above information.

> > For the longer projects, we came up with a few more:

[...]

> >- git-bisect.sh
>
> That would be my top recommendation, especially given how much effort
> Tanushree put in last winter to make this conversion to C so much more
> achievable than before.

I just added a project in the Outreachy system about it. I would have
added the link but Outreachy asks to not share the link publicly. I am
willing to co-mentor (or maybe mentor alone if no one else wants to
co-mentor) it. Anyone willing to co-mentor can register on the
outreachy website.

Thanks for the other suggestions by the way.

> Converting shell/Perl scripts into built-in C never looks as much fun as
> open-ended projects with lots of playing around, but the advantage of
> the former is that they can be easily structured, offer a lot of
> opportunity for learning, and they are ultimately more rewarding because
> the goals are much better defined than many other projects'.

I agree. Outreachy also suggest avoiding projects that have to be
discussed a lot or are not clear enough.

> >  - reduce/eliminate use of fetch_if_missing global

I like this one!

> > It might make sense to only focus on scoping the ones we feel most
> > interested in. We came up with a pretty big list because we had some
> > other programs in mind, so I suppose it's not necessary to develop all
> > of them for this program.

I agree as I don't think we will have enough mentors or co-mentors for
a big number of projects.

Best,
Christian.


[RFC PATCH 00/10] Rewrite packfile reuse code

2019-09-13 Thread Christian Couder
This patch series is rewriting the code that tries to reuse existing
packfiles.

The code in this patch series was written by GitHub and Peff nicely
provided it in the following discussion:

https://public-inbox.org/git/3e56b0fd-ebe8-4057-a93a-16ebb09fb...@jramsay.com.au/

This is an RFC patch series that mostly for now just tries to split
the code into separate commits. If this split is considered ok, then
commit messages will be improved and some doc will be added
(especially doc for pack.allowPackReuse). Perhaps performance test
results will also be provided.

Most of the changes are in the last patch (10/10) and I haven't found
a good way to split them into several patches. Ideas are welcome. In
each of the other preparatory patches there is a small change that
might make sense separately.

According to Peff this new code is a lot smarter than what it
replaces. It allows "holes" in the chunks of packfile to be reused,
and skips over them. It rewrites OFS_DELTA offsets as it goes to
account for the holes. So it's basically a linear walk over the
packfile, but with the important distinction that we don't add those
objects to the object_entry array, which makes them very lightweight
(especially in memory use, but they also aren't considered bases for
finding new deltas, etc). It seems like a good compromise between the
cost to serve a clone and the quality of the resulting packfile.

I have put Peff as the author of all the commits.

Jeff King (10):
  builtin/pack-objects: report reused packfile objects
  packfile: expose get_delta_base()
  ewah/bitmap: introduce bitmap_word_alloc()
  ewah/bitmap: always allocate 2 more words
  pack-bitmap: don't rely on bitmap_git->reuse_objects
  pack-bitmap: introduce bitmap_walk_contains()
  csum-file: introduce hashfile_total()
  pack-objects: introduce pack.allowPackReuse
  builtin/pack-objects: introduce obj_is_packed()
  pack-objects: improve partial packfile reuse

 builtin/pack-objects.c | 248 +
 csum-file.h|   9 ++
 ewah/bitmap.c  |  13 ++-
 ewah/ewok.h|   1 +
 pack-bitmap.c  | 178 -
 pack-bitmap.h  |   6 +-
 packfile.c |  10 +-
 packfile.h |   3 +
 8 files changed, 358 insertions(+), 110 deletions(-)

-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 09/10] builtin/pack-objects: introduce obj_is_packed()

2019-09-13 Thread Christian Couder
From: Jeff King 

Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 1664969c97..5ee0b3092d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2569,6 +2569,11 @@ static void ll_find_deltas(struct object_entry **list, 
unsigned list_size,
free(p);
 }
 
+static int obj_is_packed(const struct object_id *oid)
+{
+   return !!packlist_find(&to_pack, oid, NULL);
+}
+
 static void add_tag_chain(const struct object_id *oid)
 {
struct tag *tag;
@@ -2580,7 +2585,7 @@ static void add_tag_chain(const struct object_id *oid)
 * it was included via bitmaps, we would not have parsed it
 * previously).
 */
-   if (packlist_find(&to_pack, oid, NULL))
+   if (obj_is_packed(oid))
return;
 
tag = lookup_tag(the_repository, oid);
@@ -2604,7 +2609,7 @@ static int add_ref_tag(const char *path, const struct 
object_id *oid, int flag,
 
if (starts_with(path, "refs/tags/") && /* is a tag? */
!peel_ref(path, &peeled)&& /* peelable? */
-   packlist_find(&to_pack, &peeled, NULL))  /* object packed? */
+   obj_is_packed(&peeled)) /* object packed? */
add_tag_chain(oid);
return 0;
 }
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 04/10] ewah/bitmap: always allocate 2 more words

2019-09-13 Thread Christian Couder
From: Jeff King 

In a following patch we will allocate a variable number
of words in some bitmaps. When iterating over the words we
will need a mark to tell us when to stop iterating. Let's
always allocate 2 more words, that will always contain 0,
as that mark.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 ewah/bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 143dc71419..eac05485f1 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -41,7 +41,7 @@ void bitmap_set(struct bitmap *self, size_t pos)
 
if (block >= self->word_alloc) {
size_t old_size = self->word_alloc;
-   self->word_alloc = block * 2;
+   self->word_alloc = (block + 1) * 2;
REALLOC_ARRAY(self->words, self->word_alloc);
memset(self->words + old_size, 0x0,
(self->word_alloc - old_size) * sizeof(eword_t));
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 06/10] pack-bitmap: introduce bitmap_walk_contains()

2019-09-13 Thread Christian Couder
From: Jeff King 

We will use this helper function in a following patch to
tell us if an object is packed.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 pack-bitmap.c | 12 
 pack-bitmap.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b2422fed8f..1833971dc7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -828,6 +828,18 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index 
*bitmap_git,
return 0;
 }
 
+int bitmap_walk_contains(struct bitmap_index *bitmap_git,
+struct bitmap *bitmap, const struct object_id *oid)
+{
+   int idx;
+
+   if (!bitmap)
+   return 0;
+
+   idx = bitmap_position(bitmap_git, oid);
+   return idx >= 0 && bitmap_get(bitmap, idx);
+}
+
 void traverse_bitmap_commit_list(struct bitmap_index *bitmap_git,
 show_reachable_fn show_reachable)
 {
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 00de3ec8e4..5425767f0f 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -3,6 +3,7 @@
 
 #include "ewah/ewok.h"
 #include "khash.h"
+#include "pack.h"
 #include "pack-objects.h"
 
 struct commit;
@@ -53,6 +54,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *,
 int rebuild_existing_bitmaps(struct bitmap_index *, struct packing_data 
*mapping,
 kh_oid_map_t *reused_bitmaps, int show_progress);
 void free_bitmap_index(struct bitmap_index *);
+int bitmap_walk_contains(struct bitmap_index *,
+struct bitmap *bitmap, const struct object_id *oid);
 
 /*
  * After a traversal has been performed by prepare_bitmap_walk(), this can be
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 07/10] csum-file: introduce hashfile_total()

2019-09-13 Thread Christian Couder
From: Jeff King 

We will need this helper function in a following patch
to give us total number of bytes fed to the hashfile so far.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 csum-file.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/csum-file.h b/csum-file.h
index a98b1eee53..f9cbd317fb 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -42,6 +42,15 @@ void hashflush(struct hashfile *f);
 void crc32_begin(struct hashfile *);
 uint32_t crc32_end(struct hashfile *);
 
+/*
+ * Returns the total number of bytes fed to the hashfile so far (including ones
+ * that have not been written out to the descriptor yet).
+ */
+static inline off_t hashfile_total(struct hashfile *f)
+{
+   return f->total + f->offset;
+}
+
 static inline void hashwrite_u8(struct hashfile *f, uint8_t data)
 {
hashwrite(f, &data, sizeof(data));
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 02/10] packfile: expose get_delta_base()

2019-09-13 Thread Christian Couder
From: Jeff King 

In a following commit get_delta_base() will be used outside
packfile.c, so let's make it non static and declare it in
packfile.h.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 packfile.c | 10 +-
 packfile.h |  3 +++
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/packfile.c b/packfile.c
index fc43a6c52c..5a6e8d54f1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1191,11 +1191,11 @@ const struct packed_git *has_packed_and_bad(struct 
repository *r,
return NULL;
 }
 
-static off_t get_delta_base(struct packed_git *p,
-   struct pack_window **w_curs,
-   off_t *curpos,
-   enum object_type type,
-   off_t delta_obj_offset)
+off_t get_delta_base(struct packed_git *p,
+struct pack_window **w_curs,
+off_t *curpos,
+enum object_type type,
+off_t delta_obj_offset)
 {
unsigned char *base_info = use_pack(p, w_curs, *curpos, NULL);
off_t base_offset;
diff --git a/packfile.h b/packfile.h
index 3e98910bdd..8049202f4c 100644
--- a/packfile.h
+++ b/packfile.h
@@ -151,6 +151,9 @@ void *unpack_entry(struct repository *r, struct packed_git 
*, off_t, enum object
 unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned 
long len, enum object_type *type, unsigned long *sizep);
 unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, 
off_t);
 int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, 
unsigned long *);
+off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs,
+off_t *curpos, enum object_type type,
+off_t delta_obj_offset);
 
 void release_pack_memory(size_t);
 
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 05/10] pack-bitmap: don't rely on bitmap_git->reuse_objects

2019-09-13 Thread Christian Couder
From: Jeff King 

As we now allocate 2 more words than necessary for each
bitmap to serve as marks telling us that we can stop
iterating over the words, we don't need to rely on
bitmap_git->reuse_objects to stop iterating over the words.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 pack-bitmap.c | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ed2befaac6..b2422fed8f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -622,7 +622,7 @@ static void show_objects_for_type(
enum object_type object_type,
show_reachable_fn show_reach)
 {
-   size_t pos = 0, i = 0;
+   size_t i = 0;
uint32_t offset;
 
struct ewah_iterator it;
@@ -630,13 +630,15 @@ static void show_objects_for_type(
 
struct bitmap *objects = bitmap_git->result;
 
-   if (bitmap_git->reuse_objects == bitmap_git->pack->num_objects)
-   return;
-
ewah_iterator_init(&it, type_filter);
 
-   while (i < objects->word_alloc && ewah_iterator_next(&filter, &it)) {
+   for (i = 0; i < objects->word_alloc &&
+   ewah_iterator_next(&filter, &it); i++) {
eword_t word = objects->words[i] & filter;
+   size_t pos = (i * BITS_IN_EWORD);
+
+   if (!word)
+   continue;
 
for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
struct object_id oid;
@@ -648,9 +650,6 @@ static void show_objects_for_type(
 
offset += ewah_bit_ctz64(word >> offset);
 
-   if (pos + offset < bitmap_git->reuse_objects)
-   continue;
-
entry = &bitmap_git->pack->revindex[pos + offset];
nth_packed_object_oid(&oid, bitmap_git->pack, 
entry->nr);
 
@@ -659,9 +658,6 @@ static void show_objects_for_type(
 
show_reach(&oid, object_type, 0, hash, 
bitmap_git->pack, entry->offset);
}
-
-   pos += BITS_IN_EWORD;
-   i++;
}
 }
 
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 01/10] builtin/pack-objects: report reused packfile objects

2019-09-13 Thread Christian Couder
From: Jeff King 

To see when packfile reuse kicks in or not, it is useful to
show reused packfile objects statistics in the output of
upload-pack.

Helped-by: James Ramsay 
Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 76ce906946..c11b2ea8d4 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3525,7 +3525,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-" reused %"PRIu32" (delta %"PRIu32")"),
-  written, written_delta, reused, reused_delta);
+" reused %"PRIu32" (delta %"PRIu32"),"
+" pack-reused %"PRIu32),
+  written, written_delta, reused, reused_delta,
+  reuse_packfile_objects);
return 0;
 }
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 08/10] pack-objects: introduce pack.allowPackReuse

2019-09-13 Thread Christian Couder
From: Jeff King 

Let's make it possible to configure if we want pack reuse or not.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c11b2ea8d4..1664969c97 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -96,6 +96,7 @@ static off_t reuse_packfile_offset;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
+static int allow_pack_reuse = 1;
 static enum {
WRITE_BITMAP_FALSE = 0,
WRITE_BITMAP_QUIET,
@@ -2719,6 +2720,10 @@ static int git_pack_config(const char *k, const char *v, 
void *cb)
sparse = git_config_bool(k, v);
return 0;
}
+   if (!strcmp(k, "pack.allowpackreuse")) {
+   allow_pack_reuse = git_config_bool(k, v);
+   return 0;
+   }
if (!strcmp(k, "pack.threads")) {
delta_search_threads = git_config_int(k, v);
if (delta_search_threads < 0)
@@ -3063,7 +3068,8 @@ static int get_object_list_from_bitmap(struct rev_info 
*revs)
if (!(bitmap_git = prepare_bitmap_walk(revs)))
return -1;
 
-   if (pack_options_allow_reuse() &&
+   if (allow_pack_reuse &&
+   pack_options_allow_reuse() &&
!reuse_partial_packfile_from_bitmap(
bitmap_git,
&reuse_packfile,
-- 
2.23.0.46.gd213b4aca1.dirty



[RFC PATCH 10/10] pack-objects: improve partial packfile reuse

2019-09-13 Thread Christian Couder
From: Jeff King 

Let's store the chunks of the packfile that we reuse in
a dynamic array of `struct reused_chunk`, and let's use
a reuse_packfile_bitmap to speed up reusing parts of
packfiles.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 227 +
 pack-bitmap.c  | 150 +++
 pack-bitmap.h  |   3 +-
 3 files changed, 293 insertions(+), 87 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5ee0b3092d..6822ac1026 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -92,7 +92,7 @@ static struct progress *progress_state;
 
 static struct packed_git *reuse_packfile;
 static uint32_t reuse_packfile_objects;
-static off_t reuse_packfile_offset;
+static struct bitmap *reuse_packfile_bitmap;
 
 static int use_bitmap_index_default = 1;
 static int use_bitmap_index = -1;
@@ -785,57 +785,189 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
-static off_t write_reused_pack(struct hashfile *f)
+/*
+ * Record the offsets needed in our reused packfile chunks due to
+ * "gaps" where we omitted some objects.
+ */
+static struct reused_chunk {
+   off_t start;
+   off_t offset;
+} *reused_chunks;
+static int reused_chunks_nr;
+static int reused_chunks_alloc;
+
+static void record_reused_object(off_t where, off_t offset)
 {
-   unsigned char buffer[8192];
-   off_t to_write, total;
-   int fd;
+   if (reused_chunks_nr && reused_chunks[reused_chunks_nr-1].offset == 
offset)
+   return;
 
-   if (!is_pack_valid(reuse_packfile))
-   die(_("packfile is invalid: %s"), reuse_packfile->pack_name);
+   ALLOC_GROW(reused_chunks, reused_chunks_nr + 1,
+  reused_chunks_alloc);
+   reused_chunks[reused_chunks_nr].start = where;
+   reused_chunks[reused_chunks_nr].offset = offset;
+   reused_chunks_nr++;
+}
 
-   fd = git_open(reuse_packfile->pack_name);
-   if (fd < 0)
-   die_errno(_("unable to open packfile for reuse: %s"),
- reuse_packfile->pack_name);
+/*
+ * Binary search to find the chunk that "where" is in. Note
+ * that we're not looking for an exact match, just the first
+ * chunk that contains it (which implicitly ends at the start
+ * of the next chunk.
+ */
+static off_t find_reused_offset(off_t where)
+{
+   int lo = 0, hi = reused_chunks_nr;
+   while (lo < hi) {
+   int mi = lo + ((hi - lo) / 2);
+   if (where == reused_chunks[mi].start)
+   return reused_chunks[mi].offset;
+   if (where < reused_chunks[mi].start)
+   hi = mi;
+   else
+   lo = mi + 1;
+   }
 
-   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1)
-   die_errno(_("unable to seek in reused packfile"));
+   /*
+* The first chunk starts at zero, so we can't have gone below
+* there.
+*/
+   assert(lo);
+   return reused_chunks[lo-1].offset;
+}
 
-   if (reuse_packfile_offset < 0)
-   reuse_packfile_offset = reuse_packfile->pack_size - 
the_hash_algo->rawsz;
+static void write_reused_pack_one(size_t pos, struct hashfile *out,
+ struct pack_window **w_curs)
+{
+   off_t offset, next, cur;
+   enum object_type type;
+   unsigned long size;
 
-   total = to_write = reuse_packfile_offset - sizeof(struct pack_header);
+   offset = reuse_packfile->revindex[pos].offset;
+   next = reuse_packfile->revindex[pos + 1].offset;
 
-   while (to_write) {
-   int read_pack = xread(fd, buffer, sizeof(buffer));
+   record_reused_object(offset, offset - hashfile_total(out));
 
-   if (read_pack <= 0)
-   die_errno(_("unable to read from reused packfile"));
+   cur = offset;
+   type = unpack_object_header(reuse_packfile, w_curs, &cur, &size);
+   assert(type >= 0);
 
-   if (read_pack > to_write)
-   read_pack = to_write;
+   if (type == OBJ_OFS_DELTA) {
+   off_t base_offset;
+   off_t fixup;
+
+   unsigned char header[MAX_PACK_OBJECT_HEADER];
+   unsigned len;
+
+   base_offset = get_delta_base(reuse_packfile, w_curs, &cur, 
type, offset);
+   assert(base_offset != 0);
+
+   /* Convert to REF_DELTA if we must... */
+   if (!allow_ofs_delta) {
+   int base_pos = find_revindex_position(reuse_packfile, 
base_offset);
+   const unsigned char *base_sha1 =
+   nth_packed_object_sha1(reuse_packfile,
+ 

[RFC PATCH 03/10] ewah/bitmap: introduce bitmap_word_alloc()

2019-09-13 Thread Christian Couder
From: Jeff King 

In a following patch we will need to allocate a variable
number of bitmap words, instead of always 32, so let's add
bitmap_word_alloc() for this purpose.

Signed-off-by: Jeff King 
Signed-off-by: Christian Couder 
---
 ewah/bitmap.c | 11 ---
 ewah/ewok.h   |  1 +
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/ewah/bitmap.c b/ewah/bitmap.c
index 52f1178db4..143dc71419 100644
--- a/ewah/bitmap.c
+++ b/ewah/bitmap.c
@@ -22,14 +22,19 @@
 #define EWAH_MASK(x) ((eword_t)1 << (x % BITS_IN_EWORD))
 #define EWAH_BLOCK(x) (x / BITS_IN_EWORD)
 
-struct bitmap *bitmap_new(void)
+struct bitmap *bitmap_word_alloc(size_t word_alloc)
 {
struct bitmap *bitmap = xmalloc(sizeof(struct bitmap));
-   bitmap->words = xcalloc(32, sizeof(eword_t));
-   bitmap->word_alloc = 32;
+   bitmap->words = xcalloc(word_alloc, sizeof(eword_t));
+   bitmap->word_alloc = word_alloc;
return bitmap;
 }
 
+struct bitmap *bitmap_new(void)
+{
+   return bitmap_word_alloc(32);
+}
+
 void bitmap_set(struct bitmap *self, size_t pos)
 {
size_t block = EWAH_BLOCK(pos);
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 84b2a29faa..1b98b57c8b 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -172,6 +172,7 @@ struct bitmap {
 };
 
 struct bitmap *bitmap_new(void);
+struct bitmap *bitmap_word_alloc(size_t word_alloc);
 void bitmap_set(struct bitmap *self, size_t pos);
 int bitmap_get(struct bitmap *self, size_t pos);
 void bitmap_reset(struct bitmap *self);
-- 
2.23.0.46.gd213b4aca1.dirty



Re: Git in Outreachy December 2019?

2019-09-05 Thread Christian Couder
On Wed, Sep 4, 2019 at 9:41 PM Jeff King  wrote:
>
> Funding is still up in the air, but in the meantime I've tentatively
> signed us up (we have until the 24th to have the funding committed).
> Next we need mentors to submit projects, as well as first-time
> contribution micro-projects.

Great! Thanks for signing up and for all the information!


Re: Git in Outreachy December 2019?

2019-08-31 Thread Christian Couder
On Tue, Aug 27, 2019 at 7:17 AM Jeff King  wrote:
>
> Do we have interested mentors for the next round of Outreachy?

I am interested to co-mentor.

> The deadline for Git to apply to the program is September 5th. The
> deadline for mentors to have submitted project descriptions is September
> 24th. Intern applications would start on October 1st.
>
> If there are mentors who want to participate, I can handle the project
> application and can start asking around for funding.

That would be really nice, thank you!


[ANNOUNCE] Git Rev News edition 54

2019-08-21 Thread Christian Couder
Hi everyone,

The 54th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2019/08/21/edition-54/

Thanks a lot to Elijah Newren, Jeff Hostetler, Andrew Ardill and
Jean-Noël Avila who contributed this month!

Enjoy,
Christian, Jakub, Markus and Gabriel.


Re: [GSoC] My project blog

2019-08-20 Thread Christian Couder
Hi Matheus,

On Tue, Aug 20, 2019 at 1:28 PM Olga Telezhnaya
 wrote:
>
> вт, 20 авг. 2019 г. в 07:59, Matheus Tavares Bernardino
> :
> >
> > I just posted the penultimate report on my project:
> > https://matheustavares.gitlab.io/posts/going-for-a-too-big-step This
> > week I’ve been working on a v2 of threaded git-grep w/ parallel
> > inflation, to allow threads when grepping submodules. I also tried
> > some more optimizations along the way.
>
> Thank you for great blog post! You have done so many things, it is impressive.

Yeah, great blog post again!

> I absolutely agree with your plans to structure the job and split it
> to smaller pieces, it should definitely help. Moreover, if you want,
> you can make something like a documentation and delegate some parts.

I also agree with the above. I think it would be really nice for
example if we could have something in Documentation/technical/
describing how multi-threading in `git grep` works and how it could be
improved. (You can of course copy paste from your blog.)

About static function-scope variables I wonder if we could have one
day some .h and .c files with only thread safe functions in them.

> I hope you enjoyed this summer :) Thank you for your readiness to
> continue contributing after GSoC. It is not mandatory, you do not have
> to finish the project, but if you want - all the community will be so
> glad to see you as the contributor.

Thank you for your work and for considering continuing after the GSoC,
Christian.


Draft of Git Rev News edition 54

2019-08-19 Thread Christian Couder
Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-54.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/387

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition late on
Wednesday August 21st.

Thanks,
Christian.


Re: [GSoC] My project blog

2019-08-13 Thread Christian Couder
Hi Matheus,

On Tue, Aug 13, 2019 at 6:06 AM Matheus Tavares Bernardino
 wrote:
>
> I just posted a new update on my project:
> https://matheustavares.gitlab.io/posts/simplified-version-of-parallel-inflation

Great blog post as usual!

> This week I sent a simplified version of the series I was working on
> last week (which would still take some time to finish). And I'm
> working to improve it with support to --textconv and
> --recurse-submodules in the added optimizations. I've been also
> working on a patch to make git-grep stop adding submodules to the
> alternates list.

I don't know what refactoring you think is needed on your patch:

https://github.com/matheustavares/git/commit/05536a503b3315d36ad13276af7728adf674ed51

I think it's worth sending to the list with or without addressing the
small nits that I commented on.

Thanks,
Christian.


Re: [GSoC] My project blog

2019-08-06 Thread Christian Couder
Hi Matheus,

On Tue, Aug 6, 2019 at 4:54 AM Matheus Tavares Bernardino
 wrote:
>
> Here's my report from last week:
> https://matheustavares.gitlab.io/posts/week-11-wip-grep-protecting-textconv-and-submodules

Thank you for another great report!

> I'm working to protect the operations I left behind on the first
> version of the patchset[1]. And for that, I used a lot of the code Duy
> provided[2] me as an example in the early days of this project. The
> race conditions are now majorly gone, but the patches still need some
> refactoring and there are still some problems to overcome.

This looks good to me! I think you are now way past the point where
you know much more about this than me, but anyway the following might
give you some ideas or perhaps in some other ways help you:

  - About "Try to remove the obj_read_lock", which is the last point
on your TODO list, it doesn't look necessary before you submit a patch
series, unless there is no performance improvement without it. You
could, as far as I can tell, do it in a later patch series.

  - I guess " --recurse-submodules" is not used very often with git
grep. For example I never use it on the Git repo even when I perhaps
should. So if it would simplify things to just disable the
improvements you made when " --recurse-submodules" is used, then you
might consider doing that. You could later send another patch series
that would enable your improvements when " --recurse-submodules" is
used.

Thanks,
Christian.


Re: RFC - Git Developer Blog

2019-08-05 Thread Christian Couder
On Tue, Aug 6, 2019 at 5:35 AM Junio C Hamano  wrote:
>
> Emily Shaffer  writes:
>
> > In backchannels recently there has been some discussion about the idea
> > of a Git-project-blessed blog written by Git contributors, generally
> > covering usability tips or overviews of the internals of Git which the
> > general public tend to find confusing.
> > ...
> > The idea is that we could cover high level topics stringing together
> > multiple components or giving power user advice, which we can't really
> > do with the manpages.
> >
> > Thoughts?
>
> Interesting.
>
> I recall that I used to do the "Fun with ..." series back when I was
> more into use-case-exploration mode; writing those articles was fun,
> but it took a lot of time and quite an effort, so I stopped after
> writing enough.

Linux-Kongress (LK) was a conference organized by the German Unix User
Group from 1994 to 2010. They asked authors of accepted submissions
for the technical sessions to provide a paper for publication in the
conference proceedings. As I wanted to give a presentation at LK 2009,
I wrote a paper about git bisect and asked kind souls (like Junio) to
review it on the mailing list. A bit later it was included in the Git
documentation and is still there:

https://git-scm.com/docs/git-bisect-lk2009.html

The git bisect man page links to it, but I must say that it tends to
be a bit obsolete as it has not really been updated.

There are also "guides" that are part of the actual documentation and
can be listed using `git help -g`.

> Making it a group effort may help by allowing writers and reviewers
> to encourage each other.

When Git Rev News was started I thought that there could be such a
group effort to encourage each other to publish articles in it, but I
must say that outside the group of editors (currently Jakub, Markus,
Gabriel and me) it hasn't happened much.

Each month though there are a small number of people helping on
smaller things like short news, typos, releases, etc. And people who
are interviewed are doing a great job when they accept to be
interviewed.

Maybe it's also not clear that we could accept other kind of articles
than just articles focused on what happens on the mailing list. I
think we have generally tried to highlight articles by Git developers
that were published on their blogs or their company's blog though.

In any case if you or others would like to join the editor group to
focus on other kind of articles, or just to help a bit, you are most
welcome!

I would also be ok to change the form of Git Rev News if there is an
official blog. For example the "Discussions" section could become
something like "Featured articles" with links to articles on the blog.


Re: [GSoC][PATCH] grep: fix worktree case in submodules

2019-07-30 Thread Christian Couder
On Tue, Jul 30, 2019 at 10:04 PM Junio C Hamano  wrote:
>
> Matheus Tavares  writes:

> > @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct 
> > pathspec *pathspec,
> >   free(data);
> >   } else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> >   hit |= grep_submodule(opt, pathspec, &entry.oid,
> > -   base->buf, base->buf + tn_len);
> > +   base->buf, base->buf + tn_len,
> > +   1); /* ignored */
>
> The trailing comment is misleading.  In the context of reviewing
> this patch, we can probably tell it applies only to that "1", but
> if you read only the postimage, the "ignored" comment looks as if
> the call itself is somehow ignored by somebody unspecified.  It is
> not clear at all that it is only about the final parameter.
>
> If you must...
>
> hit |= grep_submodule(opt, pathspec, &entry.oid,
>   base->buf, base->buf + tn_len,
>   1 /* ignored */);

Yeah, I suggested adding an "/* ignored */" comment, but I was indeed
thinking about something like this.

> ... is a reasonable way to write it.


Re: Where do I send patches for git-gui?

2019-07-25 Thread Christian Couder
Hi Pratyush,

On Wed, Jul 24, 2019 at 11:43 PM Pratyush Yadav  wrote:
>
> I have a quick little feature to add to git-gui, and I'm wondering where
> should I discuss it and send patches. The git-gui repo [0] has no readme
> I can see that would point me in the right direction. Googling around
> didn't get me anything either.
>
> Should I send it here on this list or is it somewhere else?

It seems to me that people have been sending patches to git-gui to
this mailing list indeed.

According to the following discussions:

  - https://public-inbox.org/git/xmqqbm36w7hl@gitster-ct.c.googlers.com/
  - 
https://public-inbox.org/git/nycvar.qro.7.76.6.1905272135280.28...@tvgsbejvaqbjf.bet/

Pat Thoyts (in CC) used to be the git-gui maintainer but we haven't
been hearing from him for a long time, so we are looking for a new
maintainer.





> Also, is the project even actively maintained any more? The last commit
> was in 2017.


[ANNOUNCE] Git Rev News edition 53

2019-07-24 Thread Christian Couder
Hi everyone,

The 53rd edition of Git Rev News is now published:

  https://git.github.io/rev_news/2019/07/24/edition-53/

Thanks a lot to Jaime Rivas and Carlo Marcelo Arenas Belón who
contributed this month!

Enjoy,
Christian, Jakub, Markus and Gabriel.


Draft of Git Rev News edition 53

2019-07-22 Thread Christian Couder
Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-53.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/384

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition late on Wednesday
July 24th.

Thanks,
Christian.


Re: What's cooking in git.git (Jul 2019, #03; Fri, 12)

2019-07-17 Thread Christian Couder
On Mon, Jul 15, 2019 at 1:16 AM Matthew DeVore  wrote:
>
> On Fri, Jul 12, 2019 at 02:02:52PM -0700, Junio C Hamano wrote:
> > * md/list-objects-filter-combo (2019-06-28) 10 commits
> >  - list-objects-filter-options: make parser void
> >  - list-objects-filter-options: clean up use of ALLOC_GROW
> >  - list-objects-filter-options: allow mult. --filter
> >  - strbuf: give URL-encoding API a char predicate fn
> >  - list-objects-filter-options: make filter_spec a string_list
> >  - list-objects-filter-options: move error check up
> >  - list-objects-filter: implement composite filters
> >  - list-objects-filter-options: always supply *errbuf
> >  - list-objects-filter: put omits set in filter struct
> >  - list-objects-filter: encapsulate filter components
> >
> >  The list-objects-filter API (used to create a sparse/lazy clone)
> >  learned to take a combined filter specification.
> >
> >  There is a bit of interaction with cc/multi-promisor topic, whose
> >  conflict resolution I have no confidence in X-<.  Extra sets of
> >  eyes are appreciated.
>
> Sorry for the delay. I was on vacation and then catching up for a week after I
> got back. I uploaded a merged commit here:
>
> https://github.com/matvore/git/tree/filts
>
> And the merged file itself (only this one had conflicts) is here:
>
> https://github.com/matvore/git/blob/filts/list-objects-filter.c

The merge and the explanations behind it all look good to me.

Thanks for your work on this!
Christian.


Re: [GSoC] My project blog

2019-07-14 Thread Christian Couder
Hi Matheus,

On Sun, Jul 14, 2019 at 9:38 AM Matheus Tavares Bernardino
 wrote:
>
> I just posted a new update about my GSoC project here:
> https://matheustavares.gitlab.io/posts/week-8-a-working-parallel-inflation
> Please, feel free to leave any comments. I've finally solved the race
> condition problem and now we have a working parallel inflation :) But
> there's still some tasks to be done on it before I have a proper
> patchset.

Great that you solved the race condition!

I agree with the tasks you list in the next steps. I would even
suggest that, right after splitting the changes into logical patches,
you ask your mentors (Olga and me) to review it, and then send it as
an RFC patch series to the mailing list. When you do that though,
please list the other tasks and limitations of your patch series in
the cover letter.

Thanks,
Christian.


[RFC PATCH 4/5] test-oidmap: add 'get_all' subcommand

2019-07-07 Thread Christian Couder
Let's make it possible to test oidmap_get_next() by adding a 'get_all'
subcommand that calls oidmap_get() once first and then oidmap_get_next()
several times until there is no more entry with the same oid key.

Signed-off-by: Christian Couder 
---
 t/helper/test-oidmap.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
index c19b41aa4f..5c4897ce59 100644
--- a/t/helper/test-oidmap.c
+++ b/t/helper/test-oidmap.c
@@ -95,6 +95,23 @@ int cmd__oidmap(int argc, const char **argv)
/* print result */
puts(entry ? entry->name : "NULL");
 
+   } else if (!strcmp("get_all", cmd) && p1) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* lookup entry in oidmap */
+   entry = oidmap_get(&map, &oid);
+
+   /* print result */
+   puts(entry ? entry->name : "NULL");
+
+   if (entry)
+   while ((entry = oidmap_get_next(&map, entry)))
+   puts(entry ? entry->name : "NULL");
+
} else if (!strcmp("remove", cmd) && p1) {
 
if (get_oid(p1, &oid)) {
-- 
2.22.0.514.g3228928bce.dirty



[RFC PATCH 5/5] t0016: add 'add' and 'get_all' subcommand test

2019-07-07 Thread Christian Couder
Let's add a test case to test both the 'add' and 'get_all' subcommand
from "test-oidmap.c", and through them oidmap_add() and
oidmap_get_next() from "oidmap.{c,h}".

Signed-off-by: Christian Couder 
---
 t/t0016-oidmap.sh | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
index bbe719e950..1d3196d624 100755
--- a/t/t0016-oidmap.sh
+++ b/t/t0016-oidmap.sh
@@ -67,6 +67,32 @@ Unknown oid: invalidOid
 
 '
 
+test_expect_success 'add and get_all' '
+
+test_oidmap "add one 1
+add one un
+add two 2
+add two deux
+add three 3
+get_all two
+get_all four
+get_all invalidOid
+get_all three
+get_all one" "1
+un
+2
+deux
+3
+deux
+2
+NULL
+Unknown oid: invalidOid
+3
+un
+1"
+
+'
+
 test_expect_success 'remove' '
 
 test_oidmap "put one 1
-- 
2.22.0.514.g3228928bce.dirty



[RFC PATCH 3/5] test-oidmap: add back proper 'add' subcommand

2019-07-07 Thread Christian Couder
Let's making it ppossible to test oidmap_add() by adding an 'add'
subcommand.

Signed-off-by: Christian Couder 
---
 t/helper/test-oidmap.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
index 0acf99931e..c19b41aa4f 100644
--- a/t/helper/test-oidmap.c
+++ b/t/helper/test-oidmap.c
@@ -65,6 +65,23 @@ int cmd__oidmap(int argc, const char **argv)
puts(entry ? entry->name : "NULL");
free(entry);
 
+   } else if (!strcmp("add", cmd) && p1 && p2) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* create entry with oid_key = p1, name_value = p2 */
+   FLEX_ALLOC_STR(entry, name, p2);
+   oidcpy(&entry->entry.oid, &oid);
+
+   /* add entry */
+   oidmap_add(&map, entry);
+
+   /* print added entry */
+   puts(entry->name);
+
} else if (!strcmp("get", cmd) && p1) {
 
if (get_oid(p1, &oid)) {
-- 
2.22.0.514.g3228928bce.dirty



[RFC PATCH 0/5] oidmap: handle entries with the same key

2019-07-07 Thread Christian Couder
This is an RFC patch series that is not intended to be merged for now,
as it looks like we don't need oidmaps that can handle several entries
with the same key yet.

As I needed this for my work on reftable, I thought that I might as
well post it, to get early feedback and to avoid duplicating work in
case someone else needs it before I start sending my reftable work
(hopefully in a few months).

Christian Couder (5):
  oidmap: add oidmap_add()
  oidmap: add oidmap_get_next()
  test-oidmap: add back proper 'add' subcommand
  test-oidmap: add 'get_all' subcommand
  t0016: add 'add' and 'get_all' subcommand test

 oidmap.c   | 20 
 oidmap.h   | 12 
 t/helper/test-oidmap.c | 34 ++
 t/t0016-oidmap.sh  | 26 ++
 4 files changed, 92 insertions(+)

-- 
2.22.0.514.g3228928bce.dirty



[RFC PATCH 2/5] oidmap: add oidmap_get_next()

2019-07-07 Thread Christian Couder
For now "oidmap.h" gives us no way to get all the entries that have the
same oid key from an oidmap, as oidmap_get() will always return the first
entry. So let's add oidmap_get_next() for this purpose.

Signed-off-by: Christian Couder 
---
 oidmap.c | 8 
 oidmap.h | 6 ++
 2 files changed, 14 insertions(+)

diff --git a/oidmap.c b/oidmap.c
index bfb290ee01..9cf9dfd533 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -32,6 +32,14 @@ void *oidmap_get(const struct oidmap *map, const struct 
object_id *key)
return hashmap_get_from_hash(&map->map, oidhash(key), key);
 }
 
+void *oidmap_get_next(const struct oidmap *map, const void *entry)
+{
+   if (!map->map.cmpfn)
+   return NULL;
+
+   return hashmap_get_next(&map->map, entry);
+}
+
 void *oidmap_remove(struct oidmap *map, const struct object_id *key)
 {
struct hashmap_entry entry;
diff --git a/oidmap.h b/oidmap.h
index 21d929ad79..5aad22784a 100644
--- a/oidmap.h
+++ b/oidmap.h
@@ -49,6 +49,12 @@ void oidmap_free(struct oidmap *map, int free_entries);
 void *oidmap_get(const struct oidmap *map,
 const struct object_id *key);
 
+/*
+ * Returns the next equal oidmap entry, or NULL if not found. This can be
+ * used to iterate over duplicate entries (see `oidmap_add`).
+ */
+void *oidmap_get_next(const struct oidmap *map, const void *entry);
+
 /*
  * Adds an oidmap entry. This allows to add duplicate entries (i.e.
  * separate values with the same oid key).
-- 
2.22.0.514.g3228928bce.dirty



[RFC PATCH 1/5] oidmap: add oidmap_add()

2019-07-07 Thread Christian Couder
We will need to have more than one entry with the same oid key
in an oidmap, which is not supported yet, as oipmap_put()
replaces an existing entry with the same oid key. So let's add
oidmap_add().

Signed-off-by: Christian Couder 
---
 oidmap.c | 12 
 oidmap.h |  6 ++
 2 files changed, 18 insertions(+)

diff --git a/oidmap.c b/oidmap.c
index 6d6e840d03..bfb290ee01 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -43,6 +43,17 @@ void *oidmap_remove(struct oidmap *map, const struct 
object_id *key)
return hashmap_remove(&map->map, &entry, key);
 }
 
+void oidmap_add(struct oidmap *map, void *entry)
+{
+   struct oidmap_entry *to_put = entry;
+
+   if (!map->map.cmpfn)
+   oidmap_init(map, 0);
+
+   hashmap_entry_init(&to_put->internal_entry, oidhash(&to_put->oid));
+   hashmap_add(&map->map, to_put);
+}
+
 void *oidmap_put(struct oidmap *map, void *entry)
 {
struct oidmap_entry *to_put = entry;
@@ -53,3 +64,4 @@ void *oidmap_put(struct oidmap *map, void *entry)
hashmap_entry_init(&to_put->internal_entry, oidhash(&to_put->oid));
return hashmap_put(&map->map, to_put);
 }
+
diff --git a/oidmap.h b/oidmap.h
index 7a939461ff..21d929ad79 100644
--- a/oidmap.h
+++ b/oidmap.h
@@ -49,6 +49,12 @@ void oidmap_free(struct oidmap *map, int free_entries);
 void *oidmap_get(const struct oidmap *map,
 const struct object_id *key);
 
+/*
+ * Adds an oidmap entry. This allows to add duplicate entries (i.e.
+ * separate values with the same oid key).
+ */
+void oidmap_add(struct oidmap *map, void *entry);
+
 /*
  * Adds or replaces an oidmap entry.
  *
-- 
2.22.0.514.g3228928bce.dirty



[PATCH 1/2] test-oidmap: remove 'add' subcommand

2019-06-29 Thread Christian Couder
The 'add' subcommand is useless as it is mostly identical
to the 'put' subcommand, so let's remove it.

Helped-by: Derrick Stolee 
Signed-off-by: Christian Couder 
---

This and 2/2 follow this discussion about test coverage:

https://public-inbox.org/git/CAP8UFD3VFdCUwDBTb9en22FO7HnWc4vgQ4h0hhariCB=om4...@mail.gmail.com/

 t/helper/test-oidmap.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
index 7036588175..0acf99931e 100644
--- a/t/helper/test-oidmap.c
+++ b/t/helper/test-oidmap.c
@@ -47,21 +47,7 @@ int cmd__oidmap(int argc, const char **argv)
if (p1)
p2 = strtok(NULL, DELIM);
 
-   if (!strcmp("add", cmd) && p1 && p2) {
-
-   if (get_oid(p1, &oid)) {
-   printf("Unknown oid: %s\n", p1);
-   continue;
-   }
-
-   /* create entry with oidkey from p1, value = p2 */
-   FLEX_ALLOC_STR(entry, name, p2);
-   oidcpy(&entry->entry.oid, &oid);
-
-   /* add to oidmap */
-   oidmap_put(&map, entry);
-
-   } else if (!strcmp("put", cmd) && p1 && p2) {
+   if (!strcmp("put", cmd) && p1 && p2) {
 
if (get_oid(p1, &oid)) {
printf("Unknown oid: %s\n", p1);
-- 
2.22.0.464.g81fc6c5d4a.dirty



[PATCH 2/2] t0016: add 'remove' subcommand test

2019-06-29 Thread Christian Couder
Testing the 'remove' subcommand was forgotten when t0016
was created. Let's fix that.

Helped-by: Derrick Stolee 
Signed-off-by: Christian Couder 
---
 t/t0016-oidmap.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
index af17264ce3..bbe719e950 100755
--- a/t/t0016-oidmap.sh
+++ b/t/t0016-oidmap.sh
@@ -67,6 +67,24 @@ Unknown oid: invalidOid
 
 '
 
+test_expect_success 'remove' '
+
+test_oidmap "put one 1
+put two 2
+put three 3
+remove one
+remove two
+remove invalidOid
+remove four" "NULL
+NULL
+NULL
+1
+2
+Unknown oid: invalidOid
+NULL"
+
+'
+
 test_expect_success 'iterate' '
 
 test_oidmap "put one 1
-- 
2.22.0.464.g81fc6c5d4a.dirty



Re: Git Test Coverage Report (Thurs. June 27)

2019-06-28 Thread Christian Couder
On Thu, Jun 27, 2019 at 7:35 PM Derrick Stolee  wrote:
>
> Here are some interesting sections I found when examining the test coverage
> report. I am only highlighting these sections because they seem to include
> non-trivial logic. In some cases, maybe the code isn't needed.
>
> On 6/27/2019 1:05 PM, Derrick Stolee wrote:
> > promisor-remote.c
> > db27dca5 25) die(_("Remote with no URL"));
> > 48de3158 61) warning(_("promisor remote name cannot begin with '/': %s"),
> > 48de3158 63) return NULL;
> > faf2abf4 93) previous->next = r->next;
> > 4ca9474e 108) return git_config_string(&core_partial_clone_filter_default,
> > fa3d1b63 139) return 0;
> > 9e27beaa 202) static int remove_fetched_oids(struct repository *repo,
> > 9e27beaa 206) int i, remaining_nr = 0;
> > 9e27beaa 207) int *remaining = xcalloc(oid_nr, sizeof(*remaining));
> > 9e27beaa 208) struct object_id *old_oids = *oids;
> > 9e27beaa 211) for (i = 0; i < oid_nr; i++)
> > 9e27beaa 212) if (oid_object_info_extended(repo, &old_oids[i], NULL,
> > 9e27beaa 214) remaining[i] = 1;
> > 9e27beaa 215) remaining_nr++;
> > 9e27beaa 218) if (remaining_nr) {
> > 9e27beaa 219) int j = 0;
> > 9e27beaa 220) new_oids = xcalloc(remaining_nr, sizeof(*new_oids));
> > 9e27beaa 221) for (i = 0; i < oid_nr; i++)
> > 9e27beaa 222) if (remaining[i])
> > 9e27beaa 223) oidcpy(&new_oids[j++], &old_oids[i]);
> > 9e27beaa 224) *oids = new_oids;
> > 9e27beaa 225) if (to_free)
> > 9e27beaa 226) free(old_oids);
> > 9e27beaa 229) free(remaining);
> > 9e27beaa 231) return remaining_nr;
> > 9e27beaa 248) if (remaining_nr == 1)
> > 9e27beaa 249) continue;
> > 9e27beaa 250) remaining_nr = remove_fetched_oids(repo, &remaining_oids,
> > 9e27beaa 252) if (remaining_nr) {
> > 9e27beaa 253) to_free = 1;
> > 9e27beaa 254) continue;
> > 9e27beaa 262) free(remaining_oids);
>
> Christian: this section continues to be untested, but I think you were
> working on creating tests for this.

Yeah, I am planning to work on this soon.

> > t/helper/test-oidmap.c
> > 11510dec 52) if (get_oid(p1, &oid)) {
> > 11510dec 53) printf("Unknown oid: %s\n", p1);
> > 11510dec 54) continue;
> > 11510dec 58) FLEX_ALLOC_STR(entry, name, p2);
> > 11510dec 59) oidcpy(&entry->entry.oid, &oid);
> > 11510dec 62) oidmap_put(&map, entry);
>
> Christian: this block looks like the test-oidmap helper never uses the "add"
> subcommand. Is that correct?

Yeah, I initially copied it from hashmap, but then I realized that it
was nearly identical as the "put" subcommand, so not worth testing
separately. I should have removed it and will do it soon.

> > 11510dec 97) if (get_oid(p1, &oid)) {
> > 11510dec 98) printf("Unknown oid: %s\n", p1);
> > 11510dec 99) continue;
> > 11510dec 103) entry = oidmap_remove(&map, &oid);
> > 11510dec 106) puts(entry ? entry->name : "NULL");
> > 11510dec 107) free(entry);
>
> Similarly, this block means we are not using the "remove" subcommand.

Yeah, it looks like I forgot to implement a test for that subcommand.
Will add it soon.

Thanks,
Christian.


[ANNOUNCE] Git Rev News edition 52

2019-06-28 Thread Christian Couder
Hi everyone,

The 52nd edition of Git Rev News is now published:

  https://git.github.io/rev_news/2019/06/28/edition-52/

Thanks a lot to Jeff Hostetler, David Pursehouse and Johannes
Schindelin who contributed this month!

Enjoy,
Christian, Jakub, Markus and Gabriel.


Re: [PATCH] promisor-remote.h: fix an 'hdr-check' warning

2019-06-26 Thread Christian Couder
Hi Ramsay,

On Thu, Jun 27, 2019 at 12:14 AM Ramsay Jones
 wrote:
>
> If you need to re-roll your 'cc/multi-promisor' branch, could you please
> squash this into the relevant patch (commit 9e27beaa23, "promisor-remote:
> implement promisor_remote_get_direct()", 2019-06-25).

Sure, I have integrated it into my current development branch.

> [No, this is not the same as the April patch! :-D ]

Glad to hear that :-D

Thanks,
Christian.


Draft of Git Rev News edition 52

2019-06-26 Thread Christian Couder
Hi everyone!

A draft of a new Git Rev News edition is available here:

  https://github.com/git/git.github.io/blob/master/rev_news/drafts/edition-52.md

Everyone is welcome to contribute in any section either by editing the
above page on GitHub and sending a pull request, or by commenting on
this GitHub issue:

  https://github.com/git/git.github.io/issues/381

You can also reply to this email.

In general all kinds of contribution, for example proofreading,
suggestions for articles or links, help on the issues in GitHub, and
so on, are very much appreciated.

I tried to cc everyone who appears in this edition, but maybe I missed
some people, sorry about that.

Jakub, Markus, Gabriel and me plan to publish this edition on Friday
June 28th. Yeah, this is late and unusual, sorry about that!

Thanks,
Christian.


Re: [PATCH v5 04/16] promisor-remote: implement promisor_remote_get_direct()

2019-06-25 Thread Christian Couder
On Fri, May 31, 2019 at 7:10 AM Christian Couder
 wrote:

> On Thu, May 30, 2019 at 7:21 PM Derrick Stolee  wrote:
> >
> > On 4/9/2019 12:11 PM, Christian Couder wrote:

> > > +{
> > > + int i, missing_nr = 0;
> > > + int *missing = xcalloc(oid_nr, sizeof(*missing));
> > > + struct object_id *old_oids = *oids;
> > > + struct object_id *new_oids;
> > > + int old_fetch_if_missing = fetch_if_missing;
> > > +
> > > + fetch_if_missing = 0;
> >
> > This global 'fetch_if_missing' swap seems very fragile. I'm guessing you 
> > are using
> > it to prevent a loop when calling oid_object_info_extended() below. Can you 
> > instead
> > pass a flag to the method that disables the fetch_if_missing behavior?
>
> If such a flag existed when I wrote the function I would certainly
> have used it, as I also dislike this kind of messing with a global
> (and globals in general).
>
> I will see if I can do something about it according to what you
> suggest later in this thread.

In the V6 patch series I just sent, the new
OBJECT_INFO_SKIP_FETCH_OBJECT flag that you introduced is used.

> > > + for (i = 0; i < oid_nr; i++)
> > > + if (oid_object_info_extended(the_repository, &old_oids[i], 
> > > NULL, 0)) {
> >
> > A use of "the_repository" this deep in new code is asking for a refactor 
> > later to remove it.
> > Please try to pass a "struct repository *r" through your methods so we 
> > minimize references
> > to the_repository (and the amount of work required to remove them later).
>
> Ok, I will take a look at that.

A "struct repository *r" is passed in V6. I forgot to mention that in
the cover letter.

> > > + missing_nr = remove_fetched_oids(&missing_oids, 
> > > missing_nr, to_free);
> >
> > Here is the one call, and after this assignment "missing_nr" does mean the 
> > number of missing objects.
> > However, I do think this could be clarified by using remaining_nr and 
> > remaining_oids.
>
> Ok, I will take a look at using "remaining_nr" and "remaining_oids".

Done in V6 too.

> > > + if (missing_nr) {
> > > + to_free = 1;
> > > + continue;
> > > + }
> >
> > Now this block took a bit to grok. You use to_free in the if(to_free) 
> > free(missing_oids); below.
> > But it also changes the behavior of remove_fetched_oids(). This means that 
> > the first time
> > remove_fetched_oids() will preserve the list (because it is the input list) 
> > but all later
> > calls will free the newly-created intermediate list. This checks out.
> >
> > What is confusing to me: is there any reason that missing_nr would be zero 
> > in this situation?
>
> I don't think so but I will check again, and maybe add a comment.

Actually missing_nr, or now remaining_nr, would be 0 if all the
promised objects have been fetched.

> > > + }
> > > + res = 0;
> > > + break;
> > > + }
> > > +
> > > + if (to_free)
> > > + free(missing_oids);
> > > +
> > > + return res;
> > > +}
> >
> > While the test coverage report brought this patch to my attention, it does 
> > seem correct.
> > I still think a test exposing this method would be good, especially one 
> > that requires
> > a fetch_objects() call to multiple remotes to really exercise the details 
> > of remove_fetched_oids().
>
> Yeah, I would like to actually test it. I will take another look at
> what can be done to test this. Perhaps I will look at what can be done
> to still get some objects when fetching from a promisor/partial clone
> remote even when it doesn't have all of the objects we request.

I haven't improved test coverage or looked at how we could better
handle a partial fetch. I plan to look at that soon.

Thanks,
Christian.


[PATCH v6 06/15] promisor-remote: use repository_format_partial_clone

2019-06-25 Thread Christian Couder
A remote specified using the extensions.partialClone config
option should be considered a promisor remote too.

For simplicity and to make things predictable, this promisor
remote should be either always the last one we try to get
objects from, or the first one. So it should always be either
at the end of the promisor remote list, or at its start.

We decided to make it the last one we try, because it is
likely that someone using many promisor remotes is doing so
because the other promisor remotes are better for some reason
(maybe they are closer or faster for some kind of objects)
than the origin, and the origin is likely to be the remote
specified by extensions.partialClone.

This justification is not very strong, but one choice had to
be made, and anyway the long term plan should be to make the
order somehow fully configurable.

Signed-off-by: Christian Couder 
---
 promisor-remote.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/promisor-remote.c b/promisor-remote.c
index 763d98aedd..6a8856f475 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -40,6 +40,18 @@ static struct promisor_remote *promisor_remote_lookup(const 
char *remote_name,
return NULL;
 }
 
+static void promisor_remote_move_to_tail(struct promisor_remote *r,
+struct promisor_remote *previous)
+{
+   if (previous)
+   previous->next = r->next;
+   else
+   promisors = r->next ? r->next : r;
+   r->next = NULL;
+   *promisors_tail = r;
+   promisors_tail = &r->next;
+}
+
 static int promisor_remote_config(const char *var, const char *value, void 
*data)
 {
const char *name;
@@ -76,6 +88,17 @@ static void promisor_remote_init(void)
initialized = 1;
 
git_config(promisor_remote_config, NULL);
+
+   if (repository_format_partial_clone) {
+   struct promisor_remote *o, *previous;
+
+   o = promisor_remote_lookup(repository_format_partial_clone,
+  &previous);
+   if (o)
+   promisor_remote_move_to_tail(o, previous);
+   else
+   promisor_remote_new(repository_format_partial_clone);
+   }
 }
 
 static void promisor_remote_clear(void)
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 13/15] Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}

2019-06-25 Thread Christian Couder
As fetch_objects() is now used only in promisor-remote.c
and should't be used outside it, let's move it into
promisor-remote.c, make it static there, and remove
fetch-object.{c,h}.

Signed-off-by: Christian Couder 
---
 Makefile  |  1 -
 fetch-object.c| 43 ---
 fetch-object.h|  9 -
 promisor-remote.c | 40 +++-
 4 files changed, 39 insertions(+), 54 deletions(-)
 delete mode 100644 fetch-object.c
 delete mode 100644 fetch-object.h

diff --git a/Makefile b/Makefile
index 049bc8cfd4..9b0baa7239 100644
--- a/Makefile
+++ b/Makefile
@@ -880,7 +880,6 @@ LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec-cmd.o
 LIB_OBJS += fetch-negotiator.o
-LIB_OBJS += fetch-object.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += fsmonitor.o
diff --git a/fetch-object.c b/fetch-object.c
deleted file mode 100644
index eac4d448ef..00
--- a/fetch-object.c
+++ /dev/null
@@ -1,43 +0,0 @@
-#include "cache.h"
-#include "packfile.h"
-#include "pkt-line.h"
-#include "strbuf.h"
-#include "transport.h"
-#include "fetch-object.h"
-
-static int fetch_refs(const char *remote_name, struct ref *ref)
-{
-   struct remote *remote;
-   struct transport *transport;
-   int original_fetch_if_missing = fetch_if_missing;
-   int res;
-
-   fetch_if_missing = 0;
-   remote = remote_get(remote_name);
-   if (!remote->url[0])
-   die(_("Remote with no URL"));
-   transport = transport_get(remote, remote->url[0]);
-
-   transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
-   transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
-   res = transport_fetch_refs(transport, ref);
-   fetch_if_missing = original_fetch_if_missing;
-
-   return res;
-}
-
-int fetch_objects(const char *remote_name, const struct object_id *oids,
- int oid_nr)
-{
-   struct ref *ref = NULL;
-   int i;
-
-   for (i = 0; i < oid_nr; i++) {
-   struct ref *new_ref = alloc_ref(oid_to_hex(&oids[i]));
-   oidcpy(&new_ref->old_oid, &oids[i]);
-   new_ref->exact_oid = 1;
-   new_ref->next = ref;
-   ref = new_ref;
-   }
-   return fetch_refs(remote_name, ref);
-}
diff --git a/fetch-object.h b/fetch-object.h
deleted file mode 100644
index 7bcc7cadb0..00
--- a/fetch-object.h
+++ /dev/null
@@ -1,9 +0,0 @@
-#ifndef FETCH_OBJECT_H
-#define FETCH_OBJECT_H
-
-struct object_id;
-
-int fetch_objects(const char *remote_name, const struct object_id *oids,
- int oid_nr);
-
-#endif
diff --git a/promisor-remote.c b/promisor-remote.c
index 826890f7b8..92c4c12c1c 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -2,7 +2,45 @@
 #include "object-store.h"
 #include "promisor-remote.h"
 #include "config.h"
-#include "fetch-object.h"
+#include "transport.h"
+
+static int fetch_refs(const char *remote_name, struct ref *ref)
+{
+   struct remote *remote;
+   struct transport *transport;
+   int original_fetch_if_missing = fetch_if_missing;
+   int res;
+
+   fetch_if_missing = 0;
+   remote = remote_get(remote_name);
+   if (!remote->url[0])
+   die(_("Remote with no URL"));
+   transport = transport_get(remote, remote->url[0]);
+
+   transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
+   transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
+   res = transport_fetch_refs(transport, ref);
+   fetch_if_missing = original_fetch_if_missing;
+
+   return res;
+}
+
+static int fetch_objects(const char *remote_name,
+const struct object_id *oids,
+int oid_nr)
+{
+   struct ref *ref = NULL;
+   int i;
+
+   for (i = 0; i < oid_nr; i++) {
+   struct ref *new_ref = alloc_ref(oid_to_hex(&oids[i]));
+   oidcpy(&new_ref->old_oid, &oids[i]);
+   new_ref->exact_oid = 1;
+   new_ref->next = ref;
+   ref = new_ref;
+   }
+   return fetch_refs(remote_name, ref);
+}
 
 static struct promisor_remote *promisors;
 static struct promisor_remote **promisors_tail = &promisors;
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 07/15] Use promisor_remote_get_direct() and has_promisor_remote()

2019-06-25 Thread Christian Couder
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder 
---
 builtin/cat-file.c|  5 +++--
 builtin/fetch.c   | 11 ++-
 builtin/gc.c  |  3 ++-
 builtin/index-pack.c  |  8 
 builtin/repack.c  |  3 ++-
 cache-tree.c  |  3 ++-
 connected.c   |  3 ++-
 diff.c|  9 -
 list-objects-filter-options.c | 28 +++-
 packfile.c|  3 ++-
 sha1-file.c   | 15 ---
 t/t5601-clone.sh  |  2 +-
 t/t5616-partial-clone.sh  |  2 +-
 unpack-trees.c|  8 
 14 files changed, 56 insertions(+), 47 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0f092382e1..85ae10bf0b 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -15,6 +15,7 @@
 #include "sha1-array.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "promisor-remote.h"
 
 struct batch_options {
int enabled;
@@ -523,8 +524,8 @@ static int batch_objects(struct batch_options *opt)
if (opt->all_objects) {
struct object_cb_data cb;
 
-   if (repository_format_partial_clone)
-   warning("This repository has extensions.partialClone 
set. Some objects may not be loaded.");
+   if (has_promisor_remote())
+   warning("This repository uses promisor remotes. Some 
objects may not be loaded.");
 
cb.opt = opt;
cb.expand = &data;
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 4ba63d5ac6..f74bd78144 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -23,6 +23,7 @@
 #include "packfile.h"
 #include "list-objects-filter-options.h"
 #include "commit-reach.h"
+#include "promisor-remote.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
@@ -1460,7 +1461,7 @@ static inline void fetch_one_setup_partial(struct remote 
*remote)
 * If no prior partial clone/fetch and the current fetch DID NOT
 * request a partial-fetch, do a normal fetch.
 */
-   if (!repository_format_partial_clone && !filter_options.choice)
+   if (!has_promisor_remote() && !filter_options.choice)
return;
 
/*
@@ -1468,7 +1469,7 @@ static inline void fetch_one_setup_partial(struct remote 
*remote)
 * on this repo and remember the given filter-spec as the default
 * for subsequent fetches to this remote.
 */
-   if (!repository_format_partial_clone && filter_options.choice) {
+   if (!has_promisor_remote() && filter_options.choice) {
partial_clone_register(remote->name, &filter_options);
return;
}
@@ -1477,7 +1478,7 @@ static inline void fetch_one_setup_partial(struct remote 
*remote)
 * We are currently limited to only ONE promisor remote and only
 * allow partial-fetches from the promisor remote.
 */
-   if (strcmp(remote->name, repository_format_partial_clone)) {
+   if (!promisor_remote_find(remote->name)) {
if (filter_options.choice)
die(_("--filter can only be used with the remote "
  "configured in extensions.partialClone"));
@@ -1611,7 +1612,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;
 
-   if (filter_options.choice && !repository_format_partial_clone)
+   if (filter_options.choice && !has_promisor_remote())
die("--filter can only be used when extensions.partialClone is 
set");
 
if (all) {
@@ -1645,7 +1646,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
}
 
if (remote) {
-   if (filter_options.choice || repository_format_partial_clone)
+   if (filter_options.choice || has_promisor_remote())
fetch_one_setup_partial(remote);
result = fetch_one(remote, argc, argv, prune_tags_ok);
} else {
diff --git a/builtin/gc.c b/builtin/gc.c
index 8943bcc300..824a8832b5 100644
--- a/builti

[PATCH v6 08/15] promisor-remote: parse remote.*.partialclonefilter

2019-06-25 Thread Christian Couder
This makes it possible to specify a different partial clone
filter for each promisor remote.

Signed-off-by: Christian Couder 
---
 builtin/fetch.c   |  2 +-
 list-objects-filter-options.c | 27 +++
 list-objects-filter-options.h |  3 ++-
 promisor-remote.c | 15 +++
 promisor-remote.h |  5 -
 t/t0410-partial-clone.sh  |  2 +-
 t/t5601-clone.sh  |  1 +
 t/t5616-partial-clone.sh  |  2 +-
 8 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f74bd78144..13d8133130 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1491,7 +1491,7 @@ static inline void fetch_one_setup_partial(struct remote 
*remote)
 * the config.
 */
if (!filter_options.choice)
-   partial_clone_get_default_filter_spec(&filter_options);
+   partial_clone_get_default_filter_spec(&filter_options, 
remote->name);
return;
 }
 
diff --git a/list-objects-filter-options.c b/list-objects-filter-options.c
index b0de7d3c17..28c571f922 100644
--- a/list-objects-filter-options.c
+++ b/list-objects-filter-options.c
@@ -30,6 +30,9 @@ static int gently_parse_list_objects_filter(
 {
const char *v0;
 
+   if (!arg)
+   return 0;
+
if (filter_options->choice) {
if (errbuf) {
strbuf_addstr(
@@ -148,6 +151,7 @@ void partial_clone_register(
const struct list_objects_filter_options *filter_options)
 {
char *cfg_name;
+   char *filter_name;
 
/* Check if it is already registered */
if (!promisor_remote_find(remote)) {
@@ -162,27 +166,26 @@ void partial_clone_register(
/*
 * Record the initial filter-spec in the config as
 * the default for subsequent fetches from this remote.
-*
-* TODO: record it into remote..partialclonefilter
 */
-   core_partial_clone_filter_default =
-   xstrdup(filter_options->filter_spec);
-   git_config_set("core.partialclonefilter",
-  core_partial_clone_filter_default);
+   filter_name = xstrfmt("remote.%s.partialclonefilter", remote);
+   git_config_set(filter_name, filter_options->filter_spec);
+   free(filter_name);
 
/* Make sure the config info are reset */
promisor_remote_reinit();
 }
 
 void partial_clone_get_default_filter_spec(
-   struct list_objects_filter_options *filter_options)
+   struct list_objects_filter_options *filter_options,
+   const char *remote)
 {
+   struct promisor_remote *promisor = promisor_remote_find(remote);
+
/*
 * Parse default value, but silently ignore it if it is invalid.
 */
-   if (!core_partial_clone_filter_default)
-   return;
-   gently_parse_list_objects_filter(filter_options,
-core_partial_clone_filter_default,
-NULL);
+   if (promisor)
+   gently_parse_list_objects_filter(filter_options,
+promisor->partial_clone_filter,
+NULL);
 }
diff --git a/list-objects-filter-options.h b/list-objects-filter-options.h
index c54ffb..8deaa287b5 100644
--- a/list-objects-filter-options.h
+++ b/list-objects-filter-options.h
@@ -87,6 +87,7 @@ void partial_clone_register(
const char *remote,
const struct list_objects_filter_options *filter_options);
 void partial_clone_get_default_filter_spec(
-   struct list_objects_filter_options *filter_options);
+   struct list_objects_filter_options *filter_options,
+   const char *remote);
 
 #endif /* LIST_OBJECTS_FILTER_OPTIONS_H */
diff --git a/promisor-remote.c b/promisor-remote.c
index 6a8856f475..826890f7b8 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -75,6 +75,21 @@ static int promisor_remote_config(const char *var, const 
char *value, void *data
free(remote_name);
return 0;
}
+   if (!strcmp(subkey, "partialclonefilter")) {
+   struct promisor_remote *r;
+   char *remote_name = xmemdupz(name, namelen);
+
+   r = promisor_remote_lookup(remote_name, NULL);
+   if (!r)
+   r = promisor_remote_new(remote_name);
+
+   free(remote_name);
+
+   if (!r)
+   return 0;
+
+   return git_config_string(&r->partial_clone_filter, var, value);
+   }
 
return 0;
 }
diff --git a/promisor-remote.h b/promisor-remote.h
index 4048e0..838cb092f3 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -5,10 +5,13 @@ struct object_id;
 
 /*
  * Promisor remote linked list
- * Its information come from remote.XXX config entries.
+

[PATCH v6 09/15] builtin/fetch: remove unique promisor remote limitation

2019-06-25 Thread Christian Couder
As the infrastructure for more than one promisor remote
has been introduced in previous patches, we can remove
code that forbids the registration of more than one
promisor remote.

Signed-off-by: Christian Couder 
---
 builtin/fetch.c | 20 +---
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 13d8133130..5657d054ec 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1465,26 +1465,16 @@ static inline void fetch_one_setup_partial(struct 
remote *remote)
return;
 
/*
-* If this is the FIRST partial-fetch request, we enable partial
-* on this repo and remember the given filter-spec as the default
-* for subsequent fetches to this remote.
+* If this is a partial-fetch request, we enable partial on
+* this repo if not already enabled and remember the given
+* filter-spec as the default for subsequent fetches to this
+* remote.
 */
-   if (!has_promisor_remote() && filter_options.choice) {
+   if (filter_options.choice) {
partial_clone_register(remote->name, &filter_options);
return;
}
 
-   /*
-* We are currently limited to only ONE promisor remote and only
-* allow partial-fetches from the promisor remote.
-*/
-   if (!promisor_remote_find(remote->name)) {
-   if (filter_options.choice)
-   die(_("--filter can only be used with the remote "
- "configured in extensions.partialClone"));
-   return;
-   }
-
/*
 * Do a partial-fetch from the promisor remote using either the
 * explicitly given filter-spec or inherit the filter-spec from
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 00/15] Many promisor remotes

2019-06-25 Thread Christian Couder
tch 12/15 (remote: add promisor and partial clone config to the doc)

These documentation patches explain how things can work with more than
one promisor remote.

  - Patch 13/15 (Remove fetch-object.{c,h} in favor of promisor-remote.{c,h})
  - Patch 14/15 (Move repository_format_partial_clone to promisor-remote.c)
  - Patch 15/15 (Move core_partial_clone_filter_default to promisor-remote.c)

These patches try to hide the old features (fetch_objects(),
repository_format_partial_clone and core_partial_clone_filter_default)
that only made sense for the general code to use when there was only
one promisor remote. This ensures that there will be compilation
errors rather than bugs or test failures if the old features are used
in the old fashion way.

Links
~

This patch series on GitHub:

V6: https://github.com/chriscool/git/commits/many-promisor-remotes
V5: https://github.com/chriscool/git/commits/many-promisor-remotes68
V4: https://github.com/chriscool/git/commits/many-promisor-remotes58
V3: https://github.com/chriscool/git/commits/many-promisor-remotes40
V2: https://github.com/chriscool/git/commits/many-promisor-remotes35
V1: https://github.com/chriscool/git/commits/many-promisor-remotes17

On the mailing list:

V5: https://public-inbox.org/git/20190409161116.30256-1-chrisc...@tuxfamily.org/
V4: https://public-inbox.org/git/20190401164045.17328-1-chrisc...@tuxfamily.org/
V3: https://public-inbox.org/git/20190312132959.11764-1-chrisc...@tuxfamily.org/
V2: https://public-inbox.org/git/20190122144212.15119-1-chrisc...@tuxfamily.org/
V1: https://public-inbox.org/git/20181211052746.16218-1-chrisc...@tuxfamily.org/

This patch series is a follow up from the discussions related to
the remote odb V4 patch series:

https://public-inbox.org/git/20180802061505.2983-1-chrisc...@tuxfamily.org/

Especially in:

https://public-inbox.org/git/CAP8UFD3nrhjANwNDqTwx5ZtnZNcnbAFqUN=u=lrvzuh4+3w...@mail.gmail.com/

I said that I would like to work on things in the following order:

  1) Teaching partial clone to attempt to fetch missing objects from
multiple remotes instead of only one using the order in the config.

  2) Simplifying the protocol for fetching missing objects so that it
can be satisfied by a lighter weight object storage system than a full
Git server.

  3) Making it possible to explicitly define an order in which the
remotes are accessed.

  4) Making the criteria for what objects can be missing more
aggressive, so that I can "git add" a large file and work with it
using Git without even having a second copy of that object in my local
object store.

And this patch series is about the 1).

The previous remote odb patch series on GitHub:

V5: https://github.com/chriscool/git/commits/remote-odb
V4: https://github.com/chriscool/git/commits/remote-odb5
V3: https://github.com/chriscool/git/commits/remote-odb3
V2: https://github.com/chriscool/git/commits/remote-odb2
V1: https://github.com/chriscool/git/commits/remote-odb1

Discussions related to previous versions of the odb patch series:

V4: https://public-inbox.org/git/20180802061505.2983-1-chrisc...@tuxfamily.org/
V3: https://public-inbox.org/git/20180713174959.16748-1-chrisc...@tuxfamily.org/
V2: https://public-inbox.org/git/20180630083542.20347-1-chrisc...@tuxfamily.org/
V1: https://public-inbox.org/git/20180623121846.19750-1-chrisc...@tuxfamily.org/

Christian Couder (15):
  t0410: remove pipes after git commands
  fetch-object: make functions return an error code
  Add initial support for many promisor remotes
  promisor-remote: implement promisor_remote_get_direct()
  promisor-remote: add promisor_remote_reinit()
  promisor-remote: use repository_format_partial_clone
  Use promisor_remote_get_direct() and has_promisor_remote()
  promisor-remote: parse remote.*.partialclonefilter
  builtin/fetch: remove unique promisor remote limitation
  t0410: test fetching from many promisor remotes
  partial-clone: add multiple remotes in the doc
  remote: add promisor and partial clone config to the doc
  Remove fetch-object.{c,h} in favor of promisor-remote.{c,h}
  Move repository_format_partial_clone to promisor-remote.c
  Move core_partial_clone_filter_default to promisor-remote.c

 Documentation/config/remote.txt   |   8 +
 Documentation/technical/partial-clone.txt | 117 +++---
 Makefile  |   2 +-
 builtin/cat-file.c|   5 +-
 builtin/fetch.c   |  29 +--
 builtin/gc.c  |   3 +-
 builtin/index-pack.c  |   8 +-
 builtin/repack.c  |   3 +-
 cache-tree.c  |   3 +-
 cache.h   |   2 -
 config.c  |   5 -
 connected.c   |   3 +-
 diff.c|   9 +-
 environment.c |   2 -
 fetch-object.c|  40 
 fetch-object.

[PATCH v6 15/15] Move core_partial_clone_filter_default to promisor-remote.c

2019-06-25 Thread Christian Couder
Now that we can have a different default partial clone filter for
each promisor remote, let's hide core_partial_clone_filter_default
as a static in promisor-remote.c to avoid it being use for
anything other than managing backward compatibility.

Signed-off-by: Christian Couder 
---
 cache.h   | 1 -
 config.c  | 5 -
 environment.c | 1 -
 promisor-remote.c | 5 +
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/cache.h b/cache.h
index e34b9e66d2..a4d7f84eeb 100644
--- a/cache.h
+++ b/cache.h
@@ -961,7 +961,6 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
-extern const char *core_partial_clone_filter_default;
 extern int repository_format_worktree_config;
 
 /*
diff --git a/config.c b/config.c
index 296a6d9cc4..317b226bc8 100644
--- a/config.c
+++ b/config.c
@@ -1344,11 +1344,6 @@ static int git_default_core_config(const char *var, 
const char *value, void *cb)
return 0;
}
 
-   if (!strcmp(var, "core.partialclonefilter")) {
-   return git_config_string(&core_partial_clone_filter_default,
-var, value);
-   }
-
if (!strcmp(var, "core.usereplacerefs")) {
read_replace_refs = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 8855d2fc11..efa072680a 100644
--- a/environment.c
+++ b/environment.c
@@ -31,7 +31,6 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
-const char *core_partial_clone_filter_default;
 int repository_format_worktree_config;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/promisor-remote.c b/promisor-remote.c
index 31d51bb50e..9bc296cdde 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -5,6 +5,7 @@
 #include "transport.h"
 
 static char *repository_format_partial_clone;
+static const char *core_partial_clone_filter_default;
 
 void set_repository_format_partial_clone(char *partial_clone)
 {
@@ -103,6 +104,10 @@ static int promisor_remote_config(const char *var, const 
char *value, void *data
int namelen;
const char *subkey;
 
+   if (!strcmp(var, "core.partialclonefilter"))
+   return git_config_string(&core_partial_clone_filter_default,
+var, value);
+
if (parse_config_key(var, "remote", &name, &namelen, &subkey) < 0)
return 0;
 
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 01/15] t0410: remove pipes after git commands

2019-06-25 Thread Christian Couder
Let's not run a git command, especially one with "verify" in its
name, upstream of a pipe, because the pipe will hide the git
command's exit code.

While at it, let's also avoid a useless `cat` command piping
into `sed`.

Helped-by: SZEDER Gábor 
Signed-off-by: Christian Couder 
---
 t/t0410-partial-clone.sh | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 5bd892f2f7..3559313bd0 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -166,8 +166,9 @@ test_expect_success 'fetching of missing objects' '
# associated packfile contains the object
ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
test_line_count = 1 promisorlist &&
-   IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
-   git verify-pack --verbose "$IDX" | grep "$HASH"
+   IDX=$(sed "s/promisor$/idx/" promisorlist) &&
+   git verify-pack --verbose "$IDX" >out &&
+   grep "$HASH" out
 '
 
 test_expect_success 'fetching of missing objects works with ref-in-want 
enabled' '
@@ -514,8 +515,9 @@ test_expect_success 'fetching of missing objects from an 
HTTP server' '
# associated packfile contains the object
ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
test_line_count = 1 promisorlist &&
-   IDX=$(cat promisorlist | sed "s/promisor$/idx/") &&
-   git verify-pack --verbose "$IDX" | grep "$HASH"
+   IDX=$(sed "s/promisor$/idx/" promisorlist) &&
+   git verify-pack --verbose "$IDX" >out &&
+   grep "$HASH" out
 '
 
 test_done
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 05/15] promisor-remote: add promisor_remote_reinit()

2019-06-25 Thread Christian Couder
From: Christian Couder 

We will need to reinitialize the promisor remote configuration
as we will make some changes to it in a later commit.

Signed-off-by: Christian Couder 
---
 promisor-remote.c | 22 --
 promisor-remote.h |  1 +
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/promisor-remote.c b/promisor-remote.c
index b79a84ce3a..763d98aedd 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -67,10 +67,10 @@ static int promisor_remote_config(const char *var, const 
char *value, void *data
return 0;
 }
 
+static int initialized;
+
 static void promisor_remote_init(void)
 {
-   static int initialized;
-
if (initialized)
return;
initialized = 1;
@@ -78,6 +78,24 @@ static void promisor_remote_init(void)
git_config(promisor_remote_config, NULL);
 }
 
+static void promisor_remote_clear(void)
+{
+   while (promisors) {
+   struct promisor_remote *r = promisors;
+   promisors = promisors->next;
+   free(r);
+   }
+
+   promisors_tail = &promisors;
+}
+
+void promisor_remote_reinit(void)
+{
+   initialized = 0;
+   promisor_remote_clear();
+   promisor_remote_init();
+}
+
 struct promisor_remote *promisor_remote_find(const char *remote_name)
 {
promisor_remote_init();
diff --git a/promisor-remote.h b/promisor-remote.h
index ed4ecead36..4048e0 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -12,6 +12,7 @@ struct promisor_remote {
const char name[FLEX_ARRAY];
 };
 
+extern void promisor_remote_reinit(void);
 extern struct promisor_remote *promisor_remote_find(const char *remote_name);
 extern int has_promisor_remote(void);
 extern int promisor_remote_get_direct(struct repository *repo,
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 12/15] remote: add promisor and partial clone config to the doc

2019-06-25 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/config/remote.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/Documentation/config/remote.txt b/Documentation/config/remote.txt
index 6c4cad83a2..a8e6437a90 100644
--- a/Documentation/config/remote.txt
+++ b/Documentation/config/remote.txt
@@ -76,3 +76,11 @@ remote..pruneTags::
 +
 See also `remote..prune` and the PRUNING section of
 linkgit:git-fetch[1].
+
+remote..promisor::
+   When set to true, this remote will be used to fetch promisor
+   objects.
+
+remote..partialclonefilter::
+   The filter that will be applied when fetching from this
+   promisor remote.
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 11/15] partial-clone: add multiple remotes in the doc

2019-06-25 Thread Christian Couder
While at it, let's remove a reference to ODB effort as the ODB
effort has been replaced by directly enhancing partial clone
and promisor remote features.

Signed-off-by: Christian Couder 
---
 Documentation/technical/partial-clone.txt | 117 --
 1 file changed, 84 insertions(+), 33 deletions(-)

diff --git a/Documentation/technical/partial-clone.txt 
b/Documentation/technical/partial-clone.txt
index 896c7b3878..210373e258 100644
--- a/Documentation/technical/partial-clone.txt
+++ b/Documentation/technical/partial-clone.txt
@@ -30,12 +30,20 @@ advance* during clone and fetch operations and thereby 
reduce download
 times and disk usage.  Missing objects can later be "demand fetched"
 if/when needed.
 
+A remote that can later provide the missing objects is called a
+promisor remote, as it promises to send the objects when
+requested. Initialy Git supported only one promisor remote, the origin
+remote from which the user cloned and that was configured in the
+"extensions.partialClone" config option. Later support for more than
+one promisor remote has been implemented.
+
 Use of partial clone requires that the user be online and the origin
-remote be available for on-demand fetching of missing objects.  This may
-or may not be problematic for the user.  For example, if the user can
-stay within the pre-selected subset of the source tree, they may not
-encounter any missing objects.  Alternatively, the user could try to
-pre-fetch various objects if they know that they are going offline.
+remote or other promisor remotes be available for on-demand fetching
+of missing objects.  This may or may not be problematic for the user.
+For example, if the user can stay within the pre-selected subset of
+the source tree, they may not encounter any missing objects.
+Alternatively, the user could try to pre-fetch various objects if they
+know that they are going offline.
 
 
 Non-Goals
@@ -100,18 +108,18 @@ or commits that reference missing trees.
 Handling Missing Objects
 
 
-- An object may be missing due to a partial clone or fetch, or missing due
-  to repository corruption.  To differentiate these cases, the local
-  repository specially indicates such filtered packfiles obtained from the
-  promisor remote as "promisor packfiles".
+- An object may be missing due to a partial clone or fetch, or missing
+  due to repository corruption.  To differentiate these cases, the
+  local repository specially indicates such filtered packfiles
+  obtained from promisor remotes as "promisor packfiles".
 +
 These promisor packfiles consist of a ".promisor" file with
 arbitrary contents (like the ".keep" files), in addition to
 their ".pack" and ".idx" files.
 
 - The local repository considers a "promisor object" to be an object that
-  it knows (to the best of its ability) that the promisor remote has promised
-  that it has, either because the local repository has that object in one of
+  it knows (to the best of its ability) that promisor remotes have promised
+  that they have, either because the local repository has that object in one of
   its promisor packfiles, or because another promisor object refers to it.
 +
 When Git encounters a missing object, Git can see if it is a promisor object
@@ -123,12 +131,12 @@ expensive-to-modify list of missing objects.[a]
 - Since almost all Git code currently expects any referenced object to be
   present locally and because we do not want to force every command to do
   a dry-run first, a fallback mechanism is added to allow Git to attempt
-  to dynamically fetch missing objects from the promisor remote.
+  to dynamically fetch missing objects from promisor remotes.
 +
 When the normal object lookup fails to find an object, Git invokes
-fetch-object to try to get the object from the server and then retry
-the object lookup.  This allows objects to be "faulted in" without
-complicated prediction algorithms.
+promisor_remote_get_direct() to try to get the object from a promisor
+remote and then retry the object lookup.  This allows objects to be
+"faulted in" without complicated prediction algorithms.
 +
 For efficiency reasons, no check as to whether the missing object is
 actually a promisor object is performed.
@@ -157,8 +165,7 @@ and prefetch those objects in bulk.
 +
 We are not happy with this global variable and would like to remove it,
 but that requires significant refactoring of the object code to pass an
-additional flag.  We hope that concurrent efforts to add an ODB API can
-encompass this.
+additional flag.
 
 
 Fetching Missing Objects
@@ -182,21 +189,63 @@ has been updated to not use any object flags when the 
corresponding argument
   though they are not necessary.
 
 
+Using many promisor remotes
+---
+
+Many promisor remotes can be configured and used.
+
+This allows for example a us

[PATCH v6 14/15] Move repository_format_partial_clone to promisor-remote.c

2019-06-25 Thread Christian Couder
Now that we have has_promisor_remote() and can use many
promisor remotes, let's hide repository_format_partial_clone
as a static in promisor-remote.c to avoid it being use
for anything other than managing backward compatibility.

Signed-off-by: Christian Couder 
---
 cache.h   | 1 -
 environment.c | 1 -
 promisor-remote.c | 7 +++
 promisor-remote.h | 6 ++
 setup.c   | 3 ++-
 5 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index bf20337ef4..e34b9e66d2 100644
--- a/cache.h
+++ b/cache.h
@@ -961,7 +961,6 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
-extern char *repository_format_partial_clone;
 extern const char *core_partial_clone_filter_default;
 extern int repository_format_worktree_config;
 
diff --git a/environment.c b/environment.c
index 89af47cb85..8855d2fc11 100644
--- a/environment.c
+++ b/environment.c
@@ -31,7 +31,6 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
-char *repository_format_partial_clone;
 const char *core_partial_clone_filter_default;
 int repository_format_worktree_config;
 const char *git_commit_encoding;
diff --git a/promisor-remote.c b/promisor-remote.c
index 92c4c12c1c..31d51bb50e 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -4,6 +4,13 @@
 #include "config.h"
 #include "transport.h"
 
+static char *repository_format_partial_clone;
+
+void set_repository_format_partial_clone(char *partial_clone)
+{
+   repository_format_partial_clone = xstrdup_or_null(partial_clone);
+}
+
 static int fetch_refs(const char *remote_name, struct ref *ref)
 {
struct remote *remote;
diff --git a/promisor-remote.h b/promisor-remote.h
index 838cb092f3..8200dfc940 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -22,4 +22,10 @@ extern int promisor_remote_get_direct(struct repository 
*repo,
  const struct object_id *oids,
  int oid_nr);
 
+/*
+ * This should be used only once from setup.c to set the value we got
+ * from the extensions.partialclone config option.
+ */
+extern void set_repository_format_partial_clone(char *partial_clone);
+
 #endif /* PROMISOR_REMOTE_H */
diff --git a/setup.c b/setup.c
index 8dcb4631f7..25a3038277 100644
--- a/setup.c
+++ b/setup.c
@@ -4,6 +4,7 @@
 #include "dir.h"
 #include "string-list.h"
 #include "chdir-notify.h"
+#include "promisor-remote.h"
 
 static int inside_git_dir = -1;
 static int inside_work_tree = -1;
@@ -478,7 +479,7 @@ static int check_repository_format_gently(const char 
*gitdir, struct repository_
}
 
repository_format_precious_objects = candidate->precious_objects;
-   repository_format_partial_clone = 
xstrdup_or_null(candidate->partial_clone);
+   set_repository_format_partial_clone(candidate->partial_clone);
repository_format_worktree_config = candidate->worktree_config;
string_list_clear(&candidate->unknown_extensions, 0);
 
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 10/15] t0410: test fetching from many promisor remotes

2019-06-25 Thread Christian Couder
From: Christian Couder 

This shows that it is now possible to fetch objects from more
than one promisor remote, and that fetching from a new
promisor remote can configure it as one.

Helped-by: SZEDER Gábor 
Signed-off-by: Christian Couder 
---
 t/t0410-partial-clone.sh | 49 +++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index 3082eff2bf..2498e72a34 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -183,8 +183,55 @@ test_expect_success 'fetching of missing objects works 
with ref-in-want enabled'
grep "git< fetch=.*ref-in-want" trace
 '
 
+test_expect_success 'fetching of missing objects from another promisor remote' 
'
+   git clone "file://$(pwd)/server" server2 &&
+   test_commit -C server2 bar &&
+   git -C server2 repack -a -d --write-bitmap-index &&
+   HASH2=$(git -C server2 rev-parse bar) &&
+
+   git -C repo remote add server2 "file://$(pwd)/server2" &&
+   git -C repo config remote.server2.promisor true &&
+   git -C repo cat-file -p "$HASH2" &&
+
+   git -C repo fetch server2 &&
+   rm -rf repo/.git/objects/* &&
+   git -C repo cat-file -p "$HASH2" &&
+
+   # Ensure that the .promisor file is written, and check that its
+   # associated packfile contains the object
+   ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+   test_line_count = 1 promisorlist &&
+   IDX=$(sed "s/promisor$/idx/" promisorlist) &&
+   git verify-pack --verbose "$IDX" >out &&
+   grep "$HASH2" out
+'
+
+test_expect_success 'fetching of missing objects configures a promisor remote' 
'
+   git clone "file://$(pwd)/server" server3 &&
+   test_commit -C server3 baz &&
+   git -C server3 repack -a -d --write-bitmap-index &&
+   HASH3=$(git -C server3 rev-parse baz) &&
+   git -C server3 config uploadpack.allowfilter 1 &&
+
+   rm repo/.git/objects/pack/pack-*.promisor &&
+
+   git -C repo remote add server3 "file://$(pwd)/server3" &&
+   git -C repo fetch --filter="blob:none" server3 $HASH3 &&
+
+   test_cmp_config -C repo true remote.server3.promisor &&
+
+   # Ensure that the .promisor file is written, and check that its
+   # associated packfile contains the object
+   ls repo/.git/objects/pack/pack-*.promisor >promisorlist &&
+   test_line_count = 1 promisorlist &&
+   IDX=$(sed "s/promisor$/idx/" promisorlist) &&
+   git verify-pack --verbose "$IDX" >out &&
+   grep "$HASH3" out
+'
+
 test_expect_success 'fetching of missing blobs works' '
-   rm -rf server repo &&
+   rm -rf server server2 repo &&
+   rm -rf server server3 repo &&
test_create_repo server &&
test_commit -C server foo &&
git -C server repack -a -d --write-bitmap-index &&
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 02/15] fetch-object: make functions return an error code

2019-06-25 Thread Christian Couder
From: Christian Couder 

The callers of the fetch_object() and fetch_objects() might
be interested in knowing if these functions succeeded or not.

Signed-off-by: Christian Couder 
Signed-off-by: Junio C Hamano 
---
 fetch-object.c | 13 -
 fetch-object.h |  4 ++--
 sha1-file.c|  4 ++--
 3 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/fetch-object.c b/fetch-object.c
index 4266548800..eac4d448ef 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -5,11 +5,12 @@
 #include "transport.h"
 #include "fetch-object.h"
 
-static void fetch_refs(const char *remote_name, struct ref *ref)
+static int fetch_refs(const char *remote_name, struct ref *ref)
 {
struct remote *remote;
struct transport *transport;
int original_fetch_if_missing = fetch_if_missing;
+   int res;
 
fetch_if_missing = 0;
remote = remote_get(remote_name);
@@ -19,12 +20,14 @@ static void fetch_refs(const char *remote_name, struct ref 
*ref)
 
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
-   transport_fetch_refs(transport, ref);
+   res = transport_fetch_refs(transport, ref);
fetch_if_missing = original_fetch_if_missing;
+
+   return res;
 }
 
-void fetch_objects(const char *remote_name, const struct object_id *oids,
-  int oid_nr)
+int fetch_objects(const char *remote_name, const struct object_id *oids,
+ int oid_nr)
 {
struct ref *ref = NULL;
int i;
@@ -36,5 +39,5 @@ void fetch_objects(const char *remote_name, const struct 
object_id *oids,
new_ref->next = ref;
ref = new_ref;
}
-   fetch_refs(remote_name, ref);
+   return fetch_refs(remote_name, ref);
 }
diff --git a/fetch-object.h b/fetch-object.h
index d6444caa5a..7bcc7cadb0 100644
--- a/fetch-object.h
+++ b/fetch-object.h
@@ -3,7 +3,7 @@
 
 struct object_id;
 
-void fetch_objects(const char *remote_name, const struct object_id *oids,
-  int oid_nr);
+int fetch_objects(const char *remote_name, const struct object_id *oids,
+ int oid_nr);
 
 #endif
diff --git a/sha1-file.c b/sha1-file.c
index 888b6024d5..819d32cdb8 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1381,8 +1381,8 @@ int oid_object_info_extended(struct repository *r, const 
struct object_id *oid,
!already_retried && r == the_repository &&
!(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
/*
-* TODO Investigate having fetch_object() return
-* TODO error/success and stopping the music here.
+* TODO Investigate checking fetch_object() return
+* TODO value and stopping on error here.
 * TODO Pass a repository struct through fetch_object,
 * such that arbitrary repositories work.
 */
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 04/15] promisor-remote: implement promisor_remote_get_direct()

2019-06-25 Thread Christian Couder
From: Christian Couder 

This is implemented for now by calling fetch_objects(). It fetches
from all the promisor remotes.

Helped-by: Ramsay Jones 
Helped-by: Derrick Stolee 
Signed-off-by: Christian Couder 
---
 promisor-remote.c | 67 +++
 promisor-remote.h |  5 
 2 files changed, 72 insertions(+)

diff --git a/promisor-remote.c b/promisor-remote.c
index c249b80e02..b79a84ce3a 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -1,6 +1,8 @@
 #include "cache.h"
+#include "object-store.h"
 #include "promisor-remote.h"
 #include "config.h"
+#include "fetch-object.h"
 
 static struct promisor_remote *promisors;
 static struct promisor_remote **promisors_tail = &promisors;
@@ -90,3 +92,68 @@ int has_promisor_remote(void)
 {
return !!promisor_remote_find(NULL);
 }
+
+static int remove_fetched_oids(struct repository *repo,
+  struct object_id **oids,
+  int oid_nr, int to_free)
+{
+   int i, remaining_nr = 0;
+   int *remaining = xcalloc(oid_nr, sizeof(*remaining));
+   struct object_id *old_oids = *oids;
+   struct object_id *new_oids;
+
+   for (i = 0; i < oid_nr; i++)
+   if (oid_object_info_extended(repo, &old_oids[i], NULL,
+OBJECT_INFO_SKIP_FETCH_OBJECT)) {
+   remaining[i] = 1;
+   remaining_nr++;
+   }
+
+   if (remaining_nr) {
+   int j = 0;
+   new_oids = xcalloc(remaining_nr, sizeof(*new_oids));
+   for (i = 0; i < oid_nr; i++)
+   if (remaining[i])
+   oidcpy(&new_oids[j++], &old_oids[i]);
+   *oids = new_oids;
+   if (to_free)
+   free(old_oids);
+   }
+
+   free(remaining);
+
+   return remaining_nr;
+}
+
+int promisor_remote_get_direct(struct repository *repo,
+  const struct object_id *oids,
+  int oid_nr)
+{
+   struct promisor_remote *r;
+   struct object_id *remaining_oids = (struct object_id *)oids;
+   int remaining_nr = oid_nr;
+   int to_free = 0;
+   int res = -1;
+
+   promisor_remote_init();
+
+   for (r = promisors; r; r = r->next) {
+   if (fetch_objects(r->name, remaining_oids, remaining_nr) < 0) {
+   if (remaining_nr == 1)
+   continue;
+   remaining_nr = remove_fetched_oids(repo, 
&remaining_oids,
+remaining_nr, to_free);
+   if (remaining_nr) {
+   to_free = 1;
+   continue;
+   }
+   }
+   res = 0;
+   break;
+   }
+
+   if (to_free)
+   free(remaining_oids);
+
+   return res;
+}
diff --git a/promisor-remote.h b/promisor-remote.h
index 01dcdf4dc7..ed4ecead36 100644
--- a/promisor-remote.h
+++ b/promisor-remote.h
@@ -1,6 +1,8 @@
 #ifndef PROMISOR_REMOTE_H
 #define PROMISOR_REMOTE_H
 
+struct object_id;
+
 /*
  * Promisor remote linked list
  * Its information come from remote.XXX config entries.
@@ -12,5 +14,8 @@ struct promisor_remote {
 
 extern struct promisor_remote *promisor_remote_find(const char *remote_name);
 extern int has_promisor_remote(void);
+extern int promisor_remote_get_direct(struct repository *repo,
+ const struct object_id *oids,
+ int oid_nr);
 
 #endif /* PROMISOR_REMOTE_H */
-- 
2.22.0.229.ga13d9ffdf7.dirty



[PATCH v6 03/15] Add initial support for many promisor remotes

2019-06-25 Thread Christian Couder
From: Christian Couder 

The promisor-remote.{c,h} files will contain functions to
manage many promisor remotes.

We expect that there will not be a lot of promisor remotes,
so it is ok to use a simple linked list to manage them.

Helped-by: Jeff King 
Helped-by: SZEDER Gábor 
Signed-off-by: Christian Couder 
---
 Makefile  |  1 +
 promisor-remote.c | 92 +++
 promisor-remote.h | 16 +
 3 files changed, 109 insertions(+)
 create mode 100644 promisor-remote.c
 create mode 100644 promisor-remote.h

diff --git a/Makefile b/Makefile
index f58bf14c7b..049bc8cfd4 100644
--- a/Makefile
+++ b/Makefile
@@ -944,6 +944,7 @@ LIB_OBJS += preload-index.o
 LIB_OBJS += pretty.o
 LIB_OBJS += prio-queue.o
 LIB_OBJS += progress.o
+LIB_OBJS += promisor-remote.o
 LIB_OBJS += prompt.o
 LIB_OBJS += protocol.o
 LIB_OBJS += quote.o
diff --git a/promisor-remote.c b/promisor-remote.c
new file mode 100644
index 00..c249b80e02
--- /dev/null
+++ b/promisor-remote.c
@@ -0,0 +1,92 @@
+#include "cache.h"
+#include "promisor-remote.h"
+#include "config.h"
+
+static struct promisor_remote *promisors;
+static struct promisor_remote **promisors_tail = &promisors;
+
+static struct promisor_remote *promisor_remote_new(const char *remote_name)
+{
+   struct promisor_remote *r;
+
+   if (*remote_name == '/') {
+   warning(_("promisor remote name cannot begin with '/': %s"),
+   remote_name);
+   return NULL;
+   }
+
+   FLEX_ALLOC_STR(r, name, remote_name);
+
+   *promisors_tail = r;
+   promisors_tail = &r->next;
+
+   return r;
+}
+
+static struct promisor_remote *promisor_remote_lookup(const char *remote_name,
+ struct promisor_remote 
**previous)
+{
+   struct promisor_remote *r, *p;
+
+   for (p = NULL, r = promisors; r; p = r, r = r->next)
+   if (!strcmp(r->name, remote_name)) {
+   if (previous)
+   *previous = p;
+   return r;
+   }
+
+   return NULL;
+}
+
+static int promisor_remote_config(const char *var, const char *value, void 
*data)
+{
+   const char *name;
+   int namelen;
+   const char *subkey;
+
+   if (parse_config_key(var, "remote", &name, &namelen, &subkey) < 0)
+   return 0;
+
+   if (!strcmp(subkey, "promisor")) {
+   char *remote_name;
+
+   if (!git_config_bool(var, value))
+   return 0;
+
+   remote_name = xmemdupz(name, namelen);
+
+   if (!promisor_remote_lookup(remote_name, NULL))
+   promisor_remote_new(remote_name);
+
+   free(remote_name);
+   return 0;
+   }
+
+   return 0;
+}
+
+static void promisor_remote_init(void)
+{
+   static int initialized;
+
+   if (initialized)
+   return;
+   initialized = 1;
+
+   git_config(promisor_remote_config, NULL);
+}
+
+struct promisor_remote *promisor_remote_find(const char *remote_name)
+{
+   promisor_remote_init();
+
+   if (!remote_name)
+   return promisors;
+
+   return promisor_remote_lookup(remote_name, NULL);
+}
+
+int has_promisor_remote(void)
+{
+   return !!promisor_remote_find(NULL);
+}
diff --git a/promisor-remote.h b/promisor-remote.h
new file mode 100644
index 00..01dcdf4dc7
--- /dev/null
+++ b/promisor-remote.h
@@ -0,0 +1,16 @@
+#ifndef PROMISOR_REMOTE_H
+#define PROMISOR_REMOTE_H
+
+/*
+ * Promisor remote linked list
+ * Its information come from remote.XXX config entries.
+ */
+struct promisor_remote {
+   struct promisor_remote *next;
+   const char name[FLEX_ARRAY];
+};
+
+extern struct promisor_remote *promisor_remote_find(const char *remote_name);
+extern int has_promisor_remote(void);
+
+#endif /* PROMISOR_REMOTE_H */
-- 
2.22.0.229.ga13d9ffdf7.dirty



Re: git bisect should return 1 when the first bad commit is found

2019-06-24 Thread Christian Couder
On Mon, Jun 24, 2019 at 4:51 AM Jeff King  wrote:
>
> On Sun, Jun 23, 2019 at 01:32:16PM -0700, Pedro Larroy wrote:
>
> > Thanks for your answer.
> >
> > I was expecting the HEAD to point to the first bad commit.
> >
> > In mercurial, the exit status tells you information about the
> > bisection process:  https://www.mercurial-scm.org/repo/hg/help/bisect

It's not clear from he above URL how that differs from what git bisect
does. I only found "Returns 0 on success" there which is not very
explicit, and we could argue that it's also what git bisect does.

> > Sure one can parse stdout, it's just more tedious than just checking
> > the return code and having the HEAD left to the original bad commit.

The git bisect documentation says:

   Eventually there will be no more revisions left to inspect, and
the command will print out a description of the first bad commit. The
reference
   refs/bisect/bad will be left pointing at that commit.

So you just need to parse stdout to detect that it found the first bad
commit, and then you can use refs/bisect/bad.

If the return code was used, how would you distinguish between a
failure in the command (for example if you give bad information to
`git bisect good` or `git bisect bad`) and the fact that it has not
yet found the first bad commit? Anyway you would need to add some
logic for that.

> I think it might be nice for Git to write a well-known refname (like
> BISECT_RESULT or similar) so that you can refer to that instead of
> having to read stdout (whether by machine or by a user
> cutting-and-pasting). And I cannot offhand think of a particular reason
> why that could not just be HEAD (instead of something bisect-specific)
> after the bisect finishes.

If it would be HEAD, it would mean that git bisect would potentially
have to do one more checkout so that HEAD points to the first bad
commit. This checkout would sometimes be useless, so it's more
efficient to use something like refs/bisect/bad rather than HEAD.

> We do not promise any particular value in HEAD now. The only downside
> would be the minor cost to checkout the working tree of the known-bad
> commit if we are not already there.

Though we might not explicitly promise in the doc that HEAD will stay
at the last commit that was tested, I think that's something people
can expect from the way we describe how bisect work. So I don't think
it would be a good idea to change our behavior.


[PATCH v2] doc: improve usage string in MyFirstContribution

2019-06-21 Thread Christian Couder
We implement a command called git-psuh which accept arguments, so let's
show that it accepts arguments in the doc and the usage string.

While at it, we need to prepare "a NULL-terminated array of usage strings",
not just "a NULL-terminated usage string".

Helped-by: Emily Shaffer 
Helped-by: Eric Sunshine 
Signed-off-by: Christian Couder 
---

The only change compared to he previous version is that "[...]" is
used instead of "..." in the synopsis and help message as discussed
with Emily, Eric and Junio.

 Documentation/MyFirstContribution.txt | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
index 895b7cfd4f..f8670379c0 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -428,7 +428,7 @@ git-psuh - Delight users' typo with a shy horse
 SYNOPSIS
 
 [verse]
-'git-psuh'
+'git-psuh [...]'
 
 DESCRIPTION
 ---
@@ -491,14 +491,16 @@ Take a look at 
`Documentation/technical/api-parse-options.txt`. This is a handy
 tool for pulling out options you need to be able to handle, and it takes a
 usage string.
 
-In order to use it, we'll need to prepare a NULL-terminated usage string and a
-`builtin_psuh_options` array. Add a line to `#include "parse-options.h"`.
+In order to use it, we'll need to prepare a NULL-terminated array of usage
+strings and a `builtin_psuh_options` array.
 
-At global scope, add your usage:
+Add a line to `#include "parse-options.h"`.
+
+At global scope, add your array of usage strings:
 
 
 static const char * const psuh_usage[] = {
-   N_("git psuh"),
+   N_("git psuh [...]"),
NULL,
 };
 
-- 
2.22.0.206.geb73f2e638



Re: [PATCH 0/2] Add OBJECT_INFO_NO_FETCH_IF_MISSING flag

2019-06-21 Thread Christian Couder
On Thu, Jun 20, 2019 at 10:52 PM Junio C Hamano  wrote:
>
> Christian Couder  writes:
>
> > In a review[1] of my "many promisor remotes" patch series[2] and in
> > the following thread, it was suggested that a flag should be passed to
> > tell oid_object_info_extended() that it should not fetch objects from
> > promisor remotes if they are missing, instead of using the ugly
> > fetch_if_missing global.
> >
> > It looks like the OBJECT_INFO_FOR_PREFETCH flag already exists but
> > unfortunately conflates 2 separate things.
> >
> > This patch series introduces OBJECT_INFO_NO_FETCH_IF_MISSING to
> > disambiguate the different meanings and then uses it instead of
> > OBJECT_INFO_FOR_PREFETCH where it makes sense.
> >
> > 1: 
> > https://public-inbox.org/git/b4d69d2b-dc0d-fffb-2909-c54060fe9...@gmail.com/
> > 2: 
> > https://public-inbox.org/git/20190409161116.30256-1-chrisc...@tuxfamily.org/
>
> What commit did you base these two patches on?

They were based on master at v2.22.0 when I worked on them, but I
didn't send them right away. And I didn't rebase them before I later
sent them.


Re: [PATCH] doc: improve usage string in MyFirstContribution

2019-06-21 Thread Christian Couder
On Thu, Jun 20, 2019 at 11:29 PM Emily Shaffer  wrote:
>
> On Thu, Jun 20, 2019 at 12:13:15AM +0200, Christian Couder wrote:
> >  SYNOPSIS
> >  
> >  [verse]
> > -'git-psuh'
> > +'git-psuh ...'
>
> It doesn't require 1 or more args - you can run it with no args. So it
> might be better suited to state the args as optional:
>
>   'git psuh [arg]...'
>
> >  
> >  static const char * const psuh_usage[] = {
> > - N_("git psuh"),
> > + N_("git psuh ..."),
>
> ...and here too.

Yeah sure, I will will resend soon with such changes.


Re: [PATCH 2/2] sha1-file: use OBJECT_INFO_NO_FETCH_IF_MISSING

2019-06-20 Thread Christian Couder
On Thu, Jun 20, 2019 at 2:39 PM Derrick Stolee  wrote:
>
> On 6/20/2019 4:50 AM, Jeff King wrote:
> > On Thu, Jun 20, 2019 at 10:30:26AM +0200, Christian Couder wrote:
> >
> >> Currently the OBJECT_INFO_FOR_PREFETCH flag is used to check
> >> if we should fetch objects from promisor remotes when we
> >> haven't found them elsewhere.
> >>
> >> Now that OBJECT_INFO_NO_FETCH_IF_MISSING exists, let's use
> >> it instead to be more correct in case this new flag is ever
> >> used without OBJECT_INFO_QUICK.
> >
> > I said earlier that this one would need to be tweaked for the new
> > upstream name. But actually, I think it is not necessary after Stolee's
> > patch.
>
> Yes, I believe that 31f5256c82  does an equivalent thing to the
> combination of these patches.

Yeah, I agree. Thanks Stolee for having already fixed that, and sorry
for bothering everyone with this.

Christian.


[PATCH 1/2] object-store: introduce OBJECT_INFO_NO_FETCH_IF_MISSING

2019-06-20 Thread Christian Couder
The fetch_if_missing global variable should be replaced as much
as possible by passing a flag to oid_object_info_extended().

The existing OBJECT_INFO_FOR_PREFETCH unfortunately conflates
both a "no fetch if missing" meaning with OBJECT_INFO_QUICK
which is about retrying packed storage.

Let's disambiguate that by adding a new explicit
OBJECT_INFO_NO_FETCH_IF_MISSING flag.

Signed-off-by: Christian Couder 
---
 object-store.h | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/object-store.h b/object-store.h
index 272e01e452..02c1795d50 100644
--- a/object-store.h
+++ b/object-store.h
@@ -277,10 +277,15 @@ struct object_info {
 #define OBJECT_INFO_IGNORE_LOOSE 16
 /*
  * Do not attempt to fetch the object if missing (even if fetch_is_missing is
- * nonzero). This is meant for bulk prefetching of missing blobs in a partial
+ * nonzero).
+ */
+#define OBJECT_INFO_NO_FETCH_IF_MISSING 32
+/*
+ * This is meant for bulk prefetching of missing blobs in a partial
  * clone. Implies OBJECT_INFO_QUICK.
  */
-#define OBJECT_INFO_FOR_PREFETCH (32 + OBJECT_INFO_QUICK)
+#define OBJECT_INFO_FOR_PREFETCH \
+  (OBJECT_INFO_NO_FETCH_IF_MISSING + OBJECT_INFO_QUICK)
 
 int oid_object_info_extended(struct repository *r,
 const struct object_id *,
-- 
2.22.0



[PATCH 2/2] sha1-file: use OBJECT_INFO_NO_FETCH_IF_MISSING

2019-06-20 Thread Christian Couder
Currently the OBJECT_INFO_FOR_PREFETCH flag is used to check
if we should fetch objects from promisor remotes when we
haven't found them elsewhere.

Now that OBJECT_INFO_NO_FETCH_IF_MISSING exists, let's use
it instead to be more correct in case this new flag is ever
used without OBJECT_INFO_QUICK.

Signed-off-by: Christian Couder 
---
 sha1-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index ed5c50dac4..2116ff1e70 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -1379,7 +1379,7 @@ int oid_object_info_extended(struct repository *r, const 
struct object_id *oid,
/* Check if it is a missing object */
if (fetch_if_missing && repository_format_partial_clone &&
!already_retried && r == the_repository &&
-   !(flags & OBJECT_INFO_FOR_PREFETCH)) {
+   !(flags & OBJECT_INFO_NO_FETCH_IF_MISSING)) {
/*
 * TODO Investigate having fetch_object() return
 * TODO error/success and stopping the music here.
-- 
2.22.0



[PATCH 0/2] Add OBJECT_INFO_NO_FETCH_IF_MISSING flag

2019-06-20 Thread Christian Couder
In a review[1] of my "many promisor remotes" patch series[2] and in
the following thread, it was suggested that a flag should be passed to
tell oid_object_info_extended() that it should not fetch objects from
promisor remotes if they are missing, instead of using the ugly
fetch_if_missing global.

It looks like the OBJECT_INFO_FOR_PREFETCH flag already exists but
unfortunately conflates 2 separate things.

This patch series introduces OBJECT_INFO_NO_FETCH_IF_MISSING to
disambiguate the different meanings and then uses it instead of
OBJECT_INFO_FOR_PREFETCH where it makes sense.

1: https://public-inbox.org/git/b4d69d2b-dc0d-fffb-2909-c54060fe9...@gmail.com/
2: https://public-inbox.org/git/20190409161116.30256-1-chrisc...@tuxfamily.org/

Christian Couder (2):
  object-store: introduce OBJECT_INFO_NO_FETCH_IF_MISSING
  sha1-file: use OBJECT_INFO_NO_FETCH_IF_MISSING

 object-store.h | 9 +++--
 sha1-file.c| 2 +-
 2 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.22.0



Re: [PATCH v4 0/4] Test oidmap

2019-06-19 Thread Christian Couder
On Thu, Jun 20, 2019 at 12:09 AM Jeff King  wrote:
>
> On Wed, Jun 19, 2019 at 05:42:13PM -0400, Jeff King wrote:
>
> > I do think that sha1hash() will eventually go away in favor of
> > oidhash(), but we can approach that separately, and convert oidmap along
> > with everyone else.

Yeah, deprecating it is a different topic.

> > It looks like we are close to being able to do that now. Grepping for
> > sha1hash shows just about everybody dereferencing an oid object, except
> > for the call in pack-objects.c. And skimming the callers there,
> > they all appear to have an oid object, too.
>
> Actually, it does get a little tangled, as our khash implementation also
> uses sha1hash (though IMHO that should become oidhash, too). There's
> also some preparatory work needed in pack-objects, which I've pushed up
> to:
>
>   https://github.com/peff/git jk/drop-sha1hash
>
> I got interrupted and may try to resume this cleanup later; mostly I
> wanted to post something now so you did not duplicate what I'd already
> done.

Thanks for your work on that,
Christian.


[PATCH] doc: improve usage string in MyFirstContribution

2019-06-19 Thread Christian Couder
We implement a command called git-psuh which accept arguments, so let's
show that it accepts arguments in the doc and the usage string.

While at it, we need to prepare "a NULL-terminated array of usage strings",
not just "a NULL-terminated usage string".

Signed-off-by: Christian Couder 
---
 Documentation/MyFirstContribution.txt | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/MyFirstContribution.txt 
b/Documentation/MyFirstContribution.txt
index 895b7cfd4f..7512a35ba3 100644
--- a/Documentation/MyFirstContribution.txt
+++ b/Documentation/MyFirstContribution.txt
@@ -428,7 +428,7 @@ git-psuh - Delight users' typo with a shy horse
 SYNOPSIS
 
 [verse]
-'git-psuh'
+'git-psuh ...'
 
 DESCRIPTION
 ---
@@ -491,14 +491,16 @@ Take a look at 
`Documentation/technical/api-parse-options.txt`. This is a handy
 tool for pulling out options you need to be able to handle, and it takes a
 usage string.
 
-In order to use it, we'll need to prepare a NULL-terminated usage string and a
-`builtin_psuh_options` array. Add a line to `#include "parse-options.h"`.
+In order to use it, we'll need to prepare a NULL-terminated array of usage
+strings and a `builtin_psuh_options` array.
 
-At global scope, add your usage:
+Add a line to `#include "parse-options.h"`.
+
+At global scope, add your array of usage strings:
 
 
 static const char * const psuh_usage[] = {
-   N_("git psuh"),
+   N_("git psuh ..."),
NULL,
 };
 
-- 
2.22.0



Re: 'git interpret-trailers' is tripped by comment characters other than '#'

2019-06-17 Thread Christian Couder
On Mon, Jun 17, 2019 at 7:31 PM Junio C Hamano  wrote:
>
> Christian Couder  writes:
>
> > On Mon, Jun 17, 2019 at 6:33 AM Masahiro Yamada
> >  wrote:
> >>
> >> On Sat, Jun 15, 2019 at 5:41 PM Christian Couder
> >>  wrote:
> >> >
> >> > > I do wonder if the trailer code is correct to always respect it, 
> >> > > though.
> >> > > For example, in "git log" output we'd expect to see commit messages 
> >> > > from
> >> > > people with all sorts of config. I suppose the point is that their
> >> > > comment characters wouldn't make it into the commit object at all, so
> >> > > the right answer there is probably not to look for comment characters 
> >> > > at
> >> > > all.
> >> >
> >> > Would you suggest an option, maybe called `--ignore-comments` to ignore 
> >> > them?
> >>
> >> Since 'git interpret-trailers' already ignores lines starting with '#',
> >> is this option true by default?
> >
> > Sorry, I should have suggested something called --unstrip-comments or
> > --ignore-comment-char that would make 'git interpret-trailers' stop
> > stripping lines that start with the comment character.
>
> So, to summarize:
>
>  - As the traditional behaviour is to strip comment, using the
>hardcoded definition of the comment char, i.e. '#', we do not
>switch the default.  Instead, a new command line option makes
>it pretend there is no comment char and nothing get stripped.

Yeah, that's the idea of --unstrip-comments (or
--ignore-comment-char). I am ok with preparing and sending a patch to
add that, though it is not urgent and it would be nice if we could
agree with the name first.

>  - But the core.commentchar that does not override hardcoded
>definition is a bug, so we'd fix that along the lines of what
>Peff's patch outlined.

Yeah, not sure if Peff wants to resend his patch with a proper commit
message. I would be ok with doing it if he prefers.

Thanks,
Christian.


Re: 'git interpret-trailers' is tripped by comment characters other than '#'

2019-06-16 Thread Christian Couder
On Mon, Jun 17, 2019 at 6:33 AM Masahiro Yamada
 wrote:
>
> On Sat, Jun 15, 2019 at 5:41 PM Christian Couder
>  wrote:
> >
> > > I do wonder if the trailer code is correct to always respect it, though.
> > > For example, in "git log" output we'd expect to see commit messages from
> > > people with all sorts of config. I suppose the point is that their
> > > comment characters wouldn't make it into the commit object at all, so
> > > the right answer there is probably not to look for comment characters at
> > > all.
> >
> > Would you suggest an option, maybe called `--ignore-comments` to ignore 
> > them?
>
> Since 'git interpret-trailers' already ignores lines starting with '#',
> is this option true by default?

Sorry, I should have suggested something called --unstrip-comments or
--ignore-comment-char that would make 'git interpret-trailers' stop
stripping lines that start with the comment character.


[PATCH v4 4/4] test-hashmap: remove 'hash' command

2019-06-15 Thread Christian Couder
If hashes like strhash() are updated, for example to use a different
hash algorithm, we should not have to be updating t0011 to change out
the hashes.

As long as hashmap can store and retrieve values, and that it performs
well, we should not care what are the values of the hashes. Let's just
focus on the externally visible behavior instead.

Suggested-by: Jeff King 
Signed-off-by: Christian Couder 
---
 t/helper/test-hashmap.c | 9 +
 t/t0011-hashmap.sh  | 9 -
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/t/helper/test-hashmap.c b/t/helper/test-hashmap.c
index 23d2b172fe..aaf17b0ddf 100644
--- a/t/helper/test-hashmap.c
+++ b/t/helper/test-hashmap.c
@@ -173,14 +173,7 @@ int cmd__hashmap(int argc, const char **argv)
p2 = strtok(NULL, DELIM);
}
 
-   if (!strcmp("hash", cmd) && p1) {
-
-   /* print results of different hash functions */
-   printf("%u %u %u %u\n",
-  strhash(p1), memhash(p1, strlen(p1)),
-  strihash(p1), memihash(p1, strlen(p1)));
-
-   } else if (!strcmp("add", cmd) && p1 && p2) {
+   if (!strcmp("add", cmd) && p1 && p2) {
 
/* create entry with key = p1, value = p2 */
entry = alloc_test_entry(hash, p1, p2);
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 3f1f505e89..9c96b3e3b1 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -9,15 +9,6 @@ test_hashmap() {
test_cmp expect actual
 }
 
-test_expect_success 'hash functions' '
-
-test_hashmap "hash key1" "2215982743 2215982743 116372151 116372151" &&
-test_hashmap "hash key2" "2215982740 2215982740 116372148 116372148" &&
-test_hashmap "hash fooBarFrotz" "1383912807 1383912807 3189766727 3189766727" 
&&
-test_hashmap "hash foobarfrotz" "2862305959 2862305959 3189766727 3189766727"
-
-'
-
 test_expect_success 'put' '
 
 test_hashmap "put key1 value1
-- 
2.22.0.3.g82edbe9b01.dirty



[PATCH v4 1/4] t/helper: add test-oidmap.c

2019-06-15 Thread Christian Couder
This new helper is very similar to "test-hashmap.c" and will help
test how `struct oidmap` from oidmap.{c,h} can be used.

Helped-by: SZEDER Gábor 
Helped-by: Jeff King 
Signed-off-by: Christian Couder 
---
 Makefile   |   1 +
 t/helper/test-oidmap.c | 126 +
 t/helper/test-tool.c   |   1 +
 t/helper/test-tool.h   |   1 +
 4 files changed, 129 insertions(+)
 create mode 100644 t/helper/test-oidmap.c

diff --git a/Makefile b/Makefile
index 8a7e235352..5efc7700ed 100644
--- a/Makefile
+++ b/Makefile
@@ -727,6 +727,7 @@ TEST_BUILTINS_OBJS += test-lazy-init-name-hash.o
 TEST_BUILTINS_OBJS += test-match-trees.o
 TEST_BUILTINS_OBJS += test-mergesort.o
 TEST_BUILTINS_OBJS += test-mktemp.o
+TEST_BUILTINS_OBJS += test-oidmap.o
 TEST_BUILTINS_OBJS += test-online-cpus.o
 TEST_BUILTINS_OBJS += test-parse-options.o
 TEST_BUILTINS_OBJS += test-path-utils.o
diff --git a/t/helper/test-oidmap.c b/t/helper/test-oidmap.c
new file mode 100644
index 00..7036588175
--- /dev/null
+++ b/t/helper/test-oidmap.c
@@ -0,0 +1,126 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "oidmap.h"
+#include "strbuf.h"
+
+/* key is an oid and value is a name (could be a refname for example) */
+struct test_entry {
+   struct oidmap_entry entry;
+   char name[FLEX_ARRAY];
+};
+
+#define DELIM " \t\r\n"
+
+/*
+ * Read stdin line by line and print result of commands to stdout:
+ *
+ * hash oidkey -> sha1hash(oidkey)
+ * put oidkey namevalue -> NULL / old namevalue
+ * get oidkey -> NULL / namevalue
+ * remove oidkey -> NULL / old namevalue
+ * iterate -> oidkey1 namevalue1\noidkey2 namevalue2\n...
+ *
+ */
+int cmd__oidmap(int argc, const char **argv)
+{
+   struct strbuf line = STRBUF_INIT;
+   struct oidmap map = OIDMAP_INIT;
+
+   setup_git_directory();
+
+   /* init oidmap */
+   oidmap_init(&map, 0);
+
+   /* process commands from stdin */
+   while (strbuf_getline(&line, stdin) != EOF) {
+   char *cmd, *p1 = NULL, *p2 = NULL;
+   struct test_entry *entry;
+   struct object_id oid;
+
+   /* break line into command and up to two parameters */
+   cmd = strtok(line.buf, DELIM);
+   /* ignore empty lines */
+   if (!cmd || *cmd == '#')
+   continue;
+
+   p1 = strtok(NULL, DELIM);
+   if (p1)
+   p2 = strtok(NULL, DELIM);
+
+   if (!strcmp("add", cmd) && p1 && p2) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* create entry with oidkey from p1, value = p2 */
+   FLEX_ALLOC_STR(entry, name, p2);
+   oidcpy(&entry->entry.oid, &oid);
+
+   /* add to oidmap */
+   oidmap_put(&map, entry);
+
+   } else if (!strcmp("put", cmd) && p1 && p2) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* create entry with oid_key = p1, name_value = p2 */
+   FLEX_ALLOC_STR(entry, name, p2);
+   oidcpy(&entry->entry.oid, &oid);
+
+   /* add / replace entry */
+   entry = oidmap_put(&map, entry);
+
+   /* print and free replaced entry, if any */
+   puts(entry ? entry->name : "NULL");
+   free(entry);
+
+   } else if (!strcmp("get", cmd) && p1) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* lookup entry in oidmap */
+   entry = oidmap_get(&map, &oid);
+
+   /* print result */
+   puts(entry ? entry->name : "NULL");
+
+   } else if (!strcmp("remove", cmd) && p1) {
+
+   if (get_oid(p1, &oid)) {
+   printf("Unknown oid: %s\n", p1);
+   continue;
+   }
+
+   /* remove entry from oidmap */
+   entry = oidmap_remove(&map, &oid);
+
+   /* print result and free entry*/
+   puts(entry ? entry->name : "NULL");
+  

[PATCH v4 0/4] Test oidmap

2019-06-15 Thread Christian Couder
Unlike hashmap that has t/helper/test-hashmap.c and t/t0011-hashmap.sh
oidmap has no specific test. The goal of this small patch series is to
change that and also improve oidmap a bit while at it.

Changes compared to V3 are the following:

  - removed "hash" command in test-oidmap.c and "hash" test in
t0016-oidmap.sh as suggested by Peff,

  - added patch 4/4 which does the same as above in test-hashmap.c and
t0011-hashmap.sh as suggested by Peff.

Previous versions on the mailing list:

V3: https://public-inbox.org/git/20190612232425.12149-1-chrisc...@tuxfamily.org/
V2: https://public-inbox.org/git/20190611082325.28878-1-chrisc...@tuxfamily.org/
V1: https://public-inbox.org/git/20190609044907.32477-1-chrisc...@tuxfamily.org/

This patch series on GitHub:

https://github.com/chriscool/git/commits/oidmap

Christian Couder (4):
  t/helper: add test-oidmap.c
  t: add t0016-oidmap.sh
  oidmap: use sha1hash() instead of static hash() function
  test-hashmap: remove 'hash' command

 Makefile|   1 +
 oidmap.c|  13 +
 t/helper/test-hashmap.c |   9 +--
 t/helper/test-oidmap.c  | 126 
 t/helper/test-tool.c|   1 +
 t/helper/test-tool.h|   1 +
 t/t0011-hashmap.sh  |   9 ---
 t/t0016-oidmap.sh   |  84 +++
 8 files changed, 217 insertions(+), 27 deletions(-)
 create mode 100644 t/helper/test-oidmap.c
 create mode 100755 t/t0016-oidmap.sh

-- 
2.22.0.3.g82edbe9b01.dirty



[PATCH v4 3/4] oidmap: use sha1hash() instead of static hash() function

2019-06-15 Thread Christian Couder
From: Christian Couder 

Get rid of the static hash() function in oidmap.c which is redundant
with sha1hash(). Use sha1hash() directly instead.

Let's be more consistent and not use several hash functions doing
nearly exactly the same thing.

Signed-off-by: Christian Couder 
---
 oidmap.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/oidmap.c b/oidmap.c
index b0841a0f58..01c206aaef 100644
--- a/oidmap.c
+++ b/oidmap.c
@@ -12,13 +12,6 @@ static int oidmap_neq(const void *hashmap_cmp_fn_data,
  &((const struct oidmap_entry *) entry_or_key)->oid);
 }
 
-static int hash(const struct object_id *oid)
-{
-   int hash;
-   memcpy(&hash, oid->hash, sizeof(hash));
-   return hash;
-}
-
 void oidmap_init(struct oidmap *map, size_t initial_size)
 {
hashmap_init(&map->map, oidmap_neq, NULL, initial_size);
@@ -36,7 +29,7 @@ void *oidmap_get(const struct oidmap *map, const struct 
object_id *key)
if (!map->map.cmpfn)
return NULL;
 
-   return hashmap_get_from_hash(&map->map, hash(key), key);
+   return hashmap_get_from_hash(&map->map, sha1hash(key->hash), key);
 }
 
 void *oidmap_remove(struct oidmap *map, const struct object_id *key)
@@ -46,7 +39,7 @@ void *oidmap_remove(struct oidmap *map, const struct 
object_id *key)
if (!map->map.cmpfn)
oidmap_init(map, 0);
 
-   hashmap_entry_init(&entry, hash(key));
+   hashmap_entry_init(&entry, sha1hash(key->hash));
return hashmap_remove(&map->map, &entry, key);
 }
 
@@ -57,6 +50,6 @@ void *oidmap_put(struct oidmap *map, void *entry)
if (!map->map.cmpfn)
oidmap_init(map, 0);
 
-   hashmap_entry_init(&to_put->internal_entry, hash(&to_put->oid));
+   hashmap_entry_init(&to_put->internal_entry, sha1hash(to_put->oid.hash));
return hashmap_put(&map->map, to_put);
 }
-- 
2.22.0.3.g82edbe9b01.dirty



[PATCH v4 2/4] t: add t0016-oidmap.sh

2019-06-15 Thread Christian Couder
From: Christian Couder 

Add actual tests for operations using `struct oidmap` from oidmap.{c,h}.

Helped-by: SZEDER Gábor 
Helped-by: Jeff King 
Signed-off-by: Christian Couder 
---
 t/t0016-oidmap.sh | 84 +++
 1 file changed, 84 insertions(+)
 create mode 100755 t/t0016-oidmap.sh

diff --git a/t/t0016-oidmap.sh b/t/t0016-oidmap.sh
new file mode 100755
index 00..af17264ce3
--- /dev/null
+++ b/t/t0016-oidmap.sh
@@ -0,0 +1,84 @@
+#!/bin/sh
+
+test_description='test oidmap'
+. ./test-lib.sh
+
+# This purposefully is very similar to t0011-hashmap.sh
+
+test_oidmap () {
+   echo "$1" | test-tool oidmap $3 >actual &&
+   echo "$2" >expect &&
+   test_cmp expect actual
+}
+
+
+test_expect_success 'setup' '
+
+   test_commit one &&
+   test_commit two &&
+   test_commit three &&
+   test_commit four
+
+'
+
+test_expect_success 'put' '
+
+test_oidmap "put one 1
+put two 2
+put invalidOid 4
+put three 3" "NULL
+NULL
+Unknown oid: invalidOid
+NULL"
+
+'
+
+test_expect_success 'replace' '
+
+test_oidmap "put one 1
+put two 2
+put three 3
+put invalidOid 4
+put two deux
+put one un" "NULL
+NULL
+NULL
+Unknown oid: invalidOid
+2
+1"
+
+'
+
+test_expect_success 'get' '
+
+test_oidmap "put one 1
+put two 2
+put three 3
+get two
+get four
+get invalidOid
+get one" "NULL
+NULL
+NULL
+2
+NULL
+Unknown oid: invalidOid
+1"
+
+'
+
+test_expect_success 'iterate' '
+
+test_oidmap "put one 1
+put two 2
+put three 3
+iterate" "NULL
+NULL
+NULL
+$(git rev-parse two) 2
+$(git rev-parse one) 1
+$(git rev-parse three) 3"
+
+'
+
+test_done
-- 
2.22.0.3.g82edbe9b01.dirty



Re: 'git interpret-trailers' is tripped by comment characters other than '#'

2019-06-15 Thread Christian Couder
On Fri, Jun 14, 2019 at 5:10 PM Jeff King  wrote:
>
> On Fri, Jun 14, 2019 at 08:35:04PM +0900, Masahiro Yamada wrote:
>
> > Perhaps, 'git interpret-trailers' should be changed
> > to recognize core.commentChar ?
>
> It looks like the trailer code does respect it, but the
> interpret-trailers program never loads the config. Does the patch below
> make your problem go away?

It seems to me to be the right analysis and the right fix too.

> I do wonder if the trailer code is correct to always respect it, though.
> For example, in "git log" output we'd expect to see commit messages from
> people with all sorts of config. I suppose the point is that their
> comment characters wouldn't make it into the commit object at all, so
> the right answer there is probably not to look for comment characters at
> all.

Would you suggest an option, maybe called `--ignore-comments` to ignore them?

Thanks,
Christian.


Re: [PATCH 2/3] t: add t0016-oidmap.sh

2019-06-14 Thread Christian Couder
On Fri, Jun 14, 2019 at 12:22 AM Junio C Hamano  wrote:
>
> Jeff King  writes:
>
> >> > I know there are testing philosophies that go to this level of
> >> > white-box testing, but I don't think we usually do in Git. A unit
> >> > test of oidmap's externally visible behavior seems like the right
> >> > level to me.
> >>
> >> That's a good point...  but then why does 't0011-hashmap.sh' do it in
> >> the first place?  As far as I understood this t0016 mainly follows
> >> suit of t0011.
> >
> > I'd make the same argument against t0011. :)
>
> Yeah, I tend to agree.  It is not a good excuse that somebody else
> alerady has made a mistake.

Ok, I will remove the "hash" test in t0016 and the corresponding code
in test-oidmap.c.

> > I think there it at least made a little more sense because we truly are
> > hashing ourselves, rather than just copying out some sha1 bytes. But I
> > think I'd still argue that if I updated strhash() to use a different
> > hash, I should not have to be updating t0011 to change out the hashes.
>
> True, too.

I will also send an additional patch to remove similar code in t00161
and test-hashmap.c.


  1   2   3   4   5   6   7   8   9   10   >