Re: [PATCH v3 08/28] shallow.c: add mark_new_shallow_refs()

2013-11-26 Thread Junio C Hamano
Duy Nguyen  writes:

> On Tue, Nov 26, 2013 at 5:20 AM, Junio C Hamano  wrote:
>> Hmph.  the use of ->util field in this patch feels that it was
>> something commit-slab data structure was invented to solve.
>
> Good stuff! Thanks.
>
>>> + if (c->util == NULL)
>>> + c->util = bitmap;
>>> + else {
>>> + /*
>>> +  * Deliberately leak a lot in commit->util
>>> +  * because there can be many pointers to the
>>> +  * same bitmap. Probably should allocate in a
>>> +  * pool and free the whole pool at the end.
>>> +  */
>>
>> ... or perhaps make the bitmap into
>>
>> struct {
>> int refcnt;
>> uint32_t bits[FLEX_ARRAY];
>> }
>>
>> and refcnt them?
>
> I still prefer memory pools so I just need to do a few free() than
> walking through all the commits again and refcnt-- or free() them.

Fair enough.

> Sorry to break the patches this way and lose the overall call flow.
> It's just too big to put all into one patch. 13/28 is the one that put
> the pieces together but basically
>
>  1. receive the remote's .git/shallow
>  2. call remote_reachable_shallow_points() to exclude our shallow commits
>  3. get the pack and install it (or unpack it)
>  4. call this function to determine what new ref needs new shallow
> commits from the result of #2

Thanks for a roadmap.  Will find time to re-read the thing with it.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/28] shallow.c: add mark_new_shallow_refs()

2013-11-26 Thread Duy Nguyen
On Tue, Nov 26, 2013 at 5:20 AM, Junio C Hamano  wrote:
> Hmph.  the use of ->util field in this patch feels that it was
> something commit-slab data structure was invented to solve.

Good stuff! Thanks.

>> + if (c->util == NULL)
>> + c->util = bitmap;
>> + else {
>> + /*
>> +  * Deliberately leak a lot in commit->util
>> +  * because there can be many pointers to the
>> +  * same bitmap. Probably should allocate in a
>> +  * pool and free the whole pool at the end.
>> +  */
>
> ... or perhaps make the bitmap into
>
> struct {
> int refcnt;
> uint32_t bits[FLEX_ARRAY];
> }
>
> and refcnt them?

I still prefer memory pools so I just need to do a few free() than
walking through all the commits again and refcnt-- or free() them.

>> + /*
>> +  * Quick check to see if we may need to add new shallow
>> +  * roots. Go through the list of root candidates and check if
>> +  * they exist (either in current repo, or in the new pack, we
>> +  * can't distinguish).
>> +  *
>> +  * 1) If none of the new roots exist, the pack must connect to
>> +  *the main object graph, which is already guarded by
>> +  *current repo's shallow roots and we will not need to
>> +  *consider adding new shallow roots, so we can exit early.
>> +  *
>> +  * 2) The pack may connect to some existing object islands in
>> +  *current repo then add shallow roots to plug loose ends
>> +  *from those islands. In that case, new shallow roots must
>> +  *also exist in the repo as this stage (old objects plus
>> +  *the new pack).
>> +  *
>> +  * 3) The last, easiest case, is the pack contains some
>> +  *shallow roots, which may be used to tie up loose ends of
>> +  *some new refs, or redundanty (tying up loose ends of new
>> +  *object islands)
>> +  */
>> + for (i = 0;i < shallow->nr; i++)
>> + if (has_sha1_file(shallow->array[i]))
>> + break;
>> + if (i == shallow->nr)
>> + /*
>> +  * this is the first and also the common case, where
>> +  * the new pack does not carry any new shallow
>> +  * points. No need to to the expensive commit traverse
>> +  * dance below.
>> +  */
>> + return 0;
>
> I am Confused.
>
> The loop only made sure that all the elements in the array[] is
> still missing (or, ... is this function supposed to be called before
> installing a new pack???  It is unclear.  But if new objects were
> unpacked while receiving, then there is no "not install the new
> objects and check" possible, so I'd assume this is called after
> receiving and registering a new pack to the object store).

Yes it's called after installing the new packs (or after unpacking).
I'll mention about this.

> But then, can it be that you had N-1 "shallow points" originally,
> the pack has a reference to a new missing commit, and the array has
> N "shallow points" in total?  Or is the caller expected to call this
> function with shallow pointing at a pre-transfer shallow points?

Another thing I did not mention is all shallow commits we have are
already filtered out by remove_reachable_shallow_points. By the time
you get here, the array only contains the shallow commits we don't
have that may or may not be referenced by a something (oh yeah I did
not handle tags!) in the new pack. So if all of them are missing (i.e.
the new pack does not reference to any of them), they're useless and
can be thrown away.

Sorry to break the patches this way and lose the overall call flow.
It's just too big to put all into one patch. 13/28 is the one that put
the pieces together but basically

 1. receive the remote's .git/shallow
 2. call remote_reachable_shallow_points() to exclude our shallow commits
 3. get the pack and install it (or unpack it)
 4. call this function to determine what new ref needs new shallow
commits from the result of #2

>> + /*
>> +  * Prepare the commit graph to track what refs can reach what
>> +  * (new) shallow points.
>> +  */
>> + nr = get_max_object_index();
>
> Hmph. At this point (again, there is no description on where in the
> overall sequence of events this function is designed to be called,
> so it is impossible to review the logic), is it expected that we
> have seen all the objects under the sun and marked them in a
> specific way?

No. At this point we just make sure to have clean flags in commit
objects. paint_down() is the one that walks through and marks them as
seen.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 08/28] shallow.c: add mark_new_shallow_refs()

2013-11-25 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> When we receive a pack and the shallow points from another repository,
> we may need to add more shallow points to current repo to make sure no
> commits point to nowhere. But usually we don't want to do so because
> (in future) new shallow points invalidate pack bitmaps and we need to
> rebuild them again, which is not cheap.
>
> So the default way is we allow ref updates that do not introduce new
> shallow points and mark the others. If the user is fine with new
> shallow point addition, we accept the marked refs.
>
> But even so we do not blindly accept all shallow points provided. Some
> of them might not point to any commits in the new pack. Some might
> even do, but those might be unreachable object islands. Only shallow
> points that are reachable from old and new refs can stay.
>
> The way it's implemented is paint down from each ref, attach a bitmap
> to each commit where one bit represents one ref. In order to avoid
> allocating new bitmap for every commit, we try to reuse the same
> bitmap for parent commits if possible. This reduces allocation and
> leaks deliberately because it's hard to keep/time consuming track how
> many pointers to the same buffer.
>
> In a typical push or fetch, the new pack should not carry new shallow
> roots. If the current repo does not have any commit islands refered by
> the sender's shallow roots either, this function is just a few
> has_sha1_file(). So quite cheap.
>
> Once the sender diverts from that path (or the receiver detects
> shallow roots attached to commit islands from 
> remove_reachable_shallow_points),
> it'll be a lot more expensive. Pack bitmaps won't help this kind of
> commit traversal, but commit cache might.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  commit.h  |   4 ++
>  shallow.c | 219 
> ++
>  2 files changed, 223 insertions(+)

Hmph.  the use of ->util field in this patch feels that it was
something commit-slab data structure was invented to solve.


> diff --git a/shallow.c b/shallow.c
> index a974d2d..c92a1dc 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -4,6 +4,8 @@
>  #include "pkt-line.h"
>  #include "remote.h"
>  #include "refs.h"
> +#include "diff.h"
> +#include "revision.h"
>  
>  static int is_shallow = -1;
>  static struct stat shallow_stat;
> @@ -280,3 +282,220 @@ void remove_reachable_shallow_points(struct 
> extra_have_objects *out,
>   }
>   free(ca.commits);
>  }
> +
> +static int paint_down(const unsigned char *sha1, int id, int nr_bits, int 
> quick)
> +{
> + int hit_bottom = 0;
> + unsigned int i, nr;
> + struct commit_list *head = NULL;
> + int bitmap_nr = (nr_bits + 31) / 32;
> + int bitmap_size = bitmap_nr * sizeof(uint32_t);
> + uint32_t *tmp = xmalloc(bitmap_size);
> + uint32_t *bitmap = xcalloc(bitmap_size, sizeof(uint32_t));
> + bitmap[id / 32] |= (1 << (id % 32));
> + commit_list_insert(lookup_commit(sha1), &head);
> + while (head) {
> + struct commit_list *p;
> + struct commit *c = head->item;
> + uint32_t *c_util = c->util;
> +
> + p = head;
> + head = head->next;
> + free(p);
> +
> + if (c->object.flags & (SEEN | UNINTERESTING))
> + continue;
> + else
> + c->object.flags |= SEEN;
> +
> + if (c->util == NULL)
> + c->util = bitmap;
> + else {
> + /*
> +  * Deliberately leak a lot in commit->util
> +  * because there can be many pointers to the
> +  * same bitmap. Probably should allocate in a
> +  * pool and free the whole pool at the end.
> +  */

... or perhaps make the bitmap into

struct {
int refcnt;
uint32_t bits[FLEX_ARRAY];
}

and refcnt them?


> + memcpy(tmp, c_util, bitmap_size);
> + for (i = 0; i < bitmap_nr; i++)
> + tmp[i] |= bitmap[i];
> + if (memcmp(tmp, c_util, bitmap_size)) {
> + c->util = xmalloc(bitmap_size);
> + memcpy(c->util, tmp, bitmap_size);
> + }
> + }
> +
> + if (c->object.flags & BOTTOM) {
> + hit_bottom = 1;
> + if (quick) {
> + free_commit_list(head);
> + break;
> + } else
> + continue;
> + }
> +
> + if (parse_commit(c))
> + die("unable to parse commit %s",
> + sha1_to_hex(c->object.sha1));
> +
> + for (p = c->parents; p; p = p->next) {
> + if (p->item->object.flags & SEE

[PATCH v3 08/28] shallow.c: add mark_new_shallow_refs()

2013-11-24 Thread Nguyễn Thái Ngọc Duy
When we receive a pack and the shallow points from another repository,
we may need to add more shallow points to current repo to make sure no
commits point to nowhere. But usually we don't want to do so because
(in future) new shallow points invalidate pack bitmaps and we need to
rebuild them again, which is not cheap.

So the default way is we allow ref updates that do not introduce new
shallow points and mark the others. If the user is fine with new
shallow point addition, we accept the marked refs.

But even so we do not blindly accept all shallow points provided. Some
of them might not point to any commits in the new pack. Some might
even do, but those might be unreachable object islands. Only shallow
points that are reachable from old and new refs can stay.

The way it's implemented is paint down from each ref, attach a bitmap
to each commit where one bit represents one ref. In order to avoid
allocating new bitmap for every commit, we try to reuse the same
bitmap for parent commits if possible. This reduces allocation and
leaks deliberately because it's hard to keep/time consuming track how
many pointers to the same buffer.

In a typical push or fetch, the new pack should not carry new shallow
roots. If the current repo does not have any commit islands refered by
the sender's shallow roots either, this function is just a few
has_sha1_file(). So quite cheap.

Once the sender diverts from that path (or the receiver detects
shallow roots attached to commit islands from remove_reachable_shallow_points),
it'll be a lot more expensive. Pack bitmaps won't help this kind of
commit traversal, but commit cache might.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 commit.h  |   4 ++
 shallow.c | 219 ++
 2 files changed, 223 insertions(+)

diff --git a/commit.h b/commit.h
index 98044e6..e1fd587 100644
--- a/commit.h
+++ b/commit.h
@@ -194,6 +194,7 @@ extern struct commit_list *get_octopus_merge_bases(struct 
commit_list *in);
 #define INFINITE_DEPTH 0x7fff
 
 struct extra_have_objects;
+struct ref;
 extern int register_shallow(const unsigned char *sha1);
 extern int unregister_shallow(const unsigned char *sha1);
 extern int for_each_commit_graft(each_commit_graft_fn, void *);
@@ -209,6 +210,9 @@ extern char *setup_temporary_shallow(void);
 extern void advertise_shallow_grafts(int);
 extern void remove_reachable_shallow_points(struct extra_have_objects *out,
const struct extra_have_objects 
*in);
+extern int mark_new_shallow_refs(const struct extra_have_objects *ref,
+int *ref_status, uint32_t **used,
+const struct extra_have_objects *shallow);
 
 int is_descendant_of(struct commit *, struct commit_list *);
 int in_merge_bases(struct commit *, struct commit *);
diff --git a/shallow.c b/shallow.c
index a974d2d..c92a1dc 100644
--- a/shallow.c
+++ b/shallow.c
@@ -4,6 +4,8 @@
 #include "pkt-line.h"
 #include "remote.h"
 #include "refs.h"
+#include "diff.h"
+#include "revision.h"
 
 static int is_shallow = -1;
 static struct stat shallow_stat;
@@ -280,3 +282,220 @@ void remove_reachable_shallow_points(struct 
extra_have_objects *out,
}
free(ca.commits);
 }
+
+static int paint_down(const unsigned char *sha1, int id, int nr_bits, int 
quick)
+{
+   int hit_bottom = 0;
+   unsigned int i, nr;
+   struct commit_list *head = NULL;
+   int bitmap_nr = (nr_bits + 31) / 32;
+   int bitmap_size = bitmap_nr * sizeof(uint32_t);
+   uint32_t *tmp = xmalloc(bitmap_size);
+   uint32_t *bitmap = xcalloc(bitmap_size, sizeof(uint32_t));
+   bitmap[id / 32] |= (1 << (id % 32));
+   commit_list_insert(lookup_commit(sha1), &head);
+   while (head) {
+   struct commit_list *p;
+   struct commit *c = head->item;
+   uint32_t *c_util = c->util;
+
+   p = head;
+   head = head->next;
+   free(p);
+
+   if (c->object.flags & (SEEN | UNINTERESTING))
+   continue;
+   else
+   c->object.flags |= SEEN;
+
+   if (c->util == NULL)
+   c->util = bitmap;
+   else {
+   /*
+* Deliberately leak a lot in commit->util
+* because there can be many pointers to the
+* same bitmap. Probably should allocate in a
+* pool and free the whole pool at the end.
+*/
+   memcpy(tmp, c_util, bitmap_size);
+   for (i = 0; i < bitmap_nr; i++)
+   tmp[i] |= bitmap[i];
+   if (memcmp(tmp, c_util, bitmap_size)) {
+   c->util = xmalloc(bitmap_size);
+   memcpy(c->util, tmp, bitmap_size);
+