Re: [PATCH 3/6] fetch-pack: in protocol v2, enqueue commons first

2018-06-05 Thread Jonathan Tan
On Tue, Jun 5, 2018 at 4:30 PM, Jonathan Nieder  wrote:
> I get lost in the above description.  I suspect it's doing a good job
> of describing the code, instead of answering the question I really
> have about what is broken and what behavior we want instead.
>
> E.g. are there some commands that I can run to trigger the unnecessary
> "have" lines?  That would make it easier for me to understand the rest
> and whether this is a good approach for suppressing them.
>
> It's possible I should be skipping to the test, but a summary in the
> commit message would make life easier for lazy people like me. :)

OK, I'll start the commit message with explaining a situation in which
these redundant "have" lines will appear instead. (The situation will
be the same as the one in the test.)

> This is subtle.  My instinct would be to assume that the purpose of
> everything_local is just to determine whether we need to send a fetch
> request, but it appears we also want to rely on a side effect from it.
>
> Could everything_local get a function comment to describe what side
> effects we will be counting on from it?

You're right that there's a side effect in everything_local. In v2,
I'll have a preparatory patch to separate it into a few functions so
that we can see what happens more clearly.

> nit: this adds the new test as last in the script.  Is there some
> logical earlier place in the file it can go instead?  That way, the
> file stays organized and concurrent patches that modify the same test
> script are less likely to conflict.

Good point. I'll find a place.

>> + rm -rf server client &&
>> + git init server &&
>> + test_commit -C server aref_both_1 &&
>> + git -C server tag -d aref_both_1 &&
>> + test_commit -C server aref_both_2 &&
>
> What does aref stand for?

"A ref", "a" as in "one". I'll find a better name (probably just
"both_1" and "both_2").

>> +
>> + # The ref name that only the server has must be a prefix of all the
>> + # others, to ensure that the client has the same information regardless
>> + # of whether protocol v0 (which does not have ref prefix filtering) or
>> + # protocol v2 (which does) is used.
>
> must or else what?  Maybe:
>
> # In this test, the ref name that only the server has is a prefix of
> # all other refs. This ensures that the client has the same 
> information
> # regardless of [etc]

Thanks - I'll use your suggestion.

>> + git clone server client &&
>> + test_commit -C server aref &&
>> + test_commit -C client aref_client &&
>> +
>> + # In both protocol v0 and v2, ensure that the parent of aref_both_2 is
>> + # not sent as a "have" line.
>
> Why shouldn't it be sent as a "have" line?  E.g. does another "have"
> line make it redundant?

The server's ref advertisement makes it redundant. I'll explain this
more clearly in v2.

>> +
>> + rm -f trace &&
>> + cp -r client clientv0 &&
>> + GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv0 \
>> + fetch origin aref &&
>> + grep "have $(git -C client rev-parse aref_client)" trace &&
>> + grep "have $(git -C client rev-parse aref_both_2)" trace &&
>
> nit: can make this more robust by doing
>
> aref_client=$(git -C client rev-parse aref_client) &&
> aref_both_2=$(git -C client rev-parse aref_both_2) &&
>
> since this way if the git command fails, the test fails.

Will do. Thanks for your comments.


Re: [PATCH 3/6] fetch-pack: in protocol v2, enqueue commons first

2018-06-05 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> In do_fetch_pack_v2(), rev_list_insert_ref_oid() is invoked before
> everything_local(). This means that if we have a commit that is both our
> ref and their ref, it would be enqueued by rev_list_insert_ref_oid() as
> SEEN, and since it is thus already SEEN, everything_local() would not
> enqueue it.
>
> If everything_local() were invoked first, as it is in do_fetch_pack()
> for protocol v0, then everything_local() would enqueue it with
> COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents
> are not sent as "have" lines.
>
> Change the order in do_fetch_pack_v2() to be consistent with
> do_fetch_pack(), and to avoid sending unnecessary "have" lines.

I get lost in the above description.  I suspect it's doing a good job
of describing the code, instead of answering the question I really
have about what is broken and what behavior we want instead.

E.g. are there some commands that I can run to trigger the unnecessary
"have" lines?  That would make it easier for me to understand the rest
and whether this is a good approach for suppressing them.

It's possible I should be skipping to the test, but a summary in the
commit message would make life easier for lazy people like me. :)

[...]
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -1372,14 +1372,14 @@ static struct ref *do_fetch_pack_v2(struct 
> fetch_pack_args *args,
>   for_each_ref(clear_marks, NULL);
>   marked = 1;
>  
> - for_each_ref(rev_list_insert_ref_oid, NULL);
> - for_each_cached_alternate(insert_one_alternate_object);
> -
>   /* Filter 'ref' by 'sought' and those that aren't local 
> */
>   if (everything_local(args, , sought, nr_sought))
>   state = FETCH_DONE;
>   else
>   state = FETCH_SEND_REQUEST;
> +
> + for_each_ref(rev_list_insert_ref_oid, NULL);
> + for_each_cached_alternate(insert_one_alternate_object);

This is subtle.  My instinct would be to assume that the purpose of
everything_local is just to determine whether we need to send a fetch
request, but it appears we also want to rely on a side effect from it.

Could everything_local get a function comment to describe what side
effects we will be counting on from it?

>   break;
>   case FETCH_SEND_REQUEST:
>   if (send_fetch_request(fd[1], args, ref, ,
> diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> index 0680dec80..ad6a50ad6 100755
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -808,6 +808,41 @@ test_expect_success 'fetch with --filter=blob:limit=0' '
>   fetch_filter_blob_limit_zero server server
>  '
>  
> +test_expect_success 'use ref advertisement to prune "have" lines sent' '

nit: this adds the new test as last in the script.  Is there some
logical earlier place in the file it can go instead?  That way, the
file stays organized and concurrent patches that modify the same test
script are less likely to conflict.

> + rm -rf server client &&
> + git init server &&
> + test_commit -C server aref_both_1 &&
> + git -C server tag -d aref_both_1 &&
> + test_commit -C server aref_both_2 &&

What does aref stand for?

> +
> + # The ref name that only the server has must be a prefix of all the
> + # others, to ensure that the client has the same information regardless
> + # of whether protocol v0 (which does not have ref prefix filtering) or
> + # protocol v2 (which does) is used.

must or else what?  Maybe:

# In this test, the ref name that only the server has is a prefix of
# all other refs. This ensures that the client has the same information
# regardless of [etc]

> + git clone server client &&
> + test_commit -C server aref &&
> + test_commit -C client aref_client &&
> +
> + # In both protocol v0 and v2, ensure that the parent of aref_both_2 is
> + # not sent as a "have" line.

Why shouldn't it be sent as a "have" line?  E.g. does another "have"
line make it redundant?

> +
> + rm -f trace &&
> + cp -r client clientv0 &&
> + GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv0 \
> + fetch origin aref &&
> + grep "have $(git -C client rev-parse aref_client)" trace &&
> + grep "have $(git -C client rev-parse aref_both_2)" trace &&

nit: can make this more robust by doing

aref_client=$(git -C client rev-parse aref_client) &&
aref_both_2=$(git -C client rev-parse aref_both_2) &&

since this way if the git command fails, the test fails.

> + ! grep "have $(git -C client rev-parse aref_both_2^)" trace &&

Nice.

Thanks for a pleasant read,
Jonathan


[PATCH 3/6] fetch-pack: in protocol v2, enqueue commons first

2018-06-04 Thread Jonathan Tan
In do_fetch_pack_v2(), rev_list_insert_ref_oid() is invoked before
everything_local(). This means that if we have a commit that is both our
ref and their ref, it would be enqueued by rev_list_insert_ref_oid() as
SEEN, and since it is thus already SEEN, everything_local() would not
enqueue it.

If everything_local() were invoked first, as it is in do_fetch_pack()
for protocol v0, then everything_local() would enqueue it with
COMMON_REF | SEEN. The addition of COMMON_REF ensures that its parents
are not sent as "have" lines.

Change the order in do_fetch_pack_v2() to be consistent with
do_fetch_pack(), and to avoid sending unnecessary "have" lines.

Signed-off-by: Jonathan Tan 
---
 fetch-pack.c  |  6 +++---
 t/t5500-fetch-pack.sh | 35 +++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 2d090f612..192771a8f 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1372,14 +1372,14 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
for_each_ref(clear_marks, NULL);
marked = 1;
 
-   for_each_ref(rev_list_insert_ref_oid, NULL);
-   for_each_cached_alternate(insert_one_alternate_object);
-
/* Filter 'ref' by 'sought' and those that aren't local 
*/
if (everything_local(args, , sought, nr_sought))
state = FETCH_DONE;
else
state = FETCH_SEND_REQUEST;
+
+   for_each_ref(rev_list_insert_ref_oid, NULL);
+   for_each_cached_alternate(insert_one_alternate_object);
break;
case FETCH_SEND_REQUEST:
if (send_fetch_request(fd[1], args, ref, ,
diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 0680dec80..ad6a50ad6 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -808,6 +808,41 @@ test_expect_success 'fetch with --filter=blob:limit=0' '
fetch_filter_blob_limit_zero server server
 '
 
+test_expect_success 'use ref advertisement to prune "have" lines sent' '
+   rm -rf server client &&
+   git init server &&
+   test_commit -C server aref_both_1 &&
+   git -C server tag -d aref_both_1 &&
+   test_commit -C server aref_both_2 &&
+
+   # The ref name that only the server has must be a prefix of all the
+   # others, to ensure that the client has the same information regardless
+   # of whether protocol v0 (which does not have ref prefix filtering) or
+   # protocol v2 (which does) is used.
+   git clone server client &&
+   test_commit -C server aref &&
+   test_commit -C client aref_client &&
+
+   # In both protocol v0 and v2, ensure that the parent of aref_both_2 is
+   # not sent as a "have" line.
+
+   rm -f trace &&
+   cp -r client clientv0 &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv0 \
+   fetch origin aref &&
+   grep "have $(git -C client rev-parse aref_client)" trace &&
+   grep "have $(git -C client rev-parse aref_both_2)" trace &&
+   ! grep "have $(git -C client rev-parse aref_both_2^)" trace &&
+
+   rm -f trace &&
+   cp -r client clientv2 &&
+   GIT_TRACE_PACKET="$(pwd)/trace" git -C clientv2 -c protocol.version=2 \
+   fetch origin aref &&
+   grep "have $(git -C client rev-parse aref_client)" trace &&
+   grep "have $(git -C client rev-parse aref_both_2)" trace &&
+   ! grep "have $(git -C client rev-parse aref_both_2^)" trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.17.0.768.g1526ddbba1.dirty