Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-12-05 Thread Shawn Pearce
On Sun, Oct 26, 2014 at 8:39 AM, Dennis Kaarsemaker
den...@kaarsemaker.net wrote:
 On Wed, Oct 22, 2014 at 10:11:40AM -0700, Junio C Hamano wrote:
 Dennis Kaarsemaker den...@kaarsemaker.net writes:

  On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote:
  Dennis Kaarsemaker den...@kaarsemaker.net writes:
 
   By not clearing the request buffer in stateless-rpc mode, fetch-pack
   would keep sending already known-common commits, leading to ever bigger
   http requests, eventually getting too large for git-http-backend to
   handle properly without filling up the pipe buffer in inflate_request.
   ---
   I'm still not quite sure whether this is the right thing to do, but make
   test still passes :) The new testcase demonstrates the problem, when
   running t5551 with EXPENSIVE, this test will hang without the patch to
   fetch-pack.c and succeed otherwise.
 
  IIUC, because stateless is just that, i.e. the server-end does not
  keep track of what is already known, not telling what is known to be
  common in each request would fundamentally break the protocol.  Am I
  mistaken?
 
  That sounds plausible, but why then does the fetch complete with this
  line removed, and why does 'make test' still pass?

 The fetch-pack program tries to help the upload-pack program(s)
 running on the other end find what nodes in the graph both
 repositories have in common by sending what the repository on its
 end has.  Some commits may not be known by the other side (e.g. your
 new commits that haven't been pushed there that are made on a branch
 forked from the common history), and some others may be known
 (i.e. you drilled down the history from the tips of your refs and
 reached a commit that you fetched from the common history
 previously).  The latter are ACKed by the upload-pack process and
 are remembered to be re-sent to the _next_ incarnation of the
 upload-pack process when stateless RPC is in use.

 With your patch, you stop telling the upload-pack process what these
 two programs already found to be common in their exchange.  After
 the first exchange, fetch-pack and upload-pack may have noticed that
 both ends have version 2.0, but because you do not convey that fact
 to the other side, the new incarnation of upload-pack may end up
 deciding that the version 1.9 is the newest common commit between
 the two, and sending commits between 1.9 and 2.0.

 If you imagine an extreme case, it would be easy to see why the
 fetch completes and make test passes are not sufficient to say
 anything about this change.  Even if you break the protocol in in a
 way different from your patch, by not sending any have, such a
 butchered fetch-pack will become fetch everything from scratch,
 aka clone.  The end result will still have correct history and
 fetch completes would be true.

 But I'd prefer deferring a more detailed analysis/explanation to
 Shawn, as stateless RPC is his creation.

Junio's statement above about the world is correct.

 Thanks for the explanation, that makes it quite clear that this approach
 is wrong. The patch below (apologies for the formatting, I'm not quite
 sure how to use format-patch in this situation) does it differently: by
 buffering upload-pack's output until it has read all the input, the new
 test still succeeds and again 'make test' passes.

This probably introduces latency into the traditional bidirectional
multi_ack protocol.

 @@ -384,15 +385,19 @@ static int get_common_commits(void)
 if (multi_ack == 2  got_common
  !got_other  ok_to_give_up()) {
 sent_ready = 1;
 -   packet_write(1, ACK %s ready\n, last_hex);
 +   packet_buf_write(resp_buf, ACK %s ready\n, 
 last_hex);
 }
 if (have_obj.nr == 0 || multi_ack)
 -   packet_write(1, NAK\n);
 +   packet_buf_write(resp_buf, NAK\n);

By buffering and delaying these when !stateless_rpc we defer telling
the peer in a multi_ack exchange. That delays the peer from cutting
off its communication by about 1RTT.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-26 Thread Dennis Kaarsemaker
On Wed, Oct 22, 2014 at 10:11:40AM -0700, Junio C Hamano wrote:
 Dennis Kaarsemaker den...@kaarsemaker.net writes:
 
  On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote:
  Dennis Kaarsemaker den...@kaarsemaker.net writes:
  
   By not clearing the request buffer in stateless-rpc mode, fetch-pack
   would keep sending already known-common commits, leading to ever bigger
   http requests, eventually getting too large for git-http-backend to
   handle properly without filling up the pipe buffer in inflate_request.
   ---
   I'm still not quite sure whether this is the right thing to do, but make
   test still passes :) The new testcase demonstrates the problem, when
   running t5551 with EXPENSIVE, this test will hang without the patch to
   fetch-pack.c and succeed otherwise.
  
  IIUC, because stateless is just that, i.e. the server-end does not
  keep track of what is already known, not telling what is known to be
  common in each request would fundamentally break the protocol.  Am I
  mistaken?
 
  That sounds plausible, but why then does the fetch complete with this
  line removed, and why does 'make test' still pass?
 
 The fetch-pack program tries to help the upload-pack program(s)
 running on the other end find what nodes in the graph both
 repositories have in common by sending what the repository on its
 end has.  Some commits may not be known by the other side (e.g. your
 new commits that haven't been pushed there that are made on a branch
 forked from the common history), and some others may be known
 (i.e. you drilled down the history from the tips of your refs and
 reached a commit that you fetched from the common history
 previously).  The latter are ACKed by the upload-pack process and
 are remembered to be re-sent to the _next_ incarnation of the
 upload-pack process when stateless RPC is in use.
 
 With your patch, you stop telling the upload-pack process what these
 two programs already found to be common in their exchange.  After
 the first exchange, fetch-pack and upload-pack may have noticed that
 both ends have version 2.0, but because you do not convey that fact
 to the other side, the new incarnation of upload-pack may end up
 deciding that the version 1.9 is the newest common commit between
 the two, and sending commits between 1.9 and 2.0.
 
 If you imagine an extreme case, it would be easy to see why the
 fetch completes and make test passes are not sufficient to say
 anything about this change.  Even if you break the protocol in in a
 way different from your patch, by not sending any have, such a
 butchered fetch-pack will become fetch everything from scratch,
 aka clone.  The end result will still have correct history and
 fetch completes would be true.
 
 But I'd prefer deferring a more detailed analysis/explanation to
 Shawn, as stateless RPC is his creation.

Thanks for the explanation, that makes it quite clear that this approach
is wrong. The patch below (apologies for the formatting, I'm not quite
sure how to use format-patch in this situation) does it differently: by
buffering upload-pack's output until it has read all the input, the new
test still succeeds and again 'make test' passes. 

---
 t/t5551-http-fetch-smart.sh | 32 
 upload-pack.c   | 29 +++--
 2 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index 6cbc12d..2aac237 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -245,5 +245,37 @@ test_expect_success EXPENSIVE 'clone the 50,000 tag repo 
to check OS command lin
)
 '
 
+test_expect_success EXPENSIVE 'create 50,000 more tags' '
+   (
+   cd $HTTPD_DOCUMENT_ROOT_PATH/repo.git 
+   for i in `test_seq 50001 10`
+   do
+   echo commit refs/heads/too-many-refs-again
+   echo mark :$i
+   echo committer git g...@example.com $i +
+   echo data 0
+   echo M 644 inline bla.txt
+   echo data 4
+   echo bla
+   # make every commit dangling by always
+   # rewinding the branch after each commit
+   echo reset refs/heads/too-many-refs-again
+   echo from :50001
+   done | git fast-import --export-marks=marks 
+
+   # now assign tags to all the dangling commits we created above
+   tag=$(perl -e print \bla\ x 30) 
+   sed -e s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1| marks 
packed-refs
+   )
+'
+
+test_expect_success EXPENSIVE 'fetch the new tags' '
+   (
+   cd too-many-refs 
+   git fetch --tags 
+   test $(git for-each-ref refs/tags | wc -l) = 10
+   )
+'
+
 stop_httpd
 test_done
diff --git a/upload-pack.c b/upload-pack.c
index ac9ac15..3d76750 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -373,6 +373,7 @@ static int get_common_commits(void)
int got_common = 0;

Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-26 Thread Dennis Kaarsemaker
On Wed, Oct 22, 2014 at 05:07:31PM +0700, Duy Nguyen wrote:
 On Wed, Oct 22, 2014 at 2:41 PM, Dennis Kaarsemaker
 den...@kaarsemaker.net wrote:
  I see two options:
 
  * Turning that interaction into a more cooperative process, with a
select/poll loop
  * Make upload-pack buffer its entire response when run in stateless_rpc
mode until it has consumed all of the request
 
 Or add a helper daemon and support stateful smart http. Or maybe
 that's what you meant in the first option.

No, I meant that get-http-backend should have a select/poll loop that
can read from and write to git-upload-pack at the same time, but option
two was easier to implement :)

Stateful smart http seems to be against the design goals of smart http if I
interpret Documentation/technical/http-protocol.txt correctly though, so
that doesn't seem to be the right approach.
-- 
Dennis Kaarsemaker den...@kaarsemaker.net
http://twitter.com/seveas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-22 Thread Dennis Kaarsemaker
On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote:
 Dennis Kaarsemaker den...@kaarsemaker.net writes:
 
  By not clearing the request buffer in stateless-rpc mode, fetch-pack
  would keep sending already known-common commits, leading to ever bigger
  http requests, eventually getting too large for git-http-backend to
  handle properly without filling up the pipe buffer in inflate_request.
  ---
  I'm still not quite sure whether this is the right thing to do, but make
  test still passes :) The new testcase demonstrates the problem, when
  running t5551 with EXPENSIVE, this test will hang without the patch to
  fetch-pack.c and succeed otherwise.
 
 IIUC, because stateless is just that, i.e. the server-end does not
 keep track of what is already known, not telling what is known to be
 common in each request would fundamentally break the protocol.  Am I
 mistaken?

That sounds plausible, but why then does the fetch complete with this
line removed, and why does 'make test' still pass? I tried to understand
the protocol, but the documentation has TODO's in some critical
places :)

And if that's true, it means the inflate_request / upload-pack
interaction should be fixed, so more than 64k (current linux pipe buffer
size) of uncompressed data is supported. I see two options:

* Turning that interaction into a more cooperative process, with a
  select/poll loop
* Make upload-pack buffer its entire response when run in stateless_rpc
  mode until it has consumed all of the request

The latter sounds easier to do, but not being very familiar with the
protocol, I may have missed something obvious.

-- 
Dennis Kaarsemaker
http://www.kaarsemaker.net

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-22 Thread Duy Nguyen
On Wed, Oct 22, 2014 at 2:41 PM, Dennis Kaarsemaker
den...@kaarsemaker.net wrote:
 I see two options:

 * Turning that interaction into a more cooperative process, with a
   select/poll loop
 * Make upload-pack buffer its entire response when run in stateless_rpc
   mode until it has consumed all of the request

Or add a helper daemon and support stateful smart http. Or maybe
that's what you meant in the first option.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-22 Thread Junio C Hamano
Dennis Kaarsemaker den...@kaarsemaker.net writes:

 On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote:
 Dennis Kaarsemaker den...@kaarsemaker.net writes:
 
  By not clearing the request buffer in stateless-rpc mode, fetch-pack
  would keep sending already known-common commits, leading to ever bigger
  http requests, eventually getting too large for git-http-backend to
  handle properly without filling up the pipe buffer in inflate_request.
  ---
  I'm still not quite sure whether this is the right thing to do, but make
  test still passes :) The new testcase demonstrates the problem, when
  running t5551 with EXPENSIVE, this test will hang without the patch to
  fetch-pack.c and succeed otherwise.
 
 IIUC, because stateless is just that, i.e. the server-end does not
 keep track of what is already known, not telling what is known to be
 common in each request would fundamentally break the protocol.  Am I
 mistaken?

 That sounds plausible, but why then does the fetch complete with this
 line removed, and why does 'make test' still pass?

The fetch-pack program tries to help the upload-pack program(s)
running on the other end find what nodes in the graph both
repositories have in common by sending what the repository on its
end has.  Some commits may not be known by the other side (e.g. your
new commits that haven't been pushed there that are made on a branch
forked from the common history), and some others may be known
(i.e. you drilled down the history from the tips of your refs and
reached a commit that you fetched from the common history
previously).  The latter are ACKed by the upload-pack process and
are remembered to be re-sent to the _next_ incarnation of the
upload-pack process when stateless RPC is in use.

With your patch, you stop telling the upload-pack process what these
two programs already found to be common in their exchange.  After
the first exchange, fetch-pack and upload-pack may have noticed that
both ends have version 2.0, but because you do not convey that fact
to the other side, the new incarnation of upload-pack may end up
deciding that the version 1.9 is the newest common commit between
the two, and sending commits between 1.9 and 2.0.

If you imagine an extreme case, it would be easy to see why the
fetch completes and make test passes are not sufficient to say
anything about this change.  Even if you break the protocol in in a
way different from your patch, by not sending any have, such a
butchered fetch-pack will become fetch everything from scratch,
aka clone.  The end result will still have correct history and
fetch completes would be true.

But I'd prefer deferring a more detailed analysis/explanation to
Shawn, as stateless RPC is his creation.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fetch-pack: don't resend known-common refs in find_common

2014-10-21 Thread Junio C Hamano
Dennis Kaarsemaker den...@kaarsemaker.net writes:

 By not clearing the request buffer in stateless-rpc mode, fetch-pack
 would keep sending already known-common commits, leading to ever bigger
 http requests, eventually getting too large for git-http-backend to
 handle properly without filling up the pipe buffer in inflate_request.
 ---
 I'm still not quite sure whether this is the right thing to do, but make
 test still passes :) The new testcase demonstrates the problem, when
 running t5551 with EXPENSIVE, this test will hang without the patch to
 fetch-pack.c and succeed otherwise.

IIUC, because stateless is just that, i.e. the server-end does not
keep track of what is already known, not telling what is known to be
common in each request would fundamentally break the protocol.  Am I
mistaken?


  fetch-pack.c|  1 -
  t/t5551-http-fetch-smart.sh | 32 
  2 files changed, 32 insertions(+), 1 deletion(-)

 diff --git a/fetch-pack.c b/fetch-pack.c
 index 655ee64..258245c 100644
 --- a/fetch-pack.c
 +++ b/fetch-pack.c
 @@ -410,7 +410,6 @@ static int find_common(struct fetch_pack_args *args,
*/
   const char *hex = 
 sha1_to_hex(result_sha1);
   packet_buf_write(req_buf, 
 have %s\n, hex);
 - state_len = req_buf.len;
   }
   mark_common(commit, 0, 1);
   retval = 0;
 diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
 index 6cbc12d..2aac237 100755
 --- a/t/t5551-http-fetch-smart.sh
 +++ b/t/t5551-http-fetch-smart.sh
 @@ -245,5 +245,37 @@ test_expect_success EXPENSIVE 'clone the 50,000 tag repo 
 to check OS command lin
   )
  '
  
 +test_expect_success EXPENSIVE 'create 50,000 more tags' '
 + (
 + cd $HTTPD_DOCUMENT_ROOT_PATH/repo.git 
 + for i in `test_seq 50001 10`
 + do
 + echo commit refs/heads/too-many-refs-again
 + echo mark :$i
 + echo committer git g...@example.com $i +
 + echo data 0
 + echo M 644 inline bla.txt
 + echo data 4
 + echo bla
 + # make every commit dangling by always
 + # rewinding the branch after each commit
 + echo reset refs/heads/too-many-refs-again
 + echo from :50001
 + done | git fast-import --export-marks=marks 
 +
 + # now assign tags to all the dangling commits we created above
 + tag=$(perl -e print \bla\ x 30) 
 + sed -e s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1| marks 
 packed-refs
 + )
 +'
 +
 +test_expect_success EXPENSIVE 'fetch the new tags' '
 + (
 + cd too-many-refs 
 + git fetch --tags 
 + test $(git for-each-ref refs/tags | wc -l) = 10
 + )
 +'
 +
  stop_httpd
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html