Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
On Sun, Oct 26, 2014 at 8:39 AM, Dennis Kaarsemaker den...@kaarsemaker.net wrote: On Wed, Oct 22, 2014 at 10:11:40AM -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: By not clearing the request buffer in stateless-rpc mode, fetch-pack would keep sending already known-common commits, leading to ever bigger http requests, eventually getting too large for git-http-backend to handle properly without filling up the pipe buffer in inflate_request. --- I'm still not quite sure whether this is the right thing to do, but make test still passes :) The new testcase demonstrates the problem, when running t5551 with EXPENSIVE, this test will hang without the patch to fetch-pack.c and succeed otherwise. IIUC, because stateless is just that, i.e. the server-end does not keep track of what is already known, not telling what is known to be common in each request would fundamentally break the protocol. Am I mistaken? That sounds plausible, but why then does the fetch complete with this line removed, and why does 'make test' still pass? The fetch-pack program tries to help the upload-pack program(s) running on the other end find what nodes in the graph both repositories have in common by sending what the repository on its end has. Some commits may not be known by the other side (e.g. your new commits that haven't been pushed there that are made on a branch forked from the common history), and some others may be known (i.e. you drilled down the history from the tips of your refs and reached a commit that you fetched from the common history previously). The latter are ACKed by the upload-pack process and are remembered to be re-sent to the _next_ incarnation of the upload-pack process when stateless RPC is in use. With your patch, you stop telling the upload-pack process what these two programs already found to be common in their exchange. After the first exchange, fetch-pack and upload-pack may have noticed that both ends have version 2.0, but because you do not convey that fact to the other side, the new incarnation of upload-pack may end up deciding that the version 1.9 is the newest common commit between the two, and sending commits between 1.9 and 2.0. If you imagine an extreme case, it would be easy to see why the fetch completes and make test passes are not sufficient to say anything about this change. Even if you break the protocol in in a way different from your patch, by not sending any have, such a butchered fetch-pack will become fetch everything from scratch, aka clone. The end result will still have correct history and fetch completes would be true. But I'd prefer deferring a more detailed analysis/explanation to Shawn, as stateless RPC is his creation. Junio's statement above about the world is correct. Thanks for the explanation, that makes it quite clear that this approach is wrong. The patch below (apologies for the formatting, I'm not quite sure how to use format-patch in this situation) does it differently: by buffering upload-pack's output until it has read all the input, the new test still succeeds and again 'make test' passes. This probably introduces latency into the traditional bidirectional multi_ack protocol. @@ -384,15 +385,19 @@ static int get_common_commits(void) if (multi_ack == 2 got_common !got_other ok_to_give_up()) { sent_ready = 1; - packet_write(1, ACK %s ready\n, last_hex); + packet_buf_write(resp_buf, ACK %s ready\n, last_hex); } if (have_obj.nr == 0 || multi_ack) - packet_write(1, NAK\n); + packet_buf_write(resp_buf, NAK\n); By buffering and delaying these when !stateless_rpc we defer telling the peer in a multi_ack exchange. That delays the peer from cutting off its communication by about 1RTT. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
On Wed, Oct 22, 2014 at 10:11:40AM -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: By not clearing the request buffer in stateless-rpc mode, fetch-pack would keep sending already known-common commits, leading to ever bigger http requests, eventually getting too large for git-http-backend to handle properly without filling up the pipe buffer in inflate_request. --- I'm still not quite sure whether this is the right thing to do, but make test still passes :) The new testcase demonstrates the problem, when running t5551 with EXPENSIVE, this test will hang without the patch to fetch-pack.c and succeed otherwise. IIUC, because stateless is just that, i.e. the server-end does not keep track of what is already known, not telling what is known to be common in each request would fundamentally break the protocol. Am I mistaken? That sounds plausible, but why then does the fetch complete with this line removed, and why does 'make test' still pass? The fetch-pack program tries to help the upload-pack program(s) running on the other end find what nodes in the graph both repositories have in common by sending what the repository on its end has. Some commits may not be known by the other side (e.g. your new commits that haven't been pushed there that are made on a branch forked from the common history), and some others may be known (i.e. you drilled down the history from the tips of your refs and reached a commit that you fetched from the common history previously). The latter are ACKed by the upload-pack process and are remembered to be re-sent to the _next_ incarnation of the upload-pack process when stateless RPC is in use. With your patch, you stop telling the upload-pack process what these two programs already found to be common in their exchange. After the first exchange, fetch-pack and upload-pack may have noticed that both ends have version 2.0, but because you do not convey that fact to the other side, the new incarnation of upload-pack may end up deciding that the version 1.9 is the newest common commit between the two, and sending commits between 1.9 and 2.0. If you imagine an extreme case, it would be easy to see why the fetch completes and make test passes are not sufficient to say anything about this change. Even if you break the protocol in in a way different from your patch, by not sending any have, such a butchered fetch-pack will become fetch everything from scratch, aka clone. The end result will still have correct history and fetch completes would be true. But I'd prefer deferring a more detailed analysis/explanation to Shawn, as stateless RPC is his creation. Thanks for the explanation, that makes it quite clear that this approach is wrong. The patch below (apologies for the formatting, I'm not quite sure how to use format-patch in this situation) does it differently: by buffering upload-pack's output until it has read all the input, the new test still succeeds and again 'make test' passes. --- t/t5551-http-fetch-smart.sh | 32 upload-pack.c | 29 +++-- 2 files changed, 51 insertions(+), 10 deletions(-) diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 6cbc12d..2aac237 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -245,5 +245,37 @@ test_expect_success EXPENSIVE 'clone the 50,000 tag repo to check OS command lin ) ' +test_expect_success EXPENSIVE 'create 50,000 more tags' ' + ( + cd $HTTPD_DOCUMENT_ROOT_PATH/repo.git + for i in `test_seq 50001 10` + do + echo commit refs/heads/too-many-refs-again + echo mark :$i + echo committer git g...@example.com $i + + echo data 0 + echo M 644 inline bla.txt + echo data 4 + echo bla + # make every commit dangling by always + # rewinding the branch after each commit + echo reset refs/heads/too-many-refs-again + echo from :50001 + done | git fast-import --export-marks=marks + + # now assign tags to all the dangling commits we created above + tag=$(perl -e print \bla\ x 30) + sed -e s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1| marks packed-refs + ) +' + +test_expect_success EXPENSIVE 'fetch the new tags' ' + ( + cd too-many-refs + git fetch --tags + test $(git for-each-ref refs/tags | wc -l) = 10 + ) +' + stop_httpd test_done diff --git a/upload-pack.c b/upload-pack.c index ac9ac15..3d76750 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -373,6 +373,7 @@ static int get_common_commits(void) int got_common = 0;
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
On Wed, Oct 22, 2014 at 05:07:31PM +0700, Duy Nguyen wrote: On Wed, Oct 22, 2014 at 2:41 PM, Dennis Kaarsemaker den...@kaarsemaker.net wrote: I see two options: * Turning that interaction into a more cooperative process, with a select/poll loop * Make upload-pack buffer its entire response when run in stateless_rpc mode until it has consumed all of the request Or add a helper daemon and support stateful smart http. Or maybe that's what you meant in the first option. No, I meant that get-http-backend should have a select/poll loop that can read from and write to git-upload-pack at the same time, but option two was easier to implement :) Stateful smart http seems to be against the design goals of smart http if I interpret Documentation/technical/http-protocol.txt correctly though, so that doesn't seem to be the right approach. -- Dennis Kaarsemaker den...@kaarsemaker.net http://twitter.com/seveas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: By not clearing the request buffer in stateless-rpc mode, fetch-pack would keep sending already known-common commits, leading to ever bigger http requests, eventually getting too large for git-http-backend to handle properly without filling up the pipe buffer in inflate_request. --- I'm still not quite sure whether this is the right thing to do, but make test still passes :) The new testcase demonstrates the problem, when running t5551 with EXPENSIVE, this test will hang without the patch to fetch-pack.c and succeed otherwise. IIUC, because stateless is just that, i.e. the server-end does not keep track of what is already known, not telling what is known to be common in each request would fundamentally break the protocol. Am I mistaken? That sounds plausible, but why then does the fetch complete with this line removed, and why does 'make test' still pass? I tried to understand the protocol, but the documentation has TODO's in some critical places :) And if that's true, it means the inflate_request / upload-pack interaction should be fixed, so more than 64k (current linux pipe buffer size) of uncompressed data is supported. I see two options: * Turning that interaction into a more cooperative process, with a select/poll loop * Make upload-pack buffer its entire response when run in stateless_rpc mode until it has consumed all of the request The latter sounds easier to do, but not being very familiar with the protocol, I may have missed something obvious. -- Dennis Kaarsemaker http://www.kaarsemaker.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
On Wed, Oct 22, 2014 at 2:41 PM, Dennis Kaarsemaker den...@kaarsemaker.net wrote: I see two options: * Turning that interaction into a more cooperative process, with a select/poll loop * Make upload-pack buffer its entire response when run in stateless_rpc mode until it has consumed all of the request Or add a helper daemon and support stateful smart http. Or maybe that's what you meant in the first option. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
Dennis Kaarsemaker den...@kaarsemaker.net writes: On di, 2014-10-21 at 10:56 -0700, Junio C Hamano wrote: Dennis Kaarsemaker den...@kaarsemaker.net writes: By not clearing the request buffer in stateless-rpc mode, fetch-pack would keep sending already known-common commits, leading to ever bigger http requests, eventually getting too large for git-http-backend to handle properly without filling up the pipe buffer in inflate_request. --- I'm still not quite sure whether this is the right thing to do, but make test still passes :) The new testcase demonstrates the problem, when running t5551 with EXPENSIVE, this test will hang without the patch to fetch-pack.c and succeed otherwise. IIUC, because stateless is just that, i.e. the server-end does not keep track of what is already known, not telling what is known to be common in each request would fundamentally break the protocol. Am I mistaken? That sounds plausible, but why then does the fetch complete with this line removed, and why does 'make test' still pass? The fetch-pack program tries to help the upload-pack program(s) running on the other end find what nodes in the graph both repositories have in common by sending what the repository on its end has. Some commits may not be known by the other side (e.g. your new commits that haven't been pushed there that are made on a branch forked from the common history), and some others may be known (i.e. you drilled down the history from the tips of your refs and reached a commit that you fetched from the common history previously). The latter are ACKed by the upload-pack process and are remembered to be re-sent to the _next_ incarnation of the upload-pack process when stateless RPC is in use. With your patch, you stop telling the upload-pack process what these two programs already found to be common in their exchange. After the first exchange, fetch-pack and upload-pack may have noticed that both ends have version 2.0, but because you do not convey that fact to the other side, the new incarnation of upload-pack may end up deciding that the version 1.9 is the newest common commit between the two, and sending commits between 1.9 and 2.0. If you imagine an extreme case, it would be easy to see why the fetch completes and make test passes are not sufficient to say anything about this change. Even if you break the protocol in in a way different from your patch, by not sending any have, such a butchered fetch-pack will become fetch everything from scratch, aka clone. The end result will still have correct history and fetch completes would be true. But I'd prefer deferring a more detailed analysis/explanation to Shawn, as stateless RPC is his creation. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fetch-pack: don't resend known-common refs in find_common
Dennis Kaarsemaker den...@kaarsemaker.net writes: By not clearing the request buffer in stateless-rpc mode, fetch-pack would keep sending already known-common commits, leading to ever bigger http requests, eventually getting too large for git-http-backend to handle properly without filling up the pipe buffer in inflate_request. --- I'm still not quite sure whether this is the right thing to do, but make test still passes :) The new testcase demonstrates the problem, when running t5551 with EXPENSIVE, this test will hang without the patch to fetch-pack.c and succeed otherwise. IIUC, because stateless is just that, i.e. the server-end does not keep track of what is already known, not telling what is known to be common in each request would fundamentally break the protocol. Am I mistaken? fetch-pack.c| 1 - t/t5551-http-fetch-smart.sh | 32 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/fetch-pack.c b/fetch-pack.c index 655ee64..258245c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -410,7 +410,6 @@ static int find_common(struct fetch_pack_args *args, */ const char *hex = sha1_to_hex(result_sha1); packet_buf_write(req_buf, have %s\n, hex); - state_len = req_buf.len; } mark_common(commit, 0, 1); retval = 0; diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh index 6cbc12d..2aac237 100755 --- a/t/t5551-http-fetch-smart.sh +++ b/t/t5551-http-fetch-smart.sh @@ -245,5 +245,37 @@ test_expect_success EXPENSIVE 'clone the 50,000 tag repo to check OS command lin ) ' +test_expect_success EXPENSIVE 'create 50,000 more tags' ' + ( + cd $HTTPD_DOCUMENT_ROOT_PATH/repo.git + for i in `test_seq 50001 10` + do + echo commit refs/heads/too-many-refs-again + echo mark :$i + echo committer git g...@example.com $i + + echo data 0 + echo M 644 inline bla.txt + echo data 4 + echo bla + # make every commit dangling by always + # rewinding the branch after each commit + echo reset refs/heads/too-many-refs-again + echo from :50001 + done | git fast-import --export-marks=marks + + # now assign tags to all the dangling commits we created above + tag=$(perl -e print \bla\ x 30) + sed -e s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1| marks packed-refs + ) +' + +test_expect_success EXPENSIVE 'fetch the new tags' ' + ( + cd too-many-refs + git fetch --tags + test $(git for-each-ref refs/tags | wc -l) = 10 + ) +' + stop_httpd test_done -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html