Re: GIT clone fails, how to proceed?
Hi, On 23.06.2013 15:55, Willy Tarreau wrote: > Guys, I found a workaround which seems to be working quit ewell at the > moment. For some reason the kernel seems to ignore the max TCP window > size when GSO is enabled on the interface, resulting in hundreds of kB > in flight which take ages to recover in case of losses => haproxy sees > nothing move and finally times out. Disabling GSO on that interface > completely fixed the issue, now the socket's send queues are reasonable > and match the configuration and I've not seen a timeout for the last > hour. There were always a few per hour previously that I always attributed > to the clients! > > So I think it's really fixed now. I can confirm that. Thanks a lot. > Cheers, > Willy > > > thomas
Re: GIT clone fails, how to proceed?
Guys, I found a workaround which seems to be working quit ewell at the moment. For some reason the kernel seems to ignore the max TCP window size when GSO is enabled on the interface, resulting in hundreds of kB in flight which take ages to recover in case of losses => haproxy sees nothing move and finally times out. Disabling GSO on that interface completely fixed the issue, now the socket's send queues are reasonable and match the configuration and I've not seen a timeout for the last hour. There were always a few per hour previously that I always attributed to the clients! So I think it's really fixed now. Cheers, Willy
Re: GIT clone fails, how to proceed?
Hi Lukas, OK it's a kernel issue on my reverse proxy. Look below, haproxy detected a timeout after 30s of idle : (fd 14 faces the client, fd 15 the server) epoll_wait(0, 0x1aebdd8, 0xc8, 0) = 0 gettimeofday({1371978241, 119519}, NULL) = 0 recv(15, "-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"..., 8030, 0) = 8030 send(14, "-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"..., 8030, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 4280 gettimeofday({1371978241, 120296}, NULL) = 0 epoll_wait(0, 0x1aebdd8, 0xc8, 0) = 0 gettimeofday({1371978241, 120635}, NULL) = 0 recv(15, "A\23,A\17\221\234k\271!\313C\245\267a Pp\316\204-9\342E\360\3438\255\322\247(-J"..., 4280, 0) = 4280 send(14, "i.\302#t\35\300\354~G\312\2606\266\201\376\254}~\362\372l_\226\31\5\210{\344\361\10`\30"..., 3750, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(0, 0x3, 0xe, 0xa2028) = 0 ==> buffers are full for fd #14. Nothing happens on this FD for the next 30 seconds, until we decide it's over and close the connection : epoll_wait(0, 0x1aebdd8, 0xc8, 0x93)= 0 gettimeofday({1371978271, 122894}, NULL) = 0 setsockopt(14, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0 close(14) = 0 shutdown(15, 1 /* send */) = 0 close(15) = 0 sendto(10, "<134>Jun 23 11:04:31 haproxy[1153"..., 318, MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(514), sin_addr=inet_addr("10.8.1.2")}, 16) = 318 Now the problem is that the capture taken on the same side shows a different story : this happens after the recovery from some losses : 10:03:47.918722 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270212996 win 398 (DF) (ttl 53, id 27788, len 52) 10:03:48.001500 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 3270212996:3270213496(500) ack 3500432327 win 1500 (DF) (ttl 128, id 24160, len 552) 10:03:48.025513 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 3270213496:3270213996(500) ack 3500432327 win 1500 (DF) (ttl 128, id 24161, len 552) 10:03:48.063247 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270213496 win 405 (DF) (ttl 53, id 27789, len 52) 10:03:48.086935 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 3500432327:3500432327(0) ack 3270213996 win 413 (DF) (ttl 53, id 27790, len 52) 10:03:48.344722 62.212.114.60.81 > 88.191.124.161.45154: R [tcp sum ok] 3270224496:3270224496(0) ack 3500432327 win 1500 (DF) (ttl 128, id 24183, len 52) When the RST happens, all bytes were acked, so for sure the writes should have retriggerred. It's probably time to upgrade this kernel. Now that I'm thinking about it, I believe that the issues started when I switched to use this machine :-/ Best regards, Willy
Re: GIT clone fails, how to proceed?
On Sun, Jun 23, 2013 at 10:54:00AM +0200, Lukas Tribus wrote: > Still fails here: > > lukas@ubuntuvm:~/haproxy-test$ time git clone > http://git.1wt.eu/git/haproxy.git/ > Cloning into 'haproxy'... > error: Unable to get pack file > http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack > transfer closed with 233372 bytes remaining to read > error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under > http://git.1wt.eu/git/haproxy.git > Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0 > while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. > error: Fetch failed. > > real 15m37.691s > user 0m0.012s > sys 0m0.084s > lukas@ubuntuvm:~/haproxy-test$ Yes I noticed it in the logs during your test. I managed to reproduce it now. That's strange, the server-side haproxy detects an error and closes (not a timeout) while network traces show that it's the first one to close. I'll have to retry using strace. I suspect some timeout issue reported at by the kernel. I don't even have tcp keep-alives though :-/ Thanks for the test! Willy
RE: GIT clone fails, how to proceed?
Hi Willy, > I've just put the cache into maintenance so that connections will go > directly to the origin, if you want to retry. It will be even slower > but probably worth a try. Still fails here: lukas@ubuntuvm:~/haproxy-test$ time git clone http://git.1wt.eu/git/haproxy.git/ Cloning into 'haproxy'... error: Unable to get pack file http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack transfer closed with 233372 bytes remaining to read error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under http://git.1wt.eu/git/haproxy.git Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0 while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. error: Fetch failed. real 15m37.691s user 0m0.012s sys 0m0.084s lukas@ubuntuvm:~/haproxy-test$ Regards, Lukas
Re: GIT clone fails, how to proceed?
Hi Lukas, On Sun, Jun 23, 2013 at 09:46:34AM +0200, Lukas Tribus wrote: > Hi, > > > I find it strange that the 'normal' git repository (though slow) is > > unable to clone correctly. But i guess thats not so important if there > > is a good workaround / secondary up to date repository. > > I agree, slow is one thing, not working is another thing. > > Willy, can you take a look why cloning from git.1wt.eu fails? > > > lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/ > Cloning into 'haproxy'... > error: Unable to get pack file > http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack > transfer closed with 272368 bytes remaining to read > error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under > http://git.1wt.eu/git/haproxy.git > Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3 > while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. > error: Fetch failed. > lukas@ubuntuvm:~/haproxy-test$ We have this report from time to time with no clear explanation :-( Here it seems the problem was a bit clearer. When you download from git.1wt.eu, you pass via a cache (formilux.org) so that git packs are retrieved faster. There is one haproxy in front of this cache which reports this : 2013-06-23T07:42:16+02:00/86 127.0.0.1 haproxy[29509]: XX.XXX.XX.XX:39265 [23/Jun/2013:07:41:44.662] public cache-1wt/cache 45/0/0/2079/32021 200 51531 - - SDNI 9/9/5/5/0 0/0 {git.1wt.eu} \"GET /git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2e fb57d3.pack HTTP/1.1\" And on the site on the other side I'm seeing this : Jun 23 09:42:16 rpx2 haproxy[1153]: 88.191.124.161:40531 [23/Jun/2013:09:41:44.857] http-in www/www 3/0/1/13/31816 200 220007 - - cD-- 1/1/1/1/0 0/0 {git.1wt.eu:81|git/1.7.9.5|XX.XXX.XX.XX, 1|||} {|323599|application/octet-st} "GET /git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack HTTP/1.1" So it seems to me like this is the cache in the middle which tends to hang on some connections. And probably that once the connection aborts, the broken object is stored truncated in the cache. I've just put the cache into maintenance so that connections will go directly to the origin, if you want to retry. It will be even slower but probably worth a try. Regards, Willy
RE: GIT clone fails, how to proceed?
Hi, > I find it strange that the 'normal' git repository (though slow) is > unable to clone correctly. But i guess thats not so important if there > is a good workaround / secondary up to date repository. I agree, slow is one thing, not working is another thing. Willy, can you take a look why cloning from git.1wt.eu fails? lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/ Cloning into 'haproxy'... error: Unable to get pack file http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack transfer closed with 272368 bytes remaining to read error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under http://git.1wt.eu/git/haproxy.git Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3 while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef. error: Fetch failed. lukas@ubuntuvm:~/haproxy-test$ Thanks, Lukas
Re: GIT clone fails, how to proceed?
Hi Lukas, Thanks, that works indeed. Maybe its worth mentioning this url in the websites main page where the links to "Latest versions" is also present? I find it strange that the 'normal' git repository (though slow) is unable to clone correctly. But i guess thats not so important if there is a good workaround / secondary up to date repository. PiBa-NL Op 22-6-2013 1:22, Lukas Tribus schreef: Hi! When trying to clone the repository it always seems to fail. (there have been more reports of this in emails/irc of other users..) Also it seams to take ages before it fails.. I'm using the formilux mirror, which is mentioned in the README: git clone http://master.formilux.org/git/people/willy/haproxy.git/ Its up-to-date, reliable and fast. Lukas
RE: GIT clone fails, how to proceed?
Hi! > When trying to clone the repository it always seems to fail. (there have > been more reports of this in emails/irc of other users..) > Also it seams to take ages before it fails.. I'm using the formilux mirror, which is mentioned in the README: git clone http://master.formilux.org/git/people/willy/haproxy.git/ Its up-to-date, reliable and fast. Lukas