Re: GIT clone fails, how to proceed?

2013-06-23 Thread Thomas Heil
Hi,

On 23.06.2013 15:55, Willy Tarreau wrote:
> Guys, I found a workaround which seems to be working quit ewell at the
> moment. For some reason the kernel seems to ignore the max TCP window
> size when GSO is enabled on the interface, resulting in hundreds of kB
> in flight which take ages to recover in case of losses => haproxy sees
> nothing move and finally times out. Disabling GSO on that interface
> completely fixed the issue, now the socket's send queues are reasonable
> and match the configuration and I've not seen a timeout for the last
> hour. There were always a few per hour previously that I always attributed
> to the clients!
>
> So I think it's really fixed now.
I can confirm that.
Thanks a lot.
> Cheers,
> Willy
>
>
>

thomas



Re: GIT clone fails, how to proceed?

2013-06-23 Thread Willy Tarreau
Guys, I found a workaround which seems to be working quit ewell at the
moment. For some reason the kernel seems to ignore the max TCP window
size when GSO is enabled on the interface, resulting in hundreds of kB
in flight which take ages to recover in case of losses => haproxy sees
nothing move and finally times out. Disabling GSO on that interface
completely fixed the issue, now the socket's send queues are reasonable
and match the configuration and I've not seen a timeout for the last
hour. There were always a few per hour previously that I always attributed
to the clients!

So I think it's really fixed now.

Cheers,
Willy




Re: GIT clone fails, how to proceed?

2013-06-23 Thread Willy Tarreau
Hi Lukas,

OK it's a kernel issue on my reverse proxy. Look below, haproxy detected a
timeout after 30s of idle :

(fd 14 faces the client, fd 15 the server)

epoll_wait(0, 0x1aebdd8, 0xc8, 0)   = 0
gettimeofday({1371978241, 119519}, NULL) = 0
recv(15, 
"-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"...,
 8030, 0) = 8030
send(14, 
"-\nR\216+f\213%G\3539\"\270\246{9\3037\272\317N\215\0\226\333;\334\320y\374Z.'"...,
 8030, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 4280
gettimeofday({1371978241, 120296}, NULL) = 0
epoll_wait(0, 0x1aebdd8, 0xc8, 0)   = 0
gettimeofday({1371978241, 120635}, NULL) = 0
recv(15, "A\23,A\17\221\234k\271!\313C\245\267a 
Pp\316\204-9\342E\360\3438\255\322\247(-J"..., 4280, 0) = 4280
send(14, 
"i.\302#t\35\300\354~G\312\2606\266\201\376\254}~\362\372l_\226\31\5\210{\344\361\10`\30"...,
 3750, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = -1 EAGAIN (Resource temporarily 
unavailable)
epoll_ctl(0, 0x3, 0xe, 0xa2028) = 0

==> buffers are full for fd #14. Nothing happens on this FD for the
next 30 seconds, until we decide it's over and close the connection :

epoll_wait(0, 0x1aebdd8, 0xc8, 0x93)= 0
gettimeofday({1371978271, 122894}, NULL) = 0
setsockopt(14, SOL_SOCKET, SO_LINGER, {onoff=1, linger=0}, 8) = 0
close(14)   = 0
shutdown(15, 1 /* send */)  = 0
close(15)   = 0
sendto(10, "<134>Jun 23 11:04:31 haproxy[1153"..., 318, 
MSG_DONTWAIT|MSG_NOSIGNAL, {sa_family=AF_INET, sin_port=htons(514), 
sin_addr=inet_addr("10.8.1.2")}, 16) = 318


Now the problem is that the capture taken on the same side shows a different
story : this happens after the recovery from some losses :

10:03:47.918722 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270212996 win 398  (DF) (ttl 53, id 27788, len 52)
10:03:48.001500 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 
3270212996:3270213496(500) ack 3500432327 win 1500  (DF) (ttl 128, id 24160, len 552)
10:03:48.025513 62.212.114.60.81 > 88.191.124.161.45154: . [tcp sum ok] 
3270213496:3270213996(500) ack 3500432327 win 1500  (DF) (ttl 128, id 24161, len 552)
10:03:48.063247 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270213496 win 405  (DF) (ttl 53, id 27789, len 52)
10:03:48.086935 88.191.124.161.45154 > 62.212.114.60.81: . [tcp sum ok] 
3500432327:3500432327(0) ack 3270213996 win 413  (DF) (ttl 53, id 27790, len 52)
10:03:48.344722 62.212.114.60.81 > 88.191.124.161.45154: R [tcp sum ok] 
3270224496:3270224496(0) ack 3500432327 win 1500  (DF) (ttl 128, id 24183, len 52)

When the RST happens, all bytes were acked, so for sure the writes should
have retriggerred. It's probably time to upgrade this kernel. Now that I'm
thinking about it, I believe that the issues started when I switched to use
this machine :-/

Best regards,
Willy




Re: GIT clone fails, how to proceed?

2013-06-23 Thread Willy Tarreau
On Sun, Jun 23, 2013 at 10:54:00AM +0200, Lukas Tribus wrote:
> Still fails here:
> 
> lukas@ubuntuvm:~/haproxy-test$ time git clone 
> http://git.1wt.eu/git/haproxy.git/
> Cloning into 'haproxy'...
> error: Unable to get pack file 
> http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack
> transfer closed with 233372 bytes remaining to read
> error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under 
> http://git.1wt.eu/git/haproxy.git
> Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0
> while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef.
> error: Fetch failed.
> 
> real    15m37.691s
> user    0m0.012s
> sys 0m0.084s
> lukas@ubuntuvm:~/haproxy-test$

Yes I noticed it in the logs during your test. I managed to reproduce it
now. That's strange, the server-side haproxy detects an error and closes
(not a timeout) while network traces show that it's the first one to close.

I'll have to retry using strace. I suspect some timeout issue reported at
by the kernel. I don't even have tcp keep-alives though :-/

Thanks for the test!
Willy




RE: GIT clone fails, how to proceed?

2013-06-23 Thread Lukas Tribus
Hi Willy,


> I've just put the cache into maintenance so that connections will go
> directly to the origin, if you want to retry. It will be even slower
> but probably worth a try.


Still fails here:

lukas@ubuntuvm:~/haproxy-test$ time git clone http://git.1wt.eu/git/haproxy.git/
Cloning into 'haproxy'...
error: Unable to get pack file 
http://git.1wt.eu/git/haproxy.git/objects/pack/pack-815835d1b2e20e0ad9d028756813b078cdf8f9c2.pack
transfer closed with 233372 bytes remaining to read
error: Unable to find 84d23dab089a4313913e22c1b0c60cc2b48216f0 under 
http://git.1wt.eu/git/haproxy.git
Cannot obtain needed blob 84d23dab089a4313913e22c1b0c60cc2b48216f0
while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef.
error: Fetch failed.

real    15m37.691s
user    0m0.012s
sys 0m0.084s
lukas@ubuntuvm:~/haproxy-test$



Regards,

Lukas 


Re: GIT clone fails, how to proceed?

2013-06-23 Thread Willy Tarreau
Hi Lukas,

On Sun, Jun 23, 2013 at 09:46:34AM +0200, Lukas Tribus wrote:
> Hi,
> 
> > I find it strange that the 'normal' git repository (though slow) is
> > unable to clone correctly. But i guess thats not so important if there
> > is a good workaround / secondary up to date repository.
> 
> I agree, slow is one thing, not working is another thing.
> 
> Willy, can you take a look why cloning from git.1wt.eu fails?
> 
> 
> lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/
> Cloning into 'haproxy'...
> error: Unable to get pack file 
> http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack
> transfer closed with 272368 bytes remaining to read
> error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under 
> http://git.1wt.eu/git/haproxy.git
> Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3
> while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef.
> error: Fetch failed.
> lukas@ubuntuvm:~/haproxy-test$

We have this report from time to time with no clear explanation :-(
Here it seems the problem was a bit clearer.

When you download from git.1wt.eu, you pass via a cache (formilux.org)
so that git packs are retrieved faster.

There is one haproxy in front of this cache which reports this :

2013-06-23T07:42:16+02:00/86 127.0.0.1 haproxy[29509]: XX.XXX.XX.XX:39265 
[23/Jun/2013:07:41:44.662] public cache-1wt/cache 45/0/0/2079/32021 200 51531 - 
- SDNI 9/9/5/5/0 0/0 {git.1wt.eu} \"GET 
/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2e
fb57d3.pack HTTP/1.1\"

And on the site on the other side I'm seeing this :

Jun 23 09:42:16 rpx2 haproxy[1153]: 88.191.124.161:40531 
[23/Jun/2013:09:41:44.857] http-in www/www 3/0/1/13/31816 200 220007 - - cD-- 
1/1/1/1/0 0/0 {git.1wt.eu:81|git/1.7.9.5|XX.XXX.XX.XX, 1|||} 
{|323599|application/octet-st} "GET 
/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack
 HTTP/1.1" 

So it seems to me like this is the cache in the middle which tends to
hang on some connections. And probably that once the connection aborts,
the broken object is stored truncated in the cache.

I've just put the cache into maintenance so that connections will go
directly to the origin, if you want to retry. It will be even slower
but probably worth a try.

Regards,
Willy




RE: GIT clone fails, how to proceed?

2013-06-23 Thread Lukas Tribus
Hi,


> I find it strange that the 'normal' git repository (though slow) is
> unable to clone correctly. But i guess thats not so important if there
> is a good workaround / secondary up to date repository.

I agree, slow is one thing, not working is another thing.

Willy, can you take a look why cloning from git.1wt.eu fails?


lukas@ubuntuvm:~/haproxy-test$ git clone http://git.1wt.eu/git/haproxy.git/
Cloning into 'haproxy'...
error: Unable to get pack file 
http://git.1wt.eu/git/haproxy.git/objects/pack/pack-ad332087a4ea5a65ac90791a6d55f57f2efb57d3.pack
transfer closed with 272368 bytes remaining to read
error: Unable to find 85eb3ee8610b7a8389e78b3f342f6101467d31c3 under 
http://git.1wt.eu/git/haproxy.git
Cannot obtain needed blob 85eb3ee8610b7a8389e78b3f342f6101467d31c3
while processing commit 0a3dd74c9cd24ab77178c9ccc65c577a91648cef.
error: Fetch failed.
lukas@ubuntuvm:~/haproxy-test$




Thanks,

Lukas 


Re: GIT clone fails, how to proceed?

2013-06-22 Thread PiBa-NL

Hi Lukas,

Thanks, that works indeed.

Maybe its worth mentioning this url in the websites main page where the 
links to "Latest versions" is also present?
I find it strange that the 'normal' git repository (though slow) is 
unable to clone correctly. But i guess thats not so important if there 
is a good workaround / secondary up to date repository.


PiBa-NL

Op 22-6-2013 1:22, Lukas Tribus schreef:

Hi!


When trying to clone the repository it always seems to fail. (there have
been more reports of this in emails/irc of other users..)
Also it seams to take ages before it fails..

I'm using the formilux mirror, which is mentioned in the README:
git clone http://master.formilux.org/git/people/willy/haproxy.git/

Its up-to-date, reliable and fast.


Lukas   





RE: GIT clone fails, how to proceed?

2013-06-21 Thread Lukas Tribus
Hi!

> When trying to clone the repository it always seems to fail. (there have
> been more reports of this in emails/irc of other users..)
> Also it seams to take ages before it fails..

I'm using the formilux mirror, which is mentioned in the README:
git clone http://master.formilux.org/git/people/willy/haproxy.git/

Its up-to-date, reliable and fast.


Lukas