Re: http-reuse always, work quite well

2016-10-27 Thread Willy Tarreau
Hi Brendan,

On Sat, Oct 22, 2016 at 11:39:51AM -0400, Brendan Kearney wrote:
> On 10/22/2016 02:08 AM, Willy Tarreau wrote:
> 
> > You're welcome. Please note that the reuse mechanism is not perfect and
> > can still be improved. So do not hesitate to report any issue you find,
> > we definitely need real-world feedback like this. I cannot promise that
> > every issue will be fixed, but at least we need to consider them and see
> > what can be done.
> > 
> > Cheers,
> > Willy
> > 
> i have http interception in place, using iptables/DNAT to redirect traffic
> to haproxy and load balance to 2 squid instances.  i was using aggressive
> mode http-reuse and it seemed to provide better streaming experience for
> roku/sling.

In fact it can be better for all use cases since you save one round trip
for most requests by reusing an already connected connection.

> after a period of time, the performance degraded and the
> experience was worse than original state.  buffering, lag and pixelation are
> the symptoms.  i did not try to use the always mode, and turned http-reuse
> off for the interception i am doing.  the issue has cleared since.

That's a useful report. I don't see a reason here. Hmmm thinking about it
there could be one in fact. Maybe you ended up with too many connections
on your squids ? But even in that situation, aggressive would still ensure
that some of the requests are properly delivered. Another possibility is
that some TIME_WAIT connections have been accumulating on the haproxy
machine in the direction of the squid servers, because aggressive will
still result in closing some connections when they cannot be stolen. What
OS are you using ? On Linux it's known to close cleanly (SO_LINGER works
well), I don't know for other OSes. Or do you happen to have a firewall
between linux and squid which would sometimes catch the reset but with
the squid occasionally not getting it ? That's an issue which sometimes
affects http-server-close as well.

> while interception and transparent proxying seem to be problematic, explicit
> proxying and internal http have both seen a marked improvement in
> performance.

I'm now thinking that transparent could indeed be an issue in this case,
because you can sometimes force to close an existing connection from an
ip:port source, and suddenly when the client speaks it has to immediately
reopen. I'd say that in terms of logging it's a bit ugly to reuse someone
else's connection because the requests that come to your squid can pretend
to be from client A while in fact client A was the first one to cause the
connection to establish and client B is the one sending the next request.

Do you use "usesrc client" or "usesrc clientip" ? The former will force
the same ip:port to be reused and it can indeed cause some trouble due
to the other side not always getting the close in time. If the latter,
the source port is dynamically allocated so the ports should rotate.

> no scientific collection of data has been done, but page load
> times have been noticeably improved.  i may move from aggressive to always
> for these backends.

In my experience, Squid is quite good at keeping connections opened, so
yes it might work well without causing trouble to users.

Willy



Re: http-reuse always, work quite well

2016-10-25 Thread Pavlos Parissis
On 22/10/2016 08:08 πμ, Willy Tarreau wrote:
> Hi Pavlos,
> 
> On Fri, Oct 21, 2016 at 03:01:52PM +0200, Pavlos Parissis wrote:
>>> I'm not surprized that always works better, but my point is that if it's
>>> much better it can be useful to stay with it, but if it's only 1% better
>>> it's not worth it.
>>>
>>
>> It is way better:-), see Marcin's response.
> 
> Ah sorry, I missed it. Indeed it looks much better, but we don't have
> the reference (no reuse) on this graph.

I will try to rerun the test tomorrow, which runs on production servers with
real user traffic:-)

> If the no reuse shows 10 times
> higher average times, then it means "safe" reuse brings a 10 times
> improvement and "always" brings 20 times so it's a matter of choice.
> However if safe does approximately the same as no reuse, for sure
> "always" is almost needed.
> 
> while "always" is optimal, strictly speaking it's
> not very clean if the clients are not always willing to retry a failed
> first request, and browsers typically fall into that category. A real
> world case can be a request dequeued to a connection that just closes.

 What is the response of HAProxy to clients in this case? HTTP 50N?
>>>
>>> No, the client-side connection will simply be aborted so that the client
>>> can decide whether to retry or not.
>>
>> Connection will be aborted by haproxy sending TCP RST?
> 
> As much as possible yes. The principle is to let the client retry the
> request (since it is the only one knowing whether it's safe or not).
> 
>>> I'd suggest a rule of thumb (maybe this should be added to the doc) : watch
>>> your logs over a long period. If you don't see queue timeouts, nor request
>>> timeouts, it's probably safe enough to use "always".
>>
>> Which field on the log do we need to watch? Tq?
> 
> Tw (time spent waiting in the queue), Tc (time spent getting a connection),
> and of course the termination flags, everything with a C or Q on the second
> char needs to be analysed.
> 

I looked at the logs for a period of 11hours and found zero occurrences of C or
Q. I also didn't noticed any change on Tw and Tc timers. I will keep an eye.

>>> Each time you notice
>>> one of them, there is a small risk of impacting another client. It's not
>>> rocket science but the risks depend on the same parameters.
>>
>>
>> Thanks a lot for yet another reach in information replies.
> 
> You're welcome. Please note that the reuse mechanism is not perfect and
> can still be improved. So do not hesitate to report any issue you find,
> we definitely need real-world feedback like this. I cannot promise that
> every issue will be fixed, but at least we need to consider them and see
> what can be done.
> 

Acked, I will report any issues we may find.

Thanks,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: http-reuse always, work quite well

2016-10-22 Thread Brendan Kearney

On 10/22/2016 02:08 AM, Willy Tarreau wrote:


You're welcome. Please note that the reuse mechanism is not perfect and
can still be improved. So do not hesitate to report any issue you find,
we definitely need real-world feedback like this. I cannot promise that
every issue will be fixed, but at least we need to consider them and see
what can be done.

Cheers,
Willy

i have http interception in place, using iptables/DNAT to redirect 
traffic to haproxy and load balance to 2 squid instances.  i was using 
aggressive mode http-reuse and it seemed to provide better streaming 
experience for roku/sling.  after a period of time, the performance 
degraded and the experience was worse than original state.  buffering, 
lag and pixelation are the symptoms.  i did not try to use the always 
mode, and turned http-reuse off for the interception i am doing.  the 
issue has cleared since.


while interception and transparent proxying seem to be problematic, 
explicit proxying and internal http have both seen a marked improvement 
in performance.  no scientific collection of data has been done, but 
page load times have been noticeably improved.  i may move from 
aggressive to always for these backends.


keep up the good work, and thanks for some really great software,

brendan




Re: http-reuse always, work quite well

2016-10-22 Thread Willy Tarreau
Hi Pavlos,

On Fri, Oct 21, 2016 at 03:01:52PM +0200, Pavlos Parissis wrote:
> > I'm not surprized that always works better, but my point is that if it's
> > much better it can be useful to stay with it, but if it's only 1% better
> > it's not worth it.
> > 
> 
> It is way better:-), see Marcin's response.

Ah sorry, I missed it. Indeed it looks much better, but we don't have
the reference (no reuse) on this graph. If the no reuse shows 10 times
higher average times, then it means "safe" reuse brings a 10 times
improvement and "always" brings 20 times so it's a matter of choice.
However if safe does approximately the same as no reuse, for sure
"always" is almost needed.

> >>> while "always" is optimal, strictly speaking it's
> >>> not very clean if the clients are not always willing to retry a failed
> >>> first request, and browsers typically fall into that category. A real
> >>> world case can be a request dequeued to a connection that just closes.
> >>
> >> What is the response of HAProxy to clients in this case? HTTP 50N?
> > 
> > No, the client-side connection will simply be aborted so that the client
> > can decide whether to retry or not.
> 
> Connection will be aborted by haproxy sending TCP RST?

As much as possible yes. The principle is to let the client retry the
request (since it is the only one knowing whether it's safe or not).

> > I'd suggest a rule of thumb (maybe this should be added to the doc) : watch
> > your logs over a long period. If you don't see queue timeouts, nor request
> > timeouts, it's probably safe enough to use "always".
> 
> Which field on the log do we need to watch? Tq?

Tw (time spent waiting in the queue), Tc (time spent getting a connection),
and of course the termination flags, everything with a C or Q on the second
char needs to be analysed.

> > Each time you notice
> > one of them, there is a small risk of impacting another client. It's not
> > rocket science but the risks depend on the same parameters.
> 
> 
> Thanks a lot for yet another reach in information replies.

You're welcome. Please note that the reuse mechanism is not perfect and
can still be improved. So do not hesitate to report any issue you find,
we definitely need real-world feedback like this. I cannot promise that
every issue will be fixed, but at least we need to consider them and see
what can be done.

Cheers,
Willy



Re: http-reuse always, work quite well

2016-10-21 Thread Pavlos Parissis
On 21/10/2016 08:14 πμ, Willy Tarreau wrote:
> Hi Pavlos,
> 
> On Wed, Oct 19, 2016 at 08:28:34AM +0200, Pavlos Parissis wrote:
>>> That's really great, thanks for the feedback. Have you tried the other
>>> http-reuse options ?
>>
>> A workmate did the experimentation on http-reuse and I only know that 
>> 'always'
>> worked better for us. I will ask him to provide some details.
> 
> I'm not surprized that always works better, but my point is that if it's
> much better it can be useful to stay with it, but if it's only 1% better
> it's not worth it.
> 

It is way better:-), see Marcin's response.

>>> while "always" is optimal, strictly speaking it's
>>> not very clean if the clients are not always willing to retry a failed
>>> first request, and browsers typically fall into that category. A real
>>> world case can be a request dequeued to a connection that just closes.
>>
>> What is the response of HAProxy to clients in this case? HTTP 50N?
> 
> No, the client-side connection will simply be aborted so that the client
> can decide whether to retry or not.

Connection will be aborted by haproxy sending TCP RST?

> Sometimes even the first request of
> the connection will benefit from a retry, but normally only subsequent
> requests are supposed to be retried.
> 
>>> In theory in your case, "aggressive" should do the same as "always",
>>> though since you know your applications it will not improve anything.
>>> However if "safe" works well enough (even if it causes a few more
>>> connections), you should instead use it. If the gains are minimal,
>>> then you'll have to compare :-)
>>>
>>> Oh, last thing, "always" is generally fine when connecting to a static
>>> server or a cache because you don't break end-user browsing session in
>>> the rare case an error happens.
>>>
>>
>> Fortunately, our applications don't suffer from this. Applications don't 
>> store
>> locally any session information. If a browser requests 10 HTTP requests over 
>> 4
>> TCP connection, those HTTP requests will be always delivered to different
>> servers. Furthermore, we use uWSGI which does also load balancing of 
>> requests to
>> a set of processes which do not share any information.
> 
> OK but what I meant is that if a request fails on the application side, it
> generally has some impact on the user's browsing session. A post which
> returns an error, some automatic filling of a list not being performed,
> etc. In browsers nowadays it's hard to force to retry a failed action
> and reloading the page is not always an option. For a static server if
> an image fails to load, that's just a minor issue and the user can always
> right-click on it, select "view image" and reload it.
> 
> I'd suggest a rule of thumb (maybe this should be added to the doc) : watch
> your logs over a long period. If you don't see queue timeouts, nor request
> timeouts, it's probably safe enough to use "always".

Which field on the log do we need to watch? Tq?

> Each time you notice
> one of them, there is a small risk of impacting another client. It's not
> rocket science but the risks depend on the same parameters.


Thanks a lot for yet another reach in information replies.
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: http-reuse always, work quite well

2016-10-21 Thread Willy Tarreau
Hi Pavlos,

On Wed, Oct 19, 2016 at 08:28:34AM +0200, Pavlos Parissis wrote:
> > That's really great, thanks for the feedback. Have you tried the other
> > http-reuse options ?
> 
> A workmate did the experimentation on http-reuse and I only know that 'always'
> worked better for us. I will ask him to provide some details.

I'm not surprized that always works better, but my point is that if it's
much better it can be useful to stay with it, but if it's only 1% better
it's not worth it.

> > while "always" is optimal, strictly speaking it's
> > not very clean if the clients are not always willing to retry a failed
> > first request, and browsers typically fall into that category. A real
> > world case can be a request dequeued to a connection that just closes.
> 
> What is the response of HAProxy to clients in this case? HTTP 50N?

No, the client-side connection will simply be aborted so that the client
can decide whether to retry or not. Sometimes even the first request of
the connection will benefit from a retry, but normally only subsequent
requests are supposed to be retried.

> > In theory in your case, "aggressive" should do the same as "always",
> > though since you know your applications it will not improve anything.
> > However if "safe" works well enough (even if it causes a few more
> > connections), you should instead use it. If the gains are minimal,
> > then you'll have to compare :-)
> > 
> > Oh, last thing, "always" is generally fine when connecting to a static
> > server or a cache because you don't break end-user browsing session in
> > the rare case an error happens.
> > 
> 
> Fortunately, our applications don't suffer from this. Applications don't store
> locally any session information. If a browser requests 10 HTTP requests over 4
> TCP connection, those HTTP requests will be always delivered to different
> servers. Furthermore, we use uWSGI which does also load balancing of requests 
> to
> a set of processes which do not share any information.

OK but what I meant is that if a request fails on the application side, it
generally has some impact on the user's browsing session. A post which
returns an error, some automatic filling of a list not being performed,
etc. In browsers nowadays it's hard to force to retry a failed action
and reloading the page is not always an option. For a static server if
an image fails to load, that's just a minor issue and the user can always
right-click on it, select "view image" and reload it.

I'd suggest a rule of thumb (maybe this should be added to the doc) : watch
your logs over a long period. If you don't see queue timeouts, nor request
timeouts, it's probably safe enough to use "always". Each time you notice
one of them, there is a small risk of impacting another client. It's not
rocket science but the risks depend on the same parameters.

Cheers,
Willy



Re: http-reuse always, work quite well

2016-10-19 Thread Pavlos Parissis
On 15/10/2016 09:31 πμ, Willy Tarreau wrote:
> Hi Pavlos,
> 
> On Fri, Oct 14, 2016 at 04:33:20PM +0200, Pavlos Parissis wrote:
>> Hi,
>>
>> I just want to drop a note and mention that http-reuse works very well
>> for us:
>>
>> % ss -t state established '( sport = :http )'|wc -l
>> 2113
>>
>> % ss -t state established '( dport = :http )'| wc -l
>> 408
>>
>> As, you can see connections established to backend servers are much less
>> than the connections from clients. In the attached graph you can see
>> 24-hour pattern doesn't influence that much the number of connections to
>> backend.
> 
> That's really great, thanks for the feedback. Have you tried the other
> http-reuse options ?

A workmate did the experimentation on http-reuse and I only know that 'always'
worked better for us. I will ask him to provide some details.

> while "always" is optimal, strictly speaking it's
> not very clean if the clients are not always willing to retry a failed
> first request, and browsers typically fall into that category. A real
> world case can be a request dequeued to a connection that just closes.

What is the response of HAProxy to clients in this case? HTTP 50N?

> Of course this very rarely happens, but "very rarely" is not something
> your end users will accept when they face it :-)
> 

True, very much true.

> In theory in your case, "aggressive" should do the same as "always",
> though since you know your applications it will not improve anything.
> However if "safe" works well enough (even if it causes a few more
> connections), you should instead use it. If the gains are minimal,
> then you'll have to compare :-)
> 
> Oh, last thing, "always" is generally fine when connecting to a static
> server or a cache because you don't break end-user browsing session in
> the rare case an error happens.
> 

Fortunately, our applications don't suffer from this. Applications don't store
locally any session information. If a browser requests 10 HTTP requests over 4
TCP connection, those HTTP requests will be always delivered to different
servers. Furthermore, we use uWSGI which does also load balancing of requests to
a set of processes which do not share any information.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: http-reuse always, work quite well

2016-10-15 Thread Willy Tarreau
Hi Pavlos,

On Fri, Oct 14, 2016 at 04:33:20PM +0200, Pavlos Parissis wrote:
> Hi,
> 
> I just want to drop a note and mention that http-reuse works very well
> for us:
> 
> % ss -t state established '( sport = :http )'|wc -l
> 2113
> 
> % ss -t state established '( dport = :http )'| wc -l
> 408
> 
> As, you can see connections established to backend servers are much less
> than the connections from clients. In the attached graph you can see
> 24-hour pattern doesn't influence that much the number of connections to
> backend.

That's really great, thanks for the feedback. Have you tried the other
http-reuse options ? while "always" is optimal, strictly speaking it's
not very clean if the clients are not always willing to retry a failed
first request, and browsers typically fall into that category. A real
world case can be a request dequeued to a connection that just closes.
Of course this very rarely happens, but "very rarely" is not something
your end users will accept when they face it :-)

In theory in your case, "aggressive" should do the same as "always",
though since you know your applications it will not improve anything.
However if "safe" works well enough (even if it causes a few more
connections), you should instead use it. If the gains are minimal,
then you'll have to compare :-)

Oh, last thing, "always" is generally fine when connecting to a static
server or a cache because you don't break end-user browsing session in
the rare case an error happens.

Thanks for the feedback!
Willy



Re: http-reuse always, work quite well

2016-10-14 Thread Pavlos Parissis
On 14/10/2016 08:49 μμ, Aleksandar Lazic wrote:
> Hi
> 
> Am 14-10-2016 16:33, schrieb Pavlos Parissis:
>> Hi,
>>
>> I just want to drop a note and mention that http-reuse works very well
>> for us:
>>
>> % ss -t state established '( sport = :http )'|wc -l
>> 2113
>>
>> % ss -t state established '( dport = :http )'| wc -l
>> 408
>>
>> As, you can see connections established to backend servers are much less
>> than the connections from clients. In the attached graph you can see
>> 24-hour pattern doesn't influence that much the number of connections to
>> backend.
> 
> Great ;-)
> 
> Just for my curiosity which great working version do you have in place?
> 

We use HAPEE 1.6 which is HAProxy 1.6.9 version plus some patches from 1.7.

You should expect the same behavior from HAProxy.

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: http-reuse always, work quite well

2016-10-14 Thread Aleksandar Lazic

Hi

Am 14-10-2016 16:33, schrieb Pavlos Parissis:

Hi,

I just want to drop a note and mention that http-reuse works very well
for us:

% ss -t state established '( sport = :http )'|wc -l
2113

% ss -t state established '( dport = :http )'| wc -l
408

As, you can see connections established to backend servers are much 
less

than the connections from clients. In the attached graph you can see
24-hour pattern doesn't influence that much the number of connections 
to

backend.


Great ;-)

Just for my curiosity which great working version do you have in place?

Thank you very much for your answer.

Best regards
Aleks