Re: Backend Server UP/Down Debugging?

2009-08-26 Thread Willy Tarreau
Hi Krzysztof,

glad to get news from you !

On Thu, Aug 27, 2009 at 08:45:23AM +0200, Krzysztof Oledzki wrote:
(...)
> >right now it's not archived. I would like to keep a local copy of
> >the last request sent and response received which caused a state
> >change, but that's not implemented yet. I wanted to clean up the
> >stats socket first, but now I realize that we could keep at least
> >some info (eg: HTTP status, timeout, ...) in the server struct
> >itself and report it in the log. Nothing of that is performed right
> >now, so you may have to tcpdump at best :-(
> 
> As always, I have a patch for that, solving it nearly exactly like you 
> described it. ;)

excellent. I was at least twice tempted to do it but lacked the time for
it! When the stats socket processing looks better, I hope to quickly add
a "show health" entry reporting all health check status.

> However for the last half year I have been rather silent, 
> mostly because it is very important time in my private life, so I think 
> I'm partially excused. ;)

Hey, you're completely excused, private life is more important than code :-)

> I know that there are some unfinished tasks (acl 
> for exapmple) so I'll try to push ASAP, maybe starting from the easier 
> patches, likt this ones. The rest will have to wait when I get back from 
> honeymoon.

No problem, your patches will be welcome as usual !

> >One trick you can do to make this experience better is to set a "port" 
> >or "addr" option on your server line to run the checks on a different 
> >ip/port combination. That way, you can filter that on tcpdump and only 
> >get the checks.
> 
> Or to use something like:
> 
> echo -e "GET  HTTP/1.0\r\nhost: \r\n"|nc  |less -S

that's an option too, indeed.

Best regards,
Willy




Re: Backend Server UP/Down Debugging?

2009-08-26 Thread Krzysztof Oledzki



On Thu, 27 Aug 2009, Willy Tarreau wrote:


Hi,

Hi,


On Wed, Aug 26, 2009 at 02:00:42PM -0700, Jonah Horowitz wrote:

I???m watching my servers on the back end and occasionally they flap.  I???m 
wondering if there is a way to see why they are taken out of service.  I???d 
like to see the actual response, or at least a HTTP status code.


right now it's not archived. I would like to keep a local copy of
the last request sent and response received which caused a state
change, but that's not implemented yet. I wanted to clean up the
stats socket first, but now I realize that we could keep at least
some info (eg: HTTP status, timeout, ...) in the server struct
itself and report it in the log. Nothing of that is performed right
now, so you may have to tcpdump at best :-(


As always, I have a patch for that, solving it nearly exactly like you 
described it. ;) However for the last half year I have been rather silent, 
mostly because it is very important time in my private life, so I think 
I'm partially excused. ;) I know that there are some unfinished tasks (acl 
for exapmple) so I'll try to push ASAP, maybe starting from the easier 
patches, likt this ones. The rest will have to wait when I get back from 
honeymoon.


One trick you can do to make this experience better is to set a "port" 
or "addr" option on your server line to run the checks on a different 
ip/port combination. That way, you can filter that on tcpdump and only 
get the checks.


Or to use something like:

echo -e "GET  HTTP/1.0\r\nhost: \r\n"|nc  |less -S

Best regards,

Krzysztof Olędzki

Re: TCP log format question

2009-08-26 Thread Willy Tarreau
Hello,

On Wed, Aug 26, 2009 at 06:24:16PM +0400, Dmitry Sivachenko wrote:
> Hello!
> 
> I am running haproxy-1.4-dev2 with the following
> configuration (excerpt):
> 
> global
> log /var/run/loglocal0
> user www
> group www
> daemon
> defaults
> log global
> modetcp
> balance roundrobin
> maxconn 2000
> option abortonclose
> option allbackups
> option httplog
> option dontlog-normal
> option dontlognull
> option redispatch
> option tcplog

I'm seeing that you have both "tcplog" and "httplog". Since they
both add a set of flags, the union of both is enabled which means
httplog to me. I should add a check for this so that tcplog disables
httplog.

> In my log file I see the following lines:
> Aug 26 18:19:50 balancer0-00 haproxy[66301]: A.B.C.D:28689 
> [26/Aug/2009:18:19:50.034] M-front M-native/ms1 -1/1/0/-1/3 -1 339 - - CD-- 
> 0/0/0/0/0 0/0 ""
> 
> 1) What does "" mean? I see no description of that field in
> documentation of TCP log format.

this is because of "option httplog".

> 2) Why *all* requests are being logged? 
> (note option dontlog-normal in default section).
> How should I change configuration to log only important events
> (errors) and do not log the fact connection was made and served?

Hmmm dontlog-normal only works in HTTP mode. Could you please
explain what type of normal connections you would want to log
and what type you would not want to log ? It could help making
a choice of implementation of dontlog-normal for tcplog.

Regards,
Willy




Re: Backend Server UP/Down Debugging?

2009-08-26 Thread Willy Tarreau
Hi,

On Wed, Aug 26, 2009 at 02:00:42PM -0700, Jonah Horowitz wrote:
> I???m watching my servers on the back end and occasionally they flap.  I???m 
> wondering if there is a way to see why they are taken out of service.  I???d 
> like to see the actual response, or at least a HTTP status code.

right now it's not archived. I would like to keep a local copy of
the last request sent and response received which caused a state
change, but that's not implemented yet. I wanted to clean up the
stats socket first, but now I realize that we could keep at least
some info (eg: HTTP status, timeout, ...) in the server struct
itself and report it in the log. Nothing of that is performed right
now, so you may have to tcpdump at best :-(

One trick you can do to make this experience better is to set a
"port" or "addr" option on your server line to run the checks on
a different ip/port combination. That way, you can filter that on
tcpdump and only get the checks.

Willy




Re: round robin

2009-08-26 Thread Willy Tarreau
On Wed, Aug 26, 2009 at 03:39:44PM +0200, Johan Duflost wrote:
> Hello Willy,
> 
> Ok I can understand this issue but when session stickyness is enabled, all 
> the objects should be retreived from the same server, isn't it?

yes you're right, they should.

> So it's probably not our problem.

Then check your server's downtime on the stats page. It's possible that
one server sometimes goes down and does not receive requests. Also
check the "Redis" column, which will indicate if a server sometimes
does not respond to connection requests, causing them to be retries
on the other server.

Willy




Backend Server UP/Down Debugging?

2009-08-26 Thread Jonah Horowitz
I’m watching my servers on the back end and occasionally they flap.  I’m 
wondering if there is a way to see why they are taken out of service.  I’d like 
to see the actual response, or at least a HTTP status code.

 

Jonah Horowitz · Monitoring Manager · jhorow...@looksmart.net 
 

w: 415.348.7694 · c: 415.513.7202 · f: 415.348.7020

625 Second Street, San Francisco, CA 94107

 



Re: HAProxy randomly returning 502s when balancing IIS

2009-08-26 Thread Miguel Pilar Vilagran
Hello,


On 8/26/09 3:45 AM, "Willy Tarreau"  wrote:

Hello,

On Tue, Aug 25, 2009 at 03:26:57PM -0400, Miguel Pilar Vilagran wrote:
> Hello all,
>
> I have tried to diagnose our HAProxy install to see if we can get it solved. 
> It seems that the HAProxy will randomly return 502s for existing items in the 
> end servers.
>
> The configuration is as follows:
> Ubuntu Server 9.04 ( Linux valkyrie 2.6.28-15-server #49-Ubuntu SMP Tue Aug 
> 18 20:09:37 UTC 2009 x86_64 GNU/Linux )
> HAProxy 1.3.20 (compiled from source with `make TARGET=linux26 USE_PCRE=1` 
> after `apt-get build-dep haproxy` )
>
> HAProxy Config can be found here: http://pastebin.com/f9014ec9
>
> I am using HAProxy to load balance a varnish install on two servers which 
> share virtual Ips through spread and wackamole. HAProxy runs on the same 
> server running varnish and then load balances the web servers.
>
> The http path is:
> (browser) -> [ (HAProxy:public) -> (Varnish) -> (HAProxy:varnish_front) ] -> 
> (IIS Cluster)
>
> Where everything between square brackets is (optimally) in the same host. 
> Varnish will only talk with the local HAProxy, but HAProxy will fail over to 
> the other varnish in the cluster (this allows for varnish to fail on a single 
> host, which is necessary because varnish has a fairly long restart time).
>
> All the servers are located in the same VLAN inside the same blade enclosure 
> and average latency is 0.150ms, .026ms mdev.
>
> A snippet of the log (filtered for 502 responses) can be found here: 
> http://pastebin.com/f6db9a351
>
> Previously (haproxy 1.3.15.5, the ubuntu default) the 502s ended with SH- but 
> now they are SL--.

Interesting. I think this means that the server aborts before sending the whole
response. If that's so, 1.3.20 should not log it SL (L means last packet of the
data part). Please try to get a tcpdump trace between haproxy and IIS, since
the behaviour is random, it is the only way to understand better what is
happening. Use tcpdump -s0 to get full packets capture.

Regards,
Willy


The dump has been sent to Willy directly. If anyone else needs it let me know.

--
Miguel Pilar


TCP log format question

2009-08-26 Thread Dmitry Sivachenko
Hello!

I am running haproxy-1.4-dev2 with the following
configuration (excerpt):

global
log /var/run/loglocal0
user www
group www
daemon
defaults
log global
modetcp
balance roundrobin
maxconn 2000
option abortonclose
option allbackups
option httplog
option dontlog-normal
option dontlognull
option redispatch
option tcplog
retries 2

frontend M-front
bind 0.0.0.0:17306
mode tcp
acl M-acl nbsrv(M-native) ge 5
use_backend M-native if M-acl
default_backend M-foreign

backend M-native
mode tcp
server ms1 ms1:17306 check maxconn 100 maxqueue 1 weight 100
server ms2 ms2:17306 check maxconn 100 maxqueue 1 weight 100
<...>

backend M-foreign
mode tcp
server ms3 ms3:17306 check maxconn 100 maxqueue 1 weight 100
server ms4 ms4:17306 check maxconn 100 maxqueue 1 weight 100

Note that both frontend and 2 backends are running in TCP mode.

In my log file I see the following lines:
Aug 26 18:19:50 balancer0-00 haproxy[66301]: A.B.C.D:28689 
[26/Aug/2009:18:19:50.034] M-front M-native/ms1 -1/1/0/-1/3 -1 339 - - CD-- 
0/0/0/0/0 0/0 ""

1) What does "" mean? I see no description of that field in
documentation of TCP log format.
2) Why *all* requests are being logged? 
(note option dontlog-normal in default section).
How should I change configuration to log only important events
(errors) and do not log the fact connection was made and served?

Thanks in advance!



Re: round robin

2009-08-26 Thread Johan Duflost

Hello Willy,

Ok I can understand this issue but when session stickyness is enabled, all 
the objects should be retreived from the same server, isn't it?

So it's probably not our problem.

Thanks.

Johan


- Original Message - 
From: "Willy Tarreau" 

To: "Johan Duflost" 
Cc: "Angelo Höngens" ; 
Sent: Saturday, August 22, 2009 7:51 AM
Subject: Re: round robin



On Fri, Aug 21, 2009 at 12:19:01PM +0200, Johan Duflost wrote:

Hello Willy,

Thank you for your answer.
I will check the stats.
Is this "resonance problem" specific to haproxy?


No not at all, it's specific to the round-robin
algorithm instead, which may be implemented in many
LBs. The most common way to observe this is when you
have two identical servers, an LB in round-robin mode,
and a first page with an even number of objects. You
try to log in on the servers through the LB and notice
you're always logging on to the same server. You can
hit reload as many times as you want, once your browser
has emitted all requests the round-robin pointer goes
back to the first server. This is very annoying when
doing benchmarks, because it tends to create unwanted
stickiness which makes you think that things might work
when they may not. Once the load starts and requests
flow in parallel, the effect totally disappears, of
course.

Willy










No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.65/2322 - Release Date: 08/23/09 
18:03:00





Re: HAProxy randomly returning 502s when balancing IIS

2009-08-26 Thread Willy Tarreau
Hello,

On Tue, Aug 25, 2009 at 03:26:57PM -0400, Miguel Pilar Vilagran wrote:
> Hello all,
> 
> I have tried to diagnose our HAProxy install to see if we can get it solved. 
> It seems that the HAProxy will randomly return 502s for existing items in the 
> end servers.
> 
> The configuration is as follows:
> Ubuntu Server 9.04 ( Linux valkyrie 2.6.28-15-server #49-Ubuntu SMP Tue Aug 
> 18 20:09:37 UTC 2009 x86_64 GNU/Linux )
> HAProxy 1.3.20 (compiled from source with `make TARGET=linux26 USE_PCRE=1` 
> after `apt-get build-dep haproxy` )
> 
> HAProxy Config can be found here: http://pastebin.com/f9014ec9
> 
> I am using HAProxy to load balance a varnish install on two servers which 
> share virtual Ips through spread and wackamole. HAProxy runs on the same 
> server running varnish and then load balances the web servers.
> 
> The http path is:
> (browser) -> [ (HAProxy:public) -> (Varnish) -> (HAProxy:varnish_front) ] -> 
> (IIS Cluster)
> 
> Where everything between square brackets is (optimally) in the same host. 
> Varnish will only talk with the local HAProxy, but HAProxy will fail over to 
> the other varnish in the cluster (this allows for varnish to fail on a single 
> host, which is necessary because varnish has a fairly long restart time).
> 
> All the servers are located in the same VLAN inside the same blade enclosure 
> and average latency is 0.150ms, .026ms mdev.
> 
> A snippet of the log (filtered for 502 responses) can be found here: 
> http://pastebin.com/f6db9a351
> 
> Previously (haproxy 1.3.15.5, the ubuntu default) the 502s ended with SH- but 
> now they are SL--.

Interesting. I think this means that the server aborts before sending the whole
response. If that's so, 1.3.20 should not log it SL (L means last packet of the
data part). Please try to get a tcpdump trace between haproxy and IIS, since
the behaviour is random, it is the only way to understand better what is
happening. Use tcpdump -s0 to get full packets capture.

Regards,
Willy




Re: reqrep always replaces, even when position \x is not found?

2009-08-26 Thread Willy Tarreau
Hi,

On Mon, Aug 24, 2009 at 12:10:41PM +0100, Pedro Mata-Mouros Fonseca wrote:
> Greetings,
> 
> Sorry for the "double" posting, let me just try to simplify the  
> exposition of the problem. I'm trying to rewrite a URI using regular  
> expression placeholders, but I'm having a problem that when a  
> placeholder doesn't match anything it still gets replaced in the  
> destination with a '1'. This is the configuration directive (basically  
> if the URI is rss2, rewrite to RSS2; but if it is rss only, rewrite to  
> RSS):
> 
> reqrep ^([^\ ]*)\ /rss(2*)(.*)   \1\ /Endpoint/RSS\2\3
> 
> Calling rss2 correctly rewrites the URI:
> => /rss2?u=username correctly rewrites to /Endpoint/RSS2?u=username
> 
> However calling rss (without the '2') triggers a wrong replacement:
> => /rss?u=username incorrectly rewrites to /Endpoint/RSS1?u=username,  
> instead of /Endpoint/RSS?u=username
> 
> This is the content of the log:
> 10.134.15.124 - - [21/Aug/2009:12:08:33 +0100] "GET /Endpoint/RSS2? 
> u=username HTTP/1.1" 404 209
> 10.134.15.124 - - [21/Aug/2009:12:09:05 +0100] "GET /Endpoint/RSS1? 
> u=username HTTP/1.1" 404 209
> 
> Many thanks again.

For me it works well :
  GET /Endpoint/RSS?u=username HTTP/1.0

So it looks like a bug in the regex lib. My version was built with PCRE
and with glibc's standard regex library. What OS/regex lib are you using ?
A lot of those libs are buggy, and before I discovered PCRE, I had tried
several alternatives with pretty unfortunate experiences !

Regards,
Willy




Re: Compilation of haproxy-1.4-dev2 on FreeBSD

2009-08-26 Thread Willy Tarreau
Hello,

On Mon, Aug 24, 2009 at 03:11:06PM +0400, Dmitry Sivachenko wrote:
> Hello!
> 
> Please consider the following patches. They are required to
> compile haproxy-1.4-dev2 on FreeBSD.
> 
> Summary:
> 1) include  before 
> 2) Use IPPROTO_TCP instead of SOL_TCP
> (they are both defined as 6, TCP protocol number)

Will apply it, thanks !

Regards,
Willy