Re: Some thoughts about redispatch
Hi Dmitry, On Mon, May 26, 2014 at 06:28:33PM +0400, Dmitry Sivachenko wrote: On 26 ??? 2014 ?., at 18:21, Willy Tarreau w...@1wt.eu wrote: I think it definitely makes some sense. Probably not in its exact form but as something to work on. In fact, I think we should only apply the 1s retry delay when remaining on the same server, and avoid as much a possible to remain on the same server. For hashes or when there's a single server, we have no choice, but when doing round robin for example, we can pick another one. This is especially true for static servers or ad servers for example where fastest response time is preferred over sticking to the same server. Yes, that was exactly my point. In many situations it is better to ask another server immediately to get fastest response rather than trying to stick to the same server as much as possible. So worked a bit on this subject. It's far from being obvious. The problem is that at the moment where we decide of the 1s delay before a retry, we don't know if we'll end up on the same server or not. Thus I'm thinking about this : - if the connection is persistent (cookie, etc...), apply the current retry mechanism, as we absolutely don't want to break application sessions ; - otherwise, we redispatch starting on the first retry as you suggest. But then we have two possibilities for the delay before reconnecting. If the server farm has more than 1 server and the balance algorithm is not a hash nor first, then we don't apply the delay because we expect to land on a different server with a high probability. Otherwise we keep the delay because we're almost certain to land on the same server. This way it continues to silently mask occasional server restarts and is optimally efficient in stateless farms when there's a possibility to quickly pick another server. Do you see any other point that needs specific care ? Regards, Willy
Re: Some thoughts about redispatch
On 28 мая 2014 г., at 11:13, Willy Tarreau w...@1wt.eu wrote: Hi Dmitry, So worked a bit on this subject. It's far from being obvious. The problem is that at the moment where we decide of the 1s delay before a retry, we don't know if we'll end up on the same server or not. Thus I'm thinking about this : - if the connection is persistent (cookie, etc...), apply the current retry mechanism, as we absolutely don't want to break application sessions ; I agree. - otherwise, we redispatch starting on the first retry as you suggest. But then we have two possibilities for the delay before reconnecting. If the server farm has more than 1 server and the balance algorithm is not a hash nor first, then we don't apply the delay because we expect to land on a different server with a high probability. Otherwise we keep the delay because we're almost certain to land on the same server. This way it continues to silently mask occasional server restarts and is optimally efficient in stateless farms when there's a possibility to quickly pick another server. Do you see any other point that needs specific care ? I would export that magic 1 second as a configuration parameter (with 0 meaning no delay). After all, we could fail to connect not only because of server restart, but also because a switch or a router dropped a packet. Other than that, sounds good. Thanks!
Re: Some thoughts about redispatch
On 28 мая 2014 г., at 12:49, Willy Tarreau w...@1wt.eu wrote: On Wed, May 28, 2014 at 12:35:17PM +0400, Dmitry Sivachenko wrote: - otherwise, we redispatch starting on the first retry as you suggest. But then we have two possibilities for the delay before reconnecting. If the server farm has more than 1 server and the balance algorithm is not a hash nor first, then we don't apply the delay because we expect to land on a different server with a high probability. Otherwise we keep the delay because we're almost certain to land on the same server. This way it continues to silently mask occasional server restarts and is optimally efficient in stateless farms when there's a possibility to quickly pick another server. Do you see any other point that needs specific care ? I would export that magic 1 second as a configuration parameter (with 0 meaning no delay). I'm not sure we need to add another tunable just for this. Okay. After all, we could fail to connect not only because of server restart, but also because a switch or a router dropped a packet. No, because a dropped packet is already handled by the TCP stack. Here the haproxy retry is really about retrying after an explicit failure (server responded that the port was closed). Also, the typical TCP retransmit interval for dropped packets in the network stack is 3s, so we're already 3 times as fast as the TCP stack. I don't think it's reasonable to always kill this delay when retrying on the same server. We used to have that in the past and people were complaining that we were hammering servers for no reason, since there's little chance that a server which is not started will suddenly be ready in the next 100 microseconds. I mean that with timeout connect=100ms (good value for local network IMO), we are far away from TCP restransmit timeout and if a switch drops a packet (it drops randomly and it can transmit next one even if we retry immediately). If we have a tunable (let's make a default 1 second), people will have more freedom in some situations.
Re: Some thoughts about redispatch
On Wed, May 28, 2014 at 12:54:47PM +0400, Dmitry Sivachenko wrote: After all, we could fail to connect not only because of server restart, but also because a switch or a router dropped a packet. No, because a dropped packet is already handled by the TCP stack. Here the haproxy retry is really about retrying after an explicit failure (server responded that the port was closed). Also, the typical TCP retransmit interval for dropped packets in the network stack is 3s, so we're already 3 times as fast as the TCP stack. I don't think it's reasonable to always kill this delay when retrying on the same server. We used to have that in the past and people were complaining that we were hammering servers for no reason, since there's little chance that a server which is not started will suddenly be ready in the next 100 microseconds. I mean that with timeout connect=100ms (good value for local network IMO), we are far away from TCP restransmit timeout and if a switch drops a packet (it drops randomly and it can transmit next one even if we retry immediately). If we have a tunable (let's make a default 1 second), people will have more freedom in some situations. OK but then you make an interesting point with your very low timeout connect. What about using the min of timeout connect and 1s then ? Thus you can simply use your lower timeout connect as this new timeout. Would that be OK for you ? Willy
Re: Some thoughts about redispatch
On 28 мая 2014 г., at 13:06, Willy Tarreau w...@1wt.eu wrote: OK but then you make an interesting point with your very low timeout connect. What about using the min of timeout connect and 1s then ? Thus you can simply use your lower timeout connect as this new timeout. Would that be OK for you ? Sounds reasonable (provided we are talking only about redispatch to the same server, not to the other one).
Re: Some thoughts about redispatch
On Wed, May 28, 2014 at 01:11:47PM +0400, Dmitry Sivachenko wrote: On 28 ?? 2014 ??., at 13:06, Willy Tarreau w...@1wt.eu wrote: OK but then you make an interesting point with your very low timeout connect. What about using the min of timeout connect and 1s then ? Thus you can simply use your lower timeout connect as this new timeout. Would that be OK for you ? Sounds reasonable (provided we are talking only about redispatch to the same server, not to the other one). of course :-) I'll try to sketch something like this then. Willy
Theme: Strengthen Capacity to Enhance Sustainability
Invitation: International Training Colloquium Theme: Strenghtening Capacity in Sustainable Development Dates and Locations: June 9 - 13, 2014 | New York City, NY - United States June 16 - 20, 2014 | Lome, Republic of Togo Organizer: Ecofuture Fund for Sustainable Development. Dear Colleagues, We are delighted to invite you to the International Training Colloquium in Sustainable Development that will be held in : New York City, NY - United States, from June 9th to June 13th, 2014 and Lome, Republic of Togo, from June 16th to June 20th, 2014 More than 20 innovative and informative workshops sessions, interactive discussions and impactful networking prospects will comprise the Environment, Culture, Economy and Society international Colloquium themed Strenghtening Capacity in Sustainable Development, presented by Ecofuture Fund. Participants will be exposed to a variety of innovative solutions to common problems faced in the sustainable development sector and beyond, and gained tools for finding new ways to approach and solve their organization's challenges. The Ecofuture Fund is conducting Capacity Strenghtening Training Colloquium designed for people who plan to work or volunteer in community development, or who already work in this field and want to advance their careers. The Ecofuture Fund uses as its basis, a framework of rights-based development and the Ecofuture Fund Approach. This multi-sectoral, participatory approach focuses on the empowerment of people as both the ends and means of a genuine sustainable development process. We are training and consulting with organizations from around the world. Overview: This program uses a multi-sector, participatory approach that focuses on empowerment of people as both the ends and means of a sustainable development process. Rather than teaching prescriptive solutions to community problems, we provide you with the tools to use the community's input and vision to create options and solutions that truly meet community needs. The program helps you learn through case studies, exercises, and group discussions, providing a complete learning environment. You will learn from, and share experiences with, practitioners working in the field around the world. This program gives experienced practitioners a fresh perspective and provides novices and volunteers the training they need to be successful working in the field of development. Description: This year, we are keen on further increasing the number of participants by proposing an even more attractive, innovative and focused program that handles all the aspects of sustainable development and capacity Strenghtening. More than 20 courses will be offered during the training event including: Microfinance and the Role of Women, Local Communities and Climate Change Mitigation Strategies, Social Entrepreneurship and Enterprise Development, Financial Management, Fundraising Techniques etc. The complete courses list and description can be found on the program webpage at: http://ecofuturefund.org/workshops.html We expect participants from: Governments and International Organizations, Civil Society, the Business Public and Private Sectors, Academics Institutions and Leaders and Private Individuals amongst others. Availability of attendance sponsorship: We are informing you of the availability of attendance sponsorship intended to cover the participation of delegates from middle and low income countries. The sponsorship is to encourage quality research, mobility and networking among participants. For more information and availability, please contact the Registration Desk. Registration: To register for the events, please email i...@ecofuturefund.org the Organizing Committee for the registration modalities and requirements. Due to time constraints, interested participants are strongly advised to confirm attendance beforehand. While we anticipate your response at your earliest convenience, please do not hesitate to contact us for further information. We are looking forward to your effective participation in the forthcoming events. Sincerely, Katrina Rogers, Ph.D Director Training and Events EcoFuture Fund for Sustainable Development 43355 State HWY 28 Arkville, NY 12406 Phone : +1.801.382.4460 Fax: +1.801.382.4460 Email: i...@ecofuturefund.org Website : www.ecofuturefund.org New York - United States
Re: Some thoughts about redispatch
On 28 мая 2014 г., at 11:13, Willy Tarreau w...@1wt.eu wrote: - otherwise, we redispatch starting on the first retry as you suggest. But then we have two possibilities for the delay before reconnecting. If the server farm has more than 1 server and the balance algorithm is not a hash nor first, then we don't apply the delay because we expect to land on a different server with a high probability. BTW, I thought that with option redispatch we will *always* retry on another server (if there are several servers in backend configured and balance algorithm is leastconn or round-robin). Why do you say with a high probability here?
Re: Some thoughts about redispatch
On Wed, May 28, 2014 at 01:24:28PM +0400, Dmitry Sivachenko wrote: On 28 ?? 2014 ??., at 11:13, Willy Tarreau w...@1wt.eu wrote: - otherwise, we redispatch starting on the first retry as you suggest. But then we have two possibilities for the delay before reconnecting. If the server farm has more than 1 server and the balance algorithm is not a hash nor first, then we don't apply the delay because we expect to land on a different server with a high probability. BTW, I thought that with option redispatch we will *always* retry on another server (if there are several servers in backend configured and balance algorithm is leastconn or round-robin). Why do you say with a high probability here? Because : - some determinist algorithms (typically hashes) will definitely end up on the same server - some almost-determinist algorithms (like first) will very likely end up on the same server - some other algorithms (like leastconn) may end up on the same server, if this server still has less connections than other ones. - the other servers might have a zero weight, preventing us from using them - the farm size can always change (due to health checks) Willy
use_backend
Hi all, I created a lot of ACL's to select to which server a request needs to go. The issue I'm facing now is that I want to redirect my own request (based on IP) to 1 specific server. Optimally this would be: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server and ( goto_server1 or goto_server2 ) use_backend Server1 if goto_server1 or goto_server2 This would redirect my own IP still to the errors if the server is not available. Thanks!
Re: use_backend
On Wed, May 28, 2014 at 2:03 PM, Steven Van Ingelgem ste...@vaningelgem.be wrote: Hi all, I created a lot of ACL's to select to which server a request needs to go. The issue I'm facing now is that I want to redirect my own request (based on IP) to 1 specific server. Optimally this would be: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server and ( goto_server1 or goto_server2 ) use_backend Server1 if goto_server1 or goto_server2 This would redirect my own IP still to the errors if the server is not available. Thanks! Hi Steven, There is not way to use parenthesis in HAProxy when writing rules. I would split content switching related to your TestServer in two: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_server1 use_backend TestServer if goto_test_server goto_server2 use_backend Server1 if goto_server1 || goto_server2 Or you can specify a dedicated acl for test server urls: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. acl goto_servertest_url hdr_beg(host) -i abc. def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_servertest_url use_backend Server1 if goto_server1 || goto_server2 Baptiste
Re: use_backend
How many entries can I add in 1 acl? Because I splitted 1 of the acls up in 14 lines, with each line about 40 items. I think I did it so a human could still read the configuration file, but does that matter for HAProxy? Thanks On 28 May 2014 14:11, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 2:03 PM, Steven Van Ingelgem ste...@vaningelgem.be wrote: Hi all, I created a lot of ACL's to select to which server a request needs to go. The issue I'm facing now is that I want to redirect my own request (based on IP) to 1 specific server. Optimally this would be: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server and ( goto_server1 or goto_server2 ) use_backend Server1 if goto_server1 or goto_server2 This would redirect my own IP still to the errors if the server is not available. Thanks! Hi Steven, There is not way to use parenthesis in HAProxy when writing rules. I would split content switching related to your TestServer in two: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_server1 use_backend TestServer if goto_test_server goto_server2 use_backend Server1 if goto_server1 || goto_server2 Or you can specify a dedicated acl for test server urls: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. acl goto_servertest_url hdr_beg(host) -i abc. def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_servertest_url use_backend Server1 if goto_server1 || goto_server2 Baptiste
HTTPS Redirects to HTTP
I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN= www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327]
Re: use_backend
On Wed, May 28, 2014 at 2:15 PM, Steven Van Ingelgem ste...@vaningelgem.be wrote: How many entries can I add in 1 acl? Because I splitted 1 of the acls up in 14 lines, with each line about 40 items. I think I did it so a human could still read the configuration file, but does that matter for HAProxy? Thanks On 28 May 2014 14:11, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 2:03 PM, Steven Van Ingelgem ste...@vaningelgem.be wrote: Hi all, I created a lot of ACL's to select to which server a request needs to go. The issue I'm facing now is that I want to redirect my own request (based on IP) to 1 specific server. Optimally this would be: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server and ( goto_server1 or goto_server2 ) use_backend Server1 if goto_server1 or goto_server2 This would redirect my own IP still to the errors if the server is not available. Thanks! Hi Steven, There is not way to use parenthesis in HAProxy when writing rules. I would split content switching related to your TestServer in two: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_server1 use_backend TestServer if goto_test_server goto_server2 use_backend Server1 if goto_server1 || goto_server2 Or you can specify a dedicated acl for test server urls: acl goto_server1 hdr_beg(host) -i abc. acl goto_server2 hdr_beg(host) -i def. acl goto_servertest_url hdr_beg(host) -i abc. def. alc goto_test_server src 1.2.3.4 use_backend TestServer if goto_test_server goto_servertest_url use_backend Server1 if goto_server1 || goto_server2 Baptiste Please, stop top posting! You can add as many entries as you want. You can even load entries from a file, that way, it will be even more human readable :) Baptiste
Questions about TCP NO DELAY and nbproc
Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon
Re: HTTPS Redirects to HTTP
On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste
Re: Questions about TCP NO DELAY and nbproc
On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon Hi Jon, Please post at least your HAProxy version, how you built/installed it, etc... configuration, logs showing the errors are welcome too. Note that HAProxy is not supposed to generate any 500 errors (only 502, 503, 504) Baptiste
Re: Questions about TCP NO DELAY and nbproc
Hi Baptiste, I'm sorry, I should clarify, I meant 504. It's really quite prevalent, at least 4/10 at times, sometimes 8/10... I'm using: HA-Proxy version 1.4.24 2013/06/17 This is more or less the way the entirety of the configuration is: global user nobody group nobody daemon nbproc 4 maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option tcplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 120ms # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Basically haproxy balances a set of those bidder backends from port 9010 to 9080... Does that clarify things? On Wed, May 28, 2014 at 9:40 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon Hi Jon, Please post at least your HAProxy version, how you built/installed it, etc... configuration, logs showing the errors are welcome too. Note that HAProxy is not supposed to generate any 500 errors (only 502, 503, 504) Baptiste
Re: HTTPS Redirects to HTTP
Baptiste, Thanks for the heads up. Just to make sure I understand, you're saying that my balanced application server, in this case a tomcat pair, needs to account for the header modification and it does not appear that it is currently doing that? If so, thanks for your help, I can take that to my developers! On Wed, May 28, 2014 at 8:45 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste
Re: HTTPS Redirects to HTTP
On Wed, May 28, 2014 at 3:57 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for the heads up. Just to make sure I understand, you're saying that my balanced application server, in this case a tomcat pair, needs to account for the header modification and it does not appear that it is currently doing that? If so, thanks for your help, I can take that to my developers! On Wed, May 28, 2014 at 8:45 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste Yes, this is what I meant. Your application should read this header and write the redirect in consequence. The first 302 response should be https://haproxy/login.jsp;. Or you could use HAProxy to rewrite it on the fly, but it's a dirty workaround. Baptiste
Re: HTTPS Redirects to HTTP
Baptiste, Thanks for your help again. How would you recommend rewriting with HAProxy to do that on the fly? If you've got something that should work that's already written, that's easier than me trying to piece things together from different sources. On Wed, May 28, 2014 at 9:00 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:57 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for the heads up. Just to make sure I understand, you're saying that my balanced application server, in this case a tomcat pair, needs to account for the header modification and it does not appear that it is currently doing that? If so, thanks for your help, I can take that to my developers! On Wed, May 28, 2014 at 8:45 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste Yes, this is what I meant. Your application should read this header and write the redirect in consequence. The first 302 response should be https://haproxy/login.jsp;. Or you could use HAProxy to rewrite it on the fly, but it's a dirty workaround. Baptiste
Re: Questions about TCP NO DELAY and nbproc
On Wed, May 28, 2014 at 3:56 PM, Jon Bogaty j...@magnetic.com wrote: Hi Baptiste, I'm sorry, I should clarify, I meant 504. It's really quite prevalent, at least 4/10 at times, sometimes 8/10... I'm using: HA-Proxy version 1.4.24 2013/06/17 This is more or less the way the entirety of the configuration is: global user nobody group nobody daemon nbproc 4 maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option tcplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 120ms # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Basically haproxy balances a set of those bidder backends from port 9010 to 9080... Does that clarify things? On Wed, May 28, 2014 at 9:40 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon Hi Jon, Please post at least your HAProxy version, how you built/installed it, etc... configuration, logs showing the errors are welcome too. Note that HAProxy is not supposed to generate any 500 errors (only 502, 503, 504) Baptiste Could you please turn on option httplog and remove the tcplog option? 504 means the server did not answer fast enough (longuer than the timeout server). Just increase the timeout server a bit and see what happens. We usually set it up to a few seconds (less than 20). Baptiste
Re: HTTPS Redirects to HTTP
On Wed, May 28, 2014 at 4:02 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for your help again. How would you recommend rewriting with HAProxy to do that on the fly? If you've got something that should work that's already written, that's easier than me trying to piece things together from different sources. On Wed, May 28, 2014 at 9:00 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:57 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for the heads up. Just to make sure I understand, you're saying that my balanced application server, in this case a tomcat pair, needs to account for the header modification and it does not appear that it is currently doing that? If so, thanks for your help, I can take that to my developers! On Wed, May 28, 2014 at 8:45 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste Yes, this is what I meant. Your application should read this header and write the redirect in consequence. The first 302 response should be https://haproxy/login.jsp;. Or you could use HAProxy to rewrite it on the fly, but it's a dirty workaround. Baptiste Look for rspirep in the documentation there is an example about the Location header. Baptiste
Re: HAProxy Hang during initial connection
Hi John, On Tue, May 27, 2014 at 08:08:27PM +, JDzialo John wrote: Hi Willy, Here is a capture of all traffic btwn the two servers using the host option. Thank you. Basically traffic goes from haproxy to a web farm in a round robin fashion. These individual web servers are accessing a single file server, (we are in the process of splitting this file server into multiple servers to spread the load). This one file server is getting slammed all day with requests and that may be the root of the problem but have not found the smoking gun necessary to prove it. Should I also get a capture btwn our web farm and the file server? I have also cc'd our network administrator to help come down to a solution. Let me know what you think and if there is anything else I can provide. As always thank you so much for your help. You have been a great help to me in narrowing down issues. I spent some time reading the captures and found nothing abnormal in them. Do you have any indication of a faulty session or request ? Also I noticed that you took the captures on the server itself and that the server has TSO enabled since we're seeing large frames. It would be possible that there's a bug in the network driver or NIC causing some frames to be lost for example. Maybe the same trace taken on the haproxy server at the same time would reveal some extra information. Note, you don't need to post the whole file to the list, there are about 800 people who are probably not interested in receiving this 6MB file :-) Either you can put it on a public server, or you can simply send it privately to me. Thanks, Willy
Re: HTTPS Redirects to HTTP
Sounds good, thanks! On Wed, May 28, 2014 at 9:05 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 4:02 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for your help again. How would you recommend rewriting with HAProxy to do that on the fly? If you've got something that should work that's already written, that's easier than me trying to piece things together from different sources. On Wed, May 28, 2014 at 9:00 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:57 PM, Souda Burger soudabur...@gmail.com wrote: Baptiste, Thanks for the heads up. Just to make sure I understand, you're saying that my balanced application server, in this case a tomcat pair, needs to account for the header modification and it does not appear that it is currently doing that? If so, thanks for your help, I can take that to my developers! On Wed, May 28, 2014 at 8:45 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:00 PM, Souda Burger soudabur...@gmail.com wrote: I have an haproxy server set up with a compiled 1.5-dev25 version of HaProxy. I am needing SSL and since SSL isn't available in 1.4, I compiled 1.5. I have everything working, but I noticed something peculiar and wasn't sure if this was expected behavior or not. Below is my SSL haproxy.cfg file along with the wget that I performed against my websserver. It appears to initially redirect HTTPS to HTTP which then rewrites the connection back to HTTPS. Again, is this expected behavior or is something in my config incorrect? Thanks! global daemon log 127.0.0.1 local2 maxconn 4096 user haproxy group haproxy chroot /var/chroot/haproxy defaults log global mode http retries 3 option httplog option dontlognull option redispatch timeout server 5 timeout client 5 timeout connect 5000 frontend http_in bind *:80 default_backend portalbackend frontend https_in reqadd X-Forwarded-Proto:\ https bind *:443 ssl crt /etc/haproxy/haproxy.crt default_backend portalbackend backend portalbackend balance leastconn redirect scheme https if !{ ssl_fc } option httpchk GET /login.jsp option forwardfor option http-server-close server node1 ip1:8080 check inter 5000 server node2 ip2:8080 check inter 5000 07:53:18 ~$ wget https://haproxy --no-check-certificate --2014-05-28 07:59:55-- https://haproxy/ Resolving haproxy... 192.168.8.213 Connecting to haproxy|192.168.8.213|:443... connected. WARNING: cannot verify haproxy's certificate, issued by '/CN=www.exceliance.fr': Self-signed certificate encountered. WARNING: certificate common name 'www.exceliance.fr' doesn't match requested host name 'haproxy'. HTTP request sent, awaiting response... 302 Found Location: http://haproxy/login.jsp [following] --2014-05-28 07:59:55-- http://haproxy/login.jsp Connecting to haproxy|192.168.8.213|:80... connected. HTTP request sent, awaiting response... 302 Found Location: https://haproxy/login.jsp [following] --2014-05-28 07:59:55-- https://haproxy/login.jsp Reusing existing connection to haproxy:443. HTTP request sent, awaiting response... 200 OK Length: 7327 (7.2K) [text/html] Saving to: 'index.html.1' 100%[=] 7,327 --.-K/s in 0s 2014-05-28 07:59:55 (81.3 MB/s) - 'index.html.1' saved [7327/7327] Hi Souda, The first 302 seems to be sent by your application server which does not seems to take into account you X-Forwarded-Proto header. Baptiste Yes, this is what I meant. Your application should read this header and write the redirect in consequence. The first 302 response should be https://haproxy/login.jsp;. Or you could use HAProxy to rewrite it on the fly, but it's a dirty workaround. Baptiste Look for rspirep in the documentation there is an example about the Location header. Baptiste
Re: Questions about TCP NO DELAY and nbproc
Brilliant Baptiste, thank you. I've setup proper logging and a longer timeout: global user nobody group nobody daemon nbproc 4 maxconn 204800 log /dev/log local0 info log /dev/log local0 notice tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option httplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 15s # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger I'm down to 1 or 2 504s in a 1000... It's weird though, doesn't seem to be making a difference whether I go to 10s or 15, still got these last one or two pesky 504s. Anything else I could be missing? On Wed, May 28, 2014 at 10:04 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:56 PM, Jon Bogaty j...@magnetic.com wrote: Hi Baptiste, I'm sorry, I should clarify, I meant 504. It's really quite prevalent, at least 4/10 at times, sometimes 8/10... I'm using: HA-Proxy version 1.4.24 2013/06/17 This is more or less the way the entirety of the configuration is: global user nobody group nobody daemon nbproc 4 maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option tcplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 120ms # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Basically haproxy balances a set of those bidder backends from port 9010 to 9080... Does that clarify things? On Wed, May 28, 2014 at 9:40 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon Hi Jon, Please post at least your HAProxy version, how you built/installed it, etc... configuration, logs showing the errors are welcome too. Note that HAProxy is not supposed to generate any 500 errors (only 502, 503, 504) Baptiste Could
Re: Questions about TCP NO DELAY and nbproc
On Wed, May 28, 2014 at 4:47 PM, Jon Bogaty j...@magnetic.com wrote: Brilliant Baptiste, thank you. I've setup proper logging and a longer timeout: global user nobody group nobody daemon nbproc 4 maxconn 204800 log /dev/log local0 info log /dev/log local0 notice tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option httplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 15s # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger I'm down to 1 or 2 504s in a 1000... It's weird though, doesn't seem to be making a difference whether I go to 10s or 15, still got these last one or two pesky 504s. Anything else I could be missing? On Wed, May 28, 2014 at 10:04 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:56 PM, Jon Bogaty j...@magnetic.com wrote: Hi Baptiste, I'm sorry, I should clarify, I meant 504. It's really quite prevalent, at least 4/10 at times, sometimes 8/10... I'm using: HA-Proxy version 1.4.24 2013/06/17 This is more or less the way the entirety of the configuration is: global user nobody group nobody daemon nbproc 4 maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option tcplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 120ms # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Basically haproxy balances a set of those bidder backends from port 9010 to 9080... Does that clarify things? On Wed, May 28, 2014 at 9:40 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's something incorrectly tuned that's causing connection problems. Any help would be greatly appreciated. Thanks! Jon Hi Jon, Please post at least your HAProxy version, how you built/installed it, etc... configuration, logs showing the errors are welcome
Re: Questions about TCP NO DELAY and nbproc
Thanks for all your help Baptiste, I'll keep poking. :) On Wed, May 28, 2014 at 11:02 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 4:47 PM, Jon Bogaty j...@magnetic.com wrote: Brilliant Baptiste, thank you. I've setup proper logging and a longer timeout: global user nobody group nobody daemon nbproc 4 maxconn 204800 log /dev/log local0 info log /dev/log local0 notice tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option httplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 15s # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger I'm down to 1 or 2 504s in a 1000... It's weird though, doesn't seem to be making a difference whether I go to 10s or 15, still got these last one or two pesky 504s. Anything else I could be missing? On Wed, May 28, 2014 at 10:04 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:56 PM, Jon Bogaty j...@magnetic.com wrote: Hi Baptiste, I'm sorry, I should clarify, I meant 504. It's really quite prevalent, at least 4/10 at times, sometimes 8/10... I'm using: HA-Proxy version 1.4.24 2013/06/17 This is more or less the way the entirety of the configuration is: global user nobody group nobody daemon nbproc 4 maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults log global option tcplog option dontlognull mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 120ms# how long a request can wait for a backend before 503ing timeout server 120ms # how long to wait for response from backend before 503ing timeout client 6ms# how long to wait for data from clients (exchanges) timeout http-keep-alive 6ms # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Basically haproxy balances a set of those bidder backends from port 9010 to 9080... Does that clarify things? On Wed, May 28, 2014 at 9:40 AM, Baptiste bed...@gmail.com wrote: On Wed, May 28, 2014 at 3:31 PM, Jon Bogaty j...@magnetic.com wrote: Hi, I have two questions... I am having a lot of problems with 500 errors from haproxy and I am wondering if these could be two culprits: Is there an equivalent method for disabling Nagle Algorithm in TCP mode? I've looked everywhere and it seems that TCP NO DELAY is not a flag within haproxy. Only http mode seems to include the option. Could nbproc possibly have a negative effect as opposed to a beneficial one? Is it possible that by setting nbproc to four we're actually creating problems with scalability and with the number of concurrent working connections? I can post pieces of my haproxy.cfg if it helps explain how I'm building out the load balancing. I feel like somewhere in my config there's
[ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
Hi all, So with the completed agent-check updates and the last batch of merged fixes, I think we're ready. I'm emitting dev26 so that everyone can test and report any unexpected annoyance and regressions before we issue -final, and in an attempt to avoid 1.5.1 the same day as 1.5.0. The changes in this version are quite small. Conrad Hoffmann fixed a possible CPU loop when using the systemd wrapper with multiple processes when reloading. Sasha Pachev fixed a possible buffer overflow when using regex which add more than the reserved space to the request or response (very unlikely so low risk but definitely a bug). Thierry Fournier ensured that the str match is used by default for ACLs built from string samples. Olivier Burgard added time-frame filtering to halog. Dirkjan Bussink fixed some minor memory leaks on the error path. Cyril Bonté cleaned up some minor issues on the stats page. And I updated the agent-check to support multiple actions in response, and added a few stats for SSL key generations and SSL cache lookups/misses. Concerning the agent check updates, the agent can now act on these 3 dimensions : - weight (eg: based on CPU load) - service operational status up/down (complementary health checks) - service administrative status (ready/drain/maint) Now it's possible to easily write an agent which only acts on these individual statuses without acting on the other ones if needed. Also all statuses can be forced on the CLI and stats page, and health/agent checks can be stopped. I think that by -final we'll include Rémi's work on making the dhparam size configurable, and Dmitry's proposal to improve the retry/redispatch behaviour to speed up the switching to another server whenever possible. Let's say that without any remaining issue to work on during the upcoming two weeks, we'll release. Feedback welcome as usual, Willy --- Usual links below : Site index : http://haproxy.1wt.eu/ Sources : http://haproxy.1wt.eu/download/1.5/src/devel/ Changelog: http://haproxy.1wt.eu/download/1.5/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.com/haproxy-dconv/configuration-1.5.html And the changelog : 2014/05/28 : 1.5-dev26 - BUG/MEDIUM: polling: fix possible CPU hogging of worker processes after receiving SIGUSR1. - BUG/MINOR: stats: fix a typo on a closing tag for a server tracking another one - OPTIM: stats: avoid the calculation of a useless link on tracking servers in maintenance - MINOR: fix a few memory usage errors - CONTRIB: halog: Filter input lines by date and time through timestamp - MINOR: ssl: SSL_CTX_set_options() and SSL_CTX_set_mode() take a long, not an int - BUG/MEDIUM: regex: fix risk of buffer overrun in exp_replace() - MINOR: acl: set str as default match for strings - DOC: Add some precisions about acl default matching method - MEDIUM: acl: strenghten the option parser to report invalid options - BUG/MEDIUM: config: a stats-less config crashes in 1.5-dev25 - BUG/MINOR: checks: tcp-check must not stop on '\0' for binary checks - MINOR: stats: improve alignment of color codes to save one line of header - MINOR: checks: simplify and improve reporting of state changes when using log-health-checks - MINOR: server: remove the SRV_DRAIN flag which can always be deduced - MINOR: server: use functions to detect state changes and to update them - MINOR: server: create srv_was_usable() from srv_is_usable() and use a pointer - BUG/MINOR: stats: do not report 100% in the thottle column when server is draining - BUG/MAJOR: config: don't free valid regex memory - BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called - BUG/MINOR: stats: tracking servers may incorrectly report an inherited DRAIN status - MEDIUM: proxy: make timeout parser a bit stricter - REORG/MEDIUM: server: split server state and flags in two different variables - REORG/MEDIUM: server: move the maintenance bits out of the server state - MAJOR: server: use states instead of flags to store the server state - REORG: checks: put the functions in the appropriate files ! - MEDIUM: server: properly support and propagate the maintenance status - MEDIUM: server: allow multi-level server tracking - CLEANUP: checks: rename the server_status_printf function - MEDIUM: checks: simplify server up/down/nolb transitions - MAJOR: checks: move health checks changes to set_server_check_status() - MINOR: server: make the status reporting function support a reason - MINOR: checks: simplify health check reporting functions - MINOR: server: implement srv_set_stopped() - MINOR: server: implement srv_set_running() - MINOR: server: implement srv_set_stopping() - MEDIUM: checks: simplify failure notification using srv_set_stopped() - MEDIUM: checks: simplify success notification using srv_set_running() - MEDIUM:
100% CPU usage
Hey, I had asked earlier about fixing problems with 504 errors by increasing timeouts, which helped a great deal. The problem is CPU usage is up to as high as 100% very frequently, which is worrying me. Is it possible that something else needs to scaled down with the increase to the queue and server timeouts? global user nobody group nobody daemon maxconn 204800 tune.bufsize 16384 # 16k tune.rcvbuf.server 141312 # 128k defaults mode http backlog 32768 maxconn 204800 timeout connect 120ms # how long to try to connect to a backend timeout queue 10s # how long a request can wait for a backend before 503ing timeout server 15s# how long to wait for response from backend before 503ing timeout client 60s# how long to wait for data from clients (exchanges) timeout http-keep-alive 60s # how long to keep keepalive sessions when inactive option abortonclose no option forceclose option http-no-delay option nolinger frontend openx bind *:9010 default_backend bidder9010 backend bidder9010 balance roundrobin server bid001 10.1.1.50:9010 weight 1 maxconn 51200 check server bid002 10.1.1.112:9010 weight 1 maxconn 51200 check server bid003 10.1.1.113:9010 weight 1 maxconn 51200 check server bid004 10.1.1.114:9010 weight 1 maxconn 51200 check server bid005 10.1.1.115:9010 weight 1 maxconn 51200 check server bid006 10.1.1.116:9010 weight 1 maxconn 51200 check server bid007 10.1.1.117:9010 weight 1 maxconn 51200 check server bid008 10.1.1.118:9010 weight 1 maxconn 51200 check server bid009 10.1.1.119:9010 weight 1 maxconn 51200 check server bid010 10.1.1.120:9010 weight 1 maxconn 51200 check server bid011 10.1.1.127:9010 weight 1 maxconn 51200 check server bid012 10.1.1.128:9010 weight 1 maxconn 51200 check server bid013 10.1.1.126:9010 weight 1 maxconn 51200 check server bid014 10.1.1.203:9010 weight 1 maxconn 51200 check server bid015 10.1.1.204:9010 weight 1 maxconn 51200 check server bid016 10.1.1.205:9010 weight 1 maxconn 51200 check Thanks so much for all your help folks. The only thing I can think is now with far fewer connections rejected maybe haproxy is becoming more overloaded?
Re: 100% CPU usage
2014-05-28 18:56 GMT+02:00 Jon Bogaty j...@magnetic.com: Hey, I had asked earlier about fixing problems with 504 errors by increasing timeouts, which helped a great deal. The problem is CPU usage is up to as high as 100% very frequently, which is worrying me. Is it possible that something else needs to scaled down with the increase to the queue and server timeouts? What is the hitrate reached when hitting 100% CPU ? Depending on various factors, you can expect HAProxy to scale up to 10k requests /sec on a single core. Olivier
RE: 100% CPU usage
Hi, Hey, I had asked earlier about fixing problems with 504 errors by increasing timeouts, which helped a great deal. The problem is CPU usage is up to as high as 100% very frequently, which is worrying me. Haproxy (userspace) or system (kernel)? Does haproxy stop responding to requests or is it still forwarding traffic? There are some important fixes in 1.4.25, I would suggest an upgrade although there is no guarantee that it will fix the problem. timeout connect 120ms The suggestion is to configure slightly more than 3 seconds for this. option abortonclose no option forceclose option http-no-delay option nolinger Do you really need those? This are some very specific parameters that are usually not required. What I would like to say is: there is a reason they are not configured this way by default, are you fully aware of the consequences of those parameters? If one of the parameters triggers somehow a bug, it needs to be fixed, but a sub-optimal configuration can easily lead to increased load. Regards, Lukas
Re: 100% CPU usage
Yeah I was taking a look through the stats after the question Olivier asked and it doesn't seem like haproxy is actually going anywhere near enough sessions to produce 100% cpu... It's in userspace running as nobody and does still seem to respond, it's just odd. https://www.dropbox.com/s/i95ucdj8am18ut1/haproxy%20stats.tiff I don't seem to be approaching any sort of threshold. The specific options were set because of the peculiar nature of the traffic I need to balance. We basically have connections that live for hours or more, with many very fast requests. There are some questions about whether we need abortonclose or why we disabled forceclose. Could they be not what we need here? On Wed, May 28, 2014 at 1:24 PM, Lukas Tribus luky...@hotmail.com wrote: Hi, Hey, I had asked earlier about fixing problems with 504 errors by increasing timeouts, which helped a great deal. The problem is CPU usage is up to as high as 100% very frequently, which is worrying me. Haproxy (userspace) or system (kernel)? Does haproxy stop responding to requests or is it still forwarding traffic? There are some important fixes in 1.4.25, I would suggest an upgrade although there is no guarantee that it will fix the problem. timeout connect 120ms The suggestion is to configure slightly more than 3 seconds for this. option abortonclose no option forceclose option http-no-delay option nolinger Do you really need those? This are some very specific parameters that are usually not required. What I would like to say is: there is a reason they are not configured this way by default, are you fully aware of the consequences of those parameters? If one of the parameters triggers somehow a bug, it needs to be fixed, but a sub-optimal configuration can easily lead to increased load. Regards, Lukas
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
❦ 28 mai 2014 18:11 +0200, Willy Tarreau w...@1wt.eu : Feedback welcome as usual, When compiling with -Werror=format-security (which is a common settings on a Debian-based distribution), we get: src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). It leads me to the next chunk of code: chunk_appendf(trash, srv_hlt_st[state], (ref-state != SRV_ST_STOPPED) ? (ref-check.health - ref-check.rise + 1) : (ref-check.health), (ref-state != SRV_ST_STOPPED) ? (ref-check.fall) : (ref-check.rise)); Not all members of srv_hlt_st have %s/%s. I cannot say for sure how chunk_appendf work. Is that the caller or the callee that clean up? I suppose that because of ..., this is automatically the caller so the additional arguments are harmless. -- panic(esp: what could it be... I wonder...); 2.2.16 /usr/src/linux/drivers/scsi/esp.c
Re: Rewrite domain.com to other domain.com/dir/subdir
On Wed, May 28, 2014 at 2:49 AM, Matt . yamakasi@gmail.com wrote: I'm still struggeling here and also looking at Varnish if it can accomplish it. What have you tried and what part of that is not working as you expect? I think HA proxy is the way as I also use it for normal loadbalancing, but this is another chapter for sure... Any help is welcome! -Bryan
Re: Rewrite domain.com to other domain.com/dir/subdir
The normal redirect is working but convirt it to a rewrite is where I'm stuck. Should I use an ACL upfront that looks in the map and do an if on that or is the ACL not needed at all ? As I was busy too look how Varnish can accomplish this (using a MySQL Database) I need to check this again, but I know I was stuck at that part already because of the various examples that do the same rewrites in a different way. 2014-05-28 20:50 GMT+02:00 Bryan Talbot bryan.tal...@playnext.com: On Wed, May 28, 2014 at 2:49 AM, Matt . yamakasi@gmail.com wrote: I'm still struggeling here and also looking at Varnish if it can accomplish it. What have you tried and what part of that is not working as you expect? I think HA proxy is the way as I also use it for normal loadbalancing, but this is another chapter for sure... Any help is welcome! -Bryan
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
On Wed, May 28, 2014 at 08:43:10PM +0200, Vincent Bernat wrote: ❦ 28 mai 2014 18:11 +0200, Willy Tarreau w...@1wt.eu : Feedback welcome as usual, When compiling with -Werror=format-security (which is a common settings on a Debian-based distribution), we get: src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ I'm getting the same error when building against Fedora rawhide. Ryan srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). It leads me to the next chunk of code: chunk_appendf(trash, srv_hlt_st[state], (ref-state != SRV_ST_STOPPED) ? (ref-check.health - ref-check.rise + 1) : (ref-check.health), (ref-state != SRV_ST_STOPPED) ? (ref-check.fall) : (ref-check.rise)); Not all members of srv_hlt_st have %s/%s. I cannot say for sure how chunk_appendf work. Is that the caller or the callee that clean up? I suppose that because of ..., this is automatically the caller so the additional arguments are harmless. -- panic(esp: what could it be... I wonder...); 2.2.16 /usr/src/linux/drivers/scsi/esp.c
RE: HAProxy Hang during initial connection
Hi Willy Thanks, I'll send future traces to you directly. I understand the hatred of bulky email files! So I think I found the problem but would love your take on it. Our web applications and services in our haproxy backend are using keepalive in their connection headers. I understand in haproxy v1.4 keepalives are ok from the client side but not from the server side, correct? So I added the option to http-server-close on our haproxy web service server and it appears to have stopped this random half loaded data stream issue. Can you explain how having keepalive coming from the server side application connection headers could cause this issue? Could you give a brief description of what happens when haproxy receives a keepalive header but does not have http-server-close option set? Thanks as always for your help! -Original Message- From: Willy Tarreau [mailto:w...@1wt.eu] Sent: Wednesday, May 28, 2014 10:15 AM To: JDzialo John Cc: haproxy@formilux.org; AZabrecky Allan Subject: Re: HAProxy Hang during initial connection Hi John, On Tue, May 27, 2014 at 08:08:27PM +, JDzialo John wrote: Hi Willy, Here is a capture of all traffic btwn the two servers using the host option. Thank you. Basically traffic goes from haproxy to a web farm in a round robin fashion. These individual web servers are accessing a single file server, (we are in the process of splitting this file server into multiple servers to spread the load). This one file server is getting slammed all day with requests and that may be the root of the problem but have not found the smoking gun necessary to prove it. Should I also get a capture btwn our web farm and the file server? I have also cc'd our network administrator to help come down to a solution. Let me know what you think and if there is anything else I can provide. As always thank you so much for your help. You have been a great help to me in narrowing down issues. I spent some time reading the captures and found nothing abnormal in them. Do you have any indication of a faulty session or request ? Also I noticed that you took the captures on the server itself and that the server has TSO enabled since we're seeing large frames. It would be possible that there's a bug in the network driver or NIC causing some frames to be lost for example. Maybe the same trace taken on the haproxy server at the same time would reveal some extra information. Note, you don't need to post the whole file to the list, there are about 800 people who are probably not interested in receiving this 6MB file :-) Either you can put it on a public server, or you can simply send it privately to me. Thanks, Willy
Issue with ssl_c_sha1
Hi, I am trying to extract the sha1 hash of the client certificate and to pass it to the backend server. My configuration has this line: http-request set-header X-SSL-Client-SHA1 %{+Q}[ssl_c_sha1] However, this does not seem to produce a string of the form aabbcc... as the examples I've seen on the web. Instead, it appears to write the raw sha1 hash bytes. The downstream server, node.js, appears to treat these value as utf8 strings. This is the version I am running: ./haproxy --version HA-Proxy version 1.5-dev25-a339395 2014/05/10 Copyright 2000-2014 Willy Tarreau w...@1wt.eu What am I doing wrong? Ideally I would like to get the sha1 hash as a hex string. Thanks, -aydan
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
Hi Vincent, On Wed, May 28, 2014 at 08:43:10PM +0200, Vincent Bernat wrote: ??? 28 mai 2014 18:11 +0200, Willy Tarreau w...@1wt.eu : Feedback welcome as usual, When compiling with -Werror=format-security (which is a common settings on a Debian-based distribution), we get: src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). Huh, no, here it's DOWN (agent). We don't even have %s, the only possible arguments are %d. Could you please double-check ? Maybe you had local changes, I don't know, but I'm a bit confused. It leads me to the next chunk of code: chunk_appendf(trash, srv_hlt_st[state], (ref-state != SRV_ST_STOPPED) ? (ref-check.health - ref-check.rise + 1) : (ref-check.health), (ref-state != SRV_ST_STOPPED) ? (ref-check.fall) : (ref-check.rise)); Not all members of srv_hlt_st have %s/%s. I cannot say for sure how chunk_appendf work. Is that the caller or the callee that clean up? I suppose that because of ..., this is automatically the caller so the additional arguments are harmless. They're %d/%d, not %s/%s. The extra args are ignored when not used by the format string, just like printf does. In fact, chunk_appendf() does nothing special, it just uses vsnprintf(), which itself only parses the format arguments and depile them from the stack when needed. Hoping this helps, Willy
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
❦ 28 mai 2014 22:59 +0200, Willy Tarreau w...@1wt.eu : When compiling with -Werror=format-security (which is a common settings on a Debian-based distribution), we get: src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). Huh, no, here it's DOWN (agent). We don't even have %s, the only possible arguments are %d. Could you please double-check ? Maybe you had local changes, I don't know, but I'm a bit confused. You are right, I was looking at the wrong place in dumpstats.c. So, no bug, but the compiler is still not happy. What about providing an additional argument to chunk_appendf to let know that this is handled correctly? -- /* Identify the flock of penguins. */ 2.2.16 /usr/src/linux/arch/alpha/kernel/setup.c
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
On Wed, May 28, 2014 at 11:04:45PM +0200, Vincent Bernat wrote: ??? 28 mai 2014 22:59 +0200, Willy Tarreau w...@1wt.eu : When compiling with -Werror=format-security (which is a common settings on a Debian-based distribution), we get: src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). Huh, no, here it's DOWN (agent). We don't even have %s, the only possible arguments are %d. Could you please double-check ? Maybe you had local changes, I don't know, but I'm a bit confused. You are right, I was looking at the wrong place in dumpstats.c. So, no bug, but the compiler is still not happy. What about providing an additional argument to chunk_appendf to let know that this is handled correctly? I'm really not fond of adding bugs on purpose to hide compiler bugs, because they tend to be fixed by the casual reader the worst possible way... We've had our lot of gcc workarounds already and each time it ended up in a spiral. I just tried here on 4.7 with the same flag and got the same result. I tried to force const in addition to static on the types declaration and it still fails, so we're clearly in front of a compiler bug. Not a big one, but an invalid check (or an excessive use I don't know). Indeed, there's absolutely nothing wrong about writing : const char *hello = hello world\n; printf(hello); And when hello is a const, there's no risk that it will be modified at runtime, so basically the check is wrong here if it does not check the real definition of the static element. Do you have an idea how this strange check is dealt with in other programs usually if debian always uses that flag ? Willy
Re: HAProxy Hang during initial connection
Hi John, On Wed, May 28, 2014 at 07:54:20PM +, JDzialo John wrote: Hi Willy Thanks, I'll send future traces to you directly. I understand the hatred of bulky email files! So I think I found the problem but would love your take on it. Our web applications and services in our haproxy backend are using keepalive in their connection headers. I understand in haproxy v1.4 keepalives are ok from the client side but not from the server side, correct? So I added the option to http-server-close on our haproxy web service server and it appears to have stopped this random half loaded data stream issue. Can you explain how having keepalive coming from the server side application connection headers could cause this issue? Could you give a brief description of what happens when haproxy receives a keepalive header but does not have http-server-close option set? OK so the client is not a browser, right ? When you're using option httpclose only, haproxy just modifies the headers to add close to the request and to the response, but does not perform any active close. I observed in the distant past (8 years ago) that some servers would not honnor the close and would expect the client to close after they get the complete response. Then I added option forceclose to close the server-side connection once the server starts to respond. By now we have a much more complete message parser which knows where the end is and which actively closes the connections as soon as you enable http-server-close or http-keep-alive (which is not available on 1.4). So what you found now is that your server ignores the close and expects the client to close, while the client expects the same from the server. Additionally it's possible that your client uses excess buffering and does not get the whole response until it gets the real close (possibly the one caused by haproxy's timeout). Usually these situations are detectable in haproxy's logs because you get very large total transfer times for a small server response time, and you notice that all transfer times are very close to the client or server timeout. Then you know that the connection remained open for that amount of time. Anyway, please use http-server-close or forceclose, it will do what you need by enforcing the close. I hope you get a clearer picture now. Regards, Willy
Re: Rewrite domain.com to other domain.com/dir/subdir
On Wed, May 28, 2014 at 11:57 AM, Matt . yamakasi@gmail.com wrote: The normal redirect is working but convirt it to a rewrite is where I'm stuck. Should I use an ACL upfront that looks in the map and do an if on that or is the ACL not needed at all ? The example in the reqirep section of the documentation seems to mostly do what you're asking. http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#reqirep Does that not work? This will rewrite foo.com/baz.jpg - newdomain.com/com/foo/baz.jpg reqirep ^Host:\ foo.com Host:\ newdomain.com reqirep ^GET\ /(.*) GET\ /com/foo/\1 -Bryan
Re: Rewrite domain.com to other domain.com/dir/subdir
Hi Bryan, Yes I cam up to that part, but about the search in the map, do I need to do this twice ? 2014-05-28 23:28 GMT+02:00 Bryan Talbot bryan.tal...@playnext.com: On Wed, May 28, 2014 at 11:57 AM, Matt . yamakasi@gmail.com wrote: The normal redirect is working but convirt it to a rewrite is where I'm stuck. Should I use an ACL upfront that looks in the map and do an if on that or is the ACL not needed at all ? The example in the reqirep section of the documentation seems to mostly do what you're asking. http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#reqirep Does that not work? This will rewrite foo.com/baz.jpg - newdomain.com/com/foo/baz.jpg reqirep ^Host:\ foo.com Host:\ newdomain.com reqirep ^GET\ /(.*) GET\ /com/foo/\1 -Bryan
Re: Issue with ssl_c_sha1
Hi, On Wed, May 28, 2014 at 08:47:11PM +, Yumerefendi, Aydan wrote: Hi, I am trying to extract the sha1 hash of the client certificate and to pass it to the backend server. My configuration has this line: http-request set-header X-SSL-Client-SHA1 %{+Q}[ssl_c_sha1] However, this does not seem to produce a string of the form aabbcc... as the examples I've seen on the web. Instead, it appears to write the raw sha1 hash bytes. The downstream server, node.js, appears to treat these value as utf8 strings. Indeed, the doc says it's binary, so if you want it in hex, you just need to chain the hex converter : http-request set-header X-SSL-Client-SHA1 %{+Q}[ssl_c_sha1,hex] The binary form is more suited to stick tables for example as it takes half of the space. Do you think we could improve the doc one way or another to make this easier to find ? Maybe with more examples ? Do not hesitate to suggest adaptations or even patches! Regards, Willy
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
❦ 28 mai 2014 23:16 +0200, Willy Tarreau w...@1wt.eu : src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). Huh, no, here it's DOWN (agent). We don't even have %s, the only possible arguments are %d. Could you please double-check ? Maybe you had local changes, I don't know, but I'm a bit confused. You are right, I was looking at the wrong place in dumpstats.c. So, no bug, but the compiler is still not happy. What about providing an additional argument to chunk_appendf to let know that this is handled correctly? I'm really not fond of adding bugs on purpose to hide compiler bugs, because they tend to be fixed by the casual reader the worst possible way... We've had our lot of gcc workarounds already and each time it ended up in a spiral. I just tried here on 4.7 with the same flag and got the same result. I tried to force const in addition to static on the types declaration and it still fails, so we're clearly in front of a compiler bug. Not a big one, but an invalid check (or an excessive use I don't know). Indeed, there's absolutely nothing wrong about writing : const char *hello = hello world\n; printf(hello); And when hello is a const, there's no risk that it will be modified at runtime, so basically the check is wrong here if it does not check the real definition of the static element. Do you have an idea how this strange check is dealt with in other programs usually if debian always uses that flag ? Usually, printf(%s, blah) (which is far better than my first proposed solution). const char * hello means hello is a pointer to a const char. You may want to say const char * const hello. But gcc doesn't seem to handle it either (but clang does). -- panic(sun_82072_fd_inb: How did I get here?); 2.2.16 /usr/src/linux/include/asm-sparc/floppy.h
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
On Thu, May 29, 2014 at 12:28:41AM +0200, Vincent Bernat wrote: ??? 28 mai 2014 23:16 +0200, Willy Tarreau w...@1wt.eu : src/dumpstats.c:3059:4: error: format not a string literal and no format arguments [-Werror=format-security] chunk_appendf(trash, srv_hlt_st[1]); /* DOWN (agent) */ ^ srv_hlt_st[1] is DOWN %s/%s, so this is not even a false positive. I suppose this should be srv_hlt_st[0] but then it's better to just write DOWN (since it avoids the warning). Huh, no, here it's DOWN (agent). We don't even have %s, the only possible arguments are %d. Could you please double-check ? Maybe you had local changes, I don't know, but I'm a bit confused. You are right, I was looking at the wrong place in dumpstats.c. So, no bug, but the compiler is still not happy. What about providing an additional argument to chunk_appendf to let know that this is handled correctly? I'm really not fond of adding bugs on purpose to hide compiler bugs, because they tend to be fixed by the casual reader the worst possible way... We've had our lot of gcc workarounds already and each time it ended up in a spiral. I just tried here on 4.7 with the same flag and got the same result. I tried to force const in addition to static on the types declaration and it still fails, so we're clearly in front of a compiler bug. Not a big one, but an invalid check (or an excessive use I don't know). Indeed, there's absolutely nothing wrong about writing : const char *hello = hello world\n; printf(hello); And when hello is a const, there's no risk that it will be modified at runtime, so basically the check is wrong here if it does not check the real definition of the static element. Do you have an idea how this strange check is dealt with in other programs usually if debian always uses that flag ? Usually, printf(%s, blah) (which is far better than my first proposed solution). Yes but this is for a different thing, it's for when blah is a string and not a format. Here blah is a format. And it seems that this check was done precisely to forbid using a variable format, but at the same time it does not check the format, so it's useless at best, and wrong at worst. const char * hello means hello is a pointer to a const char. You may want to say const char * const hello. But gcc doesn't seem to handle it either (but clang does). Yes it does but it doesn't change its verdict. The test is really bogus I think : const char fmt[] = blah; printf(fmt); = OK const char *fmt= blah; printf(fmt); = KO const char * const fmt = blah; printf(fmt); = KO const char fmt[][5] = { blah }; printf(fmt[0]); = KO This is the difference between the first one and the last one which makes me say the test is bogus, because it's exactly the same. And worst thing is that I guess they added this check for people who mistakenly use printf(string). And as usual, they don't provide an easy way to say don't worry it's not an error, it's on purpose... This compiler is becoming more and more irritating, soon we'll have more lines of workarounds than useful lines of code. Worse in fact, the workaround is simple, it consists in removing the __attribute__((printf)) on the declaration line of chunk_appendf(), and thus *really* opening the door to real scary bugs. OK so I'll add a dummy argument to shut it up :-( Willy
Re: [ANNOUNCE] haproxy-1.5-dev26 (and hopefully last)
❦ 29 mai 2014 01:04 +0200, Willy Tarreau w...@1wt.eu : const char * hello means hello is a pointer to a const char. You may want to say const char * const hello. But gcc doesn't seem to handle it either (but clang does). Yes it does but it doesn't change its verdict. The test is really bogus I think : const char fmt[] = blah; printf(fmt); = OK const char *fmt= blah; printf(fmt); = KO const char * const fmt = blah; printf(fmt); = KO const char fmt[][5] = { blah }; printf(fmt[0]); = KO This is the difference between the first one and the last one which makes me say the test is bogus, because it's exactly the same. And worst thing is that I guess they added this check for people who mistakenly use printf(string). And as usual, they don't provide an easy way to say don't worry it's not an error, it's on purpose... This compiler is becoming more and more irritating, soon we'll have more lines of workarounds than useful lines of code. Well, this is something which exists since a long time. At least 5 years. Worse in fact, the workaround is simple, it consists in removing the __attribute__((printf)) on the declaration line of chunk_appendf(), and thus *really* opening the door to real scary bugs. OK so I'll add a dummy argument to shut it up :-( Or you could declare unsafe_chunk_appendf without the attribute and make chunk_appendf call this one and hope the compiler will optimize the indirection. But that's quite complex when you only need to add a dummy argument. -- Each module should do one thing well. - The Elements of Programming Style (Kernighan Plauger)