Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
On Tue, Oct 20, 2015 at 12:54:48AM +0530, Susheel Jalali wrote: > Dear HAProxy Developers: > > The following error message appears with HAProxy 1.6.0 after start and > then the load balancer stops. No haproxy.pid is getting created. The > same configuration works seamlessly with HAProxy 1.5.14 on the same > server. We are seeking insights into what we could be missing in our > configuration? > > The port numbers below are dedicated to this HAProxy instance and only > one HAProxy instance is running. > > /var/log/messages > > Frontend: Cannot create listening socket (0.0.0.0:) > Frontend: Cannot create listening socket (0.0.0.0:) > Proxy for stats: Cannot create listening socket () This sounds like either another process is listening on the same ports, or that these are privileged ports and you're not starting it as root. Try to start it by hand in foreground with "-db", you'll see all the messages, maybe you'll see some warnings that you're missing here. Willy
Re: [call to comment] HAProxy's DNS resolution default query type
Hi all, Thanks a lot for your feedbacks. Really valuable. I'll discuss with Willy the best approach for the change. Baptiste On Mon, Oct 19, 2015 at 11:50 PM, Andrew Hayworthwrote: > Hi all - > > Just to chime in, we just got bit by this in production. Our dns > resolver (unbound) does not follow CNAMES -> A records when you send > an ANY query type. This is by design, so I can't just configure it > differently (and ripping out our DNS resolver is not immediately > feasible). > > I therefore vote to stop sending the ANY query type, and instead rely > on A and queries. I don't have any comments on behavior regarding > NX behavior. > > NB: There is also support amongst some bigger internet companies to > fully deprecate this query type: > https://blog.cloudflare.com/deprecating-dns-any-meta-query-type/ > > On Thu, Oct 15, 2015 at 12:49 PM, Lukas Tribus wrote: >>> I second this opinion. Removing ANY altogether would be the best case. >>> >>> In reality, I think it should use the OS's resolver libraries which >>> in turn will honor whatever the admin has configured for preference >>> order at the base OS level. >>> >>> >>> As a sysadmin, one should reasonably expect that tweaking the >>> preference knob at the OS level should affect most (and ideally, all) >>> applications they are running rather than having to manually fiddle >>> knobs at the OS and various application levels. >>> If there is some discussion and *good* reasons to ignore the OS >>> defaults, I feel this should likely be an *optional* config option >>> in haproxy.cfg ie "use OS resolver, unless specifically told not to >>> for $reason) >> >> Its exactly like you are saying. >> >> I don't think there is any doubt that HAproxy will bypass OS level >> resolvers, since you are statically configuring DNS server IPs in the >> haproxy configuration file. >> >> When you don't configure any resolvers, HAproxy does use libc's >> gethostbyname() or getaddrinfo(), but both are fundamentally broken. >> >> Thats why some applications have to implement there own resolvers >> (including nginx). >> >> First of all the OS resolver doesn't provide the TTL value. So you would >> have to guess or use fixed TTL values. Second, both calls are blocking, >> which is a big no-go for any event-loop based application (for this >> reason, it can only be queried at startup, not while the application >> is running). >> >> Just configure a hostname without resolver parameters, and haproxy >> will resolve your hostnames at startup via OS (and then maintain those >> IP's). >> >> >> Applications either have to implement a resolver on their own (haproxy, >> nginx), or use yet another external library, like getdnsapi [1]. >> >> >> The point is: there is a reason for this implementation, and you can >> fallback to OS resolvers without any problems (just with their drawbacks). >> >> >> >> >> Regards, >> >> Lukas >> >> >> [1] https://getdnsapi.net/ >> > > > > -- > - Andrew Hayworth
Re: [PATCH] MINOR: cli: ability to set per-server maxconn
Hi Andrew, On Mon, Oct 19, 2015 at 02:23:39PM -0500, Andrew Hayworth wrote: > In another thread "Dynamically change server maxconn possible?", > someone raised the possibility of setting a per-server maxconn via the > stats socket. I believe the below patch implements this functionality. > > I'd appreciate any feedback, since I'm not really familiar with this > part of the code. However, I've tested it by curling slow endpoints > (the nginx echo_sleep module, specifically) and can confirm that NOSRV > is returned appropriately according to whatever maxconn settings are > set via the socket. Thanks. Normally you also need to try to dequeue pending connections when changing the value, because if you increase the limit, you need to open the door for new connections. After changing the value, you normally need something like this : if (may_dequeue_tasks(srv, srv->proxy)) process_srv_queue(srv); Regards, Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hey Willy, Recursors are not required to recurse when serving an ANY query. ANY query means that you ask a server (either recursor or auth) for everything it has on label x. If it has a CNAME on that label just returning that is a valid response (just like would happen if you queried for the CNAME type at label x). However when you ask for an A or record a recursor is required to follow the CNAME. Welcome to the wonderful world of DNS which doesn't really make sense anymore to anyone ;). Like said in the other mailthread, ANY queries are just a very unreliable way to get the records/types you want. Just asking for the actual types, if necessary in multiple queries, is the way to go. DNS is (usually) fast enough that the one extra query really shouldn't matter that much. -Robin- On 10/20/2015 8:49 AM, Willy Tarreau wrote: Hi Andrew, On Mon, Oct 19, 2015 at 05:39:58PM -0500, Andrew Hayworth wrote: The ANY query type is weird, and some resolvers don't 'do the legwork' of resolving useful things like CNAMEs. Given that upstream resolver behavior is not always under the control of the HAProxy administrator, we should not use the ANY query type. Rather, we should use A or according to either the explicit preferences of the operator, or the implicit default (/IPv6). But how does that fix the problem for you ? In your example below, the server clearly doesn't provide any A nor in the response so asking it for A or should not work either if it doesn't recurse, am I wrong ? PRODUCTION! ahaywo...@secret-hostname.com:~$ dig @10.11.12.53 ANY api.somestartup.io ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 ;; QUESTION SECTION: ;api.somestartup.io.IN ANY ;; ANSWER SECTION: api.somestartup.io. 20 IN CNAME api-somestartup-production.ap-southeast-2.elb.amazonaws.com. (...) I fear that such a change will prevent CNAMEs from working for many users where the DNS servers work fine, and will not necessarily fix the problems for other people. Regards, willy
Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
Dear Willy, Thank you for your insights. As you advised, below is the output of haproxy -f …cfg -db -V. We are starting HAProxy as root. There is no other application running on this server dedicated for load balancer. ‘netstat -apon’ suggests that these ports are not used by any other system process. HAProxy 1.6.0 was compiled from source in the server environment: Centos 7.1, and dynamic loading of (Lua 5.3.1, PCRE 8.32, OpenSSL 1.0.1e, zlib 1.2.7) HAProxy 1.5.14 runs smoothly on this same server and with the same configuration. HAProxy configuration (same for both 1.5.14 and 1.6.0) is given below. Any insights / pointers for us to further investigate and resolve this issue would be appreciated. +++ Debug output of HAProxy 1.6.0 +++ Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result FAILED Total: 3 (2 usable), will use epoll. [ALERT] 292/025305 (4402) : Starting frontend webapps-frontend: cannot create listening socket [0.0.0.0:80] [ALERT] 292/025305 (4402) : Starting frontend webapps-frontend: cannot create listening socket [0.0.0.0:443] [ALERT] 292/025305 (4402) : Starting proxy haproxystats: cannot create listening socket [Server_IP:] Using epoll() as the polling mechanism. ++ Haproxy configuration ++ global log 127.0.0.1 local2 pidfile /var/run/haproxy.pid userhaproxy group haproxy #daemon debug chroot /var/log/haproxy/ stats socket /var/log/haproxy/haproxy.stats defaults modehttp option abortonclose option http-server-close [….] frontend webapps-frontend bind *:80 name xxx bind *:443 name yyy ssl crt /path/to/server.pem [….] listen haproxystats bind Server_IP: [….] Thank you. Sincerely, -- -- Susheel Jalali Coscend Communications Solutions susheel.jal...@coscend.com www.Coscend.com On 10/20/15 12:29, Willy Tarreau wrote: > On Tue, Oct 20, 2015 at 12:54:48AM +0530, Susheel Jalali wrote: >> Dear HAProxy Developers: >> >> The following error message appears with HAProxy 1.6.0 after start and >> then the load balancer stops. No haproxy.pid is getting created. The >> same configuration works seamlessly with HAProxy 1.5.14 on the same >> server. We are seeking insights into what we could be missing in our >> configuration? >> >> The port numbers below are dedicated to this HAProxy instance and only >> one HAProxy instance is running. >> >> /var/log/messages >> >> Frontend: Cannot create listening socket (0.0.0.0:) >> Frontend: Cannot create listening socket (0.0.0.0:) >> Proxy for stats: Cannot create listening socket () > > This sounds like either another process is listening on the same ports, > or that these are privileged ports and you're not starting it as root. > > Try to start it by hand in foreground with "-db", you'll see all the > messages, maybe you'll see some warnings that you're missing here. > > Willy >
RE: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
> Dear Willy, > > Thank you for your insights. As you advised, below is the output of > haproxy -f …cfg -db -V. Can you run this through strace (strace haproxy -f …cfg -db -V) and provide the output. Also, if you have the strace output of a successful startup of 1.5.14 for comparison, that would be very helpful as well. Regards, Lukas
Re: Re: haproxy 1.6.0 crashes
Hi Rémi, On Tue, Oct 20, 2015 at 10:39:16AM +0200, Remi Gacogne wrote: > Hi, > > On 10/19/2015 05:01 PM, Willy Tarreau wrote: > >> [1] https://www.mail-archive.com/haproxy@formilux.org/msg19962.html > >> [2] https://www.mail-archive.com/haproxy@formilux.org/msg19995.html > > > > Regarding the second one, maybe Rémi's review could help. I noticed that > > you used gen_ssl_ctx_ptr_index = -1 which is the same value used for > > dh_params. Based on its name it makes me think it's a position in an > > array, so I'm not sure whether we can make them collide for example. I > > really don't know this API at all. > > I am not familiar with the generated certificate cache, so I will just > comment on the use of SSL_CTX_set_ex_data() and SSL_CTX_get_ex_data() in > the second patch, at least for now. Thank you, that was the part giving me some doubts. > The value of gen_ssl_ctx_ptr_index is correctly initialized to -1, and a > valid index is then obtained by calling SSL_CTX_get_ex_new_index() in > the __ssl_sock_init() constructor, so there should not be any collision > with other indexes. OK great then! > My only minor remark is that, though unlikely, > SSL_CTX_get_ex_new_index() might return -1 in case of error. In the DH > code we handle this by checking that ssl_dh_ptr_index is != -1 before > using it, and I think it would be better to do the same check before > using gen_ssl_ctx_ptr_index. Good catch indeed. Thanks for your insights. Best regards, Willy
Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
On Tue, Oct 20, 2015 at 10:54:58AM +0200, Lukas Tribus wrote: > > Dear Willy, > > > > Thank you for your insights. As you advised, below is the output of > > haproxy -f ?cfg -db -V. > > Can you run this through strace (strace haproxy -f ?cfg -db -V) and > provide the output. > > Also, if you have the strace output of a successful startup of 1.5.14 for > comparison, that would be very helpful as well. Yes definitely. Actually I'm seeing one difference between the two versions, it's the introduction of namespaces in 1.6.0. If it was built with support for namespaces and they are not supported in the operating system, I'm not seeing how my_socketat() can recover in case setns() returns -1, which happens when default_namespace = -1, which is the default case before initialization : #ifdef CONFIG_HAP_NS if (default_namespace < 0 || (ns && setns(ns->fd, CLONE_NEWNET) == -1)) return -1; #endif That said we're not seeing any error there so I'm having doubts. Let's wait for strace output then. Willy
Re: Re: haproxy 1.6.0 crashes
Hi, On 10/19/2015 05:01 PM, Willy Tarreau wrote: >> [1] https://www.mail-archive.com/haproxy@formilux.org/msg19962.html >> [2] https://www.mail-archive.com/haproxy@formilux.org/msg19995.html > > Regarding the second one, maybe Rémi's review could help. I noticed that > you used gen_ssl_ctx_ptr_index = -1 which is the same value used for > dh_params. Based on its name it makes me think it's a position in an > array, so I'm not sure whether we can make them collide for example. I > really don't know this API at all. I am not familiar with the generated certificate cache, so I will just comment on the use of SSL_CTX_set_ex_data() and SSL_CTX_get_ex_data() in the second patch, at least for now. The value of gen_ssl_ctx_ptr_index is correctly initialized to -1, and a valid index is then obtained by calling SSL_CTX_get_ex_new_index() in the __ssl_sock_init() constructor, so there should not be any collision with other indexes. My only minor remark is that, though unlikely, SSL_CTX_get_ex_new_index() might return -1 in case of error. In the DH code we handle this by checking that ssl_dh_ptr_index is != -1 before using it, and I think it would be better to do the same check before using gen_ssl_ctx_ptr_index. Kind regards, Remi signature.asc Description: OpenPGP digital signature
Re: haproxy 1.6.0 crashes
Le 19/10/2015 17:01, Willy Tarreau a écrit : On Mon, Oct 19, 2015 at 03:06:44PM +0200, Christopher Faulet wrote: OK so the unused objects in the tree have a refcount of 1 while the used ones have 2 or more, thus the refcount is always valid. Good that also means we must not test if the tree is null or not in ssl_sock_close(), we must always free the ssl_ctx as long as it was dynamically created, so that its refcount decreases, otherwise it keeps increasing upon every reuse. No. Maybe my explanation was not really clear. The SSL_CTX refcount is not exposed. It is an internal parameter. So, it is not incremented when the SSL_CTX is pushed in the cache tree. The call to SSL_set_SSL_CTX increases the refcount and the call to SSL_free decrements it (when the connection is closed). And, of course, the call to SSL_CTX_free decrements it too. The SSL_CTX object is released when the refcount reaches 0. For a SSL_CTX object, SSL_CTX_free must be called once. When it is evicted from the cache tree (or when the tree is destroyed) _OR_ when the connection is closed if there is no cache tree. If we always release SSL_CTX objects when the SSL connection is closed, we will have undefined references for cached objects, leading to a segfault. OK, I understood the opposite, which is that we kept a refcount for each user (cache and/or sessions). But then how do we know that an SSL_CTX is still in use when we want to evict it from the cache and that we must not free it ? Is it just the fact that between the moment it's picked from the cache using ssl_sock_get_generated_cert() and the moment it's associated to a session using SSL_set_SSL_CTX() it's not possible to yield and destroy the cached object so no race is possible here ? If so I'm fine with it for now (though it will become "fun" when we start to play with threads), I just want to be certain we're not overlooking this part as well. This is not an issue because when we get (or create) a SSL_CTX object then it is associated to a session, without any interruption. So it cannot be evicted from the cache in the middle. After this step, the refcount is >= 2, so, if the SSL_CTX object is evicted from the cache, the refcount is decremented and the SSL_CTX is not released. It will be automatically released with the closure of the last SSL connection using it. But, now this works for a non-threaded environment. Is there any plan to add thread support? If yes, this feature will not work. Also that raises another point : if the issue is related to SSL_CTX_free() being called on static contexts, then to me it means that these contexts were not properly refcounted when assigned to the SSL. Don't you think that we shouldn't instead do something like the following to properly refcount any context attached to an SSL and ensure that the SSL_CTX_free() can always be performed regardless of parallel activities in the LRU tree or anything else ? /* Alloc a new SSL session ctx */ conn->xprt_ctx = SSL_new(objt_server(conn->target)->ssl_ctx.ctx); + SSL_set_SSL_CTX(conn->xprt_ctx, objt_server(conn->target)->ssl_ctx.ctx); This last call will have no effect. Because the SSL_CTX is the same, this function returns immediately. Note that if the context changes, the refcount of the old one is decremented. But there is no issue here, because the static contexts are only released when HAProxy is stopped. Here is live cycle of static contexts: - HAProxy is started, static contexts are initialized by calling SSL_CTX_new (refcount is set to 1). - SSL connections use these contexts. SSL_new or SSL_set_SSL_CTX are called to assign a context to a SSL object. The refcount is incremented by 1 each time. When a SSL connection is closed, a call to SSL_free is done to release the SSL object and the refcount of the associated context is decremented. So the refcount is always greater or equal to 1. - HAPRoxy is stopped, all connections are closed, and finally, static contexts are freed by calling SSL_CTX_free. The refcount is equal to 1, so when SSL_CTX_free is called, it reaches 0 and the contexts is freed. The refcount is not incremented when a SSL_CTX object is pushed in the cache. There is no way to manually increment or decrement it. So, we must really know if the SSL_CTX object was cached or not when the SSL connection is closed. I'm having an issue here as well since the LRU's destroy callback is set to SSL_CTX_free. This we start with a non-null refcount. I'm sorry if I am not clear, but the problem I'm having could be described like this : - two sets of entities can use a shared resource at any instant : cache and SSL sessions ; - each of them uses SSL_CTX_free() at release time to release the object ; - SSL_CTX_free() takes care of the refcount to know if it must free or not, which means that these two entities above are each responsible for one refcount point ; - the
Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
On Tue, Oct 20, 2015 at 11:20:12AM +0200, Willy Tarreau wrote: > On Tue, Oct 20, 2015 at 10:54:58AM +0200, Lukas Tribus wrote: > > > Dear Willy, > > > > > > Thank you for your insights. As you advised, below is the output of > > > haproxy -f ?cfg -db -V. > > > > Can you run this through strace (strace haproxy -f ?cfg -db -V) and > > provide the output. > > > > Also, if you have the strace output of a successful startup of 1.5.14 for > > comparison, that would be very helpful as well. > > Yes definitely. Actually I'm seeing one difference between the two versions, > it's the introduction of namespaces in 1.6.0. If it was built with support > for namespaces and they are not supported in the operating system, I'm not > seeing how my_socketat() can recover in case setns() returns -1, which > happens when default_namespace = -1, which is the default case before > initialization : > > #ifdef CONFIG_HAP_NS > if (default_namespace < 0 || > (ns && setns(ns->fd, CLONE_NEWNET) == -1)) > return -1; > #endif OK it's clear there's a bug here in my opinion because default_namespace is *only* initialized if there are explicit namespaces. I could reproduce the issue here, you simply need to build with USE_NS=1 and to declare no namespace anywhere. Here's a proposed fix which works for me. Please confirm. Willy diff --git a/src/namespace.c b/src/namespace.c index a22f1a5..f1e81df 100644 --- a/src/namespace.c +++ b/src/namespace.c @@ -97,14 +97,13 @@ int my_socketat(const struct netns_entry *ns, int domain, int type, int protocol int sock; #ifdef CONFIG_HAP_NS - if (default_namespace < 0 || - (ns && setns(ns->fd, CLONE_NEWNET) == -1)) + if (default_namespace >= 0 && ns && setns(ns->fd, CLONE_NEWNET) == -1) return -1; #endif sock = socket(domain, type, protocol); #ifdef CONFIG_HAP_NS - if (ns && setns(default_namespace, CLONE_NEWNET) == -1) { + if (default_namespace >= 0 && ns && setns(default_namespace, CLONE_NEWNET) == -1) { close(sock); return -1; }
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hi Robin, [merging your reply and Lukas'] On Tue, Oct 20, 2015 at 08:59:27AM +0200, Robin Geuze wrote: > Hey Willy, > > Recursors are not required to recurse when serving an ANY query. ANY > query means that you ask a server (either recursor or auth) for > everything it has on label x. If it has a CNAME on that label just > returning that is a valid response (just like would happen if you > queried for the CNAME type at label x). However when you ask for an A or > record a recursor is required to follow the CNAME. Welcome to the > wonderful world of DNS which doesn't really make sense anymore to anyone ;). I didn't know the server was required to follow the CNAME, that's the info I was missing. Then of course it definitely makes sense. > Like said in the other mailthread, ANY queries are just a very > unreliable way to get the records/types you want. Just asking for the > actual types, if necessary in multiple queries, is the way to go. DNS is > (usually) fast enough that the one extra query really shouldn't matter > that much. Lukas: > I don't think this is CNAME specific. ANY will just return what the > recursor has in the cache. If it isn't in the cache, ANY won't make > the recursor ask upstream DNS servers, only A and (or MX or > any other real qtype) will. OK so you can get random response based on whatever someone else asked this server in the past. These are very useful info. We'll discuss this with Baptiste, and very likely Andrew's patch will be taken as-is. Willy
Re: New 1.6 features overview?
On 20/10/2015 12:49 μμ, SL wrote: > Hi, > > New 1.6 features look interesting from the news item. Is there a > comprehensive description of the new features anywhere? > > Thanks > > S There is this: http://blog.haproxy.com/2015/10/14/whats-new-in-haproxy-1-6/ Cheers, Pavlos signature.asc Description: OpenPGP digital signature
New 1.6 features overview?
Hi, New 1.6 features look interesting from the news item. Is there a comprehensive description of the new features anywhere? Thanks S
Re: haproxy 1.6.0 crashes
On Tue, Oct 20, 2015 at 02:14:37PM +0200, Christopher Faulet wrote: > Le 20/10/2015 14:07, Willy Tarreau a écrit : > >On Tue, Oct 20, 2015 at 01:59:52PM +0200, Willy Tarreau wrote: > >>Then my understanding is that we should instead proceed differently : > >> - the cert is generated. It gets a refcount = 1. > >> - we assign it to the SSL. Its refcount becomes two. > >> - we try to insert it into the tree. The tree will handle its freeing > >> using SSL_CTX_free() during eviction. > >> - if we can't insert into the tree because the tree is disabled, then > >> we have to call SSL_CTX_free() ourselves, then we'd rather do it > >> immediately. It will more closely mimmick the case where the cert > >> is added to the tree and immediately evicted by concurrent activity > >> on the cache. > >> - we never have to call SSL_CTX_free() during ssl_sock_close() because > >> the SSL session only relies on openssl doing the right thing based on > >> the refcount only. > >> - thus we never need to know how the cert was created since the > >> SSL_CTX_free() is either guaranteed or already done for generated > >> certs, and this protects other ones against any accidental call to > >> SSL_CTX_free() without having to track where the cert comes from. > > > >This patch does this, and based on my understanding of your explanations, > >it should do the right thing and be safe all the time. What's your opinion > >? > > > > Yes, it should work and it avoids keeping extra info on generated > certificates. Good idea ! Thanks. Do you have a easy reproducer for the issue with the certs ? I tried a little bit but probably didn't test the proper sequence. Willy
Re: haproxy 1.6.0 crashes
On Tue, Oct 20, 2015 at 01:59:52PM +0200, Willy Tarreau wrote: > Then my understanding is that we should instead proceed differently : > - the cert is generated. It gets a refcount = 1. > - we assign it to the SSL. Its refcount becomes two. > - we try to insert it into the tree. The tree will handle its freeing > using SSL_CTX_free() during eviction. > - if we can't insert into the tree because the tree is disabled, then > we have to call SSL_CTX_free() ourselves, then we'd rather do it > immediately. It will more closely mimmick the case where the cert > is added to the tree and immediately evicted by concurrent activity > on the cache. > - we never have to call SSL_CTX_free() during ssl_sock_close() because > the SSL session only relies on openssl doing the right thing based on > the refcount only. > - thus we never need to know how the cert was created since the > SSL_CTX_free() is either guaranteed or already done for generated > certs, and this protects other ones against any accidental call to > SSL_CTX_free() without having to track where the cert comes from. This patch does this, and based on my understanding of your explanations, it should do the right thing and be safe all the time. What's your opinion ? Thanks, Willy diff --git a/src/ssl_sock.c b/src/ssl_sock.c index 5319532..4eed2ea 100644 --- a/src/ssl_sock.c +++ b/src/ssl_sock.c @@ -1201,9 +1201,13 @@ ssl_sock_generate_certificate(const char *servername, struct bind_conf *bind_con ssl_ctx = ssl_sock_do_create_cert(servername, serial, bind_conf, ssl); lru64_commit(lru, ssl_ctx, cacert, 0, (void (*)(void *))SSL_CTX_free); } + SSL_set_SSL_CTX(ssl, ssl_ctx); } - else + else { ssl_ctx = ssl_sock_do_create_cert(servername, serial, bind_conf, ssl); + SSL_set_SSL_CTX(ssl, ssl_ctx); + SSL_CTX_free(ssl_ctx); + } return ssl_ctx; } @@ -1271,7 +1275,6 @@ static int ssl_sock_switchctx_cbk(SSL *ssl, int *al, struct bind_conf *s) if (s->generate_certs && (ctx = ssl_sock_generate_certificate(servername, s, ssl))) { /* switch ctx */ - SSL_set_SSL_CTX(ssl, ctx); return SSL_TLSEXT_ERR_OK; } return (s->strict_sni ? @@ -3123,13 +3126,6 @@ static int ssl_sock_from_buf(struct connection *conn, struct buffer *buf, int fl static void ssl_sock_close(struct connection *conn) { if (conn->xprt_ctx) { -#ifdef SSL_CTRL_SET_TLSEXT_HOSTNAME - if (!ssl_ctx_lru_tree && objt_listener(conn->target)) { - SSL_CTX *ctx = SSL_get_SSL_CTX(conn->xprt_ctx); - if (ctx != objt_listener(conn->target)->bind_conf->default_ctx) - SSL_CTX_free(ctx); - } -#endif SSL_free(conn->xprt_ctx); conn->xprt_ctx = NULL; sslconns--;
Re: haproxy 1.6.0 crashes
Le 20/10/2015 14:07, Willy Tarreau a écrit : On Tue, Oct 20, 2015 at 01:59:52PM +0200, Willy Tarreau wrote: Then my understanding is that we should instead proceed differently : - the cert is generated. It gets a refcount = 1. - we assign it to the SSL. Its refcount becomes two. - we try to insert it into the tree. The tree will handle its freeing using SSL_CTX_free() during eviction. - if we can't insert into the tree because the tree is disabled, then we have to call SSL_CTX_free() ourselves, then we'd rather do it immediately. It will more closely mimmick the case where the cert is added to the tree and immediately evicted by concurrent activity on the cache. - we never have to call SSL_CTX_free() during ssl_sock_close() because the SSL session only relies on openssl doing the right thing based on the refcount only. - thus we never need to know how the cert was created since the SSL_CTX_free() is either guaranteed or already done for generated certs, and this protects other ones against any accidental call to SSL_CTX_free() without having to track where the cert comes from. This patch does this, and based on my understanding of your explanations, it should do the right thing and be safe all the time. What's your opinion ? Yes, it should work and it avoids keeping extra info on generated certificates. Good idea ! -- Christopher Faulet
Re: haproxy 1.6.0 crashes
Hi Christopher, On Tue, Oct 20, 2015 at 01:32:57PM +0200, Christopher Faulet wrote: > >But then how do we know that an SSL_CTX is still in use when we want to > >evict it from the cache and that we must not free it ? Is it just the > >fact that between the moment it's picked from the cache using > >ssl_sock_get_generated_cert() and the moment it's associated to a session > >using SSL_set_SSL_CTX() it's not possible to yield and destroy the cached > >object so no race is possible here ? If so I'm fine with it for now (though > >it will become "fun" when we start to play with threads), I just want to > >be certain we're not overlooking this part as well. > > This is not an issue because when we get (or create) a SSL_CTX object > then it is associated to a session, without any interruption. So it > cannot be evicted from the cache in the middle. > After this step, the refcount is >= 2, so, if the SSL_CTX object is > evicted from the cache, the refcount is decremented and the SSL_CTX is > not released. It will be automatically released with the closure of the > last SSL connection using it. OK. > But, now this works for a non-threaded environment. Is there any plan to > add thread support? If yes, this feature will not work. Sure, we expected to be able to make some progress towards this in 1.6, so now this is postponed to 1.7. We're at least trying not to make the situation worse than it currently right now :-) > >Also that raises another point : if the issue is related to SSL_CTX_free() > >being called on static contexts, then to me it means that these contexts > >were not properly refcounted when assigned to the SSL. Don't you think > >that we shouldn't instead do something like the following to properly > >refcount any context attached to an SSL and ensure that the SSL_CTX_free() > >can always be performed regardless of parallel activities in the LRU tree > >or anything else ? > > > > /* Alloc a new SSL session ctx */ > > conn->xprt_ctx = > > SSL_new(objt_server(conn->target)->ssl_ctx.ctx); > >+SSL_set_SSL_CTX(conn->xprt_ctx, > >objt_server(conn->target)->ssl_ctx.ctx); > > This last call will have no effect. Because the SSL_CTX is the same, > this function returns immediately. Note that if the context changes, the > refcount of the old one is decremented. OK so the cert's refcount is already incremented by SSL_new(), then I don't understand why the SSL_CTX_free() fails in ssl_sock_close() since it should only decrease the refcount from what I understand. Or maybe it is just because we're not allowed to call it twice (which makes sense to me) ? > But there is no issue here, because the static contexts are only > released when HAProxy is stopped. Sure but I thought we should ensure that all SSL_CTX are properly refcounted instead of handling some of them one way and the other ones another way. I mean, if openssl provides refcounting for us, better use it globally than just for certain certs. > Here is live cycle of static contexts: > > - HAProxy is started, static contexts are initialized by calling > SSL_CTX_new (refcount is set to 1). OK. > - SSL connections use these contexts. SSL_new or SSL_set_SSL_CTX are > called to assign a context to a SSL object. The refcount is incremented > by 1 each time. When a SSL connection is closed, a call to SSL_free is > done to release the SSL object and the refcount of the associated > context is decremented. So the refcount is always greater or equal to 1. OK. > - HAPRoxy is stopped, all connections are closed, and finally, static > contexts are freed by calling SSL_CTX_free. The refcount is equal to 1, > so when SSL_CTX_free is called, it reaches 0 and the contexts is freed. OK. My understanding here (and what you already explained) is that it's really freed once it reaches zero, either upon SSL_CTX_free() or upon SSL_free(). So for certs belonging to the cache, in fact we have one call to SSL_CTX_free() then one to many calls to SSL_free(). > >I'm having an issue here as well since the LRU's destroy callback is set > >to SSL_CTX_free. This we start with a non-null refcount. I'm sorry if I am > >not clear, but the problem I'm having could be described like this : > > > > - two sets of entities can use a shared resource at any instant : cache > > and SSL sessions ; > > - each of them uses SSL_CTX_free() at release time to release the > > object ; > > - SSL_CTX_free() takes care of the refcount to know if it must free or > > not, > > which means that these two entities above are each responsible for one > > refcount point ; > > - the SSL_CTX_free() called by the cache is unconditional when the > > object > > is evicted from the cache ; > > - the SSL_CTX_free() is only done if the cache is enabled ; > > This last step is wrong. SSL_CTX_free is only done if the cache is > _DISABLED_ or NULL if you prefer. SSL_CTX_free is called when
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hi Andrew, There is a bug repeated twice in your code. In both dns_reset_resolution() and trigger_resolution(), you use "resolution->resolver_family_priority" before it is positioned. This may lead to using the last resolution->resolver_family_priority, which may be different than the server one. Please move the line "resolution->resolver_family_priority = s->resolver_family_priority;" before using the value stored in it. Appart this, it looks good. Baptiste On Tue, Oct 20, 2015 at 12:39 AM, Andrew Hayworthwrote: > The ANY query type is weird, and some resolvers don't 'do the legwork' > of resolving useful things like CNAMEs. Given that upstream resolver > behavior is not always under the control of the HAProxy administrator, > we should not use the ANY query type. Rather, we should use A or > according to either the explicit preferences of the operator, or the > implicit default (/IPv6). > > - Andrew Hayworth > > From 8ed172424cbd79197aacacd1fd89ddcfa46e213d Mon Sep 17 00:00:00 2001 > From: Andrew Hayworth > Date: Mon, 19 Oct 2015 22:29:51 + > Subject: [PATCH] MEDIUM: dns: Don't use the ANY query type > > Basically, it's ill-defined and shouldn't really be used going forward. > We can't guarantee that resolvers will do the 'legwork' for us and > actually resolve CNAMES when we request the ANY query-type. Case in point > (obfuscated, clearly): > > PRODUCTION! ahaywo...@secret-hostname.com:~$ > dig @10.11.12.53 ANY api.somestartup.io > > ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io > ; (1 server found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;api.somestartup.io.IN ANY > > ;; ANSWER SECTION: > api.somestartup.io. 20 IN CNAME > api-somestartup-production.ap-southeast-2.elb.amazonaws.com. > > ;; AUTHORITY SECTION: > somestartup.io. 166687 IN NS ns-1254.awsdns-28.org. > somestartup.io. 166687 IN NS > ns-1884.awsdns-43.co.uk. > somestartup.io. 166687 IN NS ns-440.awsdns-55.com. > somestartup.io. 166687 IN NS ns-577.awsdns-08.net. > > ;; Query time: 1 msec > ;; SERVER: 10.11.12.53#53(10.11.12.53) > ;; WHEN: Mon Oct 19 22:02:29 2015 > ;; MSG SIZE rcvd: 242 > > HAProxy can't handle that response correctly. > > Rather than try to build in support for resolving CNAMEs presented > without an A record in an answer section (which may be a valid > improvement further on), this change just skips ANY record types > altogether. A and are much more well-defined and predictable. > > Notably, this commit preserves the implicit "Prefer IPV6 behavior." > --- > include/types/dns.h | 3 ++- > src/checks.c| 6 +- > src/dns.c | 6 +- > src/server.c| 18 +++--- > 4 files changed, 19 insertions(+), 14 deletions(-) > > diff --git a/include/types/dns.h b/include/types/dns.h > index f8edb73..ea1a9f9 100644 > --- a/include/types/dns.h > +++ b/include/types/dns.h > @@ -161,7 +161,8 @@ struct dns_resolution { > unsigned int last_status_change; /* time of the latest DNS > resolution status change */ > int query_id; /* DNS query ID dedicated for this resolution */ > struct eb32_node qid; /* ebtree query id */ > - int query_type; /* query type to send. By default DNS_RTYPE_ANY */ > + int query_type; > + /* query type to send. By default DNS_RTYPE_A or DNS_RTYPE_ > depending on resolver_family_priority */ > int status; /* status of the resolution being processed RSLV_STATUS_* */ > int step; /* */ > int try; /* current resolution try */ > diff --git a/src/checks.c b/src/checks.c > index ade2428..d3cd567 100644 > --- a/src/checks.c > +++ b/src/checks.c > @@ -2214,7 +2214,11 @@ int trigger_resolution(struct server *s) > resolution->query_id = query_id; > resolution->qid.key = query_id; > resolution->step = RSLV_STEP_RUNNING; > - resolution->query_type = DNS_RTYPE_ANY; > + if (resolution->resolver_family_priority == AF_INET) { > + resolution->query_type = DNS_RTYPE_A; > + } else { > + resolution->query_type = DNS_RTYPE_; > + } > resolution->try = resolvers->resolve_retries; > resolution->try_cname = 0; > resolution->nb_responses = 0; > diff --git a/src/dns.c b/src/dns.c > index 7f71ac7..53b65ab 100644 > --- a/src/dns.c > +++ b/src/dns.c > @@ -102,7 +102,11 @@ void dns_reset_resolution(struct dns_resolution > *resolution) > resolution->qid.key = 0; > > /* default values */ > - resolution->query_type = DNS_RTYPE_ANY; > + if (resolution->resolver_family_priority == AF_INET) { > + resolution->query_type = DNS_RTYPE_A; > + } else { > + resolution->query_type = DNS_RTYPE_; > + } > > /* the second
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hi Andrew, On Mon, Oct 19, 2015 at 05:39:58PM -0500, Andrew Hayworth wrote: > The ANY query type is weird, and some resolvers don't 'do the legwork' > of resolving useful things like CNAMEs. Given that upstream resolver > behavior is not always under the control of the HAProxy administrator, > we should not use the ANY query type. Rather, we should use A or > according to either the explicit preferences of the operator, or the > implicit default (/IPv6). But how does that fix the problem for you ? In your example below, the server clearly doesn't provide any A nor in the response so asking it for A or should not work either if it doesn't recurse, am I wrong ? > PRODUCTION! ahaywo...@secret-hostname.com:~$ > dig @10.11.12.53 ANY api.somestartup.io > > ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io > ; (1 server found) > ;; global options: +cmd > ;; Got answer: > ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 > ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 > > ;; QUESTION SECTION: > ;api.somestartup.io.IN ANY > > ;; ANSWER SECTION: > api.somestartup.io. 20 IN CNAME > api-somestartup-production.ap-southeast-2.elb.amazonaws.com. (...) I fear that such a change will prevent CNAMEs from working for many users where the DNS servers work fine, and will not necessarily fix the problems for other people. Regards, willy
RE: [PATCH] MEDIUM: dns: Don't use the ANY query type
> Hi Andrew, > > On Mon, Oct 19, 2015 at 05:39:58PM -0500, Andrew Hayworth wrote: >> The ANY query type is weird, and some resolvers don't 'do the legwork' >> of resolving useful things like CNAMEs. Given that upstream resolver >> behavior is not always under the control of the HAProxy administrator, >> we should not use the ANY query type. Rather, we should use A or >> according to either the explicit preferences of the operator, or the >> implicit default (/IPv6). > > But how does that fix the problem for you ? In your example below, > the server clearly doesn't provide any A nor in the response > so asking it for A or should not work either if it doesn't > recurse, am I wrong ? I don't think this is CNAME specific. ANY will just return what the recursor has in the cache. If it isn't in the cache, ANY won't make the recursor ask upstream DNS servers, only A and (or MX or any other real qtype) will. Just switching to ANY is not enough, we still need to fallback from to A and vice versa on NX responses for single homed nodes. Lukas
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
On Tue, Oct 20, 2015 at 07:36:20PM +0200, Lukas Tribus wrote: > Hi, > > > >> A simple option in the resolvers section to instruct HAPoxy to not > >> forgive on NX and failover to next family: > >> option on-nx-try-next-family > > > > I personally find this confusing from the user's point of view. > > Agreed, we should have good and safe defaults, and address corner > cases with additional options, not the other way around. Definitely. We used to know this situation till 1.4 included with http working in tunnel mode by default and not doing the right thing by default. Many of the problem reports were caused by the fact that newcomers didn't know such specificities. > > When you know that you can only use IPv4 to join the next server, I > > think this : > > > > server remote1 remote1.mydomain check v4only > > > > is more obvious than this : > > > > option on-nx-try-next-family > > server remote1 remote1.mydomain check prefer-ipv4 > > Actually I think "v4only" would be "prefer-ipv4" without > on-nx-try-next-family, right? Anyway, I agree. Yes that's it. > Without automatic AF fallback and without ANY queries, the > "prefer" keyword actually is restricting, and not preferring. I would simply ignore prefer when v[46]only is set. > > Also, it covers the case where some servers are known to support both > > protocols while others are limited. This allows for example to join > > the same remote server over two possible families behind a DSL line > > which uses a random IP address after each reconnection : > > > > server home-v4 home-v4.mydomain check v4only > > server home-v6 home-v6.mydomain check v6only > > > > And since we already have v4only/v6only on bind lines, the analogy > > seems easy to remember. > > The behavior with v4only or v6only is quite obvious, we just query that > particular address family, but let me clarify: you are implying that > without v4only/v6only keyword, we query one address family and then > fallback to the other address family in case we get a NX response, right? Yes that's it. And in this case it's prefer that gives the ordering. > I think thats a good solution. Thanks :-) > Question: are we still talking about 1.6 here? It seems we have to > make some intrusive changes that may break configurations (but they > seem mandatory to get consistent and predictable behavior). I don't know. I'm always only focused on the combination of user-visible changes and risks of bugs (which are user-visible changes btw). So if we can do it without breaking too much code, then it can be backported. What we have now is something which is apparently insufficient to some users so we can improve the situation. I wouldn't want to remove prefer-* or change the options behaviour or whatever for example. > By the amount of people that already hit the ANY issue (3 or more?), > I would say we better break a small number of configurations between > 1.6 and 1.6.1, Normally we don't break them. Currenly prefer can pick any of two families after a response to an ANY request, which is what is still currently being done. It only doesn't retry after it tries a specific family. The only difference will be that if a config reports NX for, say, A, then today it doesn't retry and will cause the server to fail while after the change it will allow the server to continue in v6 or to fail as well. So with this change we will be very close to the current behaviour, and offer everyone the option to fix their preference and make them restrictions. > then having to deal with the fallout of the ANY issue > (because the ANY removal changes resolve-prefer behavior as well) > for the time that 1.6 is supported. Absolutely. We had such related discussions with Baptiste during the design and for many such choices, it was hard to get responses from the users asking for the feature (and several of them were disagreeing on a number of possibilities). It's common in fact, people want something "very simple" and oversee the hidden complexities since corner cases is just for others, they don't happen for them. So we decided to go the modest route and see if anything required to be enforced. I still think it was the best option. Thanks very much for sharing your opinion, that definitely helps! Willy
RE: [PATCH] MEDIUM: dns: Don't use the ANY query type
> I don't know. I'm always only focused on the combination of user-visible > changes and risks of bugs (which are user-visible changes btw). So if we > can do it without breaking too much code, then it can be backported. What > we have now is something which is apparently insufficient to some users > so we can improve the situation. I wouldn't want to remove prefer-* or > change the options behavior or whatever for example. Ok, if we don't remove existing prefer-* keywords a 1.6 backport sounds possible without user visible breakage, great. lukas
RE: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hi, >> A simple option in the resolvers section to instruct HAPoxy to not >> forgive on NX and failover to next family: >> option on-nx-try-next-family > > I personally find this confusing from the user's point of view. Agreed, we should have good and safe defaults, and address corner cases with additional options, not the other way around. > When you know that you can only use IPv4 to join the next server, I > think this : > > server remote1 remote1.mydomain check v4only > > is more obvious than this : > > option on-nx-try-next-family > server remote1 remote1.mydomain check prefer-ipv4 Actually I think "v4only" would be "prefer-ipv4" without on-nx-try-next-family, right? Anyway, I agree. Without automatic AF fallback and without ANY queries, the "prefer" keyword actually is restricting, and not preferring. > Also, it covers the case where some servers are known to support both > protocols while others are limited. This allows for example to join > the same remote server over two possible families behind a DSL line > which uses a random IP address after each reconnection : > > server home-v4 home-v4.mydomain check v4only > server home-v6 home-v6.mydomain check v6only > > And since we already have v4only/v6only on bind lines, the analogy > seems easy to remember. The behavior with v4only or v6only is quite obvious, we just query that particular address family, but let me clarify: you are implying that without v4only/v6only keyword, we query one address family and then fallback to the other address family in case we get a NX response, right? I think thats a good solution. Question: are we still talking about 1.6 here? It seems we have to make some intrusive changes that may break configurations (but they seem mandatory to get consistent and predictable behavior). By the amount of people that already hit the ANY issue (3 or more?), I would say we better break a small number of configurations between 1.6 and 1.6.1, then having to deal with the fallout of the ANY issue (because the ANY removal changes resolve-prefer behavior as well) for the time that 1.6 is supported. Regards, Lukas
Welcome to Abbexa
Welcome to Abbexa We are very excited to welcome you to the Abbexa customer community. We’d like to let you know that our professional support team is here to help you anytime via email or phone. They can help you to explore and select the correct products for your research. We are here to answer each and every question you have so please send us an email and we’ll get back to you ASAP! Our brand new website has been designed by biotechnology experts, and features more than 100,000 products, a new and improved search and a filter to enable you to narrow down your results. Try it out and let us know your thoughts! Who we are? Abbexa is a supplier and distributor of biological tools for the life science, pharmaceutical development and biotechnology sectors. Based in Cambridge, UK, we provide the scientific community with primary antibodies, secondary antibodies, proteins, ELISA kits and enzymes as well as other kits and tools. Working with various laboratories across the World, we aim to develop relevant, high quality, tested products for the biomedical research market to meet your needs at a reasonable price.
TCP raw socket data compress with haproxy
Hi, We want to use zlib to compress/uncompress tcp data between tcp session. There is only compression code for http but not for tcp.I did some research and I encountered problem of the lack of chunk size. Is there any sample or development for this scenario? We are using stunnel currently. I think it uses high level protocol for compress and uncompress on top of TCP. Thanks for all your help.
Re: TCP raw socket data compress with haproxy
Hi, On Tue, Oct 20, 2015 at 06:39:17PM +0300, Tufan Gürsu wrote: > Hi, > > We want to use zlib to compress/uncompress tcp data between tcp session. > There is only compression code for http but not for tcp.I did some > research and I encountered problem of the lack of chunk size. > Is there any sample or development for this scenario? > We are using stunnel currently. I think it uses high level protocol for > compress and uncompress on top of TCP. There's nothing planned regarding this and no standard way to achieve it either. Also, using zlib to compress live streams is really not a good idea considering how slow it is compared to more suited algorithms such as LZO, LZ4, zstd, snappy, etc that can be up to 20 times faster. Regards, Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
On Tue, Oct 20, 2015 at 06:26:38PM +0200, Baptiste wrote: > > Also, we will have to address the issue that a server may just use > > a single address-family, therefor we have to fallback between A > > and , because a NX on a query doesn't mean there are no > > A records. > > Hi Lukas, > > I do agree on this point. > A simple option in the resolvers section to instruct HAPoxy to not > forgive on NX and failover to next family: > option on-nx-try-next-family I personally find this confusing from the user's point of view. My translation of what it does is to allow cross-family requests instead of limiting requests to the preferred family. Thus it seems to me that users will have to carefully configure both this option and the prefer field to select the required family. In practice I guess most people will simply want "v4only" or "v6only" as alternatives to "prefer-ipv4" or "prefer-ipv6". When you know that you can only use IPv4 to join the next server, I think this : server remote1 remote1.mydomain check v4only is more obvious than this : option on-nx-try-next-family server remote1 remote1.mydomain check prefer-ipv4 Also, it covers the case where some servers are known to support both protocols while others are limited. This allows for example to join the same remote server over two possible families behind a DSL line which uses a random IP address after each reconnection : server home-v4 home-v4.mydomain check v4only server home-v6 home-v6.mydomain check v6only And since we already have v4only/v6only on bind lines, the analogy seems easy to remember. Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
> Also, we will have to address the issue that a server may just use > a single address-family, therefor we have to fallback between A > and , because a NX on a query doesn't mean there are no > A records. Hi Lukas, I do agree on this point. A simple option in the resolvers section to instruct HAPoxy to not forgive on NX and failover to next family: option on-nx-try-next-family The magic should happen in snr_resolution_error_cb(). Baptiste
Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
Dear Willy and Lukas, Thank you for your guidance. Upon implementing your insights, here is the summed up result: (1) With Willy’s patch HAProxy starts (2) But we have to remove listen haproxystats block, as it still cannot create listening socket for this listen proxy. Detailed information you have requested is below without and with Willy’s patch. Your further guidance to fix the item (2) above would be appreciated. Thank you. --- @Willy: HAProxy is running on two of our servers that have CentOS 7.1 (Server1) and CentOS 7 (Server2), which support namespace. Checked with: ip netns add namespace1 and then ip netns list -->This listed namespace1 WITHOUT Willy’s patch @Lukas and Willy We installed HAPRoxy 1.5.14 and 1.6.0 on both Server1 and Server2. 1.5.14 starts and functions normally on both. 1.6.0 fails after start without creating pid. ATTACHED are strace output from running the following on the same server with the same configuration (including port numbers): HAProxy 1.5.14 HAProxy 1.6.0 WITH Willy’s Patch @Willy We patched your fixns.diff to namespace.c Running haproxy in debug mode gives: Oct 20 14:03:35 localhost haproxy: Starting haproxy: [ALERT] 292/140335 (7337) : Starting proxy haproxystats: cannot bind socket [] Upon removing ‘listen haproxystats’ block from HAProxy configuration (see below), it starts normally ++ Haproxy configuration ++ global log 127.0.0.1 local2 pidfile /var/run/haproxy.pid userhaproxy group haproxy #daemon debug chroot /var/log/haproxy/ stats socket /var/log/haproxy/haproxy.stats defaults modehttp option abortonclose option http-server-close [….] frontend webapps-frontend bind *:80 name xxx bind *:443 name yyy ssl crt /path/to/server.pem [….] listen haproxystats bind Server_IP: [….] Thank you. Sincerely, -- -- Susheel Jalali Coscend Communications Solutions susheel.jal...@coscend.com www.Coscend.com - On 10/20/15 15:02, Willy Tarreau wrote: > On Tue, Oct 20, 2015 at 11:20:12AM +0200, Willy Tarreau wrote: >> On Tue, Oct 20, 2015 at 10:54:58AM +0200, Lukas Tribus wrote: Dear Willy, Thank you for your insights. As you advised, below is the output of haproxy -f ?cfg -db -V. >>> >>> Can you run this through strace (strace haproxy -f ?cfg -db -V) and >>> provide the output. >>> >>> Also, if you have the strace output of a successful startup of 1.5.14 for >>> comparison, that would be very helpful as well. >> >> Yes definitely. Actually I'm seeing one difference between the two versions, >> it's the introduction of namespaces in 1.6.0. If it was built with support >> for namespaces and they are not supported in the operating system, I'm not >> seeing how my_socketat() can recover in case setns() returns -1, which >> happens when default_namespace = -1, which is the default case before >> initialization : >> >> #ifdef CONFIG_HAP_NS >> if (default_namespace < 0 || >> (ns && setns(ns->fd, CLONE_NEWNET) == -1)) >> return -1; >> #endif > > OK it's clear there's a bug here in my opinion because default_namespace > is *only* initialized if there are explicit namespaces. I could reproduce > the issue here, you simply need to build with USE_NS=1 and to declare no > namespace anywhere. Here's a proposed fix which works for me. Please > confirm. > > Willy > strace.haproxy-1.15.4 Description: Binary data strace.haproxy-1.6.0 Description: Binary data
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
On Tue, Oct 20, 2015 at 9:09 PM, Lukas Tribuswrote: >> I don't know. I'm always only focused on the combination of user-visible >> changes and risks of bugs (which are user-visible changes btw). So if we >> can do it without breaking too much code, then it can be backported. What >> we have now is something which is apparently insufficient to some users >> so we can improve the situation. I wouldn't want to remove prefer-* or >> change the options behavior or whatever for example. > > Ok, if we don't remove existing prefer-* keywords a 1.6 backport sounds > possible without user visible breakage, great. > > lukas Ok, just to make it clear, let me write a few conf examples: - server home-v4 home-v4.mydomain check resolve-prefer ipv4 => A then (failover on NX) - server home-v4 home-v4.mydomain check v4only => A only (stop on NX) If both 'resolve-prefer ipv[46]' and 'v[46]only' are set, whatever combination, then, v[46]only applies, but configuration parsing may return a warning. So we don't break compatibility with current code and way of working! Brilliant guys :) Baptiste
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Hi Andrew, I've updated your patch quickly so Willy can integrate it. I've also updated the commit message to follow Lukas recommendations. Baptiste On Tue, Oct 20, 2015 at 2:26 PM, Baptistewrote: > Hi Andrew, > > There is a bug repeated twice in your code. > In both dns_reset_resolution() and trigger_resolution(), you use > "resolution->resolver_family_priority" before it is positioned. This > may lead to using the last resolution->resolver_family_priority, which > may be different than the server one. > Please move the line "resolution->resolver_family_priority = > s->resolver_family_priority;" before using the value stored in it. > > Appart this, it looks good. > > Baptiste > > > On Tue, Oct 20, 2015 at 12:39 AM, Andrew Hayworth > wrote: >> The ANY query type is weird, and some resolvers don't 'do the legwork' >> of resolving useful things like CNAMEs. Given that upstream resolver >> behavior is not always under the control of the HAProxy administrator, >> we should not use the ANY query type. Rather, we should use A or >> according to either the explicit preferences of the operator, or the >> implicit default (/IPv6). >> >> - Andrew Hayworth >> >> From 8ed172424cbd79197aacacd1fd89ddcfa46e213d Mon Sep 17 00:00:00 2001 >> From: Andrew Hayworth >> Date: Mon, 19 Oct 2015 22:29:51 + >> Subject: [PATCH] MEDIUM: dns: Don't use the ANY query type >> >> Basically, it's ill-defined and shouldn't really be used going forward. >> We can't guarantee that resolvers will do the 'legwork' for us and >> actually resolve CNAMES when we request the ANY query-type. Case in point >> (obfuscated, clearly): >> >> PRODUCTION! ahaywo...@secret-hostname.com:~$ >> dig @10.11.12.53 ANY api.somestartup.io >> >> ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io >> ; (1 server found) >> ;; global options: +cmd >> ;; Got answer: >> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 >> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 >> >> ;; QUESTION SECTION: >> ;api.somestartup.io.IN ANY >> >> ;; ANSWER SECTION: >> api.somestartup.io. 20 IN CNAME >> api-somestartup-production.ap-southeast-2.elb.amazonaws.com. >> >> ;; AUTHORITY SECTION: >> somestartup.io. 166687 IN NS >> ns-1254.awsdns-28.org. >> somestartup.io. 166687 IN NS >> ns-1884.awsdns-43.co.uk. >> somestartup.io. 166687 IN NS ns-440.awsdns-55.com. >> somestartup.io. 166687 IN NS ns-577.awsdns-08.net. >> >> ;; Query time: 1 msec >> ;; SERVER: 10.11.12.53#53(10.11.12.53) >> ;; WHEN: Mon Oct 19 22:02:29 2015 >> ;; MSG SIZE rcvd: 242 >> >> HAProxy can't handle that response correctly. >> >> Rather than try to build in support for resolving CNAMEs presented >> without an A record in an answer section (which may be a valid >> improvement further on), this change just skips ANY record types >> altogether. A and are much more well-defined and predictable. >> >> Notably, this commit preserves the implicit "Prefer IPV6 behavior." >> --- >> include/types/dns.h | 3 ++- >> src/checks.c| 6 +- >> src/dns.c | 6 +- >> src/server.c| 18 +++--- >> 4 files changed, 19 insertions(+), 14 deletions(-) >> >> diff --git a/include/types/dns.h b/include/types/dns.h >> index f8edb73..ea1a9f9 100644 >> --- a/include/types/dns.h >> +++ b/include/types/dns.h >> @@ -161,7 +161,8 @@ struct dns_resolution { >> unsigned int last_status_change; /* time of the latest DNS >> resolution status change */ >> int query_id; /* DNS query ID dedicated for this resolution */ >> struct eb32_node qid; /* ebtree query id */ >> - int query_type; /* query type to send. By default DNS_RTYPE_ANY */ >> + int query_type; >> + /* query type to send. By default DNS_RTYPE_A or DNS_RTYPE_ >> depending on resolver_family_priority */ >> int status; /* status of the resolution being processed RSLV_STATUS_* */ >> int step; /* */ >> int try; /* current resolution try */ >> diff --git a/src/checks.c b/src/checks.c >> index ade2428..d3cd567 100644 >> --- a/src/checks.c >> +++ b/src/checks.c >> @@ -2214,7 +2214,11 @@ int trigger_resolution(struct server *s) >> resolution->query_id = query_id; >> resolution->qid.key = query_id; >> resolution->step = RSLV_STEP_RUNNING; >> - resolution->query_type = DNS_RTYPE_ANY; >> + if (resolution->resolver_family_priority == AF_INET) { >> + resolution->query_type = DNS_RTYPE_A; >> + } else { >> + resolution->query_type = DNS_RTYPE_; >> + } >> resolution->try = resolvers->resolve_retries; >> resolution->try_cname = 0; >> resolution->nb_responses = 0; >> diff --git a/src/dns.c b/src/dns.c >> index 7f71ac7..53b65ab 100644 >> --- a/src/dns.c >> +++ b/src/dns.c >> @@ -102,7
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
On Tue, Oct 20, 2015 at 10:07:16PM +0200, Baptiste wrote: > Hi Andrew, > > I've updated your patch quickly so Willy can integrate it. > I've also updated the commit message to follow Lukas recommendations. Thanks Baptiste, I've merged it and backported it to 1.6. I'm tempted to issue 1.6.1 right now as we have a number of fixes pending already. Susheel's remaining issue is quite strange and I'm actually not convinced we'll get rid of it that quickly. Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
On Tue, Oct 20, 2015 at 10:20:50PM +0200, Baptiste wrote: > On Tue, Oct 20, 2015 at 9:09 PM, Lukas Tribuswrote: > >> I don't know. I'm always only focused on the combination of user-visible > >> changes and risks of bugs (which are user-visible changes btw). So if we > >> can do it without breaking too much code, then it can be backported. What > >> we have now is something which is apparently insufficient to some users > >> so we can improve the situation. I wouldn't want to remove prefer-* or > >> change the options behavior or whatever for example. > > > > Ok, if we don't remove existing prefer-* keywords a 1.6 backport sounds > > possible without user visible breakage, great. > > > > lukas > > Ok, just to make it clear, let me write a few conf examples: > - server home-v4 home-v4.mydomain check resolve-prefer ipv4 > => A then (failover on NX) > - server home-v4 home-v4.mydomain check v4only > => A only (stop on NX) > > If both 'resolve-prefer ipv[46]' and 'v[46]only' are set, whatever > combination, then, v[46]only applies, but configuration parsing may > return a warning. Yes, but please avoid the warning, it makes it unconvenient to edit configs. You may for example have "resolve-prefer ipv4" in the default-server directive, and having it warn because one of your servers has v4only is annoying. BTW, the v4only and resolve-prefer should also be used during the initial resolving phase performed by getaddrinfo() but that's for a future patch :-) Willy
Re: 1.6.0 Error: Cannot Create Listening Socket for Frontend and Stats,Proxies
Hi Susheel, On Wed, Oct 21, 2015 at 01:29:33AM +0530, Susheel Jalali wrote: > Dear Willy and Lukas, > > Thank you for your guidance. Upon implementing your insights, here is > the summed up result: > > (1) With Willy?s patch HAProxy starts OK thanks for confirming this. > (2) But we have to remove listen haproxystats block, as it still cannot > create listening socket for this listen proxy. This is *really* strange. > Detailed information you have requested is below without and with > Willy?s patch. Your further guidance to fix the item (2) above would be > appreciated. Thank you. > > --- > > @Willy: HAProxy is running on two of our servers that have CentOS 7.1 > (Server1) and CentOS 7 (Server2), which support namespace. Checked > with: ip netns add namespace1 and then ip netns list -->This listed > namespace1 > > > WITHOUT Willy?s patch > > @Lukas and Willy > We installed HAPRoxy 1.5.14 and 1.6.0 on both Server1 and Server2. > 1.5.14 starts and functions normally on both. > 1.6.0 fails after start without creating pid. > > ATTACHED are strace output from running the following on the same server > with the same configuration (including port numbers): > HAProxy 1.5.14 > HAProxy 1.6.0 Could you please also do the same on the patched version ? The unpatched one clearly shows the bug I fixed (ie: the socket syscall is not even called). But with the patch we must see it and I have no idea why it could fail at all, since it does exactly the same thing as the original socket() call did. Do you have any option on the "bind" line of the haproxystats listener ? Thanks, Willy
[ANNOUNCE] haproxy-1.6.1
Hi all, we've got rid of all the reported bugs since 1.6.0 so it's the right timing for a new release so that those who got burnt by these bugs can play with fire again... just kidding, it should be much better now. The changelog is very small, which is a very good thing for one week after a dot-zero release, really! In 1.5, we fixed 7 bugs in 5 days, here it's 3 in 7 days. The most impacting bugs were the segfault when 2 crts were on the same bind line, and the bug with namespaces preventing from binding if no namespace was declared at all. The rest concerns DNS adjustments to better query servers and avoid the ANY query type, and a few build fixes. We still have Susheel's report under investigation, as nothing obvious could cause his isolated binding error, but we don't yet have all the elements to evaluate it. It could also be a side effect of an unclean rebuild or something like this, and I couldn't manage to reproduce it. The full changelog for 1.6.1 is here : - DOC: specify that stats socket doc (section 9.2) is in management - BUILD: install only relevant and existing documentation - CLEANUP: don't ignore debian/ directory if present - BUG/MINOR: dns: parsing error of some DNS response - BUG/MEDIUM: namespaces: don't fail if no namespace is used - BUG/MAJOR: ssl: free the generated SSL_CTX if the LRU cache is disable - MEDIUM: dns: Don't use the ANY query type Usual URLs below : Site index : http://www.haproxy.org/ Sources : http://www.haproxy.org/download/1.6/src/ Git repository : http://git.haproxy.org/git/haproxy-1.6.git/ Git Web browsing : http://git.haproxy.org/?p=haproxy-1.6.git Changelog: http://www.haproxy.org/download/1.6/src/CHANGELOG Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/ Continue to deeply test and to carefully deploy, observe and enjoy. Willy
Re: [PATCH] MEDIUM: dns: Don't use the ANY query type
Oh wonderful - something's come up that would have blocked me from working on this until next week, so thank you very much for updating it for me! On Tue, Oct 20, 2015 at 3:07 PM, Baptistewrote: > Hi Andrew, > > I've updated your patch quickly so Willy can integrate it. > I've also updated the commit message to follow Lukas recommendations. > > Baptiste > > On Tue, Oct 20, 2015 at 2:26 PM, Baptiste wrote: >> Hi Andrew, >> >> There is a bug repeated twice in your code. >> In both dns_reset_resolution() and trigger_resolution(), you use >> "resolution->resolver_family_priority" before it is positioned. This >> may lead to using the last resolution->resolver_family_priority, which >> may be different than the server one. >> Please move the line "resolution->resolver_family_priority = >> s->resolver_family_priority;" before using the value stored in it. >> >> Appart this, it looks good. >> >> Baptiste >> >> >> On Tue, Oct 20, 2015 at 12:39 AM, Andrew Hayworth >> wrote: >>> The ANY query type is weird, and some resolvers don't 'do the legwork' >>> of resolving useful things like CNAMEs. Given that upstream resolver >>> behavior is not always under the control of the HAProxy administrator, >>> we should not use the ANY query type. Rather, we should use A or >>> according to either the explicit preferences of the operator, or the >>> implicit default (/IPv6). >>> >>> - Andrew Hayworth >>> >>> From 8ed172424cbd79197aacacd1fd89ddcfa46e213d Mon Sep 17 00:00:00 2001 >>> From: Andrew Hayworth >>> Date: Mon, 19 Oct 2015 22:29:51 + >>> Subject: [PATCH] MEDIUM: dns: Don't use the ANY query type >>> >>> Basically, it's ill-defined and shouldn't really be used going forward. >>> We can't guarantee that resolvers will do the 'legwork' for us and >>> actually resolve CNAMES when we request the ANY query-type. Case in point >>> (obfuscated, clearly): >>> >>> PRODUCTION! ahaywo...@secret-hostname.com:~$ >>> dig @10.11.12.53 ANY api.somestartup.io >>> >>> ; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> @10.11.12.53 ANY api.somestartup.io >>> ; (1 server found) >>> ;; global options: +cmd >>> ;; Got answer: >>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62454 >>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 0 >>> >>> ;; QUESTION SECTION: >>> ;api.somestartup.io.IN ANY >>> >>> ;; ANSWER SECTION: >>> api.somestartup.io. 20 IN CNAME >>> api-somestartup-production.ap-southeast-2.elb.amazonaws.com. >>> >>> ;; AUTHORITY SECTION: >>> somestartup.io. 166687 IN NS >>> ns-1254.awsdns-28.org. >>> somestartup.io. 166687 IN NS >>> ns-1884.awsdns-43.co.uk. >>> somestartup.io. 166687 IN NS >>> ns-440.awsdns-55.com. >>> somestartup.io. 166687 IN NS >>> ns-577.awsdns-08.net. >>> >>> ;; Query time: 1 msec >>> ;; SERVER: 10.11.12.53#53(10.11.12.53) >>> ;; WHEN: Mon Oct 19 22:02:29 2015 >>> ;; MSG SIZE rcvd: 242 >>> >>> HAProxy can't handle that response correctly. >>> >>> Rather than try to build in support for resolving CNAMEs presented >>> without an A record in an answer section (which may be a valid >>> improvement further on), this change just skips ANY record types >>> altogether. A and are much more well-defined and predictable. >>> >>> Notably, this commit preserves the implicit "Prefer IPV6 behavior." >>> --- >>> include/types/dns.h | 3 ++- >>> src/checks.c| 6 +- >>> src/dns.c | 6 +- >>> src/server.c| 18 +++--- >>> 4 files changed, 19 insertions(+), 14 deletions(-) >>> >>> diff --git a/include/types/dns.h b/include/types/dns.h >>> index f8edb73..ea1a9f9 100644 >>> --- a/include/types/dns.h >>> +++ b/include/types/dns.h >>> @@ -161,7 +161,8 @@ struct dns_resolution { >>> unsigned int last_status_change; /* time of the latest DNS >>> resolution status change */ >>> int query_id; /* DNS query ID dedicated for this resolution */ >>> struct eb32_node qid; /* ebtree query id */ >>> - int query_type; /* query type to send. By default DNS_RTYPE_ANY */ >>> + int query_type; >>> + /* query type to send. By default DNS_RTYPE_A or DNS_RTYPE_ >>> depending on resolver_family_priority */ >>> int status; /* status of the resolution being processed RSLV_STATUS_* */ >>> int step; /* */ >>> int try; /* current resolution try */ >>> diff --git a/src/checks.c b/src/checks.c >>> index ade2428..d3cd567 100644 >>> --- a/src/checks.c >>> +++ b/src/checks.c >>> @@ -2214,7 +2214,11 @@ int trigger_resolution(struct server *s) >>> resolution->query_id = query_id; >>> resolution->qid.key = query_id; >>> resolution->step = RSLV_STEP_RUNNING; >>> - resolution->query_type = DNS_RTYPE_ANY; >>> + if (resolution->resolver_family_priority == AF_INET) { >>> +
Re: haproxy 1.6.0 crashes
On Tue, Oct 20, 2015 at 03:00:42PM +0200, Christopher Faulet wrote: > Le 20/10/2015 14:41, Willy Tarreau a écrit : > >On Tue, Oct 20, 2015 at 02:14:37PM +0200, Christopher Faulet wrote: > >>Le 20/10/2015 14:07, Willy Tarreau a écrit : > >>>On Tue, Oct 20, 2015 at 01:59:52PM +0200, Willy Tarreau wrote: > Then my understanding is that we should instead proceed differently : > - the cert is generated. It gets a refcount = 1. > - we assign it to the SSL. Its refcount becomes two. > - we try to insert it into the tree. The tree will handle its freeing > using SSL_CTX_free() during eviction. > - if we can't insert into the tree because the tree is disabled, then > we have to call SSL_CTX_free() ourselves, then we'd rather do it > immediately. It will more closely mimmick the case where the cert > is added to the tree and immediately evicted by concurrent activity > on the cache. > - we never have to call SSL_CTX_free() during ssl_sock_close() > because > the SSL session only relies on openssl doing the right thing based > on > the refcount only. > - thus we never need to know how the cert was created since the > SSL_CTX_free() is either guaranteed or already done for generated > certs, and this protects other ones against any accidental call to > SSL_CTX_free() without having to track where the cert comes from. > >>> > >>>This patch does this, and based on my understanding of your explanations, > >>>it should do the right thing and be safe all the time. What's your > >>>opinion > >>>? > >>> > >> > >>Yes, it should work and it avoids keeping extra info on generated > >>certificates. Good idea ! > > > >Thanks. Do you have a easy reproducer for the issue with the certs ? > >I tried a little bit but probably didn't test the proper sequence. > > > > Of course. Here is a little config file: > > > global > tune.ssl.default-dh-param 2048 > daemon > > listen ssl_server > mode tcp > bind 127.0.0.1:4443 ssl crt srv1.test.com.pem crt srv2.test.com.pem > > timeout connect 5000 > timeout client 3 > timeout server 3 > > server srv A.B.C.D:80 > > > > You just need to generate 2 SSL certificates with 2 CN (here > srv1.test.com and srv2.test.com). > > Then, by doing SSL requests with the first CN, there is no problem. But > with the second CN, it should segfault on the 2nd request. > > openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK > openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK > > But, > > openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // OK > openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // KO Marvellous, thank you :-) Willy
Re: haproxy 1.6.0 crashes
So I can confirm with your reproducer that it's OK now. I've merged the proposed fix with copies of your long detailed analysis. Thanks for being so patient to explain me :-) We'll have to wait for the last pending DNS fixes and I'll emit 1.6.1 so that we get rid of these annoying early bugs. See you tomorrow, Willy
Re: haproxy 1.6.0 crashes
Le 20/10/2015 14:41, Willy Tarreau a écrit : On Tue, Oct 20, 2015 at 02:14:37PM +0200, Christopher Faulet wrote: Le 20/10/2015 14:07, Willy Tarreau a écrit : On Tue, Oct 20, 2015 at 01:59:52PM +0200, Willy Tarreau wrote: Then my understanding is that we should instead proceed differently : - the cert is generated. It gets a refcount = 1. - we assign it to the SSL. Its refcount becomes two. - we try to insert it into the tree. The tree will handle its freeing using SSL_CTX_free() during eviction. - if we can't insert into the tree because the tree is disabled, then we have to call SSL_CTX_free() ourselves, then we'd rather do it immediately. It will more closely mimmick the case where the cert is added to the tree and immediately evicted by concurrent activity on the cache. - we never have to call SSL_CTX_free() during ssl_sock_close() because the SSL session only relies on openssl doing the right thing based on the refcount only. - thus we never need to know how the cert was created since the SSL_CTX_free() is either guaranteed or already done for generated certs, and this protects other ones against any accidental call to SSL_CTX_free() without having to track where the cert comes from. This patch does this, and based on my understanding of your explanations, it should do the right thing and be safe all the time. What's your opinion ? Yes, it should work and it avoids keeping extra info on generated certificates. Good idea ! Thanks. Do you have a easy reproducer for the issue with the certs ? I tried a little bit but probably didn't test the proper sequence. Of course. Here is a little config file: global tune.ssl.default-dh-param 2048 daemon listen ssl_server mode tcp bind 127.0.0.1:4443 ssl crt srv1.test.com.pem crt srv2.test.com.pem timeout connect 5000 timeout client 3 timeout server 3 server srv A.B.C.D:80 You just need to generate 2 SSL certificates with 2 CN (here srv1.test.com and srv2.test.com). Then, by doing SSL requests with the first CN, there is no problem. But with the second CN, it should segfault on the 2nd request. openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK openssl s_client -connect 127.0.0.1:4443 -servername srv1.test.com // OK But, openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // OK openssl s_client -connect 127.0.0.1:4443 -servername srv2.test.com // KO -- Christopher Faulet