Re: Observations about reloads and DNS SRV records

2018-08-21 Thread francis Lavalliere
This is related to the discourse thread (And also discuss the issues in ML
instead of discourse).

https://discourse.haproxy.org/t/config-reload-with-dynamic-service-discovery-via-dns/2625/10


Here are the findings from my ends:



1 - State file / Port range invalid conversion

In the haproxy state file the Port is written as an unsigned long value..

ie: Port 32777 is actually written 4294934537

112 defaultback_failsaife 1 varnish1 10.100.20.78 2 0 1 1 203910 15 3 4 6 0
0 0 ip-10-100-20-78.node.aws-us-east-1.consul 4294934537 _tcp_.varnish
.service.consul


To allow the seamless reload, I had to comment the following lines in the
src/server.c


if (port > USHRT_MAX) {
  chunk_appendf(msg, “, invalid srv_port value ‘%s’”, port_str);
  port_str = NULL;
}


2 - Backend responds with 503 only after reload

In order to make it functional,

in the patch  i had to comment the following lines :

/*
// prepare DNS resolution for this server (but aint this has already been
done by the server-template function?)
res = srv_prepare_for_resolution(srv, fqdn);
if (res == -1) {
ha_alert(“could not allocate memory for DNS REsolution for server …
‘%s’\n”, srv->id);
chunk_appendf(msg, “, can’t allocate memory for DNS resolution for server
‘%s’”, srv->id);
HA_SPIN_UNLOCK(SERVER_LOCK, >lock);
goto out;
}
*/


Re: Observations about reloads and DNS SRV records

2018-07-04 Thread Tait Clarridge
Hey Baptiste,

I’ll try it out next week when I get back (currently on vacation) and let
you know.

Thanks!
Tait
On Tue., Jul. 3, 2018 at 06:24 Baptiste  wrote:

> Hi,
>
> Actually, the problem was deeper than my first thought.
> In its current state, statefile and SRV records are simply not compatible.
> I had to add a new field in the state file format to add support to this.
>
> Could you please confirm the patch attached fixes your issues?
>
> Baptiste
>
>
>
> On Mon, Jun 25, 2018 at 11:48 AM, Baptiste  wrote:
>
>> Hi,
>>
>> Forget the backend id, it's the wrong answer to that problem.
>> I was investigating an other potential issue, but this does not fix the
>> original problem reported here.
>>
>> Here is the answer I delivered today on discourse, where other people
>> have also reported the same issue:
>>
>>Just to let you know that I think I found the cause of the issue but I
>> don’t have a fix yet.
>>I’ll come back to you this week with more info and hopefully a fix.
>>The issue seem to be in srv_init_addr(), because srv->hostname is not
>> set (null).
>>
>> Baptiste
>>
>>
>>
>


Re: Observations about reloads and DNS SRV records

2018-07-03 Thread Baptiste
Hi,

Actually, the problem was deeper than my first thought.
In its current state, statefile and SRV records are simply not compatible.
I had to add a new field in the state file format to add support to this.

Could you please confirm the patch attached fixes your issues?

Baptiste



On Mon, Jun 25, 2018 at 11:48 AM, Baptiste  wrote:

> Hi,
>
> Forget the backend id, it's the wrong answer to that problem.
> I was investigating an other potential issue, but this does not fix the
> original problem reported here.
>
> Here is the answer I delivered today on discourse, where other people have
> also reported the same issue:
>
>Just to let you know that I think I found the cause of the issue but I
> don’t have a fix yet.
>I’ll come back to you this week with more info and hopefully a fix.
>The issue seem to be in srv_init_addr(), because srv->hostname is not
> set (null).
>
> Baptiste
>
>
>
From 6899b19b9686b6dadc65b89adfb32c8792004663 Mon Sep 17 00:00:00 2001
From: Baptiste Assmann 
Date: Mon, 2 Jul 2018 17:00:54 +0200
Subject: [PATCH 1/2] BUG/MEDIUM: dns: fix incomatibility between SRV
 resolution and server state file

Server state file has no indication that a server is currently managed
by a DNS SRV resolution.
And thus, both feature (DNS SRV resolution and server state), when used
together, does not provide the expected behavior: a smooth experience...

This patch introduce the "SRV record name" in the server state file and
loads and applies it if found and wherever required.
---
 include/types/server.h |  7 ---
 src/proxy.c| 10 --
 src/server.c   | 45 +
 3 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/include/types/server.h b/include/types/server.h
index 0cd20c0..29b88f8 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -126,10 +126,11 @@ enum srv_initaddr {
 "bk_f_forced_id " \
 "srv_f_forced_id "\
 "srv_fqdn "   \
-"srv_port"
+"srv_port"\
+"srvrecord"
 
-#define SRV_STATE_FILE_MAX_FIELDS 19
-#define SRV_STATE_FILE_NB_FIELDS_VERSION_1 18
+#define SRV_STATE_FILE_MAX_FIELDS 20
+#define SRV_STATE_FILE_NB_FIELDS_VERSION_1 19
 #define SRV_STATE_LINE_MAXLEN 512
 
 /* server flags -- 32 bits */
diff --git a/src/proxy.c b/src/proxy.c
index c262966..c1c41ba 100644
--- a/src/proxy.c
+++ b/src/proxy.c
@@ -1429,6 +1429,7 @@ static int dump_servers_state(struct stream_interface *si, struct chunk *buf)
 	char srv_addr[INET6_ADDRSTRLEN + 1];
 	time_t srv_time_since_last_change;
 	int bk_f_forced_id, srv_f_forced_id;
+	char *srvrecord;
 
 	/* we don't want to report any state if the backend is not enabled on this process */
 	if (px->bind_proc && !(px->bind_proc & pid_bit))
@@ -1458,18 +1459,23 @@ static int dump_servers_state(struct stream_interface *si, struct chunk *buf)
 		bk_f_forced_id = px->options & PR_O_FORCED_ID ? 1 : 0;
 		srv_f_forced_id = srv->flags & SRV_F_FORCED_ID ? 1 : 0;
 
+		srvrecord = NULL;
+		if (srv->srvrq && srv->srvrq->name)
+			srvrecord = srv->srvrq->name;
+
 		chunk_appendf(buf,
 "%d %s "
 "%d %s %s "
 "%d %d %d %d %ld "
 "%d %d %d %d %d "
-"%d %d %s %u"
+"%d %d %s %u %s"
 "\n",
 px->uuid, px->id,
 srv->puid, srv->id, srv_addr,
 srv->cur_state, srv->cur_admin, srv->uweight, srv->iweight, (long int)srv_time_since_last_change,
 srv->check.status, srv->check.result, srv->check.health, srv->check.state, srv->agent.state,
-bk_f_forced_id, srv_f_forced_id, srv->hostname ? srv->hostname : "-", srv->svc_port);
+bk_f_forced_id, srv_f_forced_id, srv->hostname ? srv->hostname : "-", srv->svc_port,
+srvrecord ? srvrecord : "-");
 		if (ci_putchk(si_ic(si), ) == -1) {
 			si_applet_cant_put(si);
 			return 0;
diff --git a/src/server.c b/src/server.c
index 277d140..cb13793 100644
--- a/src/server.c
+++ b/src/server.c
@@ -2678,6 +2678,7 @@ static void srv_update_state(struct server *srv, int version, char **params)
 	const char *fqdn;
 	const char *port_str;
 	unsigned int port;
+	char *srvrecord;
 
 	fqdn = NULL;
 	port = 0;
@@ -2701,6 +2702,7 @@ static void srv_update_state(struct server *srv, int version, char **params)
 			 * srv_f_forced_id:  params[12]
 			 * srv_fqdn: params[13]
 			 * srv_port: params[14]
+			 * srvrecord:params[15]
 			 */
 
 			/* validating srv_op_state */
@@ -2833,6 +2835,13 @@ static void srv_update_state(struct server *srv, int version, char **params)
 }
 			}
 
+			/* SRV record
+			 * NOTE: in HAProxy, SRV records must start with an underscore '_'
+			 */
+			srvrecord = params[15];
+			if (srvrecord && *srvrecord != '_')
+srvrecord = NULL;
+
 			/* don't apply anything if one error has been detected */
 			if (msg->len)
 goto out;
@@ -2965,6 +2974,41 @@ static void srv_update_state(struct server *srv, int version, char **params)
 	}
 }
 			}
+			/* If 

Re: Observations about reloads and DNS SRV records

2018-06-25 Thread Baptiste
Hi,

Forget the backend id, it's the wrong answer to that problem.
I was investigating an other potential issue, but this does not fix the
original problem reported here.

Here is the answer I delivered today on discourse, where other people have
also reported the same issue:

   Just to let you know that I think I found the cause of the issue but I
don’t have a fix yet.
   I’ll come back to you this week with more info and hopefully a fix.
   The issue seem to be in srv_init_addr(), because srv->hostname is not
set (null).

Baptiste


Re: Observations about reloads and DNS SRV records

2018-06-22 Thread Baptiste
Hi Tait,

There is a first thing you can do to prevent this from happening, but this
might be complicated to implement.
If you force a backend id in the configuration file, the problem won't
exist anymore...

  backend be
server-template srv 5 _http._tcp.srv.tld resolvers dnsserver
resolve-prefer ipv4 check
id 5

  backend be2
server-template srv 5 _http._tcp.srv.tld resolvers dnsserver
resolve-prefer ipv4 check
id 4

When I change the order of those 2 backends, my config reloads properly.

Note that I'm currently investigating a way to make this happen without
enforcing the backend id.

Baptiste


Re: Observations about reloads and DNS SRV records

2018-06-10 Thread Tait Clarridge
On Sun, Jun 10, 2018 at 3:22 AM Baptiste  wrote:

>
> Thanks for giving me this idea!
> I wrote a quick and inflexible one here:
> https://github.com/bedis/dnsserver
> So feel free to contribute to it or write your own :)
>
>
I'm going to use it to troubleshoot the issue you reported. That said,
> nothing is better than other real DNS servers (bind / unbound / powerdns
> and others) for real production.
>

Looks good! My application is using the same DNS library (miekg/dns) which
seems to be the most well-rounded and mature for go. If I think of anything
to add I'll open a PR but it is currently exactly how I am returning my
results.


>
> Baptiste
>


Re: Observations about reloads and DNS SRV records

2018-06-10 Thread Tait Clarridge
Hey Baptiste,

On Sun, Jun 10, 2018 at 3:19 AM Baptiste  wrote:

>
> ==> while writing this mail, I am able to reproduce the issue I think:
> - start HAProxy with SRV records
> - dump servers state
> - update haproxy conf to prevent dns resolution at runtime
> - reload haproxy
> ==> my servers are now "unconfigured"... (no IP, no FQDN, nothing from the
> servers state file)
>
>

That is exactly the issue I was seeing, my apologies for not describing the
actual issue I was seeing when I opened this thread.


>>
> 2. Additional record responses from the nameserver are not parsed
>> - This just means that any servers that are populated from the SRV
>> records require a second round of querying for each of the hosts after the
>> fqdn is stored. It might be more efficient if these records are also parsed
>> but I can see that it might be pretty challenging in the current DNS
>> resolver
>> - Only reason I thought of this was to try and reduce up the time it
>> takes to populate the backend servers with addresses in an effort to lessen
>> the effects of #1
>>
>>
> I'll work on this one as soon as I fixed the bug above/
>

Great! The first one will go a long way in making reloads pretty seamless
for DNS, at least for my use case :).

Thanks again,
Tait


Re: Observations about reloads and DNS SRV records

2018-06-10 Thread Baptiste
>
> I'm a little swamped with other work at the moment, but when I get a
> chance I would be able to provide a DNS server (written in Go) that returns
> additional records to test with if that helps.
>

Thanks for giving me this idea!
I wrote a quick and inflexible one here:  https://github.com/bedis/dnsserver
So feel free to contribute to it or write your own :)

I'm going to use it to troubleshoot the issue you reported. That said,
nothing is better than other real DNS servers (bind / unbound / powerdns
and others) for real production.

Baptiste


Re: Observations about reloads and DNS SRV records

2018-06-10 Thread Baptiste
Hi Tait,

1. Reloading with SRV records ignores server-state-file
>

Can you tell me more about this one. How do you see that?
I mean, I have a conf similar to yours and I can see HAProxy parsing the
server state file (and returning a bunch of warning I'm working on about
backend name and ID mismatch).

==> while writing this mail, I am able to reproduce the issue I think:
- start HAProxy with SRV records
- dump servers state
- update haproxy conf to prevent dns resolution at runtime
- reload haproxy
==> my servers are now "unconfigured"... (no IP, no FQDN, nothing from the
servers state file)


- While this is not a huge deal, it does mean that the backend in
> question becomes unavailable when the proxy is reloaded until the SRV and
> subsequent A records are resolved
>

Well, this is the role of the server state file: populating server info at
configuration parsing time so you don't have to wait until first DNS
resolution to get servers' information.
==> this might be caused by the bug found above.


2. Additional record responses from the nameserver are not parsed
> - This just means that any servers that are populated from the SRV
> records require a second round of querying for each of the hosts after the
> fqdn is stored. It might be more efficient if these records are also parsed
> but I can see that it might be pretty challenging in the current DNS
> resolver
> - Only reason I thought of this was to try and reduce up the time it
> takes to populate the backend servers with addresses in an effort to lessen
> the effects of #1
>
>
I'll work on this one as soon as I fixed the bug above/

Baptiste


Re: Observations about reloads and DNS SRV records

2018-06-07 Thread Tait Clarridge
Hi Baptiste, thanks for the response.

On Wed, Jun 6, 2018 at 6:32 PM Baptiste  wrote:

>
> This should not happen and it's a known issue that we're working on.
>
>
Excellent, figured you guys were probably already aware of it. Let me know
if I can assist in testing.


>
> Actually, I tested many DNS server and some of them simply did not send
> the additional records when they could not fit in the response (too small
> payload for the number of SRV records).
> Technically, we could try to use additional records if available and then
> failover to current way of working if none found.
>
>
True, not a lot do and I don't have a lot of hosts. I assumed it already
parsed them based on reading
https://www.haproxy.com/documentation/aloha/9-5/traffic-management/lb-layer7/dns-srv-records/
and
took that into account when writing my DNS service.

I'm a little swamped with other work at the moment, but when I get a chance
I would be able to provide a DNS server (written in Go) that returns
additional records to test with if that helps.

>
>
>> I'm happy with the workaround I'll be pursing for now where my SD service
>> (that originally was going to be a resolver and populate via SRV records)
>> is going to write all the backend definitions to disk so this is not a
>> pressing issue, just thought I'd share the limitations I discovered. My
>> knowledge of C (and the internal workings of HAproxy) is not great
>> otherwise this would probably be a patch submission for #1 :)
>>
>> Tait
>>
>>
> I'll check that for you. (In the mean time, please keep on answering to
> Aleksandar emails, the more info I'll have, the best).
>
> Baptiste
>

Thanks again, I don't have a lot of time to do any testing right now but
hope to soon.


Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Baptiste
Hi Tait

A few comments inline:

1. Reloading with SRV records ignores server-state-file
> - While this is not a huge deal, it does mean that the backend in
> question becomes unavailable when the proxy is reloaded until the SRV and
> subsequent A records are resolved
>
- I understand that the point of these records is to dynamically build
> the backend, and populating servers from outdated and potentially invalid
> information is not ideal, but it might be nice to seed the backend before
> the first SRV query returns
>

This should not happen and it's a known issue that we're working on.



> - Below is the related config since it is very likely that I have
> misconfigured something :)
>
> globals
> ...
> server-state-base /var/lib/haproxy
> server-state-file state
>
> defaults
> ...
> load-server-state-from-file global
> default-server init-addr last,libc,none
>
> ...
>
> resolvers sd
> nameserver sd 127.0.0.1:
>
> backend dynamic
> server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
> resolve-prefer ipv4 check
>
> 2. Additional record responses from the nameserver are not parsed
>

This is true, by design.


> - This just means that any servers that are populated from the SRV
> records require a second round of querying for each of the hosts after the
> fqdn is stored. It might be more efficient if these records are also parsed
> but I can see that it might be pretty challenging in the current DNS
> resolver
> - Only reason I thought of this was to try and reduce up the time it
> takes to populate the backend servers with addresses in an effort to lessen
> the effects of #1
>

Actually, I tested many DNS server and some of them simply did not send the
additional records when they could not fit in the response (too small
payload for the number of SRV records).
Technically, we could try to use additional records if available and then
failover to current way of working if none found.



> I'm happy with the workaround I'll be pursing for now where my SD service
> (that originally was going to be a resolver and populate via SRV records)
> is going to write all the backend definitions to disk so this is not a
> pressing issue, just thought I'd share the limitations I discovered. My
> knowledge of C (and the internal workings of HAproxy) is not great
> otherwise this would probably be a patch submission for #1 :)
>
> Tait
>
>
I'll check that for you. (In the mean time, please keep on answering to
Aleksandar emails, the more info I'll have, the best).

Baptiste


Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Tait Clarridge
On Wed, Jun 6, 2018 at 11:35 AM Aleksandar Lazic  wrote:

> Hi Tait.
>
> On 06/06/2018 11:16, Tait Clarridge wrote:
> >I've been testing DNS service discovery and the use of SRV records and
> have
> >a few thoughts on a couple things that I noticed.
>
> In this area was a lot of changes in the last version of haproxy, do you
> have
> tested the setup with the latest haproxy 1.8 release?
>
> Please can you be so kind and send us the output of
>
> haproxy -vv
>
>
Hey Aleksandar,

I'm on 1.8.8 so I will try to give it a shot with 1.8.9 later this
afternoon. I've added the output below.

HA-Proxy version 1.8.8 2018/04/19
Copyright 2000-2018 Willy Tarreau 

Build options :
  TARGET  = linux2628
  CPU = generic
  CC  = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-format-truncation -Wno-null-dereference
-Wno-unused-label
  OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1
USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.0h-fips  27 Mar 2018
Running on OpenSSL version : OpenSSL 1.1.0g-fips  2 Nov 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT
IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.41 2017-07-05
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"),
deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
[SPOE] spoe
[COMP] compression
[TRACE] trace

Thanks,
Tait


Re: Observations about reloads and DNS SRV records

2018-06-06 Thread Aleksandar Lazic

Hi Tait.

On 06/06/2018 11:16, Tait Clarridge wrote:

I've been testing DNS service discovery and the use of SRV records and have
a few thoughts on a couple things that I noticed.


In this area was a lot of changes in the last version of haproxy, do you have
tested the setup with the latest haproxy 1.8 release?

Please can you be so kind and send us the output of

haproxy -vv


1. Reloading with SRV records ignores server-state-file
   - While this is not a huge deal, it does mean that the backend in
question becomes unavailable when the proxy is reloaded until the SRV and
subsequent A records are resolved
   - I understand that the point of these records is to dynamically build
the backend, and populating servers from outdated and potentially invalid
information is not ideal, but it might be nice to seed the backend before
the first SRV query returns
   - Below is the related config since it is very likely that I have
misconfigured something :)

globals
   ...
   server-state-base /var/lib/haproxy
   server-state-file state

defaults
   ...
   load-server-state-from-file global
   default-server init-addr last,libc,none

...

resolvers sd
   nameserver sd 127.0.0.1:

backend dynamic
   server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
resolve-prefer ipv4 check

2. Additional record responses from the nameserver are not parsed
   - This just means that any servers that are populated from the SRV
records require a second round of querying for each of the hosts after the
fqdn is stored. It might be more efficient if these records are also parsed
but I can see that it might be pretty challenging in the current DNS
resolver
   - Only reason I thought of this was to try and reduce up the time it
takes to populate the backend servers with addresses in an effort to lessen
the effects of #1

I'm happy with the workaround I'll be pursing for now where my SD service
(that originally was going to be a resolver and populate via SRV records)
is going to write all the backend definitions to disk so this is not a
pressing issue, just thought I'd share the limitations I discovered. My
knowledge of C (and the internal workings of HAproxy) is not great
otherwise this would probably be a patch submission for #1 :)

Tait




Observations about reloads and DNS SRV records

2018-06-06 Thread Tait Clarridge
I've been testing DNS service discovery and the use of SRV records and have
a few thoughts on a couple things that I noticed.

1. Reloading with SRV records ignores server-state-file
- While this is not a huge deal, it does mean that the backend in
question becomes unavailable when the proxy is reloaded until the SRV and
subsequent A records are resolved
- I understand that the point of these records is to dynamically build
the backend, and populating servers from outdated and potentially invalid
information is not ideal, but it might be nice to seed the backend before
the first SRV query returns
- Below is the related config since it is very likely that I have
misconfigured something :)

globals
...
server-state-base /var/lib/haproxy
server-state-file state

defaults
...
load-server-state-from-file global
default-server init-addr last,libc,none

...

resolvers sd
nameserver sd 127.0.0.1:

backend dynamic
server-template srv 5 _http._tcp.serviceA.tait.testing resolvers sd
resolve-prefer ipv4 check

2. Additional record responses from the nameserver are not parsed
- This just means that any servers that are populated from the SRV
records require a second round of querying for each of the hosts after the
fqdn is stored. It might be more efficient if these records are also parsed
but I can see that it might be pretty challenging in the current DNS
resolver
- Only reason I thought of this was to try and reduce up the time it
takes to populate the backend servers with addresses in an effort to lessen
the effects of #1

I'm happy with the workaround I'll be pursing for now where my SD service
(that originally was going to be a resolver and populate via SRV records)
is going to write all the backend definitions to disk so this is not a
pressing issue, just thought I'd share the limitations I discovered. My
knowledge of C (and the internal workings of HAproxy) is not great
otherwise this would probably be a patch submission for #1 :)

Tait