Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread Pi Ba
Using poll (startup with -dk) the request works properly.

Op vr 13 apr. 2018 08:29 schreef Willy Tarreau :

> Hi Pieter,
>
> On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote:
> > To clarify the issue is not that haproxy uses cpu by looping, the issue
> is
> > that haproxy prevents the page from loading in the browser. The 'fix' on
> the
> > old version after the commit introducing the issue was to call the EV_SET
> > write delete *less* often. Or maybe my understanding of what is does is
> just
> > wrong :).
>
> Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping.
> So if it's indeed missing events, it's different. Maybe we have an issue
> with a sequence of enable/disable on the polling, causing it to be disabled
> and not re-enabled sometimes. If you can test with "-dk" to disable kqueue
> and switch to poll, that will be great because I'll be able to test under
> the same conditions. And if it doesn't fail it will tell us it's really
> related to the kqueue poller.
>
> Willy
>


Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread Willy Tarreau
Hi Pieter,

On Fri, Apr 13, 2018 at 01:12:41AM +0200, PiBa-NL wrote:
> To clarify the issue is not that haproxy uses cpu by looping, the issue is
> that haproxy prevents the page from loading in the browser. The 'fix' on the
> old version after the commit introducing the issue was to call the EV_SET
> write delete *less* often. Or maybe my understanding of what is does is just
> wrong :).

Ah, sorry, maybe I'm mixing multiple problems, I understood it was looping.
So if it's indeed missing events, it's different. Maybe we have an issue
with a sequence of enable/disable on the polling, causing it to be disabled
and not re-enabled sometimes. If you can test with "-dk" to disable kqueue
and switch to poll, that will be great because I'll be able to test under
the same conditions. And if it doesn't fail it will tell us it's really
related to the kqueue poller.

Willy



Problem installing 1.8.7 -- systemd changes

2018-04-12 Thread Shawn Heisey
I have a script on my system that I use to handle compiling and 
installing a new haproxy version.  That script has 
"EXTRA=haproxy-systemd-wrapper"on the line that does the install.


It looks like that's no longer part of haproxy, and that the systemd 
service definition (included in contrib) just calls haproxy directly. 
USE_SYSTEMD must be set on the compile command when using the contrib 
systemd script, and the systemd development library must be present.


But I don't see anything in CHANGELOG or README that talks about it at 
all, and definitely nothing saying that the wrapper has been removed, 
and nothing mentioning the new compile requirements.


Was there somewhere else that I should have looked for information about 
how to get 1.8 working with systemd?


Thanks,
Shawn




Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

And a second mail as i just thought of one extra thing you wrote that 
maybe i misunderstand or perhaps confused you with a small remark about 
cpu usage in my earlier mail (that was a side effect of my other earlier 
but totally wrong code change..).

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

To clarify the issue is not that haproxy uses cpu by looping, the issue 
is that haproxy prevents the page from loading in the browser. The 'fix' 
on the old version after the commit introducing the issue was to call 
the EV_SET write delete *less* often. Or maybe my understanding of what 
is does is just wrong :).


Op 13-4-2018 om 0:57 schreef PiBa-NL:

Hi Willy,

Op 13-4-2018 om 0:22 schreef Willy Tarreau:

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

By the way you must really not try to debug an
old version but stick to the latest fixes.
Okay testing from now on with current master, just thought it would be 
easier to backtrack if i knew what particular new/missing event would 
possibly cause it. And it could have been simpler to find a fix just 
after the problem was introduced, but it seems it ain't that simple :).


I'm seeing two things that could be of interest to test :
   - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
 to delete the events. It will slightly inflate the list of events
 but not that much. If it fixes the problem it means that the
 polled_mask is sometimes wrong. Please do that with the updated
 master.
Removing the 'if polled_mask' does not fix the issue, in fact that 
makes it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is 
also not shown anymore without those checks..


   - switch to poll() just to see if you have the same so that we can
 figure if only the kqueue code triggers the issue. poll() doesn't
 rely on polled_mask at all.

Using poll (startup with -dk) the request works properly.


Many thanks for your tests.
Willy


Regards,

PiBa-NL (Pieter)



Regards,

PiBa-NL (Pieter)




Health Checks not run before attempting to use backend

2018-04-12 Thread Dave Chiluk
Hi we're evaluating haproxy for use as the load balancer in front of our
mesos cluster.  What we are finding is that even though we have requested
the check option in the server line, haproxy attempts to serve traffic to
the server on startup until the first healthcheck completes.

server slot1 10.40.40.2:7070 check inter 1000 rise 3 fall 3 maxconn 32

This is because we are adding servers to haproxy as they are started in
mesos, but before our backend application itself is ready to serve
connections.  This results in spurious 503's being handed to clients as we
add backends via the admin socket or haproxy restart.  I looked into
possibly forcing a healthcheck during the cfgparse constructors, but that
seems like it would require some significant rearchitecting.

Is there a way to force haproxy to not use a backend until it passes a
healthcheck?  I'm also worried about the side affects this might cause as
requests start to queue up in the haproxy.

Thanks,
Dave


Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

Op 13-4-2018 om 0:22 schreef Willy Tarreau:

By the way you must really not try to debug an
old version but stick to the latest fixes.
Okay testing from now on with current master, just thought it would be 
easier to backtrack if i knew what particular new/missing event would 
possibly cause it. And it could have been simpler to find a fix just 
after the problem was introduced, but it seems it ain't that simple :).


I'm seeing two things that could be of interest to test :
   - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
 to delete the events. It will slightly inflate the list of events
 but not that much. If it fixes the problem it means that the
 polled_mask is sometimes wrong. Please do that with the updated
 master.
Removing the 'if polled_mask' does not fix the issue, in fact that makes 
it worse. the "srvrep[0007:0008]: HTTP/1.1 401 Unauthorized" is also not 
shown anymore without those checks..


   - switch to poll() just to see if you have the same so that we can
 figure if only the kqueue code triggers the issue. poll() doesn't
 rely on polled_mask at all.

Using poll (startup with -dk) the request works properly.


Many thanks for your tests.
Willy


Regards,

PiBa-NL (Pieter)




Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread Willy Tarreau
Hi Pieter,

On Fri, Apr 13, 2018 at 12:05:26AM +0200, PiBa-NL wrote:
> Hi Willy,
> 
> Okay did some more digging..
> 
> before and after the 'offending' commit the EV_SET calls are as follows in
> attached screenshot, these seem to be the main cause of some things going
> wrong.
> (I think also without using http-tunnel there is some things that dont fully
> work right, but i havn't investigated that to get something reproducible
> though, and possibly caused by the same kqueue issue..) The current issue is
> 100% reproducible sofar in a (for me) easy way.

OK that's nice (except for you of course).

> I made some changes to haproxy's output, and captured and compared the
> output.
> The 'START State' is written and also the eo and en variables are added. to
> the output.
> the READ ADD/DEL WRITE ADD/DEL output is written with fprintf statements
> just before the actual EV_SET call..
> Left side 'WORKS', right side doesn't..
> 
> As you can see there are some different calls to the EV_SET with the
> different eo and en states..

I'm suspecting we could have something wrong with the polled_mask, maybe
sometimes it's removed too early somewhere, preventing the delete(write)
from being performed, which would explain why it loops.

> I could 'fix' the behavior after the offending commit by adding back a check
> for the eo in the second screenshot.. (At least for this 1 particular page
> request..)
> 
> But in current master that 'eo' variable nolonger exists. I'm kinda at the
> end of my abilities here.. :)

Indeed, it took us years to get rid of this duplicate information! And as
you can see apparently doing so possibly revealed that some corner cases
were still relying on it. By the way you must really not try to debug an
old version but stick to the latest fixes. It's too easy to be caught by
another issue that was since fixed. The fact that you could put "eo" there
in your test indicates to me that the test was made on a commit prior to
its removal. I just want to be sure you don't waste your time.

> I hope at least some of it makes sense to you?

In part, yes! I have to sit down in front of it and scratch my head, but
my context switches are rather hard, so I'm very happy that you provide
such detailed information!

I'm seeing two things that could be of interest to test :
  - remove the two "if (fdtab[fd].polled_mask & tid_bit)" conditions
to delete the events. It will slightly inflate the list of events
but not that much. If it fixes the problem it means that the
polled_mask is sometimes wrong. Please do that with the updated
master.

  - switch to poll() just to see if you have the same so that we can
figure if only the kqueue code triggers the issue. poll() doesn't
rely on polled_mask at all.

Many thanks for your tests.
Willy



Re: HAProxy 1.8.X crashing

2018-04-12 Thread Willy Tarreau
On Thu, Apr 12, 2018 at 01:14:58PM +0200, Olivier Houchard wrote:
> > > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct 
> > > channel *req, int an_bit)
> > >  
> > >   return 0;
> > >   }
> > > + /* XXX: We probably need a better mux */
> > > + conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));
> > >  
> > >   path = http_get_path(txn);
> > >   url2sa(req->buf->p + msg->sl.rq.u,
> > 
> > While I can understand how missing this can have caused trouble, I'm not
> > sure it's the best solution, and I'm a bit worried about having various
> > code places choose a mux and install it. I suspect we're "simply" missing
> > a test in the backend code to cover the case where the connection already
> > exists but a mux was not yet installed (a situation that didn't exist
> > prior to muxes which is why we didn't notice this). I think that this
> > happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> > when conn_xprt_ready() indicates the transport was not yet initialized.
> > It's likely that the error unrolling fails to consider the case where
> > the mux wasn't initialized, leading to the crash Praveen experienced.
> > 
> 
> I'm not sure about this. My point was to try to install the mux when we can
> choose one, the day we have more relevant muxes, if it ever happens.
> This certainly fixes Praveen's crash.

I totally understand, I just claim that I think we're likely buying a
carpet to put it on the dust instead of removing the dust. It's more of
a conceptual issue : you're forced to select an arbitrary mux at a moment
where you have no idea which one will be needed for this outgoing connection.
In 1.9 it could be H2, later it could be FCGI, etc. And generally when we
have to fill arbitrary information, we pay it the high price later because
it creates unnatural bonds between some unrelated elements.

The single fact that we're forced to pick an arbitrary mux to avoid a crash
somewhere else tells me we have an issue. If one specific code path doesn't
support being called with a muxless connection, this same code path might
also be triggered when the malloc() to create the mux fails and we want to
gracefully abort and try to close this muxless connection after sending a
500 Server Error status.

Also, the only reason why we create the connection this early is that we
need a place where to put the IP addresses, and given that connections
are small (only very slightly larger than the addresses they convey), it
totally makes sense to pre-populate the address there. If we start to add
more complex muxes, we'll even possibly end up allocating more resources
(think h2+hpack contexts), trying to connect and to initiate a handshake
to a target before the rest is finished or whatever funny stuff.

Now don't get me wrong, it would very well be that for now, given the
current state of the code, we're missing many checks and this solution
will be an efficient short-term fix. But for the reasons above it doesn't
sound logical to me. Maybe we're simply missing an "if (conn->mux)" or
"if (cs->mux)" somewhere on the free path before calling something for
the mux because it was assumed that it did obvously exist.

> I don't even know how we could reach line 1177.

I'm uncertain to be honest, I just think that there is a small possibility
if the connection exists and not the transport, but I could be wrong.
> 
> > If this is right, it would mean two fixes :
> >   - one in the error unrolling path to ensure we don't destroy a non
> > allocated mux ;
> 
> And what do we do then ? Without the mux we can't destroy the cs nor the
> connection.

That may be the problem. But if a mux fails to initialize, then we can't
destroy them either ? Is cs_destroy() the problem here maybe ? If so, I
suspect that this approach would be more robust and would better match
the validity tests on conn->mux that are performed at many places in the
connection code :

if (cs->conn->mux) {
cs->conn->mux->detach(cs);
} else {
conn_stop_tracking(cs->conn);
conn_xprt_close(cs->conn);
conn_free(cs->conn);
}
cs_free(cs);

> >   - one above the return SF_ERR_INTERNAL above in connect_server() to
> > handle the case where the connection already exists but not the mux.
> 
> I'm not sure how this can happen, and what we should do if that happens.

At least in the current situation we pre-create a connection just to start
to fill some information (source/destination address), without knowing the
mux yet. It's an initialization process, it may look beautiful or ugly
depending on the perspective, but in any case we must ensure that the
code overall remains reliable in this situation. And to me it looks less
ugly than picking a possibly wrong mux just to have a valid pointer
avoiding a segfault.

But if it turns out that stabilizing the destroy path (or error path) is
not suffici

Re: [PATCH][REORG/MINOR]: config: Run postparser once per section instance

2018-04-12 Thread Ben Draut
On Thu, Apr 12, 2018 at 3:06 PM, Willy Tarreau  wrote:

> Hi Ben,
>
> On Thu, Apr 12, 2018 at 02:25:58PM -0600, Ben Draut wrote:
> > This changes the parser to run section postparsers once per section
> > instance, rather than only when the section type changes.
> >
> > This is motivated by the work discussed in
> > https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It
> should
> > make it easy to produce the warning mentioned in the summary at most once
> > per section.
>
> Hmmm I don't much like this principle, because it's not a post-section
> parser anymore in this case, it's a post config parser. Note that it is
> very possible that you need a different type of post-processing to help
> with the resolvers stuff, but I really think this approach is mistaken
> because it removes the ability to perform post-section processing. From
> what I'm seeing by quickly grepping in the code, at least the cache
> relies on this.
>

Ah, I missed the cache usage, thanks for pointing that out.


> I *think* we do have a way to register a post-config call, but to be
> honest I really don't remember, as we've added a few such hooks over
> time. Maybe you'd need to call this thing before entering
> check_config_validity() which resolves every cross-dependency between
> the config elements and which detect inconsistencies. We may imagine
> having a pre_final_check and a post_final_check callback for this maybe.


Great, I'll dig some more and give it some more thought. Thanks!


Re: [PATCH][REORG/MINOR]: config: Run postparser once per section instance

2018-04-12 Thread Willy Tarreau
Hi Ben,

On Thu, Apr 12, 2018 at 02:25:58PM -0600, Ben Draut wrote:
> This changes the parser to run section postparsers once per section
> instance, rather than only when the section type changes.
> 
> This is motivated by the work discussed in
> https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It should
> make it easy to produce the warning mentioned in the summary at most once
> per section.

Hmmm I don't much like this principle, because it's not a post-section
parser anymore in this case, it's a post config parser. Note that it is
very possible that you need a different type of post-processing to help
with the resolvers stuff, but I really think this approach is mistaken
because it removes the ability to perform post-section processing. From
what I'm seeing by quickly grepping in the code, at least the cache
relies on this.

I *think* we do have a way to register a post-config call, but to be
honest I really don't remember, as we've added a few such hooks over
time. Maybe you'd need to call this thing before entering
check_config_validity() which resolves every cross-dependency between
the config elements and which detect inconsistencies. We may imagine
having a pre_final_check and a post_final_check callback for this maybe.

Just a few ideas.

Thanks!
Willy



[PATCH][REORG/MINOR]: config: Run postparser once per section instance

2018-04-12 Thread Ben Draut
This changes the parser to run section postparsers once per section
instance, rather than only when the section type changes.

This is motivated by the work discussed in
https://www.mail-archive.com/haproxy@formilux.org/msg29527.html. It should
make it easy to produce the warning mentioned in the summary at most once
per section.

Thanks,

Ben


0001-REORG-MINOR-config-Run-postparser-once-per-section-i.patch
Description: Binary data


Re: 1.8.7 http-tunnel doesn't seem to work? (but default http-keep-alive does)

2018-04-12 Thread PiBa-NL

Hi Willy,

Op 12-4-2018 om 1:19 schreef Willy Tarreau:

Thank you very much for pointing the exact line that causes you trouble.
Well exact line.. probably not the right one. And yes just removing that 
line indeed breaks something else. (as expected..)

Would you have the ability to try the latest 1.9-dev just by chance ?
Yes when i started compiling from source, thats where i started, the 
current master branch. Though (sadly) it has the same broken effect.


Ill try and investigate a bit more.. Though if you think of a possible 
patch, i'm happy to try anything out ;)


Regards,
PiBa-NL (Pieter)



Question regarding haproxy backend behaviour

2018-04-12 Thread Ayush Goyal
Hi,

I have a question regarding haproxy backend connection behaviour. We have
following setup:

  +-+ +---+
  | haproxy |>| nginx |
  +-+ +---+

We use a haproxy cluster for ssl off-loading and then load balance request
to
nginx cluster. We are currently benchmarking this setup with 3 nodes for
haproxy
cluster and 1 nginx node. Each haproxy node has two frontend/backend pair.
First
frontend is a router for ssl connection which redistributes request to the
 second
frontend in the haproxy cluster. The second frontend is for ssl handshake
and
routing requests to nginx servers. Our configuration is as follows:

```
global
maxconn 10
user haproxy
group haproxy
nbproc 2
cpu-map 1 1
cpu-map 2 2

defaults
mode http
option forwardfor
timeout connect 5s
timeout client 30s
timeout server 30s
timeout tunnel 30m
timeout client-fin 5s

frontend ssl_sess_id_router
bind *:443
bind-process 1
mode tcp
maxconn 10
log global
option tcp-smart-accept
option splice-request
option splice-response
default_backend ssl_sess_id_router_backend

backend ssl_sess_id_router_backend
bind-process 1
mode tcp
fullconn 5
balance roundrobin
..
option tcp-smart-connect
server lbtest01 :8443 weight 1 check send-proxy
server lbtest02 :8443 weight 1 check send-proxy
server lbtest03 :8443 weight 1 check send-proxy

frontend nginx_ssl_fe
bind *:8443 ssl 
maxconn 10
bind-process 2
option tcp-smart-accept
option splice-request
option splice-response
option forwardfor
reqadd X-Forwarded-Proto:\ https
timeout client-fin 5s
timeout http-request 8s
timeout http-keep-alive 30s
default_backend nginx_backend

backend nginx_backend
bind-process 2
balance roundrobin
http-reuse safe
option tcp-smart-connect
option splice-request
option splice-response
timeout tunnel 30m
timeout http-request 8s
timeout http-keep-alive 30s
server testnginx :80  weight 1 check
```

The nginx node has nginx with 4 workers and 8192 max clients, therefore the
max
number of connection it can accept is 32768.

For benchmark, we are generating ~3k new connections per second where each
connection makes 1 http request and then holds the connection for next 30
seconds. This results in a high established connection on the first
frontend,
ssl_sess_id_router, ~25k per haproxy node (Total ~77k connections on 3
haproxy
nodes). The second frontend (nginx_ssl_fe) receives the same number of
connection on the frontend. On nginx node, we see that active connections
increase to ~32k.

Our understanding is that haproxy should keep a 1:1 connection mapping for
each
new connection in frontend/backend. But there is a connection count mismatch
between haproxy and nginx (Total 77k connections in all 3 haproxy for both
frontends vs 32k connections in nginx made by nginx_backend), We are still
not
facing any major 5xx or connection errors. We are assuming that this is
happening because haproxy is terminating old idle ssl connections to serve
the
new ones. We have following questions:

1. How the nginx_backend connections are being terminated to serve the new
connections?
2. Why haproxy is not terminating connections on the frontend to keep it
them at 32k
for 1:1 mapping?

Thanks
Ayush Goyal


Re: Segfault in haproxy v1.8 with Lua

2018-04-12 Thread Hessam Mirsadeghi
But using an applet on a request will prevent the request from being sent
to the backend servers; I still want backend servers to receive the request.

On Thu, Apr 12, 2018 at 9:16 AM, Christopher Faulet 
wrote:

> Le 12/04/2018 à 14:51, Hessam a écrit :
>
>> Thanks for your response Christopher.
>>
>> What I want is to overwrite each http response with another custom
>> response. I'm not talking about manipulating headers only; I want to
>> overwrite the whole response.
>>
>>
> AKAIK, using Lua it is no possible. I guess you want to alter the
> response. If so, the only way to achieve this is to write a filter to do
> what you want. But the current API is far to be easy to use, especially for
> the body rewriting. We planned to make things simpler but it takes time and
> needs internal (and important) refactoring first.
>
> On the other hand, if you just need to generate a custom response on some
> criteria of the request, you can use a Lua applet.
>
> --
> Christopher Faulet
>


Re: DNS resolver and mixed case responses

2018-04-12 Thread Ben Draut
At the moment, AWS's provided DNS servers and Route53 appear to always
match the question case in the answer. (As do Google's DNS servers)

Bind seems to be the odd man out in not doing that by default.

On Thu, Apr 12, 2018 at 7:08 AM, Jim Freeman  wrote:

> It will be important to know which behavior AWS's Route53/DNS servers use ?
>
> Using stock Debian/Stretch BIND9 (1:9.10.3.dfsg.P4-12.3+deb9u4), we see
> haproxy downing backend servers with
> "Server is going DOWN for maintenance (unspecified DNS error)."
> https://github.com/haproxy/haproxy/search?q=unspecified+dns+error
>
> We're expecting/testing to see if bind9's "no-case-compress { any; }"
> directive
> addresses this, but many folks do not control their DNS services (and as
> requisite
> AWS/Route53 capabilities mature, neither will we).
>
>
> On Tue, Apr 10, 2018 at 3:11 PM, Ben Draut  wrote:
>
>> It's interesting that the default behavior of HAProxy resolvers can
>> conflict with the default behavior of bind. (If you're unlucky with
>> whatever bind has cached)
>>
>> By default, bind uses case-insensitive compression, which can cause it to
>> use a different case in the ANSWER than in the QUESTION. (See
>> 'no-case-compress': https://ftp.isc.org/isc/bind9/cur/9.9/do
>> c/arm/Bv9ARM.ch06.html) We were impacted by this recently.
>>
>> Also interesting: https://indico.dns-oarc.net/event/20/session/2/
>> contribution/12/material/slides/0.pdf
>>
>>
>> On Mon, Apr 9, 2018 at 2:12 AM, Baptiste  wrote:
>>
>>> So, it seems that responses that does not match the case should be
>>> dropped:
>>> https://twitter.com/PowerDNS_Bert/status/983254222694240257
>>>
>>> Baptiste
>>>
>>
>>
>


Re: Segfault in haproxy v1.8 with Lua

2018-04-12 Thread Christopher Faulet

Le 12/04/2018 à 14:51, Hessam a écrit :

Thanks for your response Christopher.

What I want is to overwrite each http response with another custom response. 
I'm not talking about manipulating headers only; I want to overwrite the whole 
response.



AKAIK, using Lua it is no possible. I guess you want to alter the 
response. If so, the only way to achieve this is to write a filter to do 
what you want. But the current API is far to be easy to use, especially 
for the body rewriting. We planned to make things simpler but it takes 
time and needs internal (and important) refactoring first.


On the other hand, if you just need to generate a custom response on 
some criteria of the request, you can use a Lua applet.


--
Christopher Faulet



Re: DNS resolver and mixed case responses

2018-04-12 Thread Jim Freeman
It will be important to know which behavior AWS's Route53/DNS servers use ?

Using stock Debian/Stretch BIND9 (1:9.10.3.dfsg.P4-12.3+deb9u4), we see
haproxy downing backend servers with
"Server is going DOWN for maintenance (unspecified DNS error)."
https://github.com/haproxy/haproxy/search?q=unspecified+dns+error

We're expecting/testing to see if bind9's "no-case-compress { any; }"
directive
addresses this, but many folks do not control their DNS services (and as
requisite
AWS/Route53 capabilities mature, neither will we).


On Tue, Apr 10, 2018 at 3:11 PM, Ben Draut  wrote:

> It's interesting that the default behavior of HAProxy resolvers can
> conflict with the default behavior of bind. (If you're unlucky with
> whatever bind has cached)
>
> By default, bind uses case-insensitive compression, which can cause it to
> use a different case in the ANSWER than in the QUESTION. (See
> 'no-case-compress': https://ftp.isc.org/isc/bind9/cur/9.9/
> doc/arm/Bv9ARM.ch06.html) We were impacted by this recently.
>
> Also interesting: https://indico.dns-oarc.net/event/20/session/
> 2/contribution/12/material/slides/0.pdf
>
>
> On Mon, Apr 9, 2018 at 2:12 AM, Baptiste  wrote:
>
>> So, it seems that responses that does not match the case should be
>> dropped:
>> https://twitter.com/PowerDNS_Bert/status/983254222694240257
>>
>> Baptiste
>>
>
>


Re: Segfault in haproxy v1.8 with Lua

2018-04-12 Thread Hessam
Thanks for your response Christopher.

What I want is to overwrite each http response with another custom response. 
I'm not talking about manipulating headers only; I want to overwrite the whole 
response.

Thanks,
Seyed

> On Apr 12, 2018, at 8:44 AM, Christopher Faulet  wrote:
> 
>> Le 11/04/2018 à 16:11, Hessam Mirsadeghi a écrit :
>> Hi Christopher,
>> You're right; that segfault happens with the build at the faulty commit and 
>> not later versions such as v1.8.5.
>> However, version v1.8.5 does segfault with the attached modified Lua script. 
>> As far as I can tell, the problem arises after any call to "txn.res:set()".
>> In the attached Lua script, if you remove the call to either of 
>> "txn.res:set(txn.res:get())" or "txn.res:forward(txn.res:get_in_len())", the 
>> segfault will disappear.
>> Also, when I only have a call to "txn.res:set(txn.res:get())" in the script, 
>> haproxy becomes unresponsive to all but the first request on each persistent 
>> connection. That is, something like "curl -sig localhost:80 localhost:80" 
>> will only get the response for the first request; the second one times out 
>> on the existing connection and succeeds only on a a second connection 
>> established by curl.
> 
> So, with this script, I have a segfault. But not only with HAProxy 1.8 and 
> higher. It also crashes on HAProxy 1.7. Actually, It will crash on all 
> versions. This is a Lua problem not a HTTP one.
> 
> IMHO, "Channel.set" or "Channel.get" must never be used on HTTP messages. 
> These functions totally hijack the HTTP parser. There is an attempt to 
> restore the HTTP parser state. But this is only partially done (the headers' 
> analysis is done, but not all of the ensuing analysis). So the result is 
> undefined. It will seem to work for simpler cases but will fail most of time. 
> Same remarks are true for "Channel.getline" and "Channel.append".
> 
> "Channel.forward" (and "Channel.send") must also never be used on HTTP 
> messages. When called, the messages analysis is in progress. The HTTP parser 
> expects to have all the headers at this stage. Mixing message analysis and 
> data forward is totally unexpected.
> 
> I will let Thierry (the Lua maintainer) decide what to do. But, I guess he 
> will restrict usage of all these functions to TCP streams only.
> 
> So now, maybe you can describe what you try to do. There is probably another 
> way to achieve it. I hope so, because it will never work this way.
> 
> -- 
> Christopher Faulet



Re: Segfault in haproxy v1.8 with Lua

2018-04-12 Thread Christopher Faulet

Le 11/04/2018 à 16:11, Hessam Mirsadeghi a écrit :

Hi Christopher,

You're right; that segfault happens with the build at the faulty commit 
and not later versions such as v1.8.5.
However, version v1.8.5 does segfault with the attached modified Lua 
script. As far as I can tell, the problem arises after any call to 
"txn.res:set()".


In the attached Lua script, if you remove the call to either of 
"txn.res:set(txn.res:get())" or "txn.res:forward(txn.res:get_in_len())", 
the segfault will disappear.
Also, when I only have a call to "txn.res:set(txn.res:get())" in the 
script, haproxy becomes unresponsive to all but the first request on 
each persistent connection. That is, something like "curl -sig 
localhost:80 localhost:80" will only get the response for the first 
request; the second one times out on the existing connection and 
succeeds only on a a second connection established by curl.


So, with this script, I have a segfault. But not only with HAProxy 1.8 
and higher. It also crashes on HAProxy 1.7. Actually, It will crash on 
all versions. This is a Lua problem not a HTTP one.


IMHO, "Channel.set" or "Channel.get" must never be used on HTTP 
messages. These functions totally hijack the HTTP parser. There is an 
attempt to restore the HTTP parser state. But this is only partially 
done (the headers' analysis is done, but not all of the ensuing 
analysis). So the result is undefined. It will seem to work for simpler 
cases but will fail most of time. Same remarks are true for 
"Channel.getline" and "Channel.append".


"Channel.forward" (and "Channel.send") must also never be used on HTTP 
messages. When called, the messages analysis is in progress. The HTTP 
parser expects to have all the headers at this stage. Mixing message 
analysis and data forward is totally unexpected.


I will let Thierry (the Lua maintainer) decide what to do. But, I guess 
he will restrict usage of all these functions to TCP streams only.


So now, maybe you can describe what you try to do. There is probably 
another way to achieve it. I hope so, because it will never work this way.


--
Christopher Faulet



Re: HAProxy 1.8.X crashing

2018-04-12 Thread Olivier Houchard
Hi Willy,

On Thu, Apr 12, 2018 at 08:53:51AM +0200, Willy Tarreau wrote:
> Hi Olivier,
> 
> On Wed, Apr 11, 2018 at 05:29:15PM +0200, Olivier Houchard wrote:
> > From 7c9f06727cf60acf873353ac71283ff9c562aeee Mon Sep 17 00:00:00 2001
> > From: Olivier Houchard 
> > Date: Wed, 11 Apr 2018 17:23:17 +0200
> > Subject: [PATCH] BUG/MINOR: connection: Setup a mux when in proxy mode.
> > 
> > We were allocating a new connection when in proxy mode, but did not provide
> > it a mux, thus crashing later.
> > 
> > This should be backported to 1.8.
> > ---
> >  src/proto_http.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/src/proto_http.c b/src/proto_http.c
> > index 80e001d69..817692c48 100644
> > --- a/src/proto_http.c
> > +++ b/src/proto_http.c
> > @@ -62,6 +62,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -3718,6 +3719,8 @@ int http_process_request(struct stream *s, struct 
> > channel *req, int an_bit)
> >  
> > return 0;
> > }
> > +   /* XXX: We probably need a better mux */
> > +   conn_install_mux(conn, &mux_pt_ops, objt_cs(s->si[1].end));
> >  
> > path = http_get_path(txn);
> > url2sa(req->buf->p + msg->sl.rq.u,
> 
> While I can understand how missing this can have caused trouble, I'm not
> sure it's the best solution, and I'm a bit worried about having various
> code places choose a mux and install it. I suspect we're "simply" missing
> a test in the backend code to cover the case where the connection already
> exists but a mux was not yet installed (a situation that didn't exist
> prior to muxes which is why we didn't notice this). I think that this
> happens in connect_server() around line 1177 (return SF_ERR_INTERNAL)
> when conn_xprt_ready() indicates the transport was not yet initialized.
> It's likely that the error unrolling fails to consider the case where
> the mux wasn't initialized, leading to the crash Praveen experienced.
> 

I'm not sure about this. My point was to try to install the mux when we can
choose one, the day we have more relevant muxes, if it ever happens.
This certainly fixes Praveen's crash.
I don't even know how we could reach line 1177.

> If this is right, it would mean two fixes :
>   - one in the error unrolling path to ensure we don't destroy a non
> allocated mux ;

And what do we do then ? Without the mux we can't destroy the cs nor the
connection.

>   - one above the return SF_ERR_INTERNAL above in connect_server() to
> handle the case where the connection already exists but not the mux.

I'm not sure how this can happen, and what we should do if that happens.

Regards,

Olivier