Re: handling hundreds of reqrep statements

2013-10-28 Thread Willy Tarreau
Hi Patrick,

On Wed, Oct 23, 2013 at 02:15:50AM -0400, Patrick Hemmer wrote:
 It seems that when the request first comes in, haproxy allocates a
 buffer for every header. If the header is X-Foo: bar it allocates a 10
 character buffer. When you do `reqrep` on the request line, and add a
 line at the end with the \r\n it moves every header down by one. So
 X-Foo: bar ends up in the buffer for whatever header was after it. If
 that buffer isn't big enough to old the whole thing, then when haproxy
 goes to look for a matching header it won't find it. So in my case,
 X-Header-ID: foo got put in the buffer for Accept: */*. Since
 X-Header-ID is one character longer than that buffer, when haproxy went
 looking for it, it was only finding X-Header-I.

Thanks for your analysis. I'll recheck the code and run your test. It is
indeed possible that there is a bug there. Generally, if a reqrep line
adds a new line using the \r\n trick, the ACLs won't see it because the
header is not indexed. But I think that you have observed an inconsistent
behaviour, so at least I would like it to be either consistent or documented
so that the limits are well understood.

Thanks,
Willy




Re: handling hundreds of reqrep statements

2013-10-23 Thread Patrick Hemmer
 



*From: *Patrick Hemmer hapr...@stormcloud9.net
*Sent: * 2013-10-22 23:32:31 E
*CC: *haproxy@formilux.org
*Subject: *Re: handling hundreds of reqrep statements



 
 *From: *Patrick Hemmer hapr...@stormcloud9.net
 *Sent: * 2013-10-22 19:13:08 E
 *To: *haproxy@formilux.org
 *Subject: *handling hundreds of reqrep statements

 I'm currently using haproxy (1.5-dev19) as a content based router. It
 takes an incoming request, looks at the url, rewrites it, and sends
 it on to the appropriate back end.
 The difficult part is that we need to all parsing and rewriting after
 the first match. This is because we might have a url such as
 '/foo/bar' which rewrites to '/foo/baz', and another rewrite from
 '/foo/b' to '/foo/c'. As you can see both rules would try to trigger
 a rewrite on '/foo/bar/shot', and we'd end up with '/foo/caz/shot'.
 Additionally there are hundreds of these rewrites (the config file is
 generated from a mapping).

 There are 2 questions here:

 1) I currently have this working using stick tables (it's unpleasant
 but it works).
 It basically looks like this:
 frontend frontend1
 acl foo_bar path_reg ^/foo/bar
 use_backend backend1 if foo_bar

 acl foo_b path_reg ^/foo/b
 use_backend backend1 if foo_b

 backend backend1
 stick-table type integer size 1 store gpc0 # create a stick table
 to store one entry
 tcp-request content track-sc1 always_false # enable tracking on
 sc1. The `always_false` doesn't matter, it just requires a key, so we
 give it one
 acl rewrite-init sc1_clr_gpc0 ge 0 # ACL to clear gpc0
 tcp-request content accept if rewrite-init # clear gpc0 on the
 start of every request
 acl rewrite-empty sc1_get_gpc0 eq 0 # ACL to check if gpc0 has
 been set
 acl rewrite-set sc1_inc_gpc0 ge 0 # ACL to set gpc0 when a
 rewrite has matched

 acl foo_bar path_reg ^/foo/bar
 reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2 if rewrite-empty
 foo_bar rewrite-set # the conditional first checks if another rewrite
 has matched, then checks the foo_bar acl, and then performs the
 rewrite-set only if foo_bar matched

 acl foo_b path_reg ^/foo/b
 reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2 if rewrite-empty
 foo_b rewrite-set # same procedure as above

 (my actual rules are a bit more complicated, but those examples
 exhibit all the problem points I have).

 The cleaner way I thought of handling this was to instead do
 something like this:
 backend backend1
 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

 acl foo_bar path_reg ^/foo/bar
 reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2\r\nX-Rewrite-ID:\
 foo_bar if !rewrite-found foo_bar

 acl foo_b path_reg ^/foo/b
 reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2\r\nX-Rewrite-ID:\
 foo_b if !rewrite-found foo_b

 But this doesn't work. The rewrite-found acl never finds the header
 and so both reqrep commands run. Is there any better way of doing
 this than the nasty stick table?


 2) I would also like to add a field to the log indicating which rule
 matched. I can't figure out a way to accomplish this bit.
 Since the config file is automatically generated, I was hoping to
 just assign a short numeric ID and stick that in the log somehow. The
 only way I can think that this could work is by adding a header
 conditionally using an acl (or use the header created by the
 alternate idea above), and then using `capture request header` to add
 that to the log. But it does not appear haproxy can capture headers
 added by itself.

 -Patrick

 Ok, so I went home and resumed trying to figure this out, starting
 from scratch on a whole new machine. Well guess what, the cleaner
 way worked. After many proclamations of WTF? out loud (my dog was
 getting concerned), I think I found a bug. And I cannot begin to
 describe just how awesome this bug is.

 Here's how you can duplicate this awesomeness:

 Start a haproxy with the following config:
 defaults
 mode http
 timeout connect 1000
 timeout client 1000
 timeout server 1000

 frontend frontend
 bind *:2082

 maxconn 2

   acl rewrite-found req.hdr(X-Header-ID) -m found

 reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
 !rewrite-found
 reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ pop if
 !rewrite-found
 reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ tart if
 !rewrite-found

 default_backend backend

 backend backend
 server server 127.0.0.1:2090



 Start up a netcat:
 while true; do nc -l -p 2090; done


 Create a file with the following contents (I'll presume we call it
 data):
 GET /foo/ HTTP/1.1
 Accept: */*
 User-Agent: Agent
 Host: localhost:2082


 (with the empty line on the bottom)

 And now run:
 nc localhost2082  data

 In your listening netcat, notice you got 3 X-Header-ID headers.

 Now in your data file

Re: handling hundreds of reqrep statements

2013-10-23 Thread Patrick Hemmer
 



*From: *hushmeh...@hushmail.com
*Sent: * 2013-10-23 01:06:24 E
*To: *hapr...@stormcloud9.net
*CC: *haproxy@formilux.org
*Subject: *Re: handling hundreds of reqrep statements


 On Wed, 23 Oct 2013 05:33:38 +0200 Patrick Hemmer 
 hapr...@stormcloud9.net wrote:
reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
 !rewrite-found
 What about reqadd? Clumsy fiddling with \r\n (or \n\r) in regexp 
 seems awkward to me.
 reqadd X-Header-ID:\ bar unless rewrite-found

Ya, I think I figured out the issue. Had to do with haproxy
pre-allocating buffers for each header, and not expecting them being
moved around.
Unfortunately I can't use reqadd to add a header as reqadd happens too
late in the process. All reqrep statements happen before reqadd. So if I
put an acl on reqrep to skip it if the header has been added, it'll
always run the reqrep because the header gets added afterwards.
However I think I can use http-request set-header instead of reqadd.
It's not as simple as the reqrep \r\n idea, but still better than the
nasty stick table.


handling hundreds of reqrep statements

2013-10-22 Thread Patrick Hemmer
I'm currently using haproxy (1.5-dev19) as a content based router. It
takes an incoming request, looks at the url, rewrites it, and sends it
on to the appropriate back end.
The difficult part is that we need to all parsing and rewriting after
the first match. This is because we might have a url such as '/foo/bar'
which rewrites to '/foo/baz', and another rewrite from '/foo/b' to
'/foo/c'. As you can see both rules would try to trigger a rewrite on
'/foo/bar/shot', and we'd end up with '/foo/caz/shot'.
Additionally there are hundreds of these rewrites (the config file is
generated from a mapping).

There are 2 questions here:

1) I currently have this working using stick tables (it's unpleasant but
it works).
It basically looks like this:
frontend frontend1
acl foo_bar path_reg ^/foo/bar
use_backend backend1 if foo_bar

acl foo_b path_reg ^/foo/b
use_backend backend1 if foo_b

backend backend1
stick-table type integer size 1 store gpc0 # create a stick table to
store one entry
tcp-request content track-sc1 always_false # enable tracking on sc1.
The `always_false` doesn't matter, it just requires a key, so we give it one
acl rewrite-init sc1_clr_gpc0 ge 0 # ACL to clear gpc0
tcp-request content accept if rewrite-init # clear gpc0 on the start
of every request
acl rewrite-empty sc1_get_gpc0 eq 0 # ACL to check if gpc0 has been set
acl rewrite-set sc1_inc_gpc0 ge 0 # ACL to set gpc0 when a rewrite
has matched

acl foo_bar path_reg ^/foo/bar
reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2 if rewrite-empty
foo_bar rewrite-set # the conditional first checks if another rewrite
has matched, then checks the foo_bar acl, and then performs the
rewrite-set only if foo_bar matched

acl foo_b path_reg ^/foo/b
reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2 if rewrite-empty foo_b
rewrite-set # same procedure as above

(my actual rules are a bit more complicated, but those examples exhibit
all the problem points I have).

The cleaner way I thought of handling this was to instead do something
like this:
backend backend1
acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

acl foo_bar path_reg ^/foo/bar
reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2\r\nX-Rewrite-ID:\
foo_bar if !rewrite-found foo_bar

acl foo_b path_reg ^/foo/b
reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2\r\nX-Rewrite-ID:\ foo_b
if !rewrite-found foo_b

But this doesn't work. The rewrite-found acl never finds the header and
so both reqrep commands run. Is there any better way of doing this than
the nasty stick table?


2) I would also like to add a field to the log indicating which rule
matched. I can't figure out a way to accomplish this bit.
Since the config file is automatically generated, I was hoping to just
assign a short numeric ID and stick that in the log somehow. The only
way I can think that this could work is by adding a header conditionally
using an acl (or use the header created by the alternate idea above),
and then using `capture request header` to add that to the log. But it
does not appear haproxy can capture headers added by itself.

-Patrick


Re: handling hundreds of reqrep statements

2013-10-22 Thread Patrick Hemmer



*From: *Patrick Hemmer hapr...@stormcloud9.net
*Sent: * 2013-10-22 19:13:08 E
*To: *haproxy@formilux.org
*Subject: *handling hundreds of reqrep statements

 I'm currently using haproxy (1.5-dev19) as a content based router. It
 takes an incoming request, looks at the url, rewrites it, and sends it
 on to the appropriate back end.
 The difficult part is that we need to all parsing and rewriting after
 the first match. This is because we might have a url such as
 '/foo/bar' which rewrites to '/foo/baz', and another rewrite from
 '/foo/b' to '/foo/c'. As you can see both rules would try to trigger a
 rewrite on '/foo/bar/shot', and we'd end up with '/foo/caz/shot'.
 Additionally there are hundreds of these rewrites (the config file is
 generated from a mapping).

 There are 2 questions here:

 1) I currently have this working using stick tables (it's unpleasant
 but it works).
 It basically looks like this:
 frontend frontend1
 acl foo_bar path_reg ^/foo/bar
 use_backend backend1 if foo_bar

 acl foo_b path_reg ^/foo/b
 use_backend backend1 if foo_b

 backend backend1
 stick-table type integer size 1 store gpc0 # create a stick table
 to store one entry
 tcp-request content track-sc1 always_false # enable tracking on
 sc1. The `always_false` doesn't matter, it just requires a key, so we
 give it one
 acl rewrite-init sc1_clr_gpc0 ge 0 # ACL to clear gpc0
 tcp-request content accept if rewrite-init # clear gpc0 on the
 start of every request
 acl rewrite-empty sc1_get_gpc0 eq 0 # ACL to check if gpc0 has
 been set
 acl rewrite-set sc1_inc_gpc0 ge 0 # ACL to set gpc0 when a rewrite
 has matched

 acl foo_bar path_reg ^/foo/bar
 reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2 if rewrite-empty
 foo_bar rewrite-set # the conditional first checks if another rewrite
 has matched, then checks the foo_bar acl, and then performs the
 rewrite-set only if foo_bar matched

 acl foo_b path_reg ^/foo/b
 reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2 if rewrite-empty foo_b
 rewrite-set # same procedure as above

 (my actual rules are a bit more complicated, but those examples
 exhibit all the problem points I have).

 The cleaner way I thought of handling this was to instead do something
 like this:
 backend backend1
 acl rewrite-found req.hdr(X-Rewrite-ID,1) -m found

 acl foo_bar path_reg ^/foo/bar
 reqrep ^(GET|POST)\ /foo/bar(.*) \1\ /foo/baz\2\r\nX-Rewrite-ID:\
 foo_bar if !rewrite-found foo_bar

 acl foo_b path_reg ^/foo/b
 reqrep ^(GET|POST)\ /foo/b(.*) \1\ /foo/c\2\r\nX-Rewrite-ID:\
 foo_b if !rewrite-found foo_b

 But this doesn't work. The rewrite-found acl never finds the header
 and so both reqrep commands run. Is there any better way of doing this
 than the nasty stick table?


 2) I would also like to add a field to the log indicating which rule
 matched. I can't figure out a way to accomplish this bit.
 Since the config file is automatically generated, I was hoping to just
 assign a short numeric ID and stick that in the log somehow. The only
 way I can think that this could work is by adding a header
 conditionally using an acl (or use the header created by the alternate
 idea above), and then using `capture request header` to add that to
 the log. But it does not appear haproxy can capture headers added by
 itself.

 -Patrick

Ok, so I went home and resumed trying to figure this out, starting from
scratch on a whole new machine. Well guess what, the cleaner way
worked. After many proclamations of WTF? out loud (my dog was getting
concerned), I think I found a bug. And I cannot begin to describe just
how awesome this bug is.

Here's how you can duplicate this awesomeness:

Start a haproxy with the following config:
defaults
mode http
timeout connect 1000
timeout client 1000
timeout server 1000

frontend frontend
bind *:2082

maxconn 2

  acl rewrite-found req.hdr(X-Header-ID) -m found

reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
!rewrite-found
reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ pop if
!rewrite-found
reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ tart if
!rewrite-found

default_backend backend

backend backend
server server 127.0.0.1:2090



Start up a netcat:
while true; do nc -l -p 2090; done


Create a file with the following contents (I'll presume we call it data):
GET /foo/ HTTP/1.1
Accept: */*
User-Agent: Agent
Host: localhost:2082


(with the empty line on the bottom)

And now run:
nc localhost2082  data

In your listening netcat, notice you got 3 X-Header-ID headers.

Now in your data file, move the Accept: */* down one line, so it's
after the User-Agent and retry. Notice you only get 1 X-Header-ID
back. It works!

But wait, it gets even better. Put the Accept: */* line back where it
was, and in the haproxy config, replace all X-Header-ID with
X-HeaderID (just remove

Re: handling hundreds of reqrep statements

2013-10-22 Thread hushmehard


On Wed, 23 Oct 2013 05:33:38 +0200 Patrick Hemmer 
hapr...@stormcloud9.net wrote:
reqrep ^(GET)\ /foo/(.*) \1\ /foo/\2\r\nX-Header-ID:\ bar if
!rewrite-found

What about reqadd? Clumsy fiddling with \r\n (or \n\r) in regexp 
seems awkward to me.
reqadd X-Header-ID:\ bar unless rewrite-found