Re: HAProxy Healthcheck issue using Virtual hostname

2018-05-03 Thread Sen
Hi

I have an app deployed in Pivotal Cloudfoundry (PCF) and to route traffic
to an app in PCF, we have to use application route name (virtual hostname).

We have PCF in two different datacenters and I need to load balance the
traffic to these DCs , but I'm having the challenge in checking the health
of the application.

here is the config - when i have app deployed only in one DC:

backend CustomerLookupService.searchCustomer
mode http
balance roundrobin
option httpchk GET /env
HTTP/1.1\r\nHost:\customerlookup.xxx.px-prd02.cf..com:443\r\n
http-check expect rstatus ^200
option httplog
timeout server 60s
default-server inter 10s fall 3 rise 2
server 
s_CustomerLookupService.searchCustomer2customerlookup.xxx.px-prd02.cf..com:443
check resolvers dns ssl verify
none


Now I need to route the traffic to "customerlookup.xxx.px-prd03.cf..com:443
 " , in addition
to "customerlookup.xxx.px-prd02.cf..com:443 ".

in that case, how do I check the health of prd02 and prd03?.

I tried following - but it's not working:

backend CustomerLookupService.searchCustomer
mode http
balance roundrobin

option forwardfor

http-send-name-header Host
option httpchk GET /env HTTP/1.1
http-check expect rstatus ^200
option httplog
timeout server 60s
default-server inter 10s fall 3 rise 2
server 
s_CustomerLookupService.searchCustomer2customerlookup.xxx.px-prd02.cf..com:443
check resolvers dns ssl verify
none
server 
s_CustomerLookupService.searchCustomer3customerlookup.xxx.px-prd03.cf..com:443
 check resolvers dns
ssl verify none

Looks like, Hostname is not getting passed in healthcheck calls.

Any ideas how to solve this problem?

Thanks in advance
Sen.


On Thu, Apr 26, 2018 at 10:03 PM, Sen  wrote:

> Hi
>
> I have an app deployed in Pivotal Cloudfoundry (PCF) and to route traffic
> to an app in PCF, we have to use application route name (virtual hostname).
>
> We have PCF in two different datacenters and I need to load balance the
> traffic to these DCs , but I'm having the challenge in checking the health
> of the application.
>
> here is the config - when i have app deployed only in one DC:
>
> backend CustomerLookupService.searchCustomer
> mode http
> balance roundrobin
> option httpchk GET /env HTTP/1.1\r\nHost:\ customerloo
> kup.xxx.px-prd02.cf..com:443\r\n
> http-check expect rstatus ^200
> option httplog
> timeout server 60s
> default-server inter 10s fall 3 rise 2
> server s_CustomerLookupService.searchCustomer2 customerlookup
> .xxx.px-prd02.cf..com:443 check resolvers dns ssl verify none
>
>
> Now I need to route the traffic to " customerlookup.xxx.px-prd03.
> cf..com:443  " ,
> in addition to " customerlookup.xxx.px-prd02.cf..com:443 ".
>
> in that case, how do I check the health of prd02 and prd03?.
>
> I tried following - but it's not working:
>
> backend CustomerLookupService.searchCustomer
> mode http
> balance roundrobin
>
> option forwardfor
>
> http-send-name-header Host
> option httpchk GET /env HTTP/1.1
> http-check expect rstatus ^200
> option httplog
> timeout server 60s
> default-server inter 10s fall 3 rise 2
> server s_CustomerLookupService.searchCustomer2 customerlookup
> .xxx.px-prd02.cf..com:443 check resolvers dns ssl verify none
> server s_CustomerLookupService.searchCustomer3 customerlookup
> .xxx.px-prd03.cf..com:443
>  check resolvers dns
> ssl verify none
>
> Looks like, Hostname is not getting passed in healthcheck calls.
>
> Any ideas how to solve this problem?
>
> Thanks in advance
> Sen.
>
>
>


-- 
-Sen


Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-05-03 Thread PiBa-NL

Hi Thierry,

Op 3-5-2018 om 8:59 schreef Thierry Fournier:

he bug. I even installed a FreeBSD:-)  I add Willy in
copy, maybe he will reproduce it.

Thierry


The 'trick' is probably sending as few requests as possible through a 
'high latency' vpn (17ms for a ping from client to haproxy machine..).


Haproxy startup.
    Line 17: :TestSite.clireq[0007:]: GET 
/haproxy?stats HTTP/1.1
    Line 34: 0002:TestSite.clireq[0007:]: GET /favicon.ico 
HTTP/1.1
    Line 44: 0001:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1
    Line 133: 0003:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1
    Line 220: 0004:TestSite.clireq[0008:]: GET /favicon.ico 
HTTP/1.1
    Line 233: 0005:TestSite.clireq[0008:]: GET 
/haproxy?stats HTTP/1.1
    Line 251: 0006:TestSite.clireq[0008:]: GET 
/webrequest/mailstat HTTP/1.1

Crash..

Sometimes it takes a few more but its not really consistently.. Its 
rather timing sensitive i guess..



But besides the reproduction, how is the theory behind the tasks and 
their cleanup how 'should' it work?
Chrome browser makes a few requests to haproxy for stats page and the 
other for the and lua service (and a favicon in between.)..


At one point in time the tcp connection for the lua service gets closed 
and the process_stream starts to call the si_shutw.. a few calls deeper 
hlua_applet_http_release removes the http task from the list..


static void hlua_applet_http_release(struct appctx *ctx)
{
    task_delete(ctx->ctx.hlua_apphttp.task);
    task_free(ctx->ctx.hlua_apphttp.task);

Then when the current task is 'done' it will move to the next one.. the 
rq_next in the process loop..that however is pointing to the 
deleted/freed hlua_apphttp.task..?.. So getting the next task from that 
already destroyed element will fail...


Perhaps something like the patch below could work?
Does it make sense? (Same should then be done for tcp and cli tasks i 
guess..)
For my testcase it doesn't crash anymore with that change. But i'm not 
sure if now its leaking memory instead for some cases.. Is there a easy 
way to check?


Regards,
PiBa-NL (Pieter)


diff --git a/src/hlua.c b/src/hlua.c
index 4c56409..6515f52 100644
--- a/src/hlua.c
+++ b/src/hlua.c
@@ -6635,8 +6635,7 @@ error:

 static void hlua_applet_http_release(struct appctx *ctx)
 {
-    task_delete(ctx->ctx.hlua_apphttp.task);
-    task_free(ctx->ctx.hlua_apphttp.task);
+    ctx->ctx.hlua_apphttp.task->process = NULL;
 ctx->ctx.hlua_apphttp.task = NULL;
 hlua_ctx_destroy(ctx->ctx.hlua_apphttp.hlua);
 ctx->ctx.hlua_apphttp.hlua = NULL;
diff --git a/src/task.c b/src/task.c
index fd9acf6..d6ab0b9 100644
--- a/src/task.c
+++ b/src/task.c
@@ -217,6 +217,13 @@ void process_runnable_tasks()
         t = eb32sc_entry(rq_next, struct task, rq);
         rq_next = eb32sc_next(rq_next, tid_bit);
         __task_unlink_rq(t);
+            if (!t->process) {
+                // task was 'scheduled' to be destroyed (for example a 
hlua_apphttp.task).

+                task_delete(t);
+                task_free(t);
+                continue;
+            }
+
         t->state |= TASK_RUNNING;
         t->pending_state = 0;





Re: [PATCH] BUG/MINOR: lua: Socket.send causing 'close' needs 1 arguments.

2018-05-03 Thread Thierry Fournier

> On 3 May 2018, at 19:54, Sadasiva Gujjarlapudi  
> wrote:
> 
> `check_args` checked for the "exact number of" arguments available on the Lua 
> stack as
> given in the second param(nb).


You’re right ! Good catch.

Can you provide the patch with this lot of explanation for the commit ?

Like this: 
http://git.haproxy.org/?p=haproxy-1.8.git;a=commitdiff;h=05657bd24ebaf20e5c508a435be9a0830591f033

Thanks,
Thierry


> So `close` was expecting exactly "one" argument on the Lua stack.
> But when `close` was called from xxx_write_yield(), Lua stack had 3 arguments.
> 
> So close threw the exception with message "'close' needs 1 arguments".
> 
> "close_helper" function removed the Lua stack argument count check and only 
> checked
> if the first argument was a socket. 
> 
> 
> void check_args(lua_State *L, int nb, char *fcn)
> {
>   if (lua_gettop(L) == nb)
>   return;
>   WILL_LJMP(luaL_error(L, "'%s' needs %d arguments", fcn, nb));
> }
> 
> 
> Please let me know if you need more information.
> Thanks,
> Sada.
> 
> 
> On Thu, May 3, 2018 at 12:16 AM, Thierry Fournier  
> wrote:
> This function checks that one argument are present in the stack, at the
> bottom of the stack (position 1).
> 
>   MAY_LJMP(check_args(L, 1, "close"));
> 
> The stack is kept between calls, so the position 1 will contains anything.
> The content of the position 1 in the stack is a struct associated with the
> socket object.
> 
> The function close is called byt the function xxx_write() which is a callback
> of the Lua function “send()”. This send function have the same same first
> argument then the Lua function "close()”, so the stack contains a socket 
> object
> at position 1.
> 
> I’m sure because the following line, just after the stack check gets
> this argument:
> 
>   socket = MAY_LJMP(hlua_checksocket(L, 1));
> 
> If your patch works, this is a proof that the stack contains expected value
> at position 1, so removing the stack check doesn’t fix anything.
> 
> Maybe I miss something, or you encounters a bug caused by anything else.
> Do you have a method for reproducing a bug ?
> 
> BR,
> Thierry
> 
> 
> 
> > On 3 May 2018, at 08:56, Thierry Fournier  wrote:
> > 
> > Hi,
> > 
> > I’m not sure about your patch because the values in the stack are kept.
> > I need to be focus to read it, and unfortunately I’m very busy.
> > I try to check this morning, during the french strikes... 
> > 
> > Thierry
> > 
> > 
> >> On 25 Apr 2018, at 20:26, Sadasiva Gujjarlapudi  
> >> wrote:
> >> 
> >> cc: Haproxy
> >> 
> >> `hlua_socket_close` expected exactly one argument on stack.
> >> But when it was called from `hlua_socket_write_yield`, it found more than 
> >> one argument on stack and threw an error.
> >> 
> >> This patch introduced new helper function `hlua_socket_close_helper`, 
> >> which doesn't check arguments count.
> >> Please let me know if you need more information.
> >> 
> >> Thanks,
> >> Sada.
> >> 
> >> 
> >> On Wed, Apr 25, 2018 at 10:51 AM, Thierry Fournier 
> >>  wrote:
> >> Hi,
> >> 
> >> do you have a little bit of context. It is not easy to read a patch without
> >> the associated explanation.
> >> 
> >> Thanks,
> >> Thierry
> >> 
> >> 
> >> 
> >>> On 14 Apr 2018, at 01:56, sada  wrote:
> >>> 
> >>> ---
> >>> src/hlua.c | 14 ++
> >>> 1 file changed, 10 insertions(+), 4 deletions(-)
> >>> 
> >>> diff --git a/src/hlua.c b/src/hlua.c
> >>> index 60cf8f94..0585a1e7 100644
> >>> --- a/src/hlua.c
> >>> +++ b/src/hlua.c
> >>> @@ -1629,14 +1629,12 @@ __LJMP static int hlua_socket_gc(lua_State *L)
> >>> /* The close function send shutdown signal and break the
> >>> * links between the stream and the object.
> >>> */
> >>> -__LJMP static int hlua_socket_close(lua_State *L)
> >>> +__LJMP static int hlua_socket_close_helper(lua_State *L)
> >>> {
> >>> struct hlua_socket *socket;
> >>> struct appctx *appctx;
> >>> struct xref *peer;
> >>> 
> >>> -   MAY_LJMP(check_args(L, 1, "close"));
> >>> -
> >>> socket = MAY_LJMP(hlua_checksocket(L, 1));
> >>> 
> >>> /* Check if we run on the same thread than the xreator thread.
> >>> @@ -1659,6 +1657,14 @@ __LJMP static int hlua_socket_close(lua_State *L)
> >>> return 0;
> >>> }
> >>> 
> >>> +/* The close function calls close_helper.
> >>> + */
> >>> +__LJMP static int hlua_socket_close(lua_State *L)
> >>> +{
> >>> +   MAY_LJMP(check_args(L, 1, "close"));
> >>> +   return hlua_socket_close_helper(L);
> >>> +}
> >>> +
> >>> /* This Lua function assumes that the stack contain three parameters.
> >>> *  1 - USERDATA containing a struct socket
> >>> *  2 - INTEGER with values of the macro defined below
> >>> @@ -1990,7 +1996,7 @@ static int hlua_socket_write_yield(struct lua_State 
> >>> *L,int status, lua_KContext
> >>> if (len == -1)
> >>> s->req.flags |= 

Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-03 Thread Willy Tarreau
On Thu, May 03, 2018 at 08:37:09PM +0200, PiBa-NL wrote:
> Hi Tim, Willy,
> 
> Apparently even a simple copy/paste is to difficult for me to do right
> sometimes really sorry about that.. :/

Bah don't worry, if you want to re-inflate your ego, just look in the
logs for cleanups I've done just after commiting an atrocity that I
should be ashamed of, and you'll figure that we all do that once in a
while (typically a leftover printf or build warning).

> Thanks for merging and explaining, ill try and do better next time :)

You're welcome!

Willy



Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-03 Thread PiBa-NL

Hi Tim, Willy,

Apparently even a simple copy/paste is to difficult for me to do right 
sometimes really sorry about that.. :/

Thanks for merging and explaining, ill try and do better next time :)

Regards,
PiBa-NL



Re: [PATCH] BUG/MINOR: lua: Socket.send causing 'close' needs 1 arguments.

2018-05-03 Thread Sadasiva Gujjarlapudi
`check_args` checked for the "exact number of" arguments available on the
Lua stack as
given in the second param(nb).

So `close` was expecting exactly "one" argument on the Lua stack.
But when `close` was called from xxx_write_yield(), Lua stack had 3
arguments.

So close threw the exception with message "'close' needs 1 arguments".

"close_helper" function removed the Lua stack argument count check and only
checked
if the first argument was a socket.


void check_args(lua_State *L, int nb, char *fcn)
{
if (lua_gettop(L) == nb)
return;
WILL_LJMP(luaL_error(L, "'%s' needs %d arguments", fcn, nb));
}


Please let me know if you need more information.
Thanks,
Sada.


On Thu, May 3, 2018 at 12:16 AM, Thierry Fournier 
wrote:

> This function checks that one argument are present in the stack, at the
> bottom of the stack (position 1).
>
>   MAY_LJMP(check_args(L, 1, "close"));
>
> The stack is kept between calls, so the position 1 will contains anything.
> The content of the position 1 in the stack is a struct associated with the
> socket object.
>
> The function close is called byt the function xxx_write() which is a
> callback
> of the Lua function “send()”. This send function have the same same first
> argument then the Lua function "close()”, so the stack contains a socket
> object
> at position 1.
>
> I’m sure because the following line, just after the stack check gets
> this argument:
>
>   socket = MAY_LJMP(hlua_checksocket(L, 1));
>
> If your patch works, this is a proof that the stack contains expected value
> at position 1, so removing the stack check doesn’t fix anything.
>
> Maybe I miss something, or you encounters a bug caused by anything else.
> Do you have a method for reproducing a bug ?
>
> BR,
> Thierry
>
>
>
> > On 3 May 2018, at 08:56, Thierry Fournier 
> wrote:
> >
> > Hi,
> >
> > I’m not sure about your patch because the values in the stack are kept.
> > I need to be focus to read it, and unfortunately I’m very busy.
> > I try to check this morning, during the french strikes...
> >
> > Thierry
> >
> >
> >> On 25 Apr 2018, at 20:26, Sadasiva Gujjarlapudi <
> s...@signalsciences.com> wrote:
> >>
> >> cc: Haproxy
> >>
> >> `hlua_socket_close` expected exactly one argument on stack.
> >> But when it was called from `hlua_socket_write_yield`, it found more
> than one argument on stack and threw an error.
> >>
> >> This patch introduced new helper function `hlua_socket_close_helper`,
> which doesn't check arguments count.
> >> Please let me know if you need more information.
> >>
> >> Thanks,
> >> Sada.
> >>
> >>
> >> On Wed, Apr 25, 2018 at 10:51 AM, Thierry Fournier <
> tfourn...@arpalert.org> wrote:
> >> Hi,
> >>
> >> do you have a little bit of context. It is not easy to read a patch
> without
> >> the associated explanation.
> >>
> >> Thanks,
> >> Thierry
> >>
> >>
> >>
> >>> On 14 Apr 2018, at 01:56, sada  wrote:
> >>>
> >>> ---
> >>> src/hlua.c | 14 ++
> >>> 1 file changed, 10 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/src/hlua.c b/src/hlua.c
> >>> index 60cf8f94..0585a1e7 100644
> >>> --- a/src/hlua.c
> >>> +++ b/src/hlua.c
> >>> @@ -1629,14 +1629,12 @@ __LJMP static int hlua_socket_gc(lua_State *L)
> >>> /* The close function send shutdown signal and break the
> >>> * links between the stream and the object.
> >>> */
> >>> -__LJMP static int hlua_socket_close(lua_State *L)
> >>> +__LJMP static int hlua_socket_close_helper(lua_State *L)
> >>> {
> >>> struct hlua_socket *socket;
> >>> struct appctx *appctx;
> >>> struct xref *peer;
> >>>
> >>> -   MAY_LJMP(check_args(L, 1, "close"));
> >>> -
> >>> socket = MAY_LJMP(hlua_checksocket(L, 1));
> >>>
> >>> /* Check if we run on the same thread than the xreator thread.
> >>> @@ -1659,6 +1657,14 @@ __LJMP static int hlua_socket_close(lua_State
> *L)
> >>> return 0;
> >>> }
> >>>
> >>> +/* The close function calls close_helper.
> >>> + */
> >>> +__LJMP static int hlua_socket_close(lua_State *L)
> >>> +{
> >>> +   MAY_LJMP(check_args(L, 1, "close"));
> >>> +   return hlua_socket_close_helper(L);
> >>> +}
> >>> +
> >>> /* This Lua function assumes that the stack contain three parameters.
> >>> *  1 - USERDATA containing a struct socket
> >>> *  2 - INTEGER with values of the macro defined below
> >>> @@ -1990,7 +1996,7 @@ static int hlua_socket_write_yield(struct
> lua_State *L,int status, lua_KContext
> >>> if (len == -1)
> >>> s->req.flags |= CF_WAKE_WRITE;
> >>>
> >>> -   MAY_LJMP(hlua_socket_close(L));
> >>> +   MAY_LJMP(hlua_socket_close_helper(L));
> >>> lua_pop(L, 1);
> >>> lua_pushinteger(L, -1);
> >>> xref_unlock(>xref, peer);
> >>> --
> >>> 2.17.0
> >>>
> >>
> >>
> >
>
>


Re: Considering adding support for TCP Zero Copy

2018-05-03 Thread Willy Tarreau
On Thu, May 03, 2018 at 02:51:12PM +0200, Pavlos Parissis wrote:
> On 03/05/2018 02:45 uu, Olivier Houchard wrote:
> > Hi Pavlos,
> > 
> > On Thu, May 03, 2018 at 12:45:42PM +0200, Pavlos Parissis wrote:
> >> Hi,
> >>
> >> Linux kernel version 4.14 adds support for zero-copy from user memory to 
> >> TCP sockets by setting
> >> MSG_ZEROCOPY flag. This is for the sending side of the socket, for the 
> >> receiving side of the socket
> >> we need to wait for kernel version 4.18.
> >>
> >> Will you consider enabling this on HAProxy?
> >>
> >> More info can be found here, 
> >> https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html
> > 
> > After some discussion with Willy, we're not sure it is worth it.
> > It would force us to release buffer much later than we do actually, it can't
> > be used with SSL, and we already achieve zero-copy by using splicing.
> > 
> > Is there any specific case where you think it'd be a huge win ?
> > 
> 
> The only use case that I can think of is HTTP streaming. But, without testing 
> it we can't say a lot.

In fact, for HTTP streaming, splicing already does it all and even
better since it only manipulates a few pointers in the kernel between
the source and destination socket buffers. Userspace is not even
involved.

Also it's important to remember that while copies are best avoided
whenever possible, they aren't that dramatic at the common traffic
rates. I've already reached 60 Gbps of forwarded traffic with and
without splicing on a 4-core machine.

One aspect to keep in mind is the following. A typical Xeon system will
achieve around 20 GB/s of in-L3 memcpy() bandwidth. For a typical 16kB
buffer, that's only 760 ns to copy the whole buffer, which is roughly the
cost of the extra syscall needed to check that the transfer completed.
At 10 Gbps, this represents only 6.25% of the total processing time.
And there's something much more important : with the copy operation,
the buffer is released after these 760 ns and immediately recycled for
other connections. This ensures that the memory usage remains low and
that most transfer operations are made in L3 instead of RAM. If you
use zero-copy here, instead your memory will be pinned for the time
it takes to cycle on many other connections and get back to processing
this FD. It can very easily become 10-100 microseconds, or 15-150 times
more, resulting in much more RAM usage for temporary buffers, and thus
a much higher cache footprint.

In my opinion MSG_ZEROCOPY was designed for servers, those which stream
video and so on, and which produce their own data, and which don't need
to recycle their buffers. We're definitely not in this case at all here,
we're just forwarding ephemeral data so we can recycle buffers very quickly
and through splicing we can even avoid to see these data at all.

Hoping this helps,
Willy



Re: Building on AIX 7.2

2018-05-03 Thread Willy Tarreau
On Sat, Apr 28, 2018 at 01:08:17PM +, Donald MacKerracher wrote:
> This is just for a quick sanity check. I'm not seeing much sign of any
> mention of haproxy on AIX beyond 5.3. Before I involve myself heavily in it,
> has anyone attempted a build of haproxy on AIX 7.2 or am I highly likely on a
> road to nowhere?

No reports from AIX for a long time, 5.3 is indeed the latest I tested.
Maybe it will work out of the box, but it's possible that some recent
code changes have broken it a bit.

Willy



Re: [PATCH] MINOR: add get_maxconn and set_maxconn to LUA Server class.

2018-05-03 Thread Willy Tarreau
applied, thanks!
Willy



Re: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-03 Thread Willy Tarreau
On Thu, May 03, 2018 at 03:35:41PM +0200, Thierry Fournier wrote:
> Ok, I understand ... I hesitate. I think that is better to keep
> the original type, but the consistency is a good reason to convert
> to string.
> 
> I doubt that integer operations (add, sub, ...) will be performed on
> this id. So maybe, we can keep string for the consistency.
> 
> If anyone have an opinion, do not hesitate...

OK then I picked it. However I noticed something wrong that I fixed
and which is wrong in other parts of the code :

  char buffer[10];
  (...)
  snprintf(buffer, sizeof(buffer), "%d", srv->puid);
  lua_pushstring(L, buffer);

With UIDs larger than 9 snprintf() will fail and we'll push a
random string in the buffer. So I fixed it to set the buffer size to
12 ("-2147483648"). I'll have to fix the other places in the same code
part for this as well.

It's extremely unlikely anyone will notice it since few people use manual
IDs, but if some automated tool builds these IDs using ranges, this will
definitely fail.

thanks!
Willy



Re: [PATCH 1/1] DOC/MINOR: clean up LUA documentation re: servers & array/table.

2018-05-03 Thread Willy Tarreau
On Thu, May 03, 2018 at 09:20:15AM +0200, Thierry Fournier wrote:
> Hi Patrick,
> 
> Thanks for cleaning the Lua doc.
> 
> Willy, could you apply these patches ?

Now done, thanks!

Willy



Re: http-response set-header is unreliable

2018-05-03 Thread Willy Tarreau
On Thu, May 03, 2018 at 06:00:45PM +0200, Tim Düsterhus wrote:
> > What you have above looks like stderr. The rest are logs. They are for
> > very different usages, stderr is there to inform you that something went
> > wrong during a reload operation (that systemd happily hides so that you
> > believe it was OK but it was not), while the logs are there for future
> > traffic analysis and troubleshooting.
> 
> The distinction seems to be not really clear drawn:
> 
> This goes into my syslog:
> 
> > May  3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 
> > 0 conns).
> > May  3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 
> > 0 conns).
> > May  3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 
> > conns).
> > May  3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 
> > conns).
> > May  3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from 
> > server-state after a reload. 0 active and 0 backup servers left. 0 sessions 
> > active, 0 requeued, 0 remaining in queue.
> > May  3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from 
> > server-state after a reload. 0 active and 0 backup servers left. 0 sessions 
> > active, 0 requeued, 0 remaining in queue.
> > May  3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
> > May  3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
> > May  3 13:30:26 ### haproxy[14926]: Server bk_yyy/yyy is DOWN, changed from 
> > server-state after a reload. 1 active and 0 backup servers left. 0 sessions 
> > active, 0 requeued, 0 remaining in queue.
> 
> This goes into the journal:
> 
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
> > May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
> 
> At least the Proxy ... started / stopped messages should go into the
> same log.

On a regular system, these ones do not even exist because they are mostly
debug messages, which are dumped after the fork, thus which are never
shown unless you're running in foreground mode. On an init system which
requires daemon to stay in the foreground... you get the debugging
messages on output, and since the daemon confiscates your output, it
sends it to its journal. I *think* (not tried though) that you can hide
them using "-q" on the command line.

> > Going back to the initial subject, are you interested in seeing if you
> > can add a warning counter to each frontend/backend, and possibly a rate
> > limited warning in the logs as well ? I'm willing to help if needed, it's
> > just that I really cannot take care of this myself, given that I spent
> > the last 6 months dealing with bugs and various other discussions, almost
> > not having been able to start to do anything for the next release :-/ So
> > any help here is welcome as you can guess.
> > 
> 
> Personally I'd prefer the rate limited warning over the counter. As
> outlined before: A warning counter probably will be incremented for
> multiple unrelated reasons in the longer term and thus loses it
> usefulness. Having a warning_headers_too_big counter and a
> warning_whatever_there_may_be is stupid, no?

For now we don't have such a warning, so the only reason for logging
it would be this header issue. It's never supposed to happen in theory
as it normally needs to be addressed immediately and ultimately we
should block by default on this. And if later we find another reason
to add a warning, we'll figure if it makes sense to use a different
counter or not.

Also you said yourself that you wouldn't look at the logs first but at
munin first. And munin monitors your stats socket, so logically munin
should report you increases of this counter found on the stats socket.

> I feel that the error counter could / should be re-used for this and
> just the log message should be added.

Except that it's not an error until we block. We detected an error and
decided to let it pass through, which is a warning. It would be an error
if we'd block on it though.

> My munin already monitors the
> error counts. The `eresp` counter seems to fit: "- failure applying
> filters to the response.".

If you see an error, you have the guarantee that the request or response
was blocked, so definitely here it doesn't fit for the case where you
don't block. And it's very important not to violate such guarantees as
some people really rely on them. For example during forensics after an
intrusion attempt on your systems, you really want to know if the attacker
managed to retrieve something or not.

Willy



Re: http-response set-header is unreliable

2018-05-03 Thread Tim Düsterhus
Willy,

Am 03.05.2018 um 17:37 schrieb Willy Tarreau:
>> [1] I'd love to have a proper integration with systemd-journald to have
>> all my logs in one place. It's pretty annoying, because some things
>> ("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
>> Master process) go to systemd-journald (probably printed to stdout /
>> stderr) and everything else goes into syslog.
> 
> Well, you should probably tell that to the guy who instead of learning
> how to use a unix-like system decided it was easier to break everything
> in it that used to be pretty simple, stable, reliable and clear for 40
> years before he forced his crap into almost every Linux distro to the
> point of making them even less observable and debuggable than Windows
> nowadays :-(

Personally I enjoy systemd and take a unit file over an init script all
day. But I'm very well aware of the criticism surrounding it.

> What you have above looks like stderr. The rest are logs. They are for
> very different usages, stderr is there to inform you that something went
> wrong during a reload operation (that systemd happily hides so that you
> believe it was OK but it was not), while the logs are there for future
> traffic analysis and troubleshooting.

The distinction seems to be not really clear drawn:

This goes into my syslog:

> May  3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 
> conns).
> May  3 13:30:26 ### haproxy[21754]: Proxy bk_aaa stopped (FE: 0 conns, BE: 0 
> conns).
> May  3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 
> conns).
> May  3 13:30:26 ### haproxy[21754]: Proxy zzz stopped (FE: 2 conns, BE: 0 
> conns).
> May  3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from 
> server-state after a reload. 0 active and 0 backup servers left. 0 sessions 
> active, 0 requeued, 0 remaining in queue.
> May  3 13:30:26 ### haproxy[14926]: Server bk_xxx/xxx is DOWN, changed from 
> server-state after a reload. 0 active and 0 backup servers left. 0 sessions 
> active, 0 requeued, 0 remaining in queue.
> May  3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
> May  3 13:30:26 ### haproxy[14926]: backend bk_xxx has no server available!
> May  3 13:30:26 ### haproxy[14926]: Server bk_yyy/yyy is DOWN, changed from 
> server-state after a reload. 1 active and 0 backup servers left. 0 sessions 
> active, 0 requeued, 0 remaining in queue.

This goes into the journal:

> May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
> May 03 13:30:26 ### haproxy[11635]: Proxy bk_xxx started.
> May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
> May 03 13:30:26 ### haproxy[11635]: Proxy bk_yyy started.
> May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
> May 03 13:30:26 ### haproxy[11635]: Proxy bk_zzz started.
> May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.
> May 03 13:30:26 ### haproxy[11635]: Proxy aaa started.

At least the Proxy ... started / stopped messages should go into the
same log.

> And the reason journalctl is this slow very likely lies in its original
> purpose which is just to log daemons' output during startup (since it was
> confiscated by the tools). It's totally unusable for anything like moderate
> to high traffic.

Cannot comment on performance for obvious reasons.

> Going back to the initial subject, are you interested in seeing if you
> can add a warning counter to each frontend/backend, and possibly a rate
> limited warning in the logs as well ? I'm willing to help if needed, it's
> just that I really cannot take care of this myself, given that I spent
> the last 6 months dealing with bugs and various other discussions, almost
> not having been able to start to do anything for the next release :-/ So
> any help here is welcome as you can guess.
> 

Personally I'd prefer the rate limited warning over the counter. As
outlined before: A warning counter probably will be incremented for
multiple unrelated reasons in the longer term and thus loses it
usefulness. Having a warning_headers_too_big counter and a
warning_whatever_there_may_be is stupid, no?

I feel that the error counter could / should be re-used for this and
just the log message should be added. My munin already monitors the
error counts. The `eresp` counter seems to fit: "- failure applying
filters to the response.".

Best regards
Tim Düsterhus



Re: http-response set-header is unreliable

2018-05-03 Thread Willy Tarreau
Hi Tim,

On Thu, May 03, 2018 at 03:34:01PM +0200, Tim Düsterhus wrote:
> >> Especially since the issue happens randomly: Sometimes the additional
> >> headers fit by chance. Sometimes they don't. I would start by
> >> investigating the connection to the backend services, not investigating
> >> some random tunables (See my paragraph above.).
> > 
> > Actually when you have an error, the termination flags are the most 
> > important
> > thing to look at as they indicate what was wrong and where/when.
> 
> But still the termination flags do not point me to the *real* issue.
> They are relatively coarse grained.

If they indicate that an overflow occured in the request or the response,
and you have the information for each and every request, you may find that
it's quite useful, especially when you have to match this against side
effects affecting these requests. The fact that they are harder to spot
is a different issue.

> > Just out of curiosity, what do you check more often, in order of priority,
> > among :
> >   - stats page
> >   - show info on CLI
> >   - traffic logs
> >   - admin logs
> >   - other
> > 
> > Because in fact that might help figure where such painful failures would
> > need to be shown (possibly multiple places).
> 
> Primarily munin, because it shows all my services at a glance. Munin
> uses the stats socket.

OK good. This votes in favor of a per-frontend, per-backend counter then
that Munin can check and report when it increases.

> Next would be the syslog [1]. I use the default Debian packaged logging
> set up. I think it places both traffic as well as admin logs into the
> same file. I have `log global` in my default section and no specific
> logs for frontends / backends.
> 
> Last would be the stats page. I use this primarily after reboots to
> verify all my backends are properly UP. It's not much use to me for
> "long term" information, because I unconditionally reload haproxy after
> running certbot renew. Thus my haproxy instance is reloaded once a day.
> Too much hassle to pipe in the new certificates via the admin socket.

OK, pretty clear. So in short by having this per-proxy counter, we can
satisfy users like you (via Munin) and me (via the stats page).

> I don't use any other tools to retrieve information.
> 
> [1] I'd love to have a proper integration with systemd-journald to have
> all my logs in one place. It's pretty annoying, because some things
> ("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
> Master process) go to systemd-journald (probably printed to stdout /
> stderr) and everything else goes into syslog.

Well, you should probably tell that to the guy who instead of learning
how to use a unix-like system decided it was easier to break everything
in it that used to be pretty simple, stable, reliable and clear for 40
years before he forced his crap into almost every Linux distro to the
point of making them even less observable and debuggable than Windows
nowadays :-(

What you have above looks like stderr. The rest are logs. They are for
very different usages, stderr is there to inform you that something went
wrong during a reload operation (that systemd happily hides so that you
believe it was OK but it was not), while the logs are there for future
traffic analysis and troubleshooting.

And the reason journalctl is this slow very likely lies in its original
purpose which is just to log daemons' output during startup (since it was
confiscated by the tools). It's totally unusable for anything like moderate
to high traffic.

Going back to the initial subject, are you interested in seeing if you
can add a warning counter to each frontend/backend, and possibly a rate
limited warning in the logs as well ? I'm willing to help if needed, it's
just that I really cannot take care of this myself, given that I spent
the last 6 months dealing with bugs and various other discussions, almost
not having been able to start to do anything for the next release :-/ So
any help here is welcome as you can guess.

Thanks!
Willy



Re: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-03 Thread Tim Düsterhus
Thierry,

Am 03.05.2018 um 15:35 schrieb Thierry Fournier:
> Ok, I understand ... I hesitate. I think that is better to keep
> the original type, but the consistency is a good reason to convert
> to string.
> 
> I doubt that integer operations (add, sub, ...) will be performed on
> this id. So maybe, we can keep string for the consistency.
> 

I believe that opaque identifiers should be strings always. Not
integers, just because they happen to consist of digits only.

Using strings would also allow you to change the internal data type
later on, without breaking Lua.

So: +1 for string from me.

Best regards
Tim Düsterhus



Re: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-03 Thread Thierry Fournier


> On 3 May 2018, at 14:27, Patrick Hemmer  wrote:
> 
> 
> 
> On 2018/5/3 02:52, Thierry Fournier wrote:
>>> On 2 May 2018, at 16:49, Willy Tarreau 
>>>  wrote:
>>> 
>>> Hi Thierry,
>>> 
>>> when you have a moment, could you please give a quick look at these
>>> patches from Patrick so that I know if I can merge them or not ? There
>>> are 2 other ones on the list.
>>> 
>> 
>> Hi Willy and Patrick,
>> 
>> I check it. I don’t understand why you convert the puid in string.
>> You could add directly the ouid integer as is in a Lua variable with
>> the function lua_pushinteger().
>> 
>> Thierry
>> 
> I did this for consistency, as this is how Proxy.uuid behaves. Even though it 
> could return an integer, it converts it to a string and returns that instead.


Ok, I understand ... I hesitate. I think that is better to keep
the original type, but the consistency is a good reason to convert
to string.

I doubt that integer operations (add, sub, ...) will be performed on
this id. So maybe, we can keep string for the consistency.

If anyone have an opinion, do not hesitate...

Thierry


> 
>>> Thanks,
>>> Willy
>>> 
>>> On Sun, Apr 29, 2018 at 02:23:48PM -0400, Patrick Hemmer wrote:
>>> 
 ---
 doc/lua-api/index.rst |  8 
 src/hlua_fcn.c| 14 ++
 2 files changed, 22 insertions(+)
 
 
 
 diff --git a/doc/lua-api/index.rst b/doc/lua-api/index.rst
 index 7a77e46ee..cdb285957 100644
 --- a/doc/lua-api/index.rst
 +++ b/doc/lua-api/index.rst
 @@ -925,6 +925,14 @@ Server class
 
   This class provides a way for manipulating servers and retrieving 
 information.
 
 +.. js:attribute:: Server.name
 +
 +  Contain the name of the server.
 +
 +.. js:attribute:: Server.puid
 +
 +  Contain the proxy unique identifier of the server.
 +
 .. js:function:: Server.is_draining(sv)
 
   Return true if the server is currently draining sticky connections.
 diff --git a/src/hlua_fcn.c b/src/hlua_fcn.c
 index a8d53d45b..280d8e5af 100644
 --- a/src/hlua_fcn.c
 +++ b/src/hlua_fcn.c
 @@ -490,6 +490,8 @@ int hlua_listener_get_stats(lua_State *L)
 
 int hlua_fcn_new_server(lua_State *L, struct server *srv)
 {
 +  char buffer[10];
 +
lua_newtable(L);
 
/* Pop a class sesison metatable and affect it to the userdata. */
 @@ -498,6 +500,18 @@ int hlua_fcn_new_server(lua_State *L, struct server 
 *srv)
 
lua_pushlightuserdata(L, srv);
lua_rawseti(L, -2, 0);
 +
 +  /* Add server name. */
 +  lua_pushstring(L, "name");
 +  lua_pushstring(L, srv->id);
 +  lua_settable(L, -3);
 +
 +  /* Add server puid. */
 +  lua_pushstring(L, "puid");
 +  snprintf(buffer, sizeof(buffer), "%d", srv->puid);
 +  lua_pushstring(L, buffer);
 +  lua_settable(L, -3);
 +
return 1;
 }
 
 
 
>> 
> 




Re: http-response set-header is unreliable

2018-05-03 Thread Tim Düsterhus
Willy,

Am 03.05.2018 um 05:23 schrieb Willy Tarreau:
>> To me a message like: "Unable to add-header Content-Security-Policy to
>> response. Possibly the amount of headers exceeds tune.maxrewrite." would
>> have been more helpful than random 502 without any further information.
> 
> We could possibly emit something like this with a warning level, just like
> when a server goes down. However we need to rate-limit it as you don't want
> your admin log to grow at 1000/s when you're flooded with large bogus
> requests that repeatedly cause this to happen.

Yes, I agree.

>> Especially since the issue happens randomly: Sometimes the additional
>> headers fit by chance. Sometimes they don't. I would start by
>> investigating the connection to the backend services, not investigating
>> some random tunables (See my paragraph above.).
> 
> Actually when you have an error, the termination flags are the most important
> thing to look at as they indicate what was wrong and where/when.

But still the termination flags do not point me to the *real* issue.
They are relatively coarse grained.

>> Actually I'm not sure
>> whether a 502 would even be the correct response. The issue is not with
>> the backend, but with the proxy. I'd expect a 500 here:
>>
>>>The 502 (Bad Gateway) status code indicates that the server, while
>>>acting as a gateway or proxy, received an *invalid response from an
>>>inbound server* it accessed while attempting to fulfill the request.
>>
>> (highlighting mine)
> 
> It could as well, but arguably it could also be said that the frontend
> never received a valid response from the backend since this one failed
> to transform a valid response into another valid one.

Depending on the definition of valid. To me the 502 implies looking into
the backend service first, not into haproxy. But let's not bikeshed
about this.

>> After digging into it I might be able to deduce that the addition of the
>> new `http-response add-header` line caused the issues. But still I would
>> be non the wiser. I would have to stumble upon the tunable by accident.
>> Or ask on the list, like I did.
> 
> Just out of curiosity, what do you check more often, in order of priority,
> among :
>   - stats page
>   - show info on CLI
>   - traffic logs
>   - admin logs
>   - other
> 
> Because in fact that might help figure where such painful failures would
> need to be shown (possibly multiple places).

Primarily munin, because it shows all my services at a glance. Munin
uses the stats socket.

Next would be the syslog [1]. I use the default Debian packaged logging
set up. I think it places both traffic as well as admin logs into the
same file. I have `log global` in my default section and no specific
logs for frontends / backends.

Last would be the stats page. I use this primarily after reboots to
verify all my backends are properly UP. It's not much use to me for
"long term" information, because I unconditionally reload haproxy after
running certbot renew. Thus my haproxy instance is reloaded once a day.
Too much hassle to pipe in the new certificates via the admin socket.

I don't use any other tools to retrieve information.

[1] I'd love to have a proper integration with systemd-journald to have
all my logs in one place. It's pretty annoying, because some things
("Proxy bk_*** started"; [WARNING] 121/202559 (11635) : Reexecuting
Master process) go to systemd-journald (probably printed to stdout /
stderr) and everything else goes into syslog.

>> I want to note at this point that I'm not running haproxy at scale or
>> with serious monitoring. The haproxy instance I'm experiencing this
>> issue with is my personal server, not some company or business one. It
>> runs my mail and some side / hobby projects. My needs or expectations
>> might be different.
> 
> That's an important point. It's the same for me on 1wt.eu or haproxy.org,
> sometimes I figure late that there are errors. Most often it's the component
> we forget about because it works and we don't spend time looking at the logs.
> The probably, like me, you're looking at the stats page once in a while, and
> only at the logs when stats look suspicious ?
> 
> We already have "Warnings" columns in the stats page which are unused for
> the frontends, we could use it to report a count of such failures. Or we
> could add an extra "rewrite" column under "warnings" to report such errors
> where they were detected.
> 

As noted above the stats page is useless to me. Most useful to me would
be something munin could detect, because it would send me a mail.

Actually the first thing I would notice is if haproxy died, because then
my mail does not work either. I put haproxy in front of my Dovecot.
But that's a bit drastic I think. :-)

Best regards
Tim Düsterhus



Re: LUA Converters should be global

2018-05-03 Thread Thierry Fournier
Hi Patrick,

You’re right. This question was already ask.

I can’t move the converters in the global context because
some converters linked with a session. for example:

 - capture-req
 - capture-res
 - set-var
 - unset-var

Maybe we can add a flag to each converter to known if they use a session
or no, and import the sessionless converter in a global object ?

Sometime ago, I wrote this message. I do not yet test the proposed
solution, but you can try:

=

Hi, Maybe I'm wrong, but it seems that the server is choosed after the
Lua executions (action or sample fetch), so it is not possible to known
the chossen serveur during Lua phase.

In other way, the choice of the server is not easy and it is not easiy
predictible.

The converters are static functions, and they can run during the init
phase, but there are not accessible. Maybe it have an ugly solution
that consist to create a fake object Converter and use it. The
following code is just a guideline, it is not tested, and the Lua
syntax is not checked.

  -- Get the metable of converters searching in in the Global object
  -- I assume the variable meta_converter contains this metatable
  meta_converter = ...

  -- Create new object which is an array containing specific content
  -- in the slot "0"
  convs[0] = 0
  set_metatable(convs, meta_converter)

  -- Now conv is a Converter object, and maybe it can execute some
  -- converters.
  convs:crc32("test")

I'm afraid that this method doesn't work or in the worst case produce
segfault, but you can try.

In other way, if you are able to procude some line of C, you can export
the hash function from haproxy in a file. These function are autonomous
and doesn't have dependencies. You create your own Lua library
containing these two functions. You will find easyly tutorials.

BR,
Thierry

=


BR,
Thierry


> On 30 Apr 2018, at 22:19, Patrick Hemmer  wrote:
> 
> Right now in LUA, all the HAProxy converters are accessible through the 
> `Converters` class. However this class is only accessible through the TXN 
> class. Can we get this changed so that the Converters class is a global?
> 
> My intent is to be able to call a builtin HAProxy converter from within a 
> custom LUA converter. However the LUA converter prototype does not receive a 
> TXN, so I have no access to the the Converters class. Converters are supposed 
> to be stateless, so I see no reason why they should be accessible only from 
> within the TXN class.
> 
> -Patrick




Re: [PATCH] BUG/MINOR: lua: Socket.send causing 'close' needs 1 arguments.

2018-05-03 Thread Thierry Fournier
Hi,

I’m not sure about your patch because the values in the stack are kept.
I need to be focus to read it, and unfortunately I’m very busy.
I try to check this morning, during the french strikes... 

Thierry


> On 25 Apr 2018, at 20:26, Sadasiva Gujjarlapudi  
> wrote:
> 
> cc: Haproxy
> 
> `hlua_socket_close` expected exactly one argument on stack.
> But when it was called from `hlua_socket_write_yield`, it found more than one 
> argument on stack and threw an error.
> 
> This patch introduced new helper function `hlua_socket_close_helper`, which 
> doesn't check arguments count.
> Please let me know if you need more information.
> 
> Thanks,
> Sada.
> 
> 
> On Wed, Apr 25, 2018 at 10:51 AM, Thierry Fournier  
> wrote:
> Hi,
> 
> do you have a little bit of context. It is not easy to read a patch without
> the associated explanation.
> 
> Thanks,
> Thierry
> 
> 
> 
>> On 14 Apr 2018, at 01:56, sada  wrote:
>> 
>> ---
>> src/hlua.c | 14 ++
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>> 
>> diff --git a/src/hlua.c b/src/hlua.c
>> index 60cf8f94..0585a1e7 100644
>> --- a/src/hlua.c
>> +++ b/src/hlua.c
>> @@ -1629,14 +1629,12 @@ __LJMP static int hlua_socket_gc(lua_State *L)
>> /* The close function send shutdown signal and break the
>> * links between the stream and the object.
>> */
>> -__LJMP static int hlua_socket_close(lua_State *L)
>> +__LJMP static int hlua_socket_close_helper(lua_State *L)
>> {
>>  struct hlua_socket *socket;
>>  struct appctx *appctx;
>>  struct xref *peer;
>> 
>> -MAY_LJMP(check_args(L, 1, "close"));
>> -
>>  socket = MAY_LJMP(hlua_checksocket(L, 1));
>> 
>>  /* Check if we run on the same thread than the xreator thread.
>> @@ -1659,6 +1657,14 @@ __LJMP static int hlua_socket_close(lua_State *L)
>>  return 0;
>> }
>> 
>> +/* The close function calls close_helper.
>> + */
>> +__LJMP static int hlua_socket_close(lua_State *L)
>> +{
>> +MAY_LJMP(check_args(L, 1, "close"));
>> +return hlua_socket_close_helper(L);
>> +}
>> +
>> /* This Lua function assumes that the stack contain three parameters.
>> *  1 - USERDATA containing a struct socket
>> *  2 - INTEGER with values of the macro defined below
>> @@ -1990,7 +1996,7 @@ static int hlua_socket_write_yield(struct lua_State 
>> *L,int status, lua_KContext
>>  if (len == -1)
>>  s->req.flags |= CF_WAKE_WRITE;
>> 
>> -MAY_LJMP(hlua_socket_close(L));
>> +MAY_LJMP(hlua_socket_close_helper(L));
>>  lua_pop(L, 1);
>>  lua_pushinteger(L, -1);
>>  xref_unlock(>xref, peer);
>> -- 
>> 2.17.0
>> 
> 
> 




Re: 1.9dev LUA shows partial results from print_r(core.get_info()) after adding headers ?

2018-05-03 Thread Thierry Fournier
Hi,

I can’t reproduce the bug. I even installed a FreeBSD :-) I add Willy in
copy, maybe he will reproduce it.

Thierry


> On 28 Apr 2018, at 01:36, PiBa-NL  wrote:
> 
> Hi Thierry,
> 
> Op 27-4-2018 om 1:54 schreef PiBa-NL:
>> Hi Thierry,
>> 
>> Op 26-4-2018 om 12:25 schreef thierry.fourn...@arpalert.org:
>>> Your trace shows a corrupted tree. Maybe it is due to the freebsd
>>> architecture and the corruption is no reproductible on Linux ? I do not have
>>> freebsd for testing.
>>> 
>>> Regards,
>>> Thierry
>> 
>> My 'best' reproduction scenario involves around 50 to 100 request or 
>> sometimes less requests total made from my Chrome browser. (over a vpn 
>> connection which adds a little latency.. a ping/reply to the haproxy server 
>> takes +-17ms , maybe that helps reproduce.. :/ )
>> I've changed the lua function 'print_r' to not write anything in its wr() 
>> function, to rule out the the freebsd console / ssh session to be causing 
>> the issue. It stays the same though.
>> 
>> Also been adding some printf statements to how the eb_tree is used, which i 
>> think might be interesting..
>>printf("eb32sc_insert(%d)\n",new);
>>printf("eb32sc_next(%d)  leaf_p: %d \n",eb32, eb32->node.leaf_p);
>>printf("eb32sc_delete(%d)\n",eb32);
>> 
>> The pattern i see here is that usually a task is inserted, the next task is 
>> looked up, and the task gets deleted again.
>> Last round before the crash i see that the task is inserted, then deleted 
>> and then afterwards the next task is being retrieved for that same 'item'.. 
>> Which fails.. Perhaps because it was the last task to run.?. And there is 
>> nolonger a higher root to jump higher up the tree.?. I must admit i don't 
>> fully grasp how the tree is traversed exactly.. I seem to see that at least 
>> for the first task to be put into the tree there is some special handling.. 
>> On the other side, if it aint in the tree, is there technically still a 
>> 'next' item??
>> 
>> Below the last part of that logging, and also attached the complete log from 
>> the start.. Perhaps it gives a clue.?.
>> 
>> Regards,
>> 
>> PiBa-NL (Pieter)
>> 
>> eb32sc_insert(35826016)
>> process_runnable_tasks() active_tasks_mask &= ~tid_bit
>> eb32sc_next(35826016)  leaf_p: 8606272
>> eb32sc_delete(35826016)
>> task_wakeup  35826016  32
>> active_tasks_mask |= t->thread_mask  (35826016)
>> eb32sc_insert(35826016)
>> task_wakeup  35826656  8
>> active_tasks_mask |= t->thread_mask  (35826656)
>> eb32sc_insert(35826656)
>> process_runnable_tasks() active_tasks_mask &= ~tid_bit
>> eb32sc_next(35826656)  leaf_p: 35826656
>> eb32sc_delete(35826656)
>> 0013:TestSite.srvrep[0006:]: HTTP/1.1 200 OK
>> 0013:TestSite.srvhdr[0006:]: Refresh: 1
>> 0013:TestSite.srvhdr[0006:]: Server: haproxy/webstats
>> 0013:TestSite.srvhdr[0006:]: Content-Type: text/html
>> 0013:TestSite.srvhdr[0006:]: Content-Length: 1778
>> eb32sc_delete(35826016)
>> 0013:TestSite.srvcls[0006:]
>> eb32sc_next(35826016)  leaf_p: 0
>> Segmentation fault (core dumped)
>> 
> Tried to dig a little further.. pretty sure this are the steps to the issue. 
> The exact reproduction probably is rather timing sensitive though.
> A tcp connection 'stream task' gets closed, before the applet was done 
> running and this removes the applet task from the tree. This applet task is 
> however what the 'rq_next' is pointing to..
> 
> Below stack seems to be what removes and frees the applet task.. Not exactly 
> sure if this is the exact one causing the crash a little later its not 
> related to below 'log' anyhow..
> 
> hlua_applet_http_release (ctx=0x802238780) at P:\Git\haproxy\src\hlua.c:6668
> si_applet_release (si=0x8022accf0) at 
> P:\Git\haproxy\include\proto\stream_interface.h:234
> stream_int_shutw_applet (si=0x8022accf0) at 
> P:\Git\haproxy\src\stream_interface.c:1506
> si_shutw (si=0x8022accf0) at 
> P:\Git\haproxy\include\proto\stream_interface.h:321
> process_stream (t=0x80222a8c0) at P:\Git\haproxy\src\stream.c:2161
> process_runnable_tasks () at P:\Git\haproxy\src\task.c:236
> run_poll_loop () at P:\Git\haproxy\src\haproxy.c:2404
> run_thread_poll_loop (data=0x8022420a0) at P:\Git\haproxy\src\haproxy.c:2469
> main (argc=5, argv=0x7fffea68) at P:\Git\haproxy\src\haproxy.c:3060
> 
> 
>*Below some printf output, mostly containing function names, and a few 
> extra inside the process_runnable_tasks loop to show what was the current and 
> next task...
> active_tasks_mask |= t->thread_mask  (35827936)
> eb32sc_insert(35827936)
> process_runnable_tasks() active_tasks_mask &= ~tid_bit
> find rq_next, current task 35827936  rq_current:35827936
> eb32sc_next(35827936)  leaf_p: 8610368
> eb32sc_delete(35827936)
> process task 35827936  rq_next:0
> task_wakeup  35827936  32
> active_tasks_mask |= t->thread_mask  (35827936)
> eb32sc_insert(35827936)
> process_runnable_tasks() active_tasks_mask &= ~tid_bit
> find 

Re: Considering adding support for TCP Zero Copy

2018-05-03 Thread Pavlos Parissis
On 03/05/2018 02:45 μμ, Olivier Houchard wrote:
> Hi Pavlos,
> 
> On Thu, May 03, 2018 at 12:45:42PM +0200, Pavlos Parissis wrote:
>> Hi,
>>
>> Linux kernel version 4.14 adds support for zero-copy from user memory to TCP 
>> sockets by setting
>> MSG_ZEROCOPY flag. This is for the sending side of the socket, for the 
>> receiving side of the socket
>> we need to wait for kernel version 4.18.
>>
>> Will you consider enabling this on HAProxy?
>>
>> More info can be found here, 
>> https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html
> 
> After some discussion with Willy, we're not sure it is worth it.
> It would force us to release buffer much later than we do actually, it can't
> be used with SSL, and we already achieve zero-copy by using splicing.
> 
> Is there any specific case where you think it'd be a huge win ?
> 

The only use case that I can think of is HTTP streaming. But, without testing 
it we can't say a lot.

Thanks,
Pavlos



signature.asc
Description: OpenPGP digital signature


Re: Considering adding support for TCP Zero Copy

2018-05-03 Thread Olivier Houchard
Hi Pavlos,

On Thu, May 03, 2018 at 12:45:42PM +0200, Pavlos Parissis wrote:
> Hi,
> 
> Linux kernel version 4.14 adds support for zero-copy from user memory to TCP 
> sockets by setting
> MSG_ZEROCOPY flag. This is for the sending side of the socket, for the 
> receiving side of the socket
> we need to wait for kernel version 4.18.
> 
> Will you consider enabling this on HAProxy?
> 
> More info can be found here, 
> https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html

After some discussion with Willy, we're not sure it is worth it.
It would force us to release buffer much later than we do actually, it can't
be used with SSL, and we already achieve zero-copy by using splicing.

Is there any specific case where you think it'd be a huge win ?

Regards,

Olivier



Re: [PATCH] BUG/MINOR, lua/sockets, make lua tasks that are waiting for io suspend until woken up by the a corresponding event.

2018-05-03 Thread Tim Düsterhus
Pieter,

Am 03.05.2018 um 00:58 schrieb PiBa-NL:
> *snip* As for line
> lengths, it didn't even cross my mind to look at that. (i also didn't
> lookup the contributing either sorry, though it doesn't seem to specify
> specific line lengths.?.)
> 

Indeed they are not specified for haproxy, but the line lengths are kind
of an unwritten standard in git usage.

The more important length is the one of the subject line. Both

gitweb: http://git.haproxy.org/?p=haproxy.git;a=shortlog

as well as

GitHub:
https://github.com/haproxy/haproxy/commit/ebaba754297f39b86959fbdc13f66e4534aadeae

truncate the first line after a certain length. Usually a length of 50
is recommended (e.g. the vim highlighter colors the line differently
after 50 characters). However for haproxy you lose about 10 characters
for the type + severity of the patch (BUG/MINOR).

The body is recommended to wrap at lengths between 72 and 76 characters.
Personally I just wrap so that it fits my default terminal size of 80x24
characters nicely.

And one last thing: Copied, literal, output of tools such as gdb for
stack traces should not be wrapped. It should simply exceed the width.

Best regards
Tim Düsterhus



Re: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-03 Thread Patrick Hemmer


On 2018/5/3 02:52, Thierry Fournier wrote:
>> On 2 May 2018, at 16:49, Willy Tarreau  wrote:
>>
>> Hi Thierry,
>>
>> when you have a moment, could you please give a quick look at these
>> patches from Patrick so that I know if I can merge them or not ? There
>> are 2 other ones on the list.
>
> Hi Willy and Patrick,
>
> I check it. I don’t understand why you convert the puid in string.
> You could add directly the ouid integer as is in a Lua variable with
> the function lua_pushinteger().
>
> Thierry
I did this for consistency, as this is how Proxy.uuid behaves. Even
though it could return an integer, it converts it to a string and
returns that instead.

>> Thanks,
>> Willy
>>
>> On Sun, Apr 29, 2018 at 02:23:48PM -0400, Patrick Hemmer wrote:
>>> ---
>>> doc/lua-api/index.rst |  8 
>>> src/hlua_fcn.c| 14 ++
>>> 2 files changed, 22 insertions(+)
>>>
>>>
>>> diff --git a/doc/lua-api/index.rst b/doc/lua-api/index.rst
>>> index 7a77e46ee..cdb285957 100644
>>> --- a/doc/lua-api/index.rst
>>> +++ b/doc/lua-api/index.rst
>>> @@ -925,6 +925,14 @@ Server class
>>>
>>>   This class provides a way for manipulating servers and retrieving 
>>> information.
>>>
>>> +.. js:attribute:: Server.name
>>> +
>>> +  Contain the name of the server.
>>> +
>>> +.. js:attribute:: Server.puid
>>> +
>>> +  Contain the proxy unique identifier of the server.
>>> +
>>> .. js:function:: Server.is_draining(sv)
>>>
>>>   Return true if the server is currently draining sticky connections.
>>> diff --git a/src/hlua_fcn.c b/src/hlua_fcn.c
>>> index a8d53d45b..280d8e5af 100644
>>> --- a/src/hlua_fcn.c
>>> +++ b/src/hlua_fcn.c
>>> @@ -490,6 +490,8 @@ int hlua_listener_get_stats(lua_State *L)
>>>
>>> int hlua_fcn_new_server(lua_State *L, struct server *srv)
>>> {
>>> +   char buffer[10];
>>> +
>>> lua_newtable(L);
>>>
>>> /* Pop a class sesison metatable and affect it to the userdata. */
>>> @@ -498,6 +500,18 @@ int hlua_fcn_new_server(lua_State *L, struct server 
>>> *srv)
>>>
>>> lua_pushlightuserdata(L, srv);
>>> lua_rawseti(L, -2, 0);
>>> +
>>> +   /* Add server name. */
>>> +   lua_pushstring(L, "name");
>>> +   lua_pushstring(L, srv->id);
>>> +   lua_settable(L, -3);
>>> +
>>> +   /* Add server puid. */
>>> +   lua_pushstring(L, "puid");
>>> +   snprintf(buffer, sizeof(buffer), "%d", srv->puid);
>>> +   lua_pushstring(L, buffer);
>>> +   lua_settable(L, -3);
>>> +
>>> return 1;
>>> }
>>>
>>>
>



Re: [PATCH] MINOR: Add server name & puid to LUA Server class.

2018-05-03 Thread Thierry Fournier

> On 2 May 2018, at 16:49, Willy Tarreau  wrote:
> 
> Hi Thierry,
> 
> when you have a moment, could you please give a quick look at these
> patches from Patrick so that I know if I can merge them or not ? There
> are 2 other ones on the list.


Hi Willy and Patrick,

I check it. I don’t understand why you convert the puid in string.
You could add directly the ouid integer as is in a Lua variable with
the function lua_pushinteger().

Thierry

> 
> Thanks,
> Willy
> 
> On Sun, Apr 29, 2018 at 02:23:48PM -0400, Patrick Hemmer wrote:
>> ---
>> doc/lua-api/index.rst |  8 
>> src/hlua_fcn.c| 14 ++
>> 2 files changed, 22 insertions(+)
>> 
>> 
> 
>> diff --git a/doc/lua-api/index.rst b/doc/lua-api/index.rst
>> index 7a77e46ee..cdb285957 100644
>> --- a/doc/lua-api/index.rst
>> +++ b/doc/lua-api/index.rst
>> @@ -925,6 +925,14 @@ Server class
>> 
>>  This class provides a way for manipulating servers and retrieving 
>> information.
>> 
>> +.. js:attribute:: Server.name
>> +
>> +  Contain the name of the server.
>> +
>> +.. js:attribute:: Server.puid
>> +
>> +  Contain the proxy unique identifier of the server.
>> +
>> .. js:function:: Server.is_draining(sv)
>> 
>>  Return true if the server is currently draining sticky connections.
>> diff --git a/src/hlua_fcn.c b/src/hlua_fcn.c
>> index a8d53d45b..280d8e5af 100644
>> --- a/src/hlua_fcn.c
>> +++ b/src/hlua_fcn.c
>> @@ -490,6 +490,8 @@ int hlua_listener_get_stats(lua_State *L)
>> 
>> int hlua_fcn_new_server(lua_State *L, struct server *srv)
>> {
>> +char buffer[10];
>> +
>>  lua_newtable(L);
>> 
>>  /* Pop a class sesison metatable and affect it to the userdata. */
>> @@ -498,6 +500,18 @@ int hlua_fcn_new_server(lua_State *L, struct server 
>> *srv)
>> 
>>  lua_pushlightuserdata(L, srv);
>>  lua_rawseti(L, -2, 0);
>> +
>> +/* Add server name. */
>> +lua_pushstring(L, "name");
>> +lua_pushstring(L, srv->id);
>> +lua_settable(L, -3);
>> +
>> +/* Add server puid. */
>> +lua_pushstring(L, "puid");
>> +snprintf(buffer, sizeof(buffer), "%d", srv->puid);
>> +lua_pushstring(L, buffer);
>> +lua_settable(L, -3);
>> +
>>  return 1;
>> }
>> 
>> 
> 




Re: [PATCH] BUG/MINOR: lua: Socket.send causing 'close' needs 1 arguments.

2018-05-03 Thread Thierry Fournier
This function checks that one argument are present in the stack, at the
bottom of the stack (position 1).

  MAY_LJMP(check_args(L, 1, "close"));

The stack is kept between calls, so the position 1 will contains anything.
The content of the position 1 in the stack is a struct associated with the
socket object.

The function close is called byt the function xxx_write() which is a callback
of the Lua function “send()”. This send function have the same same first
argument then the Lua function "close()”, so the stack contains a socket object
at position 1.

I’m sure because the following line, just after the stack check gets
this argument:

  socket = MAY_LJMP(hlua_checksocket(L, 1));

If your patch works, this is a proof that the stack contains expected value
at position 1, so removing the stack check doesn’t fix anything.

Maybe I miss something, or you encounters a bug caused by anything else.
Do you have a method for reproducing a bug ?

BR,
Thierry



> On 3 May 2018, at 08:56, Thierry Fournier  wrote:
> 
> Hi,
> 
> I’m not sure about your patch because the values in the stack are kept.
> I need to be focus to read it, and unfortunately I’m very busy.
> I try to check this morning, during the french strikes... 
> 
> Thierry
> 
> 
>> On 25 Apr 2018, at 20:26, Sadasiva Gujjarlapudi  
>> wrote:
>> 
>> cc: Haproxy
>> 
>> `hlua_socket_close` expected exactly one argument on stack.
>> But when it was called from `hlua_socket_write_yield`, it found more than 
>> one argument on stack and threw an error.
>> 
>> This patch introduced new helper function `hlua_socket_close_helper`, which 
>> doesn't check arguments count.
>> Please let me know if you need more information.
>> 
>> Thanks,
>> Sada.
>> 
>> 
>> On Wed, Apr 25, 2018 at 10:51 AM, Thierry Fournier  
>> wrote:
>> Hi,
>> 
>> do you have a little bit of context. It is not easy to read a patch without
>> the associated explanation.
>> 
>> Thanks,
>> Thierry
>> 
>> 
>> 
>>> On 14 Apr 2018, at 01:56, sada  wrote:
>>> 
>>> ---
>>> src/hlua.c | 14 ++
>>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/src/hlua.c b/src/hlua.c
>>> index 60cf8f94..0585a1e7 100644
>>> --- a/src/hlua.c
>>> +++ b/src/hlua.c
>>> @@ -1629,14 +1629,12 @@ __LJMP static int hlua_socket_gc(lua_State *L)
>>> /* The close function send shutdown signal and break the
>>> * links between the stream and the object.
>>> */
>>> -__LJMP static int hlua_socket_close(lua_State *L)
>>> +__LJMP static int hlua_socket_close_helper(lua_State *L)
>>> {
>>> struct hlua_socket *socket;
>>> struct appctx *appctx;
>>> struct xref *peer;
>>> 
>>> -   MAY_LJMP(check_args(L, 1, "close"));
>>> -
>>> socket = MAY_LJMP(hlua_checksocket(L, 1));
>>> 
>>> /* Check if we run on the same thread than the xreator thread.
>>> @@ -1659,6 +1657,14 @@ __LJMP static int hlua_socket_close(lua_State *L)
>>> return 0;
>>> }
>>> 
>>> +/* The close function calls close_helper.
>>> + */
>>> +__LJMP static int hlua_socket_close(lua_State *L)
>>> +{
>>> +   MAY_LJMP(check_args(L, 1, "close"));
>>> +   return hlua_socket_close_helper(L);
>>> +}
>>> +
>>> /* This Lua function assumes that the stack contain three parameters.
>>> *  1 - USERDATA containing a struct socket
>>> *  2 - INTEGER with values of the macro defined below
>>> @@ -1990,7 +1996,7 @@ static int hlua_socket_write_yield(struct lua_State 
>>> *L,int status, lua_KContext
>>> if (len == -1)
>>> s->req.flags |= CF_WAKE_WRITE;
>>> 
>>> -   MAY_LJMP(hlua_socket_close(L));
>>> +   MAY_LJMP(hlua_socket_close_helper(L));
>>> lua_pop(L, 1);
>>> lua_pushinteger(L, -1);
>>> xref_unlock(>xref, peer);
>>> -- 
>>> 2.17.0
>>> 
>> 
>> 
> 




Re: [PATCH 1/1] DOC/MINOR: clean up LUA documentation re: servers & array/table.

2018-05-03 Thread Thierry Fournier
Hi Patrick,

Thanks for cleaning the Lua doc.

Willy, could you apply these patches ?
Theses patch could be backported from 1.8 to 1.6, but it is not so important.

Thierry


> On 2 May 2018, at 03:30, Patrick Hemmer  wrote:
> 
> 
> * A few typos
> * Fix definitions of values which are tables, not arrays.
> * Consistent US English naming for "server" instead of "serveur".
> ---
> doc/lua-api/index.rst | 88
> ++-
> 1 file changed, 45 insertions(+), 43 deletions(-)
> 
> 
> <0001-DOC-MINOR-clean-up-LUA-documentation-re-servers-arra.patch>




Considering adding support for TCP Zero Copy

2018-05-03 Thread Pavlos Parissis
Hi,

Linux kernel version 4.14 adds support for zero-copy from user memory to TCP 
sockets by setting
MSG_ZEROCOPY flag. This is for the sending side of the socket, for the 
receiving side of the socket
we need to wait for kernel version 4.18.

Will you consider enabling this on HAProxy?

More info can be found here, 
https://www.kernel.org/doc/html/latest/networking/msg_zerocopy.html

Cheers,
Pavlos



signature.asc
Description: OpenPGP digital signature


Sticky sessions

2018-05-03 Thread Михаил Торозёров
Hello guys
Sorry that I'm distracting you but I was struggling with an error for 1
week and couldnt solve it by myself. Maybe you will have time to check my
config to help a bit
I've even made a topic in stackoverflow
https://stackoverflow.com/questions/50149747/haproxy-sticky-sessions