Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Willy Tarreau
On Wed, Aug 26, 2020 at 06:31:21PM +0300, Ciprian Dorin Craciun wrote:
> On Wed, Aug 26, 2020 at 6:29 PM Willy Tarreau  wrote:
> > > Then, still related to logging, might I add a feature request to just
> > > use "raw" lines over UDP or TCP, instead of SysLog?  (Just as we have
> > > now support for `stdout`, but with network support.)
> >
> > You already have it. I'm doing this:
> >
> > log 127.0.0.1:5514 format raw daemon
> > log-format "%[uuid]"
> >
> > And netcat shows me only the uuid with no header nor anything. So
> > technically speaking I don't think you're missing anything for this.
> 
> 
> This is nice.  I'll have to give it a try.  (I thought that the `raw`
> applied only to `stdout`.

I think it used to be the case for quite some time, probably that it's
become available with 2.2 and the rework needed for TCP servers.

> However we are missing the TCP support.  :)

You have it in 2.2. You need to declare a ring buffer, which itself is
chained to the TCP log server. Ah I've just found you an example here:

  https://www.haproxy.com/blog/announcing-haproxy-2-2/#syslog-over-tcp

And in 2.3 we're even starting to receive syslog to forward and load
balance it (including with format conversion and UDP->TCP).

Cheers,
Willy



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Ciprian Dorin Craciun
On Wed, Aug 26, 2020 at 6:29 PM Willy Tarreau  wrote:
> > Then, still related to logging, might I add a feature request to just
> > use "raw" lines over UDP or TCP, instead of SysLog?  (Just as we have
> > now support for `stdout`, but with network support.)
>
> You already have it. I'm doing this:
>
> log 127.0.0.1:5514 format raw daemon
> log-format "%[uuid]"
>
> And netcat shows me only the uuid with no header nor anything. So
> technically speaking I don't think you're missing anything for this.


This is nice.  I'll have to give it a try.  (I thought that the `raw`
applied only to `stdout`.

However we are missing the TCP support.  :)

Ciprian.



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Willy Tarreau
On Wed, Aug 26, 2020 at 06:17:11PM +0300, Ciprian Dorin Craciun wrote:
> On Wed, Aug 26, 2020 at 1:08 PM Willy Tarreau  wrote:
> > > Also what would be extra useful, especially for debugging and perhaps
> > > security, is to add an `http-request|http-response log 
> > >  if ` that would allow adding additional log
> > > lines if a certain condition is met.
> >
> > We cannot realistically "add more lines" to a request, however deciding
> > to emit one line at the instant the rule is processed shouldn't be hard
> > at all, and I definitely see the value there, as you say, both for
> > debugging and for security alerts.
> 
> 
> Yes, you are right, I don't expect such a statement to add a new line
> to the buffer, but to immediately format a log line and submit it.
> 
> (This also aids in debugging, because if the same log line is put in
> multiple places of the `http-request` chain, one could see how a
> certain header has changed, and it could also help in generating a log
> line the moment the request was parsed, as opposed to waiting for the
> response to be processed.)

Actually that's fun, because among the features for which we wanted
to have the log profiles was the ability to have multiple log points,
precisely for request then response typically. It appeared as being a
little bit of a mess and I don't think anyone made any progress in this
area. But with your idea we could possibly delegate this choice to the
user. One hook I wanted to support was the "server connection established",
which will not be covered, but now that we have the after-response rules,
that at least gives request, response and end of transfer, which are the
most valuable ones.

> Then, still related to logging, might I add a feature request to just
> use "raw" lines over UDP or TCP, instead of SysLog?  (Just as we have
> now support for `stdout`, but with network support.)

You already have it. I'm doing this:

log 127.0.0.1:5514 format raw daemon
log-format "%[uuid]"

And netcat shows me only the uuid with no header nor anything. So
technically speaking I don't think you're missing anything for this.

Willy



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Ciprian Dorin Craciun
On Wed, Aug 26, 2020 at 1:08 PM Willy Tarreau  wrote:
> > Also what would be extra useful, especially for debugging and perhaps
> > security, is to add an `http-request|http-response log 
> >  if ` that would allow adding additional log
> > lines if a certain condition is met.
>
> We cannot realistically "add more lines" to a request, however deciding
> to emit one line at the instant the rule is processed shouldn't be hard
> at all, and I definitely see the value there, as you say, both for
> debugging and for security alerts.


Yes, you are right, I don't expect such a statement to add a new line
to the buffer, but to immediately format a log line and submit it.

(This also aids in debugging, because if the same log line is put in
multiple places of the `http-request` chain, one could see how a
certain header has changed, and it could also help in generating a log
line the moment the request was parsed, as opposed to waiting for the
response to be processed.)




> There's something I've been wanting for a while which would be log profiles
> (pre-defined and reusable log-format directives) that would significantly
> help with such situations as you probably don't want to duplicate the
> exact same format all the time. As such I'd rather first define how we
> want to reference them, then make sure that the suggestion above can be
> done in a forward compatible way.


Although `log-format` profiles might be useful, I would still keep the
option to just "paste" the same format everywhere it is needed.

(As a minor side-note:  the HAProxy configuration syntax is far from
trivial, and I see it more as an "assembler" than as a "high level
language";  therefore I've built myself a Python generator,
https://github.com/cipriancraciun/haproxy-configurator/tree/master/examples,
that just generates the matching HAProxy configuration file.  Moreover
I guess this is the way many other HAProxy integrators do it,
especially in the micro-services world.)


> These are good ideas that will fuel a long feature request :-)

Then, still related to logging, might I add a feature request to just
use "raw" lines over UDP or TCP, instead of SysLog?  (Just as we have
now support for `stdout`, but with network support.)

I know there is the issue of reliability, but that is not an issue,
given as now SysLog is also over UDP.

Thanks,
Ciprian.



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Willy Tarreau
Hi Ciprian,

On Wed, Aug 26, 2020 at 12:52:52PM +0300, Ciprian Dorin Craciun wrote:
> If we are to talk about logging, please make it easy to have valid
> JSON log lines.

Yep, good point!

> I know there is the `%[something(),json()]` that works as expected;
> however there are many (especially old) log items (say `%ST`) that
> sometimes expand to a number, and sometimes to a `-`, which basically
> means one has to always treat everything as strings, and hope the
> actual value doesn't contain a quote or JSON escape codes...  In fact,
> to keep things simple, I would suggest just adding new functions
> (fetch samples as they are called) that expose all items available in
> the `Custom log format` section  (some are available, some are not
> from what I remember);  these should expand to an empty string if they
> are not available (as opposed to a `-`).

This is exactly what was planned to be done in the background. I even
think I created an issue for this a long time ago, so that anyone can
come in and progressively convert some of these tags to sample fetch
functions that would allow to do this. I must confess that after that
long a time, I mostly remembered about the complaint not to be able
to operate on these values but I almost forgot about the problem of
not being able to actually re-encode them!

> Also what would be extra useful, especially for debugging and perhaps
> security, is to add an `http-request|http-response log 
>  if ` that would allow adding additional log
> lines if a certain condition is met.

We cannot realistically "add more lines" to a request, however deciding
to emit one line at the instant the rule is processed shouldn't be hard
at all, and I definitely see the value there, as you say, both for
debugging and for security alerts.

There's something I've been wanting for a while which would be log profiles
(pre-defined and reusable log-format directives) that would significantly
help with such situations as you probably don't want to duplicate the
exact same format all the time. As such I'd rather first define how we
want to reference them, then make sure that the suggestion above can be
done in a forward compatible way.

These are good ideas that will fuel a long feature request :-)

Thanks!
Willy



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-26 Thread Ciprian Dorin Craciun
On Tue, Aug 25, 2020 at 3:56 PM Willy Tarreau  wrote:
> There was something important I've been wanting for a few versions, which
> was to have named log formats that we could declare in a central place and
> use everywhere. It would tremendously help here. I know it can be done
> using environment variables declared in the global section but I personally
> find this ugly.
>
> So I think it's the right place to open such a discussion (what we should
> log and whether or not it loses info by default or requires to duplicate
> some data while waiting for the response), so that we can reach a better
> and more modern solution. I'm open to proposals.


If we are to talk about logging, please make it easy to have valid
JSON log lines.

I know there is the `%[something(),json()]` that works as expected;
however there are many (especially old) log items (say `%ST`) that
sometimes expand to a number, and sometimes to a `-`, which basically
means one has to always treat everything as strings, and hope the
actual value doesn't contain a quote or JSON escape codes...  In fact,
to keep things simple, I would suggest just adding new functions
(fetch samples as they are called) that expose all items available in
the `Custom log format` section  (some are available, some are not
from what I remember);  these should expand to an empty string if they
are not available (as opposed to a `-`).

Also what would be extra useful, especially for debugging and perhaps
security, is to add an `http-request|http-response log 
 if ` that would allow adding additional log
lines if a certain condition is met.

Thanks,
Ciprian.



RE: Logging using %HP (path) produce different results with H1 and H2

2020-08-25 Thread Pierre Cheynier
Hi Willy,

On Tue, Aug 25, 2020 at 14:53:05PM +0200, Willy Tarreau wrote:

> Thus an HTTP/2 request effectively "looks like" an HTTP/1 request using
> an absolute URI. What causes the mess in the logs is that such HTTP/1
> requests are rarely used (most only for proxies), but they are perfectly
> valid and given that they are often used to take routing decisions, it's
> mandatory that they are part of the logs. For example if you decide that
> every *url* starting with "/img" has to be routed to the static server
> and the rest to the application, you're forgetting that "https://foo/img/";
> is valid as well and will be routed to the application. That's what I do
> not want to report fake or reconstructed information in the logs.
>
> In 1.8, what happened when we introduced H2 is that it was directly turned
> into HTTP/1.1 before being processed and that given that we didn't support
> server-side H2, the most seamless way to handle it was to just replace
> everything with origin requests (no authority). That remained till 2.0
> since it was not really acceptable to imagine that depending on whether
> you enabled HTX or not you'd get different logs for the exact same request.
> But now we cannot cheat anymore, it had caused too much trouble already

I clearly understood the problem is more complex than it seems in the
first place, due to protocol + internal representations changes that occured
recently.

> What I understand however is that it's possible that we need to rethink
> what we're logging. Maybe instead of logging the URI by default (and missing
> the Host in HTTP/1) we ought to instead log the scheme, authority and path
> parts. These are not always there (scheme or authority in H1) but we can
> log an empty field like "-" in this case.

Clearly that was my point. Especially when you manipulate "high-level variables"
such as %HP %HQ %HU and so on, you probably expect the hard work to be done
for you.

> We cannot realistically do that in the default "httplog" format, but we
> can imagine a new default format which would report such info (htxlog?),
> or maybe renaming httplog to http-legacy-log and changing the httplog's
> default. We should then consider this opportunity to revisit certain
> fields that do not make sense anymore, like the "~" in front of the
> frontend's name for SSL, the various timers that need to report idle
> and probably user-facing time instead of just data, etc.

I +1 on this as a tradeoff (even though ~ in front frontends is already OK
by using %f vs. %ft - I understand it's more a matter of leveraging this
change in order to remove tech debt).

> So I think it's the right place to open such a discussion (what we should
> log and whether or not it loses info by default or requires to duplicate
> some data while waiting for the response), so that we can reach a better
> and more modern solution. I'm open to proposals.

That's wider than I initially thought :)

--
Pierre


Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-25 Thread Tim Düsterhus
Willy,

Am 25.08.20 um 14:53 schrieb Willy Tarreau:
> So I think it's the right place to open such a discussion (what we should
> log and whether or not it loses info by default or requires to duplicate
> some data while waiting for the response), so that we can reach a better
> and more modern solution. I'm open to proposals.
> 

%ID by default, please.

See also this issue:
https://github.com/haproxy/haproxy/issues/401#issuecomment-562776569
and this:
https://github.com/haproxy/haproxy/issues/771

Best regards
Tim Düsterhus



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-25 Thread Willy Tarreau
Hi Pierre,

On Mon, Aug 24, 2020 at 08:17:05AM +, Pierre Cheynier wrote:
> On Fri, Aug 21, 2020 at 8:11 PM William Dauchy  wrote:
> 
> So awesome to get the first response from your direct colleague :)
> 
> > I believe this is expected; this behaviour has changed since v2.1 though.
> 
> Indeed, we don't use this logging variable since a long time, so I'm not 
> really able to confirm if this is so new.
> Anyway, I understand this is related to handling h2 and its specifics, still 
> I think there should be something to fix (in one way or the other) to get 
> back to a consistent/deterministic meaning of %HP (and maybe in other places 
> where this had an impact).
> 
> Willy, any thought about this?

What is certain is that I don't want to start logging a false or misleading
information. The issues stems from browsers using a different way to represent
a resource in a request depending on HTTP versions (and it's not their fault,
it's the recommended way to do it). In HTTP/1.x we used to have :
  - relative URIs made of a path only with a separate Host header
  - absolute URIs made of a scheme + host + path and the Host header
having to be repeated.

In HTTP/2 this has been greatly simplified and only the second full URI
was kept by default. In order to save servers from having to parse these
elements, they are sent already split into:
  - a ":scheme" pseudo header holding the scheme of the URI
  - a ":authority" pseudo header holding the equivalent of the host part
of the URI
  - a ":path" pseudo header holding the path part of the URI

And no Host header is needed anymore.

Thus an HTTP/2 request effectively "looks like" an HTTP/1 request using
an absolute URI. What causes the mess in the logs is that such HTTP/1
requests are rarely used (most only for proxies), but they are perfectly
valid and given that they are often used to take routing decisions, it's
mandatory that they are part of the logs. For example if you decide that
every *url* starting with "/img" has to be routed to the static server
and the rest to the application, you're forgetting that "https://foo/img/";
is valid as well and will be routed to the application. That's what I do
not want to report fake or reconstructed information in the logs.

In 1.8, what happened when we introduced H2 is that it was directly turned
into HTTP/1.1 before being processed and that given that we didn't support
server-side H2, the most seamless way to handle it was to just replace
everything with origin requests (no authority). That remained till 2.0
since it was not really acceptable to imagine that depending on whether
you enabled HTX or not you'd get different logs for the exact same request.
But now we cannot cheat anymore, it had caused too much trouble already

What I understand however is that it's possible that we need to rethink
what we're logging. Maybe instead of logging the URI by default (and missing
the Host in HTTP/1) we ought to instead log the scheme, authority and path
parts. These are not always there (scheme or authority in H1) but we can
log an empty field like "-" in this case.

We cannot realistically do that in the default "httplog" format, but we
can imagine a new default format which would report such info (htxlog?),
or maybe renaming httplog to http-legacy-log and changing the httplog's
default. We should then consider this opportunity to revisit certain
fields that do not make sense anymore, like the "~" in front of the
frontend's name for SSL, the various timers that need to report idle
and probably user-facing time instead of just data, etc.

There was something important I've been wanting for a few versions, which
was to have named log formats that we could declare in a central place and
use everywhere. It would tremendously help here. I know it can be done
using environment variables declared in the global section but I personally
find this ugly.

So I think it's the right place to open such a discussion (what we should
log and whether or not it loses info by default or requires to duplicate
some data while waiting for the response), so that we can reach a better
and more modern solution. I'm open to proposals.

Cheers,
Willy



RE: Logging using %HP (path) produce different results with H1 and H2

2020-08-24 Thread Pierre Cheynier
On Fri, Aug 21, 2020 at 8:11 PM William Dauchy  wrote:

So awesome to get the first response from your direct colleague :)

> I believe this is expected; this behaviour has changed since v2.1 though.

Indeed, we don't use this logging variable since a long time, so I'm not really 
able to confirm if this is so new.
Anyway, I understand this is related to handling h2 and its specifics, still I 
think there should be something to fix (in one way or the other) to get back to 
a consistent/deterministic meaning of %HP (and maybe in other places where this 
had an impact).

Willy, any thought about this?

Pierre


Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-21 Thread Ciprian Dorin Craciun
On Fri, Aug 21, 2020 at 9:14 PM William Dauchy  wrote:
> On Fri, Aug 21, 2020 at 7:46 PM Pierre Cheynier  wrote:
> > We're running HAProxy 2.2.2.
> > It turns out logging requests paths using "%HP" var produce a different 
> > results on H1 vs. H2.
> >
> > H1: /path
> > H2: https://hostname.domain/path (< I consider this one buggy)


I've also hit this issue in May
(https://www.mail-archive.com/haproxy@formilux.org/msg37170.html), and
as many continue to upgrade to newer HAProxy versions, they might hit
that as well.  Personally I've opted into using variables set from
request / response rules, and then logging those in a custom log
format.

However the major issue, if I remember correctly, isn't that the
format has changed between HAProxy 2.0 and 2.1, but that in 2.1 the
same `%HP` produces different outputs based on the used protocol,
which will complicate log handling...

Perhaps a backward incompatible fix would be to always include the
protocol and `Host` header (or the IP if one is missing) in case of
HTTP/1.0 and HTTP/1.1, and thus have `%HP` always contain an URL.


> I believe this is expected; this behaviour has changed since v2.1 though.

It might be so, but the changelog for 2.1 says nothing about this change.

Ciprian.



Re: Logging using %HP (path) produce different results with H1 and H2

2020-08-21 Thread William Dauchy
Hello Pierre,

On Fri, Aug 21, 2020 at 7:46 PM Pierre Cheynier  wrote:
> We're running HAProxy 2.2.2.
> It turns out logging requests paths using "%HP" var produce a different 
> results on H1 vs. H2.
>
> H1: /path
> H2: https://hostname.domain/path (< I consider this one buggy)
>
> No idea where does this comes from exactly, I essentially understand txn->uri 
> structure ends up being completely different between the 2 code paths.
> Anybody here with lightspeed knowledge of src/h* to investigate this?

I believe this is expected; this behaviour has changed since v2.1 though.
see 
http://git.haproxy.org/?p=haproxy.git;a=commit;h=30ee1efe676e8264af16bab833c621d60a72a4d7
see also older threads talking about it
https://www.mail-archive.com/haproxy@formilux.org/msg37783.html

Best,
--
William



Logging using %HP (path) produce different results with H1 and H2

2020-08-21 Thread Pierre Cheynier
Hi list,

We're running HAProxy 2.2.2.
It turns out logging requests paths using "%HP" var produce a different results 
on H1 vs. H2.

H1: /path
H2: https://hostname.domain/path (< I consider this one buggy)

No idea where does this comes from exactly, I essentially understand txn->uri 
structure ends up being completely different between the 2 code paths.
Anybody here with lightspeed knowledge of src/h* to investigate this?

Best,

--
Pierre