Re: Logging using %HP (path) produce different results with H1 and H2
On Wed, Aug 26, 2020 at 06:31:21PM +0300, Ciprian Dorin Craciun wrote: > On Wed, Aug 26, 2020 at 6:29 PM Willy Tarreau wrote: > > > Then, still related to logging, might I add a feature request to just > > > use "raw" lines over UDP or TCP, instead of SysLog? (Just as we have > > > now support for `stdout`, but with network support.) > > > > You already have it. I'm doing this: > > > > log 127.0.0.1:5514 format raw daemon > > log-format "%[uuid]" > > > > And netcat shows me only the uuid with no header nor anything. So > > technically speaking I don't think you're missing anything for this. > > > This is nice. I'll have to give it a try. (I thought that the `raw` > applied only to `stdout`. I think it used to be the case for quite some time, probably that it's become available with 2.2 and the rework needed for TCP servers. > However we are missing the TCP support. :) You have it in 2.2. You need to declare a ring buffer, which itself is chained to the TCP log server. Ah I've just found you an example here: https://www.haproxy.com/blog/announcing-haproxy-2-2/#syslog-over-tcp And in 2.3 we're even starting to receive syslog to forward and load balance it (including with format conversion and UDP->TCP). Cheers, Willy
Re: Logging using %HP (path) produce different results with H1 and H2
On Wed, Aug 26, 2020 at 6:29 PM Willy Tarreau wrote: > > Then, still related to logging, might I add a feature request to just > > use "raw" lines over UDP or TCP, instead of SysLog? (Just as we have > > now support for `stdout`, but with network support.) > > You already have it. I'm doing this: > > log 127.0.0.1:5514 format raw daemon > log-format "%[uuid]" > > And netcat shows me only the uuid with no header nor anything. So > technically speaking I don't think you're missing anything for this. This is nice. I'll have to give it a try. (I thought that the `raw` applied only to `stdout`. However we are missing the TCP support. :) Ciprian.
Re: Logging using %HP (path) produce different results with H1 and H2
On Wed, Aug 26, 2020 at 06:17:11PM +0300, Ciprian Dorin Craciun wrote: > On Wed, Aug 26, 2020 at 1:08 PM Willy Tarreau wrote: > > > Also what would be extra useful, especially for debugging and perhaps > > > security, is to add an `http-request|http-response log > > > if ` that would allow adding additional log > > > lines if a certain condition is met. > > > > We cannot realistically "add more lines" to a request, however deciding > > to emit one line at the instant the rule is processed shouldn't be hard > > at all, and I definitely see the value there, as you say, both for > > debugging and for security alerts. > > > Yes, you are right, I don't expect such a statement to add a new line > to the buffer, but to immediately format a log line and submit it. > > (This also aids in debugging, because if the same log line is put in > multiple places of the `http-request` chain, one could see how a > certain header has changed, and it could also help in generating a log > line the moment the request was parsed, as opposed to waiting for the > response to be processed.) Actually that's fun, because among the features for which we wanted to have the log profiles was the ability to have multiple log points, precisely for request then response typically. It appeared as being a little bit of a mess and I don't think anyone made any progress in this area. But with your idea we could possibly delegate this choice to the user. One hook I wanted to support was the "server connection established", which will not be covered, but now that we have the after-response rules, that at least gives request, response and end of transfer, which are the most valuable ones. > Then, still related to logging, might I add a feature request to just > use "raw" lines over UDP or TCP, instead of SysLog? (Just as we have > now support for `stdout`, but with network support.) You already have it. I'm doing this: log 127.0.0.1:5514 format raw daemon log-format "%[uuid]" And netcat shows me only the uuid with no header nor anything. So technically speaking I don't think you're missing anything for this. Willy
Re: Logging using %HP (path) produce different results with H1 and H2
On Wed, Aug 26, 2020 at 1:08 PM Willy Tarreau wrote: > > Also what would be extra useful, especially for debugging and perhaps > > security, is to add an `http-request|http-response log > > if ` that would allow adding additional log > > lines if a certain condition is met. > > We cannot realistically "add more lines" to a request, however deciding > to emit one line at the instant the rule is processed shouldn't be hard > at all, and I definitely see the value there, as you say, both for > debugging and for security alerts. Yes, you are right, I don't expect such a statement to add a new line to the buffer, but to immediately format a log line and submit it. (This also aids in debugging, because if the same log line is put in multiple places of the `http-request` chain, one could see how a certain header has changed, and it could also help in generating a log line the moment the request was parsed, as opposed to waiting for the response to be processed.) > There's something I've been wanting for a while which would be log profiles > (pre-defined and reusable log-format directives) that would significantly > help with such situations as you probably don't want to duplicate the > exact same format all the time. As such I'd rather first define how we > want to reference them, then make sure that the suggestion above can be > done in a forward compatible way. Although `log-format` profiles might be useful, I would still keep the option to just "paste" the same format everywhere it is needed. (As a minor side-note: the HAProxy configuration syntax is far from trivial, and I see it more as an "assembler" than as a "high level language"; therefore I've built myself a Python generator, https://github.com/cipriancraciun/haproxy-configurator/tree/master/examples, that just generates the matching HAProxy configuration file. Moreover I guess this is the way many other HAProxy integrators do it, especially in the micro-services world.) > These are good ideas that will fuel a long feature request :-) Then, still related to logging, might I add a feature request to just use "raw" lines over UDP or TCP, instead of SysLog? (Just as we have now support for `stdout`, but with network support.) I know there is the issue of reliability, but that is not an issue, given as now SysLog is also over UDP. Thanks, Ciprian.
Re: Logging using %HP (path) produce different results with H1 and H2
Hi Ciprian, On Wed, Aug 26, 2020 at 12:52:52PM +0300, Ciprian Dorin Craciun wrote: > If we are to talk about logging, please make it easy to have valid > JSON log lines. Yep, good point! > I know there is the `%[something(),json()]` that works as expected; > however there are many (especially old) log items (say `%ST`) that > sometimes expand to a number, and sometimes to a `-`, which basically > means one has to always treat everything as strings, and hope the > actual value doesn't contain a quote or JSON escape codes... In fact, > to keep things simple, I would suggest just adding new functions > (fetch samples as they are called) that expose all items available in > the `Custom log format` section (some are available, some are not > from what I remember); these should expand to an empty string if they > are not available (as opposed to a `-`). This is exactly what was planned to be done in the background. I even think I created an issue for this a long time ago, so that anyone can come in and progressively convert some of these tags to sample fetch functions that would allow to do this. I must confess that after that long a time, I mostly remembered about the complaint not to be able to operate on these values but I almost forgot about the problem of not being able to actually re-encode them! > Also what would be extra useful, especially for debugging and perhaps > security, is to add an `http-request|http-response log > if ` that would allow adding additional log > lines if a certain condition is met. We cannot realistically "add more lines" to a request, however deciding to emit one line at the instant the rule is processed shouldn't be hard at all, and I definitely see the value there, as you say, both for debugging and for security alerts. There's something I've been wanting for a while which would be log profiles (pre-defined and reusable log-format directives) that would significantly help with such situations as you probably don't want to duplicate the exact same format all the time. As such I'd rather first define how we want to reference them, then make sure that the suggestion above can be done in a forward compatible way. These are good ideas that will fuel a long feature request :-) Thanks! Willy
Re: Logging using %HP (path) produce different results with H1 and H2
On Tue, Aug 25, 2020 at 3:56 PM Willy Tarreau wrote: > There was something important I've been wanting for a few versions, which > was to have named log formats that we could declare in a central place and > use everywhere. It would tremendously help here. I know it can be done > using environment variables declared in the global section but I personally > find this ugly. > > So I think it's the right place to open such a discussion (what we should > log and whether or not it loses info by default or requires to duplicate > some data while waiting for the response), so that we can reach a better > and more modern solution. I'm open to proposals. If we are to talk about logging, please make it easy to have valid JSON log lines. I know there is the `%[something(),json()]` that works as expected; however there are many (especially old) log items (say `%ST`) that sometimes expand to a number, and sometimes to a `-`, which basically means one has to always treat everything as strings, and hope the actual value doesn't contain a quote or JSON escape codes... In fact, to keep things simple, I would suggest just adding new functions (fetch samples as they are called) that expose all items available in the `Custom log format` section (some are available, some are not from what I remember); these should expand to an empty string if they are not available (as opposed to a `-`). Also what would be extra useful, especially for debugging and perhaps security, is to add an `http-request|http-response log if ` that would allow adding additional log lines if a certain condition is met. Thanks, Ciprian.
RE: Logging using %HP (path) produce different results with H1 and H2
Hi Willy, On Tue, Aug 25, 2020 at 14:53:05PM +0200, Willy Tarreau wrote: > Thus an HTTP/2 request effectively "looks like" an HTTP/1 request using > an absolute URI. What causes the mess in the logs is that such HTTP/1 > requests are rarely used (most only for proxies), but they are perfectly > valid and given that they are often used to take routing decisions, it's > mandatory that they are part of the logs. For example if you decide that > every *url* starting with "/img" has to be routed to the static server > and the rest to the application, you're forgetting that "https://foo/img/"; > is valid as well and will be routed to the application. That's what I do > not want to report fake or reconstructed information in the logs. > > In 1.8, what happened when we introduced H2 is that it was directly turned > into HTTP/1.1 before being processed and that given that we didn't support > server-side H2, the most seamless way to handle it was to just replace > everything with origin requests (no authority). That remained till 2.0 > since it was not really acceptable to imagine that depending on whether > you enabled HTX or not you'd get different logs for the exact same request. > But now we cannot cheat anymore, it had caused too much trouble already I clearly understood the problem is more complex than it seems in the first place, due to protocol + internal representations changes that occured recently. > What I understand however is that it's possible that we need to rethink > what we're logging. Maybe instead of logging the URI by default (and missing > the Host in HTTP/1) we ought to instead log the scheme, authority and path > parts. These are not always there (scheme or authority in H1) but we can > log an empty field like "-" in this case. Clearly that was my point. Especially when you manipulate "high-level variables" such as %HP %HQ %HU and so on, you probably expect the hard work to be done for you. > We cannot realistically do that in the default "httplog" format, but we > can imagine a new default format which would report such info (htxlog?), > or maybe renaming httplog to http-legacy-log and changing the httplog's > default. We should then consider this opportunity to revisit certain > fields that do not make sense anymore, like the "~" in front of the > frontend's name for SSL, the various timers that need to report idle > and probably user-facing time instead of just data, etc. I +1 on this as a tradeoff (even though ~ in front frontends is already OK by using %f vs. %ft - I understand it's more a matter of leveraging this change in order to remove tech debt). > So I think it's the right place to open such a discussion (what we should > log and whether or not it loses info by default or requires to duplicate > some data while waiting for the response), so that we can reach a better > and more modern solution. I'm open to proposals. That's wider than I initially thought :) -- Pierre
Re: Logging using %HP (path) produce different results with H1 and H2
Willy, Am 25.08.20 um 14:53 schrieb Willy Tarreau: > So I think it's the right place to open such a discussion (what we should > log and whether or not it loses info by default or requires to duplicate > some data while waiting for the response), so that we can reach a better > and more modern solution. I'm open to proposals. > %ID by default, please. See also this issue: https://github.com/haproxy/haproxy/issues/401#issuecomment-562776569 and this: https://github.com/haproxy/haproxy/issues/771 Best regards Tim Düsterhus
Re: Logging using %HP (path) produce different results with H1 and H2
Hi Pierre, On Mon, Aug 24, 2020 at 08:17:05AM +, Pierre Cheynier wrote: > On Fri, Aug 21, 2020 at 8:11 PM William Dauchy wrote: > > So awesome to get the first response from your direct colleague :) > > > I believe this is expected; this behaviour has changed since v2.1 though. > > Indeed, we don't use this logging variable since a long time, so I'm not > really able to confirm if this is so new. > Anyway, I understand this is related to handling h2 and its specifics, still > I think there should be something to fix (in one way or the other) to get > back to a consistent/deterministic meaning of %HP (and maybe in other places > where this had an impact). > > Willy, any thought about this? What is certain is that I don't want to start logging a false or misleading information. The issues stems from browsers using a different way to represent a resource in a request depending on HTTP versions (and it's not their fault, it's the recommended way to do it). In HTTP/1.x we used to have : - relative URIs made of a path only with a separate Host header - absolute URIs made of a scheme + host + path and the Host header having to be repeated. In HTTP/2 this has been greatly simplified and only the second full URI was kept by default. In order to save servers from having to parse these elements, they are sent already split into: - a ":scheme" pseudo header holding the scheme of the URI - a ":authority" pseudo header holding the equivalent of the host part of the URI - a ":path" pseudo header holding the path part of the URI And no Host header is needed anymore. Thus an HTTP/2 request effectively "looks like" an HTTP/1 request using an absolute URI. What causes the mess in the logs is that such HTTP/1 requests are rarely used (most only for proxies), but they are perfectly valid and given that they are often used to take routing decisions, it's mandatory that they are part of the logs. For example if you decide that every *url* starting with "/img" has to be routed to the static server and the rest to the application, you're forgetting that "https://foo/img/"; is valid as well and will be routed to the application. That's what I do not want to report fake or reconstructed information in the logs. In 1.8, what happened when we introduced H2 is that it was directly turned into HTTP/1.1 before being processed and that given that we didn't support server-side H2, the most seamless way to handle it was to just replace everything with origin requests (no authority). That remained till 2.0 since it was not really acceptable to imagine that depending on whether you enabled HTX or not you'd get different logs for the exact same request. But now we cannot cheat anymore, it had caused too much trouble already What I understand however is that it's possible that we need to rethink what we're logging. Maybe instead of logging the URI by default (and missing the Host in HTTP/1) we ought to instead log the scheme, authority and path parts. These are not always there (scheme or authority in H1) but we can log an empty field like "-" in this case. We cannot realistically do that in the default "httplog" format, but we can imagine a new default format which would report such info (htxlog?), or maybe renaming httplog to http-legacy-log and changing the httplog's default. We should then consider this opportunity to revisit certain fields that do not make sense anymore, like the "~" in front of the frontend's name for SSL, the various timers that need to report idle and probably user-facing time instead of just data, etc. There was something important I've been wanting for a few versions, which was to have named log formats that we could declare in a central place and use everywhere. It would tremendously help here. I know it can be done using environment variables declared in the global section but I personally find this ugly. So I think it's the right place to open such a discussion (what we should log and whether or not it loses info by default or requires to duplicate some data while waiting for the response), so that we can reach a better and more modern solution. I'm open to proposals. Cheers, Willy
RE: Logging using %HP (path) produce different results with H1 and H2
On Fri, Aug 21, 2020 at 8:11 PM William Dauchy wrote: So awesome to get the first response from your direct colleague :) > I believe this is expected; this behaviour has changed since v2.1 though. Indeed, we don't use this logging variable since a long time, so I'm not really able to confirm if this is so new. Anyway, I understand this is related to handling h2 and its specifics, still I think there should be something to fix (in one way or the other) to get back to a consistent/deterministic meaning of %HP (and maybe in other places where this had an impact). Willy, any thought about this? Pierre
Re: Logging using %HP (path) produce different results with H1 and H2
On Fri, Aug 21, 2020 at 9:14 PM William Dauchy wrote: > On Fri, Aug 21, 2020 at 7:46 PM Pierre Cheynier wrote: > > We're running HAProxy 2.2.2. > > It turns out logging requests paths using "%HP" var produce a different > > results on H1 vs. H2. > > > > H1: /path > > H2: https://hostname.domain/path (< I consider this one buggy) I've also hit this issue in May (https://www.mail-archive.com/haproxy@formilux.org/msg37170.html), and as many continue to upgrade to newer HAProxy versions, they might hit that as well. Personally I've opted into using variables set from request / response rules, and then logging those in a custom log format. However the major issue, if I remember correctly, isn't that the format has changed between HAProxy 2.0 and 2.1, but that in 2.1 the same `%HP` produces different outputs based on the used protocol, which will complicate log handling... Perhaps a backward incompatible fix would be to always include the protocol and `Host` header (or the IP if one is missing) in case of HTTP/1.0 and HTTP/1.1, and thus have `%HP` always contain an URL. > I believe this is expected; this behaviour has changed since v2.1 though. It might be so, but the changelog for 2.1 says nothing about this change. Ciprian.
Re: Logging using %HP (path) produce different results with H1 and H2
Hello Pierre, On Fri, Aug 21, 2020 at 7:46 PM Pierre Cheynier wrote: > We're running HAProxy 2.2.2. > It turns out logging requests paths using "%HP" var produce a different > results on H1 vs. H2. > > H1: /path > H2: https://hostname.domain/path (< I consider this one buggy) > > No idea where does this comes from exactly, I essentially understand txn->uri > structure ends up being completely different between the 2 code paths. > Anybody here with lightspeed knowledge of src/h* to investigate this? I believe this is expected; this behaviour has changed since v2.1 though. see http://git.haproxy.org/?p=haproxy.git;a=commit;h=30ee1efe676e8264af16bab833c621d60a72a4d7 see also older threads talking about it https://www.mail-archive.com/haproxy@formilux.org/msg37783.html Best, -- William
Logging using %HP (path) produce different results with H1 and H2
Hi list, We're running HAProxy 2.2.2. It turns out logging requests paths using "%HP" var produce a different results on H1 vs. H2. H1: /path H2: https://hostname.domain/path (< I consider this one buggy) No idea where does this comes from exactly, I essentially understand txn->uri structure ends up being completely different between the 2 code paths. Anybody here with lightspeed knowledge of src/h* to investigate this? Best, -- Pierre