Re: TLV problem after updating to 2.1.14
On Tue, Apr 14, 2020 at 05:18:39PM +0500, ??? wrote: > besides junk food, are you ok? Yes, it's just that my mood varies a lot from day to day :-/ We still have *at least* 4 weeks so stand that. It's the first time they dare announcing a possible start of the end. We'll see. These health check details aside, I've pushed the cleaned up version of the fix to the master and 2.1 branches so that Hativ doesn't need to maintain a local patch anymore. Cheers, Willy
Re: TLV problem after updating to 2.1.14
вс, 12 апр. 2020 г. в 21:22, Willy Tarreau : > Hello Hativ, > > On Sun, Apr 12, 2020 at 09:49:02AM +0200, Hativ wrote: > > Hello Willy, > > > Hativ, if I send you a patch to test next week, is it possible to > > > give > > > it a try on your side ? I'm interested in knowing if a clean "LOCAL" > > > connection works fine with Dovecot. If so then in parallel we can > > > file > > > a report on Dovecot to make their parser more robust but at least > > > we'd > > > have a longterm solution that doesn't affect deployed servers. > > > > I didn't get a patch so far. > > I'm really sorry for this. It just turns out that having been enclosed for > 4 weeks eating only junk food starts to seriously challenge my ability to > concentrate on any work and stay focused more than five seconds... and the > total lack of perspectives of any imminent improvement of this painful > situation definitely does not help :-( > besides junk food, are you ok? > > Here comes a patch (copy-pasted, you just have to copy that new line). > It's not the final one, because if it works, then we'll also need to > remoe the unused local-specific parts from the "else" block, but that's > a pure detail for now. > > Thanks, > Willy > > diff --git a/src/connection.c b/src/connection.c > index 5a2fd95bc2..0bafcdb919 100644 > --- a/src/connection.c > +++ b/src/connection.c > @@ -1378,6 +1378,7 @@ int make_proxy_line_v2(char *buf, int buf_len, > struct server *srv, struct connec > /* At least one of src or dst is not of AF_INET or AF_INET6 */ > if ( !src >|| !dst > + || conn_is_back(remote) >|| (src->ss_family != AF_INET && src->ss_family != AF_INET6) >|| (dst->ss_family != AF_INET && dst->ss_family != AF_INET6)) { > if (buf_len < PP2_HDR_LEN_UNSPEC) > > > >
Re: TLV problem after updating to 2.1.14
On Sun, Apr 12, 2020 at 10:32:40PM +0200, Hativ wrote: > Thanks for the patch, tested it, worked. Great, thanks! So I think we'll have to merge this one. What I think would be the best approach would be to get backport it to all branches where the original patch that revealed the bug was backported, and if we observe any other report, then we'll revert all of them from stable branches and keep it only in master so that failing products can be fixed and have something to be tested against. Willy
Re: TLV problem after updating to 2.1.14
Am Sonntag, den 12.04.2020, 18:19 +0200 schrieb Willy Tarreau: > Hello Hativ, > On Sun, Apr 12, 2020 at 09:49:02AM +0200, Hativ wrote: > > Hello Willy, > > > Hativ, if I send you a patch to test next week, is it possible > > > togiveit a try on your side ? I'm interested in knowing if a > > > clean "LOCAL"connection works fine with Dovecot. If so then in > > > parallel we canfilea report on Dovecot to make their parser more > > > robust but at leastwe'dhave a longterm solution that doesn't > > > affect deployed servers. > > > > I didn't get a patch so far. > > I'm really sorry for this. It just turns out that having been > enclosed for4 weeks eating only junk food starts to seriously > challenge my ability toconcentrate on any work and stay focused more > than five seconds... and thetotal lack of perspectives of any > imminent improvement of this painfulsituation definitely does not > help :-( > Here comes a patch (copy-pasted, you just have to copy that new > line).It's not the final one, because if it works, then we'll also > need to remoe the unused local-specific parts from the "else" block, > but that'sa pure detail for now. > Thanks,Willy > diff --git a/src/connection.c b/src/connection.cindex > 5a2fd95bc2..0bafcdb919 100644--- a/src/connection.c+++ > b/src/connection.c@@ -1378,6 +1378,7 @@ int make_proxy_line_v2(char > *buf, int buf_len, struct server *srv, struct connec/* At > least one of src or dst is not of AF_INET or AF_INET6 */if > ( !src || !dst+ || > conn_is_back(remote) || (src->ss_family != AF_INET && src- > >ss_family != AF_INET6) || (dst->ss_family != AF_INET && > dst->ss_family != AF_INET6)) {if (buf_len < > PP2_HDR_LEN_UNSPEC) > Hello Willy, I wish you all the best. Thanks for the patch, tested it, worked. -- Greetings Hativ
RE: TLV problem after updating to 2.1.14
Hi Willy, 4 week of junk food does clearly have it's drawbacks, I hope you at least managed some Easter/holiday themed food ;) We're happy and grateful that you are still around supporting us, even through these difficult times. Stay safe and all the best wishes for many years of haProxy still to come ;) Regards, Stefan -Original Message- From: Willy Tarreau Sent: Sunday, 12 April 2020 18:19 To: Hativ Cc: Tim Düsterhus ; haproxy@formilux.org Subject: Re: TLV problem after updating to 2.1.14 Hello Hativ, On Sun, Apr 12, 2020 at 09:49:02AM +0200, Hativ wrote: > Hello Willy, > > Hativ, if I send you a patch to test next week, is it possible to > > give it a try on your side ? I'm interested in knowing if a clean > > "LOCAL" > > connection works fine with Dovecot. If so then in parallel we can > > file a report on Dovecot to make their parser more robust but at > > least we'd have a longterm solution that doesn't affect deployed > > servers. > > I didn't get a patch so far. I'm really sorry for this. It just turns out that having been enclosed for 4 weeks eating only junk food starts to seriously challenge my ability to concentrate on any work and stay focused more than five seconds... and the total lack of perspectives of any imminent improvement of this painful situation definitely does not help :-( Here comes a patch (copy-pasted, you just have to copy that new line). It's not the final one, because if it works, then we'll also need to remoe the unused local-specific parts from the "else" block, but that's a pure detail for now. Thanks, Willy diff --git a/src/connection.c b/src/connection.c index 5a2fd95bc2..0bafcdb919 100644 --- a/src/connection.c +++ b/src/connection.c @@ -1378,6 +1378,7 @@ int make_proxy_line_v2(char *buf, int buf_len, struct server *srv, struct connec /* At least one of src or dst is not of AF_INET or AF_INET6 */ if ( !src || !dst + || conn_is_back(remote) || (src->ss_family != AF_INET && src->ss_family != AF_INET6) || (dst->ss_family != AF_INET && dst->ss_family != AF_INET6)) { if (buf_len < PP2_HDR_LEN_UNSPEC)
Re: TLV problem after updating to 2.1.14
Hello Hativ, On Sun, Apr 12, 2020 at 09:49:02AM +0200, Hativ wrote: > Hello Willy, > > Hativ, if I send you a patch to test next week, is it possible to > > give > > it a try on your side ? I'm interested in knowing if a clean "LOCAL" > > connection works fine with Dovecot. If so then in parallel we can > > file > > a report on Dovecot to make their parser more robust but at least > > we'd > > have a longterm solution that doesn't affect deployed servers. > > I didn't get a patch so far. I'm really sorry for this. It just turns out that having been enclosed for 4 weeks eating only junk food starts to seriously challenge my ability to concentrate on any work and stay focused more than five seconds... and the total lack of perspectives of any imminent improvement of this painful situation definitely does not help :-( Here comes a patch (copy-pasted, you just have to copy that new line). It's not the final one, because if it works, then we'll also need to remoe the unused local-specific parts from the "else" block, but that's a pure detail for now. Thanks, Willy diff --git a/src/connection.c b/src/connection.c index 5a2fd95bc2..0bafcdb919 100644 --- a/src/connection.c +++ b/src/connection.c @@ -1378,6 +1378,7 @@ int make_proxy_line_v2(char *buf, int buf_len, struct server *srv, struct connec /* At least one of src or dst is not of AF_INET or AF_INET6 */ if ( !src || !dst + || conn_is_back(remote) || (src->ss_family != AF_INET && src->ss_family != AF_INET6) || (dst->ss_family != AF_INET && dst->ss_family != AF_INET6)) { if (buf_len < PP2_HDR_LEN_UNSPEC)
Re: TLV problem after updating to 2.1.14
Hello Willy, > Hativ, if I send you a patch to test next week, is it possible to > give > it a try on your side ? I'm interested in knowing if a clean "LOCAL" > connection works fine with Dovecot. If so then in parallel we can > file > a report on Dovecot to make their parser more robust but at least > we'd > have a longterm solution that doesn't affect deployed servers. I didn't get a patch so far. Am Samstag, den 04.04.2020, 14:17 +0200 schrieb Willy Tarreau: > On Sat, Apr 04, 2020 at 01:49:06PM +0200, Tim Düsterhus wrote: > > > Do you think we ought to refrain from sending any address at all > > > ?I preferred to avoid possibly visible changes and apparently it > > > didn'tgo that well :-/ > > > > Based on a strict reading of the proxy protocol definition Dovecot > > iswrong here: > > > The receiver must accept this connection as valid and must use > > > thereal connection endpoints and discard the protocol block > > > including thefamily which is ignored. > > Sure but one component being wrong doesn't deserve being broken by > aunilateral change :-/ > > To my understanding the TLV are part of the protocol block, because > > theywere bolted onto the proxy protocol v2 after the fact in > > commitafb768340c9d7e50d8e7372051c8a9c3b3f2151c( > > https://github.com/haproxy/haproxy/commit/afb768340c9d7e50d8e7372051c8a9c3b3f2151c).I > > also notice that this commit made an incompatible change to > > proxyprotocol v2 1.5 years after the initial specification (however > > bothduring the 1.5 development cycle). > > Yes TLVs were added after the initial v2 spec. > > However HAProxy violates a should: > > > When a sender presents aLOCAL connection, it should not present > > > any address so it sets this field tozero. > > So that fuels the fact that we should definitely get rid of > theseaddresses and we should possibly expect less breakage by doing > so. > > My conclusion is that the proxy protocol v2 definition is not > > superclear with regards to TLV. I assume that is because they were > > notinitially part of the specification. Not sending an explicit > > length forthe address and instead only a length for address + TLV > > is non-ideal.Another case in point: My bugfix for the TLV parsing > > that ignored thelength of the proxy header, because each TLV had > > its own length. > > So I think that I'll revert the fix from stable branches, becauseit > was meant to address bug #511 which is not that important. And > inmaster we'd instead send a real, naked LOCAL request that > complieswith the rule above. > Hativ, if I send you a patch to test next week, is it possible to > giveit a try on your side ? I'm interested in knowing if a clean > "LOCAL"connection works fine with Dovecot. If so then in parallel we > can filea report on Dovecot to make their parser more robust but at > least we'dhave a longterm solution that doesn't affect deployed > servers. > Thanks!Willy -- Greetings Hativ
Re: TLV problem after updating to 2.1.14
Hello Willy, > Hativ, if I send you a patch to test next week, is it possible to > give > it a try on your side ? I'm interested in knowing if a clean "LOCAL" > connection works fine with Dovecot. If so then in parallel we can > file > a report on Dovecot to make their parser more robust but at least > we'd > have a longterm solution that doesn't affect deployed servers. Yes, sure. Am Samstag, den 04.04.2020, 14:17 +0200 schrieb Willy Tarreau: > Hativ, if I send you a patch to test next week, is it possible to > giveit a try on your side ? I'm interested in knowing if a clean > "LOCAL"connection works fine with Dovecot. If so then in parallel we > can filea report on Dovecot to make their parser more robust but at > least we'dhave a longterm solution that doesn't affect deployed > servers. -- Greetings Hativ
Re: TLV problem after updating to 2.1.14
On Sat, Apr 04, 2020 at 01:49:06PM +0200, Tim Düsterhus wrote: > > Do you think we ought to refrain from sending any address at all ? > > I preferred to avoid possibly visible changes and apparently it didn't > > go that well :-/ > > > > Based on a strict reading of the proxy protocol definition Dovecot is > wrong here: > > > The receiver must accept this connection as valid and must use the > > real connection endpoints and discard the protocol block including the > > family which is ignored. Sure but one component being wrong doesn't deserve being broken by a unilateral change :-/ > To my understanding the TLV are part of the protocol block, because they > were bolted onto the proxy protocol v2 after the fact in commit > afb768340c9d7e50d8e7372051c8a9c3b3f2151c > (https://github.com/haproxy/haproxy/commit/afb768340c9d7e50d8e7372051c8a9c3b3f2151c). > I also notice that this commit made an incompatible change to proxy > protocol v2 1.5 years after the initial specification (however both > during the 1.5 development cycle). Yes TLVs were added after the initial v2 spec. > However HAProxy violates a should: > > > When a sender presents a > > LOCAL connection, it should not present any address so it sets this field to > > zero. So that fuels the fact that we should definitely get rid of these addresses and we should possibly expect less breakage by doing so. > My conclusion is that the proxy protocol v2 definition is not super > clear with regards to TLV. I assume that is because they were not > initially part of the specification. Not sending an explicit length for > the address and instead only a length for address + TLV is non-ideal. > Another case in point: My bugfix for the TLV parsing that ignored the > length of the proxy header, because each TLV had its own length. So I think that I'll revert the fix from stable branches, because it was meant to address bug #511 which is not that important. And in master we'd instead send a real, naked LOCAL request that complies with the rule above. Hativ, if I send you a patch to test next week, is it possible to give it a try on your side ? I'm interested in knowing if a clean "LOCAL" connection works fine with Dovecot. If so then in parallel we can file a report on Dovecot to make their parser more robust but at least we'd have a longterm solution that doesn't affect deployed servers. Thanks! Willy
Re: TLV problem after updating to 2.1.14
Willy, Am 04.04.20 um 13:29 schrieb Willy Tarreau: >> Am 04.04.20 um 12:41 schrieb Tim Düsterhus: >>> The Dovecot source code is here: >>> https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L354 >>> >>> A quick glance at the Dovecot code looks like Dovecot parses the proxy >>> protocol correctly with regard to TLVs. >> >> I spoke too soon here. Dovecot ignores the addresses as it should for >> LOCAL (i.e. health check) connections [1] and then attempts to parse the >> TLV, interpreting the IP addresses as a TLV [2]. From my understanding >> HAProxy sends IP addresses for LOCAL connections for compatibility. > > Do you think we ought to refrain from sending any address at all ? > I preferred to avoid possibly visible changes and apparently it didn't > go that well :-/ > Based on a strict reading of the proxy protocol definition Dovecot is wrong here: > The receiver must accept this connection as valid and must use the > real connection endpoints and discard the protocol block including the > family which is ignored. To my understanding the TLV are part of the protocol block, because they were bolted onto the proxy protocol v2 after the fact in commit afb768340c9d7e50d8e7372051c8a9c3b3f2151c (https://github.com/haproxy/haproxy/commit/afb768340c9d7e50d8e7372051c8a9c3b3f2151c). I also notice that this commit made an incompatible change to proxy protocol v2 1.5 years after the initial specification (however both during the 1.5 development cycle). As the TLV are part of the protocol block they must be ignored for LOCAL connections to my understanding, but I'm not sure. I already mentioned / asked about that within my email [PATCH 2/3] MINOR: proxy_protocol: Ingest PP2_TYPE_UNIQUE_ID on incoming connections (Message-Id: <20200306121540.28050-3-...@bastelstu.be>): > I've also added a reg-test that verifies that pulling unique IDs works > properly. My first attempt was to use a LOCAL connection, because then > I would not need to send IP addresses in HEX encoding, however HAProxy > does not appear to read TLVs in LOCAL mode. I've attempted to find out > whether TLVs are supported for LOCAL mode or not in the proxy-protocol > specification, but it was not terribly clear. Supporting unique IDs in > LOCAL mode would definitely make sense to me. And supporting the CRC32 > checksum would also make sense I guess. However HAProxy violates a should: > When a sender presents a > LOCAL connection, it should not present any address so it sets this field to > zero. My conclusion is that the proxy protocol v2 definition is not super clear with regards to TLV. I assume that is because they were not initially part of the specification. Not sending an explicit length for the address and instead only a length for address + TLV is non-ideal. Another case in point: My bugfix for the TLV parsing that ignored the length of the proxy header, because each TLV had its own length. Best regards Tim Düsterhus
Re: TLV problem after updating to 2.1.14
On Sat, Apr 04, 2020 at 12:52:07PM +0200, Tim Düsterhus wrote: > Hativ, > Willy, > > Am 04.04.20 um 12:41 schrieb Tim Düsterhus: > > The Dovecot source code is here: > > https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L354 > > > > A quick glance at the Dovecot code looks like Dovecot parses the proxy > > protocol correctly with regard to TLVs. > > I spoke too soon here. Dovecot ignores the addresses as it should for > LOCAL (i.e. health check) connections [1] and then attempts to parse the > TLV, interpreting the IP addresses as a TLV [2]. From my understanding > HAProxy sends IP addresses for LOCAL connections for compatibility. Do you think we ought to refrain from sending any address at all ? I preferred to avoid possibly visible changes and apparently it didn't go that well :-/ Willy
Re: TLV problem after updating to 2.1.14
Hativ, Willy, Am 04.04.20 um 12:41 schrieb Tim Düsterhus: > The Dovecot source code is here: > https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L354 > > A quick glance at the Dovecot code looks like Dovecot parses the proxy > protocol correctly with regard to TLVs. I spoke too soon here. Dovecot ignores the addresses as it should for LOCAL (i.e. health check) connections [1] and then attempts to parse the TLV, interpreting the IP addresses as a TLV [2]. From my understanding HAProxy sends IP addresses for LOCAL connections for compatibility. Best regards Tim Düsterhus [1] https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L386-L390 [2] https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L456
Re: TLV problem after updating to 2.1.14
Hativ, Am 04.04.20 um 08:22 schrieb Hativ: > what I've found in the meantime: Dovecot's error messages even appear > permanently, regardless of the TCP check. > > Reverting that commit (7f26391bc51ad56c31480d03f56e1db604f1c617) back solves > the issue. No more error message in Dovecot and the Layer 7 check works again. > > What's the cause of this? Something wrong with the commit or is Dovecot wrong? > That's to be determined. The error message sounds as if Dovecot expects a TLV that is not being sent. Can you possibly provide a PCAP of a single (failing) health request to Dovecot with the commit NOT reverted? Also Ccing Willy as the author of that commit. The Dovecot source code is here: https://github.com/dovecot/core/blob/de9968d623e331a18b43dfe8a00421f72f7f7962/src/lib-master/master-service-haproxy.c#L354 A quick glance at the Dovecot code looks like Dovecot parses the proxy protocol correctly with regard to TLVs. Best regards Tim Düsterhus
Re: TLV problem after updating to 2.1.14
Hello Tim, what I've found in the meantime: Dovecot's error messages even appear permanently, regardless of the TCP check. Reverting that commit (7f26391bc51ad56c31480d03f56e1db604f1c617) back solves the issue. No more error message in Dovecot and the Layer 7 check works again. What's the cause of this? Something wrong with the commit or is Dovecot wrong? -- Greetings Hativ Am Freitag, den 03.04.2020, 11:27 +0200 schrieb Tim Düsterhus: > Hativ, > Am 03.04.20 um 00:38 schrieb Hativ: > > Any ideas what's wrong? > > I would assume that this patch changed the behavior there: > https://github.com/haproxy/haproxy/commit/7f26391bc51ad56c31480d03f56e1db604f1c617 > > Can you try reverting that to check whether it is the cause? > Best regardsTim Düsterhus
Re: TLV problem after updating to 2.1.14
Hativ, Am 03.04.20 um 00:38 schrieb Hativ: > Any ideas what's wrong? > I would assume that this patch changed the behavior there: https://github.com/haproxy/haproxy/commit/7f26391bc51ad56c31480d03f56e1db604f1c617 Can you try reverting that to check whether it is the cause? Best regards Tim Düsterhus
TLV problem after updating to 2.1.14
Hello, after updating HAProxy from 2.1.13 to 2.1.14 the TCP check with my dovecot setup doesn't work anymore. Nothing changed except the update. In dovecot I get the following errors: > Apr 03 00:21:28 srv1 dovecot[1378]: submission-login: Error: > haproxy(v2): Client disconnected: Invalid TLV: get_tlv(0) > failed:Truncated data (cmd=00, rip=) > Apr 03 00:21:28 srv1 dovecot[1378]: managesieve-login: Error: > haproxy(v2): Client disconnected: Invalid TLV: get_tlv(0) > failed:Truncated data (cmd=00, rip=) > Apr 03 00:21:28 srv1 dovecot[1378]: imap-login: Error: haproxy(v2): > Client disconnected: Invalid TLV: get_tlv(0) failed:Truncated data > (cmd=00, rip=) HAProxy log: > Apr 03 00:13:10 srv1 haproxy[3774]: [ALERT] 093/001310 (3777) : proxy > 'msa1-smtps' has no server available! > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Backup Server msa1-smtps/msa1-2 is DOWN, reason: Socket error, info: > "SSL handshake failure (Connection reset by peer) at step 1 of tcp- > check (conn > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Server msa1-smtps/msa1-1 is DOWN, reason: Socket error, info: "SSL > handshake failure (Connection reset by peer) at step 1 of tcp-check > (connect por > Apr 03 00:13:10 srv1 haproxy[3774]: [ALERT] 093/001310 (3777) : proxy > 'mda1-managesieve' has no server available! > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Backup Server mda1-managesieve/mda1-2 is DOWN, reason: Socket error, > info: "SSL handshake failure (Connection reset by peer) at step 1 of > tcp-check > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Server mda1-managesieve/mda1-1 is DOWN, reason: Socket error, info: > "SSL handshake failure (Connection reset by peer) at step 1 of tcp- > check (conne > Apr 03 00:13:10 srv1 haproxy[3774]: [ALERT] 093/001310 (3777) : proxy > 'mda1-imaps' has no server available! > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Backup Server mda1-imaps/mda1-2 is DOWN, reason: Socket error, info: > "SSL handshake failure (Connection reset by peer) at step 1 of tcp- > check (conn > Apr 03 00:13:10 srv1 haproxy[3774]: [WARNING] 093/001310 (3777) : > Server mda1-imaps/mda1-1 is DOWN, reason: Socket error, info: "SSL > handshake failure (Connection reset by peer) at step 1 of tcp-check > (connect por > Apr 03 00:13:10 srv1 haproxy[3774]: [NOTICE] 093/001309 (3774) : New > worker #1 (3777) forked > Apr 03 00:13:09 srv1 systemd[1]: Started HAProxy Load Balancer. > Apr 03 00:13:09 srv1 systemd[1]: Starting HAProxy Load Balancer... Example HAProxy config for IMAP: > listen mda1-imapsbind :993bind :993 > balance leastconn > option tcp-checktcp-check connect port 993 send-proxy > ssltcp-check expect string * OK > option tcpkaoption tcplog > stick-table type ip size 200k expire 30mstick on src > server mda1-1 mda1-1.example.com:993 ca-file /etc/ssl/certs/ca- > certificates.crt check resolvers dns send-proxy-v2server mda1-2 > mda1-2.example.com:993 ca-file /etc/ssl/certs/ca-certificates.crt > check resolvers dns send-proxy-v2 backup > timeout connect 5s > timeout client 30m > timeout server 30m When commenting out these lines it's up again: > option tcp-check > tcp-check connect port 993 send-proxy ssl > tcp-check expect string * OK Any ideas what's wrong? -- Greetings Hativ