Re: haproxy 2.4 and Kafka sink/source connector issues

2023-08-02 Thread David Greenwald
We've tested 2.3.21 and 2.2.30 successfully, so it appears to be a 2.4
addition. We've tested 2.4.23 and the latest 2.7 and 2.8 versions.






*David GreenwaldSenior Site Reliability
engineerdavid.greenw...@discogsinc.com *


On Tue, Aug 1, 2023 at 9:16 PM Willy Tarreau  wrote:

> On Tue, Aug 01, 2023 at 08:38:24PM -0700, David Greenwald wrote:
> > Thanks for the response! That seems unlikely, we're doing an httpchk
> > to the clustercheck
> > utility
> > <
> https://docs.percona.com/percona-xtradb-cluster/5.7/howtos/virt_sandbox.html
> >
> > following the pxc reference architecture, so not actually making a direct
> > database request from haproxy. We're also accessing the database with the
> > same healthchecks from Python web applications without any issues, we're
> > just seeing this from the long-lived connections, specifically JDBC sink
> > connectors and Debezium as a source connector.
> >
> > This seems to be a pretty esoteric issue and we've come up empty in our
> > Googling, unfortunately.
>
> OK. You said you encountered the problem when migrating to 2.4, what
> was the last version you're aware of that didn't cause this problem ?
>
> Willy
>

-- 
The contents of this communication are confidential. If you are not the 
intended recipient, please immediately notify the sender by reply email and 
delete this message and its attachments, if any.



Re: haproxy 2.4 and Kafka sink/source connector issues

2023-08-01 Thread Willy Tarreau
On Tue, Aug 01, 2023 at 08:38:24PM -0700, David Greenwald wrote:
> Thanks for the response! That seems unlikely, we're doing an httpchk
> to the clustercheck
> utility
> 
> following the pxc reference architecture, so not actually making a direct
> database request from haproxy. We're also accessing the database with the
> same healthchecks from Python web applications without any issues, we're
> just seeing this from the long-lived connections, specifically JDBC sink
> connectors and Debezium as a source connector.
> 
> This seems to be a pretty esoteric issue and we've come up empty in our
> Googling, unfortunately.

OK. You said you encountered the problem when migrating to 2.4, what
was the last version you're aware of that didn't cause this problem ?

Willy



Re: haproxy 2.4 and Kafka sink/source connector issues

2023-08-01 Thread David Greenwald
Thanks for the response! That seems unlikely, we're doing an httpchk
to the clustercheck
utility

following the pxc reference architecture, so not actually making a direct
database request from haproxy. We're also accessing the database with the
same healthchecks from Python web applications without any issues, we're
just seeing this from the long-lived connections, specifically JDBC sink
connectors and Debezium as a source connector.

This seems to be a pretty esoteric issue and we've come up empty in our
Googling, unfortunately.





*David GreenwaldSenior Site Reliability
engineerdavid.greenw...@discogsinc.com *


On Tue, Aug 1, 2023 at 8:25 PM Willy Tarreau  wrote:

> Hi David,
>
> On Tue, Aug 01, 2023 at 05:11:48PM -0700, David Greenwald wrote:
> > Hi all,
> >
> > Looking for some help with a networking issue we've been debugging for
> > several days. We use haproxy to TCP load-balance between Kafka Connectors
> > and a Percona MySQL cluster. In this set-up, the connectors (i.e., Java
> > JDBC) maintain long-running connections and read the database binlogs.
> This
> > includes regular polling.
> >
> > We have seen connection issues starting with haproxy 2.4 and persisting
> > through 2.8 which result in the following errors in MySQL:
> >
> > 2023-07-31T17:25:45.745607Z 3364649 [Note] Got an error reading
> > communication packets
> >
> > As you can see, this doesn't include a host or user and is happening
> early
> > in the connection. The host cache shows handshake errors here regularly
> > accumulating.
> >
> > We were unable to see errors on the haproxy side with tcplog on and have
> > been unable to get useful information from tcpdump, netstat, etc.
> >
> > We are aware FE/BE connection closure behavior changed in 2.4. The 2.4
> > option of idle-close-on-response seemed like a possible solution but
> isn't
> > compatible with mode tcp, so we're not sure what's happening here or next
> > steps for debugging. Appreciate any help or guidance here.
>
> The changes you're referring to are indeed only in HTTP mode. Have you
> tried without health checks ? I'm indeed wondering if the handshake errors
> you're observing are not simply health checks, which would explain why
> you're not seeing them in your traffic logs.
>
> Willy
>

-- 
The contents of this communication are confidential. If you are not the 
intended recipient, please immediately notify the sender by reply email and 
delete this message and its attachments, if any.



Re: haproxy 2.4 and Kafka sink/source connector issues

2023-08-01 Thread Willy Tarreau
Hi David,

On Tue, Aug 01, 2023 at 05:11:48PM -0700, David Greenwald wrote:
> Hi all,
> 
> Looking for some help with a networking issue we've been debugging for
> several days. We use haproxy to TCP load-balance between Kafka Connectors
> and a Percona MySQL cluster. In this set-up, the connectors (i.e., Java
> JDBC) maintain long-running connections and read the database binlogs. This
> includes regular polling.
> 
> We have seen connection issues starting with haproxy 2.4 and persisting
> through 2.8 which result in the following errors in MySQL:
> 
> 2023-07-31T17:25:45.745607Z 3364649 [Note] Got an error reading
> communication packets
> 
> As you can see, this doesn't include a host or user and is happening early
> in the connection. The host cache shows handshake errors here regularly
> accumulating.
> 
> We were unable to see errors on the haproxy side with tcplog on and have
> been unable to get useful information from tcpdump, netstat, etc.
> 
> We are aware FE/BE connection closure behavior changed in 2.4. The 2.4
> option of idle-close-on-response seemed like a possible solution but isn't
> compatible with mode tcp, so we're not sure what's happening here or next
> steps for debugging. Appreciate any help or guidance here.

The changes you're referring to are indeed only in HTTP mode. Have you
tried without health checks ? I'm indeed wondering if the handshake errors
you're observing are not simply health checks, which would explain why
you're not seeing them in your traffic logs.

Willy



Re: haproxy 2.4 and Kafka sink/source connector issues

2023-08-01 Thread Brendan Kearney

hey,

first, use "option mysql-check", for better service checking. you'll 
have to add a user and access to the database, and the howto is in the 
configuration.txt file 
(https://www.haproxy.org/download/2.1/doc/configuration.txt).  the 
"option httpchk" is doing you nothing because the backend isnt talking 
HTTP and the mode is tcp, for mysql.


second, look into the proxy protocol, and you can have HAProxy send the 
client IP in a TCP header, similar to the X-Forwarded-For header in 
HTTP.  you need to add a line like:


proxy-protocol-networks=::1, localhost, 

into the my.cnf or mariadb-server.cnf file.  replace the ip with a 
network cidr, without the brackets, to specify client ranges that should 
be sent using the proxy protocol.  then add the check option 
"send-proxy-v2" to the sever line in the backend of HAproxy.  mine is:


server mariadb1 192.168.88.1:3306 check inter 1 send-proxy-v2

this will help you better identify the client that is losing the connection.

if there is a firewall between the client and HAProxy, look at the logs 
there.  the firewall could be reaping the connections if they are long 
running and the firewall hits a threshold, gets busy or maybe has a 
policy update pushed to it.  something in between could be an issue.


hope this helps,

brendan

On 8/1/23 8:11 PM, David Greenwald wrote:

Hi all,

Looking for some help with a networking issue we've been debugging for 
several days. We use haproxy to TCP load-balance between Kafka 
Connectors and a Percona MySQL cluster. In this set-up, the connectors 
(i.e., Java JDBC) maintain long-running connections and read the 
database binlogs. This includes regular polling.


We have seen connection issues starting with haproxy 2.4 and 
persisting through 2.8 which result in the following errors in MySQL:


|2023-07-31T17:25:45.745607Z 3364649 [Note] Got an error reading 
communication packets|


As you can see, this doesn't include a host or user and is happening 
early in the connection. The host cache shows handshake errors here 
regularly accumulating.


We were unable to see errors on the haproxy side with tcplog on and 
have been unable to get useful information from tcpdump, netstat, etc.


We are aware FE/BE connection closure behavior changed in 2.4. The 2.4 
option of idle-close-on-response seemed like a possible solution but 
isn't compatible with mode tcp, so we're not sure what's happening 
here or next steps for debugging. Appreciate any help or guidance here.


We're running haproxy in Kubernetes using the official container, and 
are also not seeing any issues with current haproxy versions with our 
other (Python) applications.


A simplified version of our config:

global
     daemon
     maxconn 25000

defaults
     balance roundrobin
     option dontlognull
     option redispatch
     timeout http-request 5s
     timeout queue 1m
     timeout connect 4s
     timeout client 50s
     timeout server 30s
     timeout http-keep-alive 10s
     timeout check 10s
     retries 3

frontend main_writer
     bind :3306
     mode tcp
     timeout client 30s
     timeout client-fin 30s
     default_backend main_writer

backend main_writer
     mode tcp
     balance leastconn
     option httpchk
     timeout server 30s
     timeout tunnel 3h
     server db1 :3306 check port 9200 on-marked-down 
shutdown-sessions weight 100 inter 3s rise 1 fall 2
     server db2 :3306 check port 9200 on-marked-down 
shutdown-sessions weight 100 backup
     server db3 :3306 check port 9200 on-marked-down 
shutdown-sessions weight 100 backup





*

David Greenwald

Senior Site Reliability Engineer


david.greenw...@discogsinc.com

*

The contents of this communication are confidential. If you are not 
the intended recipient, please immediately notify the sender by reply 
email and delete this message and its attachments, if any.

haproxy 2.4 and Kafka sink/source connector issues

2023-08-01 Thread David Greenwald
Hi all,

Looking for some help with a networking issue we've been debugging for
several days. We use haproxy to TCP load-balance between Kafka Connectors
and a Percona MySQL cluster. In this set-up, the connectors (i.e., Java
JDBC) maintain long-running connections and read the database binlogs. This
includes regular polling.

We have seen connection issues starting with haproxy 2.4 and persisting
through 2.8 which result in the following errors in MySQL:

2023-07-31T17:25:45.745607Z 3364649 [Note] Got an error reading
communication packets

As you can see, this doesn't include a host or user and is happening early
in the connection. The host cache shows handshake errors here regularly
accumulating.

We were unable to see errors on the haproxy side with tcplog on and have
been unable to get useful information from tcpdump, netstat, etc.

We are aware FE/BE connection closure behavior changed in 2.4. The 2.4
option of idle-close-on-response seemed like a possible solution but isn't
compatible with mode tcp, so we're not sure what's happening here or next
steps for debugging. Appreciate any help or guidance here.

We're running haproxy in Kubernetes using the official container, and are
also not seeing any issues with current haproxy versions with our other
(Python) applications.

A simplified version of our config:

global
 daemon
 maxconn 25000

defaults
 balance roundrobin
 option dontlognull
 option redispatch
 timeout http-request 5s
 timeout queue 1m
 timeout connect 4s
 timeout client 50s
 timeout server 30s
 timeout http-keep-alive 10s
 timeout check 10s
 retries 3

frontend main_writer
 bind :3306
 mode tcp
 timeout client 30s
 timeout client-fin 30s
 default_backend main_writer

backend main_writer
 mode tcp
 balance leastconn
 option httpchk
 timeout server 30s
 timeout tunnel 3h
 server db1 :3306 check port 9200 on-marked-down shutdown-sessions
weight 100 inter 3s rise 1 fall 2
 server db2 :3306 check port 9200 on-marked-down shutdown-sessions
weight 100 backup
 server db3 :3306 check port 9200 on-marked-down shutdown-sessions
weight 100 backup








*David GreenwaldSenior Site Reliability
engineerdavid.greenw...@discogsinc.com *

-- 
The contents of this communication are confidential. If you are not the 
intended recipient, please immediately notify the sender by reply email and 
delete this message and its attachments, if any.