Re: Is spoa thread safe?
Reproduction is pretty simple: use the example configuration from haproxy/contrib/spoa_server start up a haproxy version with default nbthread value greater then 1 like this: "haproxy -f spoa-server.conf -d -Ws" start the example spoa script: "./spoa -f ps_python.py -d" run some curl requests: "curl localhost:10001/" the variable sess.iprep.ip_score is only set on the first and n-th request On 9/22/20 8:10 AM, Илья Шипицин wrote: > Can try to rebuild haproxy with thread sanitizer? > > Or describe repro steps > > On Mon, Sep 21, 2020, 9:49 PM Reinhard Vicinus <mailto:rvici...@metaways.de>> wrote: > > Hi, > > if I try to connect to a spoa server with haproxy 2.2.3 and nbthread > greater then 1, then if n is the nbthread value, only the first and > thereafter n-th requests are answered correct and requests in between > are not correctly forwarded to the spoa server. Is this a known > limitation or is this a bug that will hopefully get fixed in the > future? > Or am I missing something? > > Kind regards > Reinhard Vicinus > -- Reinhard Vicinus Metaways Infosystems GmbH Pickhuben 2, D-20457 Hamburg E-Mail: r.vici...@metaways.de Web:http://www.metaways.de Tel:+49 (0)40 317031-524 Fax:+49 (0)40 317031-10 Metaways Infosystems GmbH - Sitz: D-22967 Tremsbüttel Handelsregister: Amtsgericht Lübeck HRB 4508 AH Geschäftsführung: Hermann Thaele, Lüder-H. Thaele
Is spoa thread safe?
Hi, if I try to connect to a spoa server with haproxy 2.2.3 and nbthread greater then 1, then if n is the nbthread value, only the first and thereafter n-th requests are answered correct and requests in between are not correctly forwarded to the spoa server. Is this a known limitation or is this a bug that will hopefully get fixed in the future? Or am I missing something? Kind regards Reinhard Vicinus
Different behavior of multiple track-sc0 in haproxy 2.2
Hi, the following haproxy configuration produces different results with haproxy 2.0.16 (and 1.8.20) vs 2.2.3 running with "haproxy -f haproxy.conf -d": global stats socket /tmp/haproxy.sock mode 660 user haproxy group haproxy level admin defaults mode http frontend fe-a bind :10001 http-request track-sc0 src table be-a http-request track-sc0 src table be-b backend be-b stick-table type ip size 100k expire 30s store gpc0,http_req_rate(10s),http_err_rate(10s) backend be-a stick-table type ip size 100k expire 30s store gpc0,http_req_rate(10s),http_err_rate(10s) With haproxy 2.0.16 (and 1.8.20) only the table be-a gets updated. With haproxy 2.2.3 both table gets updated. Here the content of the tables after one call from localhost: haproxy 2.0.16: echo "show table be-a; show table be-b" | socat stdio UNIX-CONNECT:/tmp/haproxy.sock # table: be-a, type: ip, size:102400, used:1 0x7f248401e340: key=127.0.0.1 use=0 exp=28581 gpc0=0 http_req_rate(1)=1 http_err_rate(1)=1 # table: be-b, type: ip, size:102400, used:0 haproxy 2.2.3: echo "show table be-a; show table be-b" | socat stdio UNIX-CONNECT:/tmp/haproxy.sock # table: be-a, type: ip, size:102400, used:1 0x558b9bcbc470: key=127.0.0.1 use=0 exp=20624 gpc0=0 http_req_rate(1)=1 http_err_rate(1)=0 # table: be-b, type: ip, size:102400, used:1 0x558b9bcbc540: key=127.0.0.1 use=1 exp=20624 gpc0=0 http_req_rate(1)=1 http_err_rate(1)=0 Is this behavior a bug or a feature? And if it is a feature, is there some documentation somewhere, because there is no change to the track-sc0 option in the configuration documentation. Kind regards Reinhard Vicinus
Re: Dynamic Googlebot identification via lua?
Tim, On 9/12/20 4:25 PM, Tim Düsterhus wrote: > Reinhard, > > Am 12.09.20 um 12:45 schrieb Reinhard Vicinus: >> thanks, for your reply and the information. Sorry for my late reply, but >> I had only today time to test. I did try to get the spoa server working >> on a ubuntu bionic (18.04.4) with haproxy 2.2.3-2ppa1~bionic from the >> vbernat ppa. I could compile the spoa server with python 3.6 support >> from the latest github sources without obvious problems and it also >> started without problems with the example python script (./spoa -d -f >> ps_python.py). >> >> If I start haproxy with the following command: >> >> haproxy -f spoa-server.conf -d >> >> haproxy seg faults on the first request to port 10001 > This is a bug in HAProxy then. Do you happen to have a core dump / stack > trace? Yes I have a core dump. But I am somewhat rusty in analyzing them. So any pointers what to do with the core dump is appreciated. Also the segmentation fault only occurs if the spoa server is running so the problem is probably somewhere in the code regarding the connection to the spoa server. > >> If I start haproxy with the additional parameter -Ws then it does not >> seg fault, but only the first and every 4th request get (correctly?) >> forwarded to the spoa server, the 3 requests in between get answered >> with an empty %[var(sess.iprep.ip_score)]. >> >> [...] >> >> I am unsure if I am making some stupid mistakes, or if I should test it >> with an older haproxy version or how to debug the issue further. So any >> pointers are very much appreciated. > Can you share the configuration you attempted to use? Sorry, I forgot to mention that the configuration used is the example configuration from haproxy repository. But here it is, to ensure that there were no changes in the meantime: spoa-server.conf: global debug defaults mode http option httplog option dontlognull timeout connect 5000 timeout client 5000 timeout server 5000 listen test mode http bind :10001 filter spoe engine spoa-server config spoa-server.spoe.conf http-request set-var(req.a) var(txn.iprep.null),debug http-request set-var(req.a) var(txn.iprep.boolean),debug http-request set-var(req.a) var(txn.iprep.int32),debug http-request set-var(req.a) var(txn.iprep.uint32),debug http-request set-var(req.a) var(txn.iprep.int64),debug http-request set-var(req.a) var(txn.iprep.uint64),debug http-request set-var(req.a) var(txn.iprep.ipv4),debug http-request set-var(req.a) var(txn.iprep.ipv6),debug http-request set-var(req.a) var(txn.iprep.str),debug http-request set-var(req.a) var(txn.iprep.bin),debug http-request redirect location /%[var(sess.iprep.ip_score)] backend spoe-server mode tcp balance roundrobin timeout connect 5s timeout server 3m server spoe-server 127.0.0.1:12345 spoa-server.spoe.conf: [spoa-server] spoe-agent spoa-server messages check-client-ip option var-prefix iprep timeout hello 100ms timeout idle 30s timeout processing 15ms use-backend spoe-server spoe-message check-client-ip args always_true int(1234) src ipv6(::55) req.fhdr(host) event on-frontend-http-request Thanks in advance Reinhard Vicinus
Re: Dynamic Googlebot identification via lua?
Tim, Aleksandar, On 9/8/20 11:18 PM, Aleksandar Lazic wrote: > On 08.09.20 22:54, Tim Düsterhus wrote: >> Reinhard, >> Björn, >> >> Am 08.09.20 um 21:39 schrieb Björn Jacke: the only official supported way to identify a google bot is to run a reverse DNS lookup on the accessing IP address and run a forward DNS lookup on the result to verify that it points to accessing IP address and the resulting domain name is in either googlebot.com or google.com domain. ... >>> >>> thanks for asking this again, I brought this up earlier this year and I >>> got no answer: >>> >>> https://www.mail-archive.com/haproxy@formilux.org/msg37301.html >>> >>> I would expect that this is something that most sites would actually >>> want to check and I'm surprised that there is no solution for this >>> or at >>> least none that is obvious to find. >> >> The usually recommended solution for this kind of checks is either Lua >> or the SPOA, running the actual logic out of process. >> >> For Lua my haproxy-auth-request script is a batteries included solution >> to query an arbitrary HTTP service: >> https://github.com/TimWolla/haproxy-auth-request. It comes with the >> drawback that Lua runs single-threaded within HAProxy, so you might not >> want to use this if the checks need to run in the hot path, handling >> thousands of requests per second. >> >> It should be possible to cache the results of the script using a stick >> table or a map. >> >> Back in nginx times I used nginx' auth_request to query a local service >> that checked whether the client IP address was a Tor exit node. It >> worked well. >> >> For SPOA there's this random IP reputation service within the HAProxy >> repository: >> https://github.com/haproxy/haproxy/tree/master/contrib/spoa_example. I >> never used the SPOA feature, so I can't comment on whether that example >> generally works and how hard it would be to extend it. It certainly >> comes with the restriction that you are limited to C or Python (or a >> manual implementation of the SPOA protocol) vs anything that speaks >> HTTP. > > In addition to Tim's answer you can also try to use spoa_server which > supports `-n `. > https://github.com/haproxy/haproxy/tree/master/contrib/spoa_server > thanks, for your reply and the information. Sorry for my late reply, but I had only today time to test. I did try to get the spoa server working on a ubuntu bionic (18.04.4) with haproxy 2.2.3-2ppa1~bionic from the vbernat ppa. I could compile the spoa server with python 3.6 support from the latest github sources without obvious problems and it also started without problems with the example python script (./spoa -d -f ps_python.py). If I start haproxy with the following command: haproxy -f spoa-server.conf -d haproxy seg faults on the first request to port 10001 If I start haproxy with the additional parameter -Ws then it does not seg fault, but only the first and every 4th request get (correctly?) forwarded to the spoa server, the 3 requests in between get answered with an empty %[var(sess.iprep.ip_score)]. Here are the log files of a working request: from haproxy: :test.accept(0008)=0014 from [127.0.0.1:57570] ALPN= :test.clireq[0014:]: GET / HTTP/1.1 :test.clihdr[0014:]: host: localhost:10001 :test.clihdr[0014:]: user-agent: curl/7.58.0 :test.clihdr[0014:]: accept: */* :test.clicls[0014:] :test.closed[0014:] from spoa server: 1599906552.714422 [01] New connection from HAProxy accepted 1599906552.714593 [01] Hello handshake done: version=2.0 - max-frame-size=16380 - healthcheck=false 1599906552.714780 [01] Notify frame received: stream-id=0 - frame-id=1 1599906552.714800 [01] Message 'check-client-ip' received [{'name': '', 'value': True}, {'name': '', 'value': 1234}, {'name': '', 'value': IPv4Address('127.0.0.1')}, {'name': '', 'value': IPv6Address('::55')}, {'name': '', 'value': 'localhost:10001'}] 1599906552.716741 [01] Ack frame sent: stream-id=0 - frame-id=1 And here from a not working request: from haproxy: 001f:test.accept(0008)=0015 from [127.0.0.1:57634] ALPN= 001f:test.clireq[0015:]: GET / HTTP/1.1 001f:test.clihdr[0015:]: host: localhost:10001 001f:test.clihdr[0015:]: user-agent: curl/7.58.0 001f:test.clihdr[0015:]: accept: */* 001f:test.clicls[0015:] 001f:test.closed[0015:] 0020:spoe-server.srvcls[:adfd] 0020:spoe-server.clicls[:adfd] 0020:spoe-server.closed[:adfd] the spoa server does not log anything, during the request, but after a while the following lines are logged: 1599906689.387816 [01] New connection from HAProxy accepted 1599906689.387848 [01] Failed to write Agent frame 1599906689.387853 [01] Close the client socket because of I/O errors Every requests works if between the requests are at least 30 seconds, because after 30 seconds the
Dynamic Googlebot identification via lua?
Hi, the only official supported way to identify a google bot is to run a reverse DNS lookup on the accessing IP address and run a forward DNS lookup on the result to verify that it points to accessing IP address and the resulting domain name is in either googlebot.com or google.com domain. As far as I understand the lua api documentation, it is not possible in lua to perform DNS requests in runtime mode, so the only solution would be to use an external service to do the actual checking of an accessing IP address and use lua to question the external service and cache the result of the IP to increase performance. So as I am not that experienced in lua programming my question is if this is feasible or if I am missing something? Also, if there are other solutions I am not aware I would be thankful if I got pointers. Thanks in advance Reinhard Vicinus
option abortonclose in tcp mode valid?
Hi, we have recently activated multi threading in our haproxy (version 1.8.14) and stumbled over the following problem: we balance ftp via haproxy and after enabling multi threading, trying to transfer very small files via ftp sometimes failed, with log entries like this: Oct 20 22:45:58 vptest02 haproxy[11753]: 127.0.0.1:56631 [20/Oct/2018:22:45:58.787] ftp ftp/ftp 1/-1/0 0 CC 2/2/1/1/0 0/0 I could narrow down the problem to the option abortonclose. If enabled and the frontend receives the complete data transfer before fully establishing a connection to the backend the backend connection gets reseted and the line above is logged. Attached is a test haproxy configuration were i was able to reproduce the problem and a tcpdump showing the problem. I'm not sure if this is a bug or the documentation of the abortonclose option should be expanded. From the documentation I first thought that the option abortonclose only works in conjunction with mode http, because only the http protocol is mentioned. In tcp mode the question in my opinion is: "When is a connection aborted?". Thinking about it I came to the conclusion that the abortonclose can make sense on mode tcp too as long as the protocol used on the connection requires an answer from the server. But if that is the case then the documentation of abortonclose should explicitly mentioning that there are tcp based protocols with which it will produce errors because they don't expect an answer from the server. Regarding why it only happens with multi threading enabled: Could it be that a single threaded haproxy always opens the backend connection fully prior to closing the frontend connection and therefore avoided the problem altogether? But I'm no haproxy expert so I can be completely wrong and the issue is something completely differently. Please let me know if further information is needed to investigate this issue. Regards Reinhard Vicinus global cpu-map auto:1/1-4 0-3 nbproc 1 nbthread 4 log /dev/loglocal0 log /dev/loglocal1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon # Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private # Default ciphers to use on SSL-enabled listening sockets. # For more information, see ciphers(1SSL). This list is from: # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/ # An alternative list with additional directives can be obtained from # https://mozilla.github.io/server-side-tls/ssl-config-generator/?server=haproxy ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!MD5:!DSS ssl-default-bind-options no-sslv3 defaults log global modehttp option httplog option dontlognull option abortonclose timeout connect 5000 timeout client 5 timeout server 5 errorfile 400 /etc/haproxy/errors/400.http errorfile 403 /etc/haproxy/errors/403.http errorfile 408 /etc/haproxy/errors/408.http errorfile 500 /etc/haproxy/errors/500.http errorfile 502 /etc/haproxy/errors/502.http errorfile 503 /etc/haproxy/errors/503.http errorfile 504 /etc/haproxy/errors/504.http frontend ftp bind 127.0.0.121:21 bind 127.0.0.121:1-10250 default_backend ftp mode tcp backend ftp fullconn 128 mode tcp server ftp 10.138.3.245 check port 21 abortonclose.pcap Description: application/vnd.tcpdump.pcap
Slow backend request processing?
Hi, I have a problem were I don't know were to look further and I hope I can get some pointers. The problem is that during times with a high request count sometimes answers for requests take a long time. So I identified an request that took a long time and got the log entries from the haproxy and the backend nginx server: haproxy: Sep 26 09:47:25 heliosa haproxy[21758]: 109.41.0.201:18679 [26/Sep/2018:09:47:17.102] ext-www-shared-auction1-https~ www-shared-auction1-http/auction1-shared 0/0/0/5607/7748 200 663 - - 6870/423/292/267/0 0/0 "GET /include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572 HTTP/1.1" nginx (the logformat is non standard): "time": "2018-09-26T09:47:17+02:00" "body_bytes_sent": "237" "request_time": "0.036" "status": "200" "request": "GET /include/ajax/hotbid.php?PHPSESSID=qkmm012o23u8i9esj6nksfo445&_=1537945522572 HTTP/1.1" If I understand the haproxy default logformat correctly then the part 0/0/0/5607/7748 means that the request to the backend took around 7.748s (Tq, Tw and Tc are zero and Tr = 5607ms and Td = 2141ms; which means Tt = 7748ms is more or less completely used talking to the backend). But the backend server says it only used 0.036s to handle the request. I made a tcpdump on the server running the haproxy instance (the tcpdump of the request is attached) and if the timestamps are correct the request was indeed received in 0.039s. So either I made some mistake or it took around 5 seconds transfering/processing the request from the network interface into the haproxy instance. I looked into the statistics recorded of the server running haproxy and nothing seems noteworthy (no dropped packages or errors on the network interfaces, acceptable cpu and memory usage). The system is running haproxy 1.6.14 on an old Ubuntu trusty with 3.13.0-158 ubuntu kernel. It worked in the past without those problems. I am out of ideas where to look further for the cause of this issue and as said in the beginning any pointers or ideas were to look further would be greatly appreciated. Thanks Reinhard request_qkmm012o23u8i9esj6nksfo445_1537945522572.pcap Description: application/vnd.tcpdump.pcap
Re: [PATCH] allow higer averages then 16448ms
Hi Willy, if the cost are to high, then I have no problem keeping the known behavior. The only thing I would suggest is to document it, because it caused me some headache to figure out why the values were always to low and I couldn't find any information, that this behavior is a know problem. Best regards Reinhard On 11/25/2016 08:04 AM, Willy Tarreau wrote: > Hi Reinhard, > > On Thu, Nov 24, 2016 at 10:04:31PM +0100, Reinhard Vicinus wrote: >> Hi, >> >> we use haproxy (1.6.9) to balance very long running POST requests >> (around 50 seconds) to backend servers. It generally works like a charm, >> but the average queue time and average total session time statistic >> values are totally screwed up. >> >> The problem is that the average is calculated like this for every request: >> >> sum = sum * 511 / 512 + value >> >> for a fixed value and enough iterations: >> >> sum = value * 511 >> >> the problem is that at every iteration sum will first be multiplied by >> 511 and therefore the maximum value during the calculation is: >> >> value * 511 * 511 >> >> A unsigned int can store a maximum value of 4294967296. Divided by >> 511*511 results in 16448. That means any backend with average times >> above 16448ms will be affected by integer overflow and have wrong values. > Yes we do know this limitation. > >> The attached patch tries to solve this by storing and calculating sum as >> unsigned long long instead of a unsigned int. I don't know if the >> attached patch will work in every case, but during my limited testing it >> worked. > It will definitely work, but I didn't want to do it because of the > very expensive cost of the 64x64 multiply and divide on 32 bit > platforms which causes a measurable performance impact. However I'll > do some tests because is often OK, and doing 32x32/32 with a 64-bit > intermediary result is OK as well. If I can do it this way, I'll do > it. Otherwise I'd prefer that we just switch do long so that 64-bit > platforms can benefit from the large timers and 32-bit ones are > limited to lower values. > > Thanks, > Willy > -- Reinhard Vicinus Metaways Infosystems GmbH Pickhuben 2, D-20457 Hamburg E-Mail: r.vici...@metaways.de Web:http://www.metaways.de Tel:+49 (0)40 317031-524 Fax:+49 (0)40 317031-10 Metaways Infosystems GmbH - Sitz: D-22967 Tremsbüttel Handelsregister: Amtsgericht Lübeck HRB 4508 AH Geschäftsführung: Hermann Thaele, Lüder-H. Thaele
[PATCH] allow higer averages then 16448ms
Hi, we use haproxy (1.6.9) to balance very long running POST requests (around 50 seconds) to backend servers. It generally works like a charm, but the average queue time and average total session time statistic values are totally screwed up. The problem is that the average is calculated like this for every request: sum = sum * 511 / 512 + value for a fixed value and enough iterations: sum = value * 511 the problem is that at every iteration sum will first be multiplied by 511 and therefore the maximum value during the calculation is: value * 511 * 511 A unsigned int can store a maximum value of 4294967296. Divided by 511*511 results in 16448. That means any backend with average times above 16448ms will be affected by integer overflow and have wrong values. The attached patch tries to solve this by storing and calculating sum as unsigned long long instead of a unsigned int. I don't know if the attached patch will work in every case, but during my limited testing it worked. I'm also not sure if the implicit conversion from unsigned long long to unsigned int in swrate_avg is a good idea. It should never be a problem, because value is an unsigned int and the average can never be higher then the maximum value. But I can understand if there are coding conventions against implicit conversions. Generally I don't have a lot of c experience, so feel free to improve the patch or solve the problem another way. Kind regards Reinhard From b298b230b541bb33d1cbea92f5472c926e10b7f0 Mon Sep 17 00:00:00 2001 From: "Vicinus, Reinhard"Date: Thu, 24 Nov 2016 21:41:17 +0100 Subject: [PATCH] allow higer averages then 16448ms --- include/proto/freq_ctr.h | 4 ++-- include/types/counters.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/proto/freq_ctr.h b/include/proto/freq_ctr.h index 65388b1..099a58d 100644 --- a/include/proto/freq_ctr.h +++ b/include/proto/freq_ctr.h @@ -218,7 +218,7 @@ unsigned int freq_ctr_remain_period(struct freq_ctr_period *ctr, unsigned int pe /* Adds sample value to sliding window sum configured for samples. * The sample is returned. Better if is a power of two. */ -static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, unsigned int v) +static inline unsigned long long swrate_add(unsigned long long *sum, unsigned int n, unsigned int v) { return *sum = *sum * (n - 1) / n + v; } @@ -227,7 +227,7 @@ static inline unsigned int swrate_add(unsigned int *sum, unsigned int n, unsigne * samples. Better if is a power of two. It must be the same as the * one used above in all additions. */ -static inline unsigned int swrate_avg(unsigned int sum, unsigned int n) +static inline unsigned int swrate_avg(unsigned long long sum, unsigned int n) { return (sum + n - 1) / n; } diff --git a/include/types/counters.h b/include/types/counters.h index 3e62763..08bcee7 100644 --- a/include/types/counters.h +++ b/include/types/counters.h @@ -56,7 +56,7 @@ struct pxcounters { long long redispatches; /* retried and redispatched connections (BE only) */ long long intercepted_req; /* number of monitoring or stats requests intercepted by the frontend */ - unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */ + unsigned long long q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */ union { struct { @@ -100,7 +100,7 @@ struct srvcounters { long long retries, redispatches; /* retried and redispatched connections */ long long failed_secu; /* blocked responses because of security concerns */ - unsigned int q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */ + unsigned long long q_time, c_time, d_time, t_time; /* sums of conn_time, queue_time, data_time, total_time */ union { struct { -- 2.10.2