Re: [PATCH} improve ssl guarding
On Sat, Feb 06, 2021 at 09:18:30PM +0500, Илья Шипицин wrote: > you are right. > I've fixed it. > Thanks, both pushed in master. -- William Lallemand
Re: [ANNOUNCE] haproxy-2.4-dev7
On Sun, Feb 07, 2021 at 09:08:07PM +0100, William Dauchy wrote: > Willy: it is probably a wise idea to keep it for the 2.4 final release > notes, some people might want to know that during their update; a lot > of people have their production alerts on those metrics. Noted, I hope I won't forget :-) Thanks for the detailed explanation William! Willy
Re: [ANNOUNCE] haproxy-2.4-dev7
On Fri, Feb 5, 2021 at 4:14 PM Willy Tarreau wrote: > HAProxy 2.4-dev7 was released on 2021/02/05. It added 153 new commits > after version 2.4-dev6. > - Some significant lifting was done to the Prometheus exporter, including > new fields, better descriptions and some filtering. I've seen quite a > bunch pass in front of me but do not well understand what it does, all > that interests me is that some users are happy with these changes so I > guess they were long awaited :-) about that, please note two breaking changes: - objects' status are no longer a gauge value which you need to translate manually; instead the state is a label with a proper string value. The value of the metric simply informs whether the state is active or not. so we went from: haproxy_server_status{proxy="be_foo",server="srv0"} 2 (which meant MAINT) to: haproxy_server_status{proxy="be_foo",server="srv0",state="DOWN"} 0 haproxy_server_status{proxy="be_foo",server="srv0",state="UP"} 0 haproxy_server_status{proxy="be_foo",server="srv0",state="MAINT"} 1 haproxy_server_status{proxy="be_foo",server="srv0",state="DRAIN"} 0 haproxy_server_status{proxy="be_foo",server="srv0",state="NOLB"} 0 This change is valid for frontend, backend, server with different label values. - similar change with health checks where we put a state label: haproxy_server_check_status{proxy="be_foo",server="srv0",state="HANA"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="SOCKERR"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L4OK"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L4TOUT"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L4CON"} 1 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L6OK"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L6TOUT"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L6RSP"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L7TOUT"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L7RSP"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L7OK"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L7OKC"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="L7STS"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="PROCERR"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="PROCTOUT"} 0 haproxy_server_check_status{proxy="be_foo",server="srv0",state="PROCOK"} 0 It means: * a lot more metrics for large setup (but you can still filter as explained in the doc) * easier use on prometheus side: you will be able to group per state very easily now. Generally speaking I'm very interested in feedback regarding this change. This was motivated by the usage I saw in several companies, where people were struggling making use of those metrics. Willy: it is probably a wise idea to keep it for the 2.4 final release notes, some people might want to know that during their update; a lot of people have their production alerts on those metrics. Thanks, -- William
[PATCH 2/2] MEDIUM: contrib/prometheus-exporter: export base stick table stats
I saw some people falling back to unix socket to collect some data they could not find in prometheus exporter. One of them is base info from stick tables (used/size). I do not plan to extend it more for now; keys are quite a mess to handle. This should resolve github issue #1008. Signed-off-by: William Dauchy --- contrib/prometheus-exporter/README| 10 ++ .../prometheus-exporter/service-prometheus.c | 148 +++--- reg-tests/contrib/prometheus.vtc | 4 + 3 files changed, 142 insertions(+), 20 deletions(-) diff --git a/contrib/prometheus-exporter/README b/contrib/prometheus-exporter/README index a85981597..d882b092f 100644 --- a/contrib/prometheus-exporter/README +++ b/contrib/prometheus-exporter/README @@ -72,6 +72,7 @@ exported. Here are examples: /metrics?scope=frontend=backend # ==> Frontend and backend metrics will be exported /metrics?scope=*= # ==> no metrics will be exported /metrics?scope==global # ==> global metrics will be exported + /metrics?scope=sticktable # ==> stick tables metrics will be exported * How do I prevent my prometheus instance to explode? @@ -320,3 +321,12 @@ See prometheus export for the description of each field. | haproxy_server_need_connections_current| | haproxy_server_uweight | ++ + +* Stick table metrics + +++ +|Metric name | +++ +| haproxy_sticktable_size| +| haproxy_sticktable_used| +++ diff --git a/contrib/prometheus-exporter/service-prometheus.c b/contrib/prometheus-exporter/service-prometheus.c index 769389735..521fe1056 100644 --- a/contrib/prometheus-exporter/service-prometheus.c +++ b/contrib/prometheus-exporter/service-prometheus.c @@ -47,28 +47,33 @@ enum { /* Prometheus exporter dumper states (appctx->st1) */ enum { -PROMEX_DUMPER_INIT = 0, /* initialized */ -PROMEX_DUMPER_GLOBAL, /* dump metrics of globals */ -PROMEX_DUMPER_FRONT,/* dump metrics of frontend proxies */ -PROMEX_DUMPER_BACK, /* dump metrics of backend proxies */ -PROMEX_DUMPER_LI, /* dump metrics of listeners */ -PROMEX_DUMPER_SRV, /* dump metrics of servers */ - PROMEX_DUMPER_DONE, /* finished */ + PROMEX_DUMPER_INIT = 0, /* initialized */ + PROMEX_DUMPER_GLOBAL, /* dump metrics of globals */ + PROMEX_DUMPER_FRONT, /* dump metrics of frontend proxies */ + PROMEX_DUMPER_BACK, /* dump metrics of backend proxies */ + PROMEX_DUMPER_LI, /* dump metrics of listeners */ + PROMEX_DUMPER_SRV,/* dump metrics of servers */ + PROMEX_DUMPER_STICKTABLE, /* dump metrics of stick tables */ + PROMEX_DUMPER_DONE, /* finished */ }; /* Prometheus exporter flags (appctx->ctx.stats.flags) */ -#define PROMEX_FL_METRIC_HDR0x0001 -#define PROMEX_FL_INFO_METRIC 0x0002 -#define PROMEX_FL_FRONT_METRIC 0x0004 -#define PROMEX_FL_BACK_METRIC 0x0008 -#define PROMEX_FL_SRV_METRIC0x0010 -#define PROMEX_FL_SCOPE_GLOBAL 0x0020 -#define PROMEX_FL_SCOPE_FRONT 0x0040 -#define PROMEX_FL_SCOPE_BACK0x0080 -#define PROMEX_FL_SCOPE_SERVER 0x0100 -#define PROMEX_FL_NO_MAINT_SRV 0x0200 - -#define PROMEX_FL_SCOPE_ALL (PROMEX_FL_SCOPE_GLOBAL|PROMEX_FL_SCOPE_FRONT|PROMEX_FL_SCOPE_BACK|PROMEX_FL_SCOPE_SERVER) +#define PROMEX_FL_METRIC_HDR0x0001 +#define PROMEX_FL_INFO_METRIC 0x0002 +#define PROMEX_FL_FRONT_METRIC 0x0004 +#define PROMEX_FL_BACK_METRIC 0x0008 +#define PROMEX_FL_SRV_METRIC0x0010 +#define PROMEX_FL_SCOPE_GLOBAL 0x0020 +#define PROMEX_FL_SCOPE_FRONT 0x0040 +#define PROMEX_FL_SCOPE_BACK0x0080 +#define PROMEX_FL_SCOPE_SERVER 0x0100 +#define PROMEX_FL_NO_MAINT_SRV 0x0200 +#define PROMEX_FL_STICKTABLE_METRIC 0x0400 +#define PROMEX_FL_SCOPE_STICKTABLE 0x0800 + +#define PROMEX_FL_SCOPE_ALL (PROMEX_FL_SCOPE_GLOBAL | PROMEX_FL_SCOPE_FRONT | \ +PROMEX_FL_SCOPE_BACK | PROMEX_FL_SCOPE_SERVER | \ +PROMEX_FL_SCOPE_STICKTABLE) /* Promtheus metric type (gauge or counter) */ enum promex_mt_type { @@ -298,6 +303,25 @@ const struct ist promex_st_metric_desc[ST_F_TOTAL_FIELDS] = { [ST_F_TT_MAX] = IST("Maximum observed total request+response time (request+queue+connect+response+processing)"), }; +/* stick table base fields */ +enum sticktable_field { + STICKTABLE_SIZE = 0, + STICKTABLE_USED, + /* must always be the last one */ + STICKTABLE_TOTAL_FIELDS +}; + +const struct
[PATCH 1/2] MINOR: contrib/prometheus-exporter: use stats desc when possible followup
Remove remaining descrition which are common to stats.c. This patch is a followup of commit 82b2ce2f967d967139adb7afab064416fadad615 ("MINOR: contrib/prometheus-exporter: use stats desc when possible"). I probably messed up with one of my rebase because I'm pretty sure I removed them at some point, but who knows what happened. Signed-off-by: William Dauchy --- .../prometheus-exporter/service-prometheus.c | 35 --- 1 file changed, 35 deletions(-) diff --git a/contrib/prometheus-exporter/service-prometheus.c b/contrib/prometheus-exporter/service-prometheus.c index 126962f5e..769389735 100644 --- a/contrib/prometheus-exporter/service-prometheus.c +++ b/contrib/prometheus-exporter/service-prometheus.c @@ -284,42 +284,7 @@ const struct promex_metric promex_st_metrics[ST_F_TOTAL_FIELDS] = { /* Description of overriden stats fields */ const struct ist promex_st_metric_desc[ST_F_TOTAL_FIELDS] = { - [ST_F_PXNAME] = IST("The proxy name."), - [ST_F_SVNAME] = IST("The service name (FRONTEND for frontend, BACKEND for backend, any name for server/listener)."), - [ST_F_QCUR] = IST("Current number of queued requests."), - [ST_F_QMAX] = IST("Maximum observed number of queued requests."), - [ST_F_SCUR] = IST("Current number of active sessions."), - [ST_F_SMAX] = IST("Maximum observed number of active sessions."), - [ST_F_SLIM] = IST("Configured session limit."), - [ST_F_STOT] = IST("Total number of sessions."), - [ST_F_BIN]= IST("Current total of incoming bytes."), - [ST_F_BOUT] = IST("Current total of outgoing bytes."), - [ST_F_DREQ] = IST("Total number of denied requests."), - [ST_F_DRESP] = IST("Total number of denied responses."), - [ST_F_EREQ] = IST("Total number of request errors."), - [ST_F_ECON] = IST("Total number of connection errors."), - [ST_F_ERESP] = IST("Total number of response errors."), - [ST_F_WRETR] = IST("Total number of retry warnings."), - [ST_F_WREDIS] = IST("Total number of redispatch warnings."), [ST_F_STATUS] = IST("Current status of the service, per state label value."), - [ST_F_WEIGHT] = IST("Service weight."), - [ST_F_ACT]= IST("Current number of active servers."), - [ST_F_BCK]= IST("Current number of backup servers."), - [ST_F_CHKFAIL]= IST("Total number of failed check (Only counts checks failed when the server is up)."), - [ST_F_CHKDOWN]= IST("Total number of UP->DOWN transitions."), - [ST_F_LASTCHG]= IST("Number of seconds since the last UP<->DOWN transition."), - [ST_F_DOWNTIME] = IST("Total downtime (in seconds) for the service."), - [ST_F_QLIMIT] = IST("Configured maxqueue for the server (0 meaning no limit)."), - [ST_F_PID]= IST("Process id (0 for first instance, 1 for second, ...)"), - [ST_F_IID]= IST("Unique proxy id."), - [ST_F_SID]= IST("Server id (unique inside a proxy)."), - [ST_F_THROTTLE] = IST("Current throttle percentage for the server, when slowstart is active, or no value if not in slowstart."), - [ST_F_LBTOT] = IST("Total number of times a service was selected, either for new sessions, or when redispatching."), - [ST_F_TRACKED]= IST("Id of proxy/server if tracking is enabled."), - [ST_F_TYPE] = IST("Service type (0=frontend, 1=backend, 2=server, 3=socket/listener)."), - [ST_F_RATE] = IST("Current number of sessions per second over last elapsed second."), - [ST_F_RATE_LIM] = IST("Configured limit on new sessions per second."), - [ST_F_RATE_MAX] = IST("Maximum observed number of sessions per second."), [ST_F_CHECK_STATUS] = IST("Status of last health check, per state label value."), [ST_F_CHECK_CODE] = IST("layer5-7 code, if available of the last health check."), [ST_F_CHECK_DURATION] = IST("Total duration of the latest server health check, in seconds."), -- 2.30.0
Configure peers on clusters with 20+ instances
Hello list. I'm implementing peers in order to share rps and other metrics between all instances of a haproxy cluster, so I have a global view of these data. Here is a snippet of my poc which simply does a request count: global localpeer h1 ... listen l1 ... http-request track-sc0 int(1) table p/t1 http-request set-var(req.gpc0) sc_inc_gpc0(0) http-request set-var(req.gpc0) sc_get_gpc0(0,p/t2),add(req.gpc0) http-request set-var(req.gpc0) sc_get_gpc0(0,p/t3),add(req.gpc0) http-request return hdr x-out %[var(req.gpc0)] peers p bind :9001 log stdout format raw local0 server h1 server h2 127.0.0.1:9002 server h3 127.0.0.1:9003 table t1 type integer size 1 store gpc0 table t2 type integer size 1 store gpc0 table t3 type integer size 1 store gpc0 Our biggest cluster has actually 25 haproxy instances, meaning 25 tables per instance, and 25 set-var + add() per request per tracking data. On top of that all the 25 instances will share their 25 tables to all of the other 24 instances. Build and maintain such configuration isn't a problem at all because it's automated, but how does it scale? Starting from how much instances should I change the approach and try, eg, to elect a controller that receives everything from everybody and delivers grouped data? Any advice or best practice will be very much appreciated, thanks! ~jm