Re: bgpctl openmetric/prometheus output

2022-10-17 Thread Denis Fondras
Le Mon, Oct 17, 2022 at 01:02:01PM +0200, Claudio Jeker a écrit :
> 
> Also I'm not sure if bgpd_peer_up_seconds and bgpd_peer_down_seconds are
> sensible metrics. Having metric depend on some state seems like a bad idea. 
> 

I agree. My current tool presents only one metric for uptime and I get an alert
when delta(peerTime) < 0.



Re: bgpctl openmetric/prometheus output

2022-10-17 Thread Claudio Jeker
On Mon, Oct 17, 2022 at 12:39:44PM +0200, Denis Fondras wrote:
> Le Mon, Oct 17, 2022 at 11:49:31AM +0200, Claudio Jeker a écrit :
> > On Wed, Oct 12, 2022 at 12:12:25PM +0200, Theo Buehler wrote:
> > > On Fri, Oct 07, 2022 at 12:37:10PM +0200, Claudio Jeker wrote:
> > 
> > > > +void
> > > > +ometric_set_state(struct ometric *om, const char *state, struct 
> > > > olabels *ol)
> > > > +{
> > > > +   struct olabels *extra;
> > > > +   size_t i;
> > > > +   int val;
> > > > +
> > > > +   if (om->type != OMT_STATESET)
> > > > +   errx(1, "%s incorrect ometric type", __func__);
> > > > +
> > > > +   for (i = 0; i < om->setsize; i++) { 
> > > > +   if (strcasecmp(state, om->stateset[i]) == 0)
> > > > +   val = 1;
> > > > +   else
> > > > +   val = 0;
> > > 
> > > could simplify this to
> > > 
> > >   val = strcasecmp(state, om->stateset[i]) == 0;
> > > 
> > > but I'm not sure if this is more readable
> > 
> > Not sure either. I prefer the explicit version. So I left the code as is.
> >  
> 
> I agree, let the code be easy to read. The compiler will optimise accordingly.
> 
> BTW, thank you for working on this Claudio.

No problems. I would like if someone could check the various metrics
exported. Maybe I missed something or implemented some metrics
incorrectly.

I know that the peer state set is a bit of a problem. So I added
bgpd_peer_state_raw to make it easier for grafana.
Also I'm not sure if bgpd_peer_up_seconds and bgpd_peer_down_seconds are
sensible metrics. Having metric depend on some state seems like a bad idea. 

So any feedback is welcome. I will commit this version now so it easier
for people to test.
-- 
:wq Claudio



Re: bgpctl openmetric/prometheus output

2022-10-17 Thread Claudio Jeker
On Wed, Oct 12, 2022 at 12:12:25PM +0200, Theo Buehler wrote:
> On Fri, Oct 07, 2022 at 12:37:10PM +0200, Claudio Jeker wrote:
> > This diff adds `bgpctl show metric` which is a command that dumps some
> > stats out in openmetric format. This format can be ingested by e.g.
> > prometheus and used for monitoring.
> > 
> > The openmetric handling is in ometric.[ch]. It is fairly basic and not
> > intended for long running processes. There is a struct ometric (which is
> > one individual metric point). This metric point can have many different
> > values with each value including an optional set of labels. Since the
> > labels are used over and over again, I used a refcount on them.
> > Also since most strings used in these functions are string literals I also
> > don't copy them. Only the values of labels are copied since those are
> > for example per peer.
> > 
> > Using a small extra diff in bgplgd I can export the metrics into
> > prometheus and visualize them with grafana.
> > 
> > Consider this an MVP that can be extended with all the infos we want.
> 
> This looks pretty good to me. I like the approach and I couldn't spot
> anything really wrong with it.
> 
> Some very minor comments inline for your consideration.
> 

Thanks, some comments below.

> > +struct ometric *
> > +ometric_new_state(const char * const *states, size_t statecnt, const char 
> > *name,
> > +const char *help)
> > +{
> > +   struct ometric *om;
> 
> Matter of taste, but I'd probably have made an ometric_new_internal()
> that sets all fields, so ometric_new() and ometric_new_state() could be
> thin wrappers. This would also avoid leaving stateset and setsize
> uninitialized in ometric_new() (which I find a bit nasty).
> 
> Or use calloc().

I switched to use calloc() in both cases.

Not totally sold on ometric_new_state(), the statesets are a pain to work
with and are highly inefficent (they turn into a metric point per state).

> > +void
> > +ometric_set_state(struct ometric *om, const char *state, struct olabels 
> > *ol)
> > +{
> > +   struct olabels *extra;
> > +   size_t i;
> > +   int val;
> > +
> > +   if (om->type != OMT_STATESET)
> > +   errx(1, "%s incorrect ometric type", __func__);
> > +
> > +   for (i = 0; i < om->setsize; i++) { 
> > +   if (strcasecmp(state, om->stateset[i]) == 0)
> > +   val = 1;
> > +   else
> > +   val = 0;
> 
> could simplify this to
> 
>   val = strcasecmp(state, om->stateset[i]) == 0;
> 
> but I'm not sure if this is more readable

Not sure either. I prefer the explicit version. So I left the code as is.
 
> > +struct ometric *bgpd_info, *bgpd_scrape_time;
> > +struct ometric *peer_info, *peer_state, *peer_up_time, *peer_down_time,
> > +   *peer_last_read, *peer_last_write;
> > +struct ometric *peer_prefixes_tranmit, *peer_prefixes_receive;
> 
> typo: peer_prefixes_transmit

Fixed including all the other minor changes.
 
-- 
:wq Claudio



Re: bgpctl openmetric/prometheus output

2022-10-12 Thread Theo Buehler
On Fri, Oct 07, 2022 at 12:37:10PM +0200, Claudio Jeker wrote:
> This diff adds `bgpctl show metric` which is a command that dumps some
> stats out in openmetric format. This format can be ingested by e.g.
> prometheus and used for monitoring.
> 
> The openmetric handling is in ometric.[ch]. It is fairly basic and not
> intended for long running processes. There is a struct ometric (which is
> one individual metric point). This metric point can have many different
> values with each value including an optional set of labels. Since the
> labels are used over and over again, I used a refcount on them.
> Also since most strings used in these functions are string literals I also
> don't copy them. Only the values of labels are copied since those are
> for example per peer.
> 
> Using a small extra diff in bgplgd I can export the metrics into
> prometheus and visualize them with grafana.
> 
> Consider this an MVP that can be extended with all the infos we want.

This looks pretty good to me. I like the approach and I couldn't spot
anything really wrong with it.

Some very minor comments inline for your consideration.

> -- 
> :wq Claudio
> 
> Index: Makefile
> ===
> RCS file: /cvs/src/usr.sbin/bgpctl/Makefile,v
> retrieving revision 1.17
> diff -u -p -r1.17 Makefile
> --- Makefile  2 May 2020 14:33:33 -   1.17
> +++ Makefile  27 Sep 2022 15:50:40 -
> @@ -3,7 +3,8 @@
>  .PATH:   ${.CURDIR}/../bgpd
>  
>  PROG=bgpctl
> -SRCS=bgpctl.c output.c output_json.c parser.c mrtparser.c util.c 
> json.c
> +SRCS=bgpctl.c output.c output_json.c output_ometric.c parser.c \
> + mrtparser.c util.c json.c ometric.c
>  CFLAGS+= -Wall
>  CFLAGS+= -Wstrict-prototypes -Wmissing-prototypes
>  CFLAGS+= -Wmissing-declarations
> Index: bgpctl.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpctl/bgpctl.c,v
> retrieving revision 1.283
> diff -u -p -r1.283 bgpctl.c
> --- bgpctl.c  31 Aug 2022 15:00:53 -  1.283
> +++ bgpctl.c  25 Sep 2022 08:35:43 -
> @@ -79,7 +79,7 @@ int
>  main(int argc, char *argv[])
>  {
>   struct sockaddr_un   sa_un;
> - int  fd, n, done, ch, verbose = 0;
> + int  fd, n, done, numdone, ch, verbose = 0;
>   struct imsg  imsg;
>   struct network_confignet;
>   struct parse_result *res;
> @@ -256,6 +256,12 @@ main(int argc, char *argv[])
>   case SHOW_RIB_MEM:
>   imsg_compose(ibuf, IMSG_CTL_SHOW_RIB_MEM, 0, 0, -1, NULL, 0);
>   break;
> + case SHOW_METRIC:
> + output = _output;
> + numdone = 2;
> + imsg_compose(ibuf, IMSG_CTL_SHOW_NEIGHBOR, 0, 0, -1, NULL, 0);
> + imsg_compose(ibuf, IMSG_CTL_SHOW_RIB_MEM, 0, 0, -1, NULL, 0);
> + break;
>   case RELOAD:
>   imsg_compose(ibuf, IMSG_CTL_RELOAD, 0, 0, -1,
>   res->reason, sizeof(res->reason));
> @@ -366,18 +372,14 @@ main(int argc, char *argv[])
>   break;
>   }
>  
> + output->head(res);
> +
> + again:
>   while (ibuf->w.queued)
> - if (msgbuf_write(>w) <= 0 && errno != EAGAIN)
> + if (msgbuf_write(>w) <= 0)
>   err(1, "write error");
>  
> - output->head(res);
> -
>   while (!done) {
> - if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
> - err(1, "imsg_read error");
> - if (n == 0)
> - errx(1, "pipe closed");
> -
>   while (!done) {
>   if ((n = imsg_get(ibuf, )) == -1)
>   err(1, "imsg_get error");
> @@ -387,6 +389,20 @@ main(int argc, char *argv[])
>   done = show(, res);
>   imsg_free();
>   }
> +
> + if (done)
> + break;
> +
> + if ((n = imsg_read(ibuf)) == -1)
> + err(1, "imsg_read error");
> + if (n == 0)
> + errx(1, "pipe closed");
> +
> + }
> +
> + if (res->action == SHOW_METRIC && --numdone > 0) {
> + done = 0;
> + goto again;
>   }
>  
>   output->tail();
> @@ -416,21 +432,29 @@ show(struct imsg *imsg, struct parse_res
>  
>   switch (imsg->hdr.type) {
>   case IMSG_CTL_SHOW_NEIGHBOR:
> + if (output->neighbor == NULL)
> + break;
>   p = imsg->data;
>   output->neighbor(p, res);
>   break;
>   case IMSG_CTL_SHOW_TIMER:
>   if (imsg->hdr.len < IMSG_HEADER_SIZE + sizeof(t))
>   errx(1, "wrong imsg len");
> + if (output->timer == NULL)
> + break;
>   memcpy(, imsg->data, sizeof(t));
>   if (t.type > 0 && t.type < Timer_Max)
>   

bgpctl openmetric/prometheus output

2022-10-07 Thread Claudio Jeker
This diff adds `bgpctl show metric` which is a command that dumps some
stats out in openmetric format. This format can be ingested by e.g.
prometheus and used for monitoring.

The openmetric handling is in ometric.[ch]. It is fairly basic and not
intended for long running processes. There is a struct ometric (which is
one individual metric point). This metric point can have many different
values with each value including an optional set of labels. Since the
labels are used over and over again, I used a refcount on them.
Also since most strings used in these functions are string literals I also
don't copy them. Only the values of labels are copied since those are
for example per peer.

Using a small extra diff in bgplgd I can export the metrics into
prometheus and visualize them with grafana.

Consider this an MVP that can be extended with all the infos we want.
-- 
:wq Claudio

Index: Makefile
===
RCS file: /cvs/src/usr.sbin/bgpctl/Makefile,v
retrieving revision 1.17
diff -u -p -r1.17 Makefile
--- Makefile2 May 2020 14:33:33 -   1.17
+++ Makefile27 Sep 2022 15:50:40 -
@@ -3,7 +3,8 @@
 .PATH: ${.CURDIR}/../bgpd
 
 PROG=  bgpctl
-SRCS=  bgpctl.c output.c output_json.c parser.c mrtparser.c util.c json.c
+SRCS=  bgpctl.c output.c output_json.c output_ometric.c parser.c \
+   mrtparser.c util.c json.c ometric.c
 CFLAGS+= -Wall
 CFLAGS+= -Wstrict-prototypes -Wmissing-prototypes
 CFLAGS+= -Wmissing-declarations
Index: bgpctl.c
===
RCS file: /cvs/src/usr.sbin/bgpctl/bgpctl.c,v
retrieving revision 1.283
diff -u -p -r1.283 bgpctl.c
--- bgpctl.c31 Aug 2022 15:00:53 -  1.283
+++ bgpctl.c25 Sep 2022 08:35:43 -
@@ -79,7 +79,7 @@ int
 main(int argc, char *argv[])
 {
struct sockaddr_un   sa_un;
-   int  fd, n, done, ch, verbose = 0;
+   int  fd, n, done, numdone, ch, verbose = 0;
struct imsg  imsg;
struct network_confignet;
struct parse_result *res;
@@ -256,6 +256,12 @@ main(int argc, char *argv[])
case SHOW_RIB_MEM:
imsg_compose(ibuf, IMSG_CTL_SHOW_RIB_MEM, 0, 0, -1, NULL, 0);
break;
+   case SHOW_METRIC:
+   output = _output;
+   numdone = 2;
+   imsg_compose(ibuf, IMSG_CTL_SHOW_NEIGHBOR, 0, 0, -1, NULL, 0);
+   imsg_compose(ibuf, IMSG_CTL_SHOW_RIB_MEM, 0, 0, -1, NULL, 0);
+   break;
case RELOAD:
imsg_compose(ibuf, IMSG_CTL_RELOAD, 0, 0, -1,
res->reason, sizeof(res->reason));
@@ -366,18 +372,14 @@ main(int argc, char *argv[])
break;
}
 
+   output->head(res);
+
+ again:
while (ibuf->w.queued)
-   if (msgbuf_write(>w) <= 0 && errno != EAGAIN)
+   if (msgbuf_write(>w) <= 0)
err(1, "write error");
 
-   output->head(res);
-
while (!done) {
-   if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
-   err(1, "imsg_read error");
-   if (n == 0)
-   errx(1, "pipe closed");
-
while (!done) {
if ((n = imsg_get(ibuf, )) == -1)
err(1, "imsg_get error");
@@ -387,6 +389,20 @@ main(int argc, char *argv[])
done = show(, res);
imsg_free();
}
+
+   if (done)
+   break;
+
+   if ((n = imsg_read(ibuf)) == -1)
+   err(1, "imsg_read error");
+   if (n == 0)
+   errx(1, "pipe closed");
+
+   }
+
+   if (res->action == SHOW_METRIC && --numdone > 0) {
+   done = 0;
+   goto again;
}
 
output->tail();
@@ -416,21 +432,29 @@ show(struct imsg *imsg, struct parse_res
 
switch (imsg->hdr.type) {
case IMSG_CTL_SHOW_NEIGHBOR:
+   if (output->neighbor == NULL)
+   break;
p = imsg->data;
output->neighbor(p, res);
break;
case IMSG_CTL_SHOW_TIMER:
if (imsg->hdr.len < IMSG_HEADER_SIZE + sizeof(t))
errx(1, "wrong imsg len");
+   if (output->timer == NULL)
+   break;
memcpy(, imsg->data, sizeof(t));
if (t.type > 0 && t.type < Timer_Max)
output->timer();
break;
case IMSG_CTL_SHOW_INTERFACE:
+   if (output->interface == NULL)
+   break;
iface = imsg->data;
output->interface(iface);
break;
case IMSG_CTL_SHOW_NEXTHOP:
+   if (output->nexthop == NULL)
+