Re: [chrony-users] Monitoring Chrony

2016-02-11 Thread Ben Kochie
Log parsing is possible, there is a tool called mtail that can parse logs
to collect metrics. It is generally recommend to ask services directly
about their current state rather than regexping logs.

This way the stats are read in a more on-demand way.
On Feb 11, 2016 11:35 AM, "Bryan Christianson" <br...@whatroute.net> wrote:

>
> > On 11/02/2016, at 10:54 PM, Ben Kochie <b...@soundcloud.com> wrote:
> >
> > So far, I haven't been able to find a good programmatic way to extract
> stats with chronyc.  There are a bunch of annoying parsing issues with
> things like the sourcestats command.  The offset includes a precision, so I
> have to parse the precision and convert that to be all in one precision.  I
> haven't seen much documentation on the protocol between chronyc and chronyd.
>
> Take a look at the chronyd log files. The data is more amenable to machine
> reading than the chronyc output.
>
> >
> > - Ben Kochie
>
> Bryan Christianson
> br...@whatroute.net
>
>
>
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


Re: [chrony-users] Monitoring Chrony

2016-02-11 Thread Ben Kochie
On Thu, Feb 11, 2016 at 1:58 PM, Miroslav Lichvar <mlich...@redhat.com>
wrote:

> On Thu, Feb 11, 2016 at 10:54:49AM +0100, Ben Kochie wrote:
> > So far, I haven't been able to find a good programmatic way to extract
> > stats with chronyc.  There are a bunch of annoying parsing issues with
> > things like the sourcestats command.  The offset includes a precision,
> so I
> > have to parse the precision and convert that to be all in one precision.
>
> Yeah, I've struggled with that too. I like the human readable format
> when inspecting the chrony state, but it does complicate parsing quite
> a bit.
>
> > A couple of specific questions.
> > * Would chrony be interested in supporting the Prometheus metrics format?
>
> I looked at the page describing the archicture, but it's not clear to
> me how would a support in chrony look like. Would chronyd or something
> using the chronyc protocol be listening on a port for requests? Or
> would it periodically push data over socket somewhere? The page
> listing client libraries does't include a C library.
>

Typically we do this one of a few ways.
#1 - The application listens on a port for http requests, the default is
/metrics.  It then can respond with plain/text in the format I posted
above.  Or it will content negotiate and use grpc, a nice compact protobuf
format.  The grpc format is the most efficient, but we've had few problems
collecting text metrics at scale.

#2 - We run a side-car exporter.  We do this quite a lot for existing open
source software, like mysql, that would never listen on http, but can
provide metrics with their own protocol.

#3 - The way we collect metrics for ntpd, is we have a loop script, or cron
script, that parse output and put that output in prometheus format into a
text file.  Then we access these metrics via the node_exporter's textfile
reader.

#4 - We use something like mtail[0] and parse log files.  This is what I do
for things like apache[1] that have minimal useful internal metrics.

[0]: https://github.com/google/mtail
[1]:
https://github.com/google/mtail/blob/master/examples/apache_metrics.mtail


> > * Is there a mode for the various metrics outputs to be more machine
> > readable? (json?)
>
> No, not yet. I'd like to add a raw mode to chronyc that would print
> the values in something easily parseable. I'm not sure about json, I'd
> probably prefer something usable even from shell using just sed or
> awk.
>

One idea I had would be to add a "metrics" command to chronyc.  Then you
could run a loop/cron job that would be basically "chronyc metrics >
chrony_metrics.prom"

The output format would be sed/awk friendly as you always get one metric
key and value per line.


> > * Is there documentation for the chronyc protocol outside the code?
>
> No, unfortunately not. FWIW, the protocol is quite simple, almost all
> information you would need to implement a new client is contained in
> candm.h.
>

Ok, I will take a look.


>
> > * Are there any non-C chronyc client implementations?
> (python/ruby/whatever)
>
> Probably not, at least I've not seen anything. At some point I'd like
> to split chronyc into a library and a client application. Bindings for
> other languages could then be easily created.


This would be pretty nice.


>
> --
> Miroslav Lichvar
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


Re: [chrony-users] Monitoring Chrony

2016-02-27 Thread Ben Kochie
So I started work on adding a "metrics" command to client.c.  It's pretty
hacky, but works.

https://github.com/SuperQ/chrony/pull/1

Comments welcome.

- Ben Kochie

On Fri, Feb 12, 2016 at 10:05 AM, Miroslav Lichvar <mlich...@redhat.com>
wrote:

> On Thu, Feb 11, 2016 at 02:12:36PM +0100, Ben Kochie wrote:
> > On Thu, Feb 11, 2016 at 1:58 PM, Miroslav Lichvar <mlich...@redhat.com>
> > wrote:
> >
> > > I looked at the page describing the archicture, but it's not clear to
> > > me how would a support in chrony look like. Would chronyd or something
> > > using the chronyc protocol be listening on a port for requests? Or
> > > would it periodically push data over socket somewhere? The page
> > > listing client libraries does't include a C library.
> > >
> >
> > Typically we do this one of a few ways.
>
> > #2 - We run a side-car exporter.  We do this quite a lot for existing
> open
> > source software, like mysql, that would never listen on http, but can
> > provide metrics with their own protocol.
>
> This one seems most reasonable to me. A separate service that uses the
> chronyc protocol to read the metrics from chronyd.
>
> > #3 - The way we collect metrics for ntpd, is we have a loop script, or
> cron
> > script, that parse output and put that output in prometheus format into a
> > text file.  Then we access these metrics via the node_exporter's textfile
> > reader.
>
> This is probably the easiest way :).
>
> > #4 - We use something like mtail[0] and parse log files.  This is what I
> do
> > for things like apache[1] that have minimal useful internal metrics.
>
> The chrony logs are good in showing when exactly has the state
> changed, but if you are interested in metrics like root dispersion,
> which are constantly changing (in a deterministic way), you would have
> to calculate their current value.
>
> > > > * Is there a mode for the various metrics outputs to be more machine
> > > > readable? (json?)
> > >
> > > No, not yet. I'd like to add a raw mode to chronyc that would print
> > > the values in something easily parseable. I'm not sure about json, I'd
> > > probably prefer something usable even from shell using just sed or
> > > awk.
> > >
> >
> > One idea I had would be to add a "metrics" command to chronyc.  Then you
> > could run a loop/cron job that would be basically "chronyc metrics >
> > chrony_metrics.prom"
>
> Which metrics it would print? With the "clients" command for instance
> there can megabytes of data, which in most cases probably wouldn't be
> useful to collect, but in some cases I think it might, e.g. monitoring
> if clients are alive from the server in a small network.
>
> > The output format would be sed/awk friendly as you always get one metric
> > key and value per line.
>
> If there was just one key/value per line, wouldn't it be more
> difficult for a simple sed/awk parser to group data by source, as in
> sourcestats?
>
> I was considering something like CSV, which can be parsed in shell
> with a single "read" command and can be easily converted to more
> verbose formats like json.
>
> $ chronyc -r tracking
> #refid,address,stratum,...
> 10.16.255.1,10.16.255.1,2,...
>
> $ chronyc -r sources | grep -v '^#' | while IFS=, read mode state ...
> do
> echo $mode $state ...
> done
>
> --
> Miroslav Lichvar
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


Re: [chrony-users] Monitoring Chrony

2016-02-29 Thread Ben Kochie
It makes sense, but the real problem is the there are ordering issues with
the way things are processed in most of the stats outputs.

For example in the sources/sourcestats, the code walks each source and
outputs all the stats for each source.

Prometheus expects metrics to be output in order of metric.  For example
all of the offsets for each source.

I had a discussion with another person over the weekend, and I think what
we're going to do is to abandon the text metrics output idea and implement
the chronyd protocol (probably in Go) so that we can build a direct
Prometheus exporter.

- Ben Kochie

On Mon, Feb 29, 2016 at 10:10 AM, Miroslav Lichvar <mlich...@redhat.com>
wrote:

> (this discussion would better fit the chrony-devel list)
>
> On Sat, Feb 27, 2016 at 03:08:12PM +0100, Ben Kochie wrote:
> > So I started work on adding a "metrics" command to client.c.  It's pretty
> > hacky, but works.
> >
> > https://github.com/SuperQ/chrony/pull/1
> >
> > Comments welcome.
>
> Ok, so you implemented the metrics command as a new function which
> does the same as the serverstats command, but uses a different output
> format. I assume you would extend it later to include also the
> tracking, sources and sourcestats data. That would be a lot of
> duplicated code.
>
> As I said in the previous mail, I'd rather see it implemented as a
> different output format for the existing commands. A new chronyc
> option could be added to select the format, with default being the
> currently used human-readable output. A new printf-like function would
> be added, which would support printing hostnames or IP addresses, time
> intervals, offsets, and all other data that need to be printed.
> Depending on what output mode chronyc was running in, it would print
> the labels, align the columns, print the values with units, print end
> of lines, etc. All functions that implement the individual commands
> would then be modified to use this new function.
>
> I'm planning to look into this in the next few weeks. At this point
> I'm mainly interested in adding the CSV format to allow easy parsing
> in shell, but I think the Prometheus format could be added too.
>
> Does this make sense?
>
> --
> Miroslav Lichvar
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


[chrony-users] DNS RR and chrony

2016-02-24 Thread Ben Kochie
When using pools in the config, chrony is subject to some implementation
"problems" with libc's getaddrinfo() on many platforms.  This breaks DNS
round-robin as served by the DNS server.

There is a long standing "bug" in several libc implementations due to
strict adherence to RFC 3484 Rule #9.  There were many long arguments about
this in the 2007 era, with no resolution.

Thankfully RFC 6724 obsoletes 3484, but nobody's implemented it yet, and
it's not likely to get backported to stable distributions like Debian.

The end result here is that getaddrinfo() always sorts the output of IPv4
results and chrony will pick the first N in that list.  For example I have
a DNS record internally that has 8 servers, and I have chrony pick 4.
Every node has the same identical 4 node list instead of a random sampling
of the 4.

It would be nice if chrony were have an option to shuffle the list before
selecting.

Something like this:
pool pool.ntp.org iburst maxsources 4 shuffle

[0]: https://tools.ietf.org/html/rfc6724

- Ben Kochie


Re: [chrony-users] DNS RR and chrony

2016-03-17 Thread Ben Kochie
The CSV format looks great, I also tested the randomization which works
well.

The -v help output does't match the CSV, but that's a minor nit-pick. :)

Thanks for looking into this.  I'm still interested in writing a chrony
client in Go or Python so I can create a fully integrated metrics exporter.

- Ben Kochie

On Thu, Mar 17, 2016 at 4:30 PM, Miroslav Lichvar <mlich...@redhat.com>
wrote:

> On Thu, Feb 25, 2016 at 12:51:49PM +0100, Miroslav Lichvar wrote:
> > On Wed, Feb 24, 2016 at 04:18:36PM +0100, Ben Kochie wrote:
> > > The end result here is that getaddrinfo() always sorts the output of
> IPv4
> > > results and chrony will pick the first N in that list.  For example I
> have
> > > a DNS record internally that has 8 servers, and I have chrony pick 4.
> > > Every node has the same identical 4 node list instead of a random
> sampling
> > > of the 4.
>
> > In order to fix that, I guess we need to either modify the hash
> > function to include some random variable initialized on start to make
> > the hashing random, or schedule the first poll of the sources in a
> > random order.
>
> The randomization of sources in the hash table is now implemented in
> git if you would like to try it.
>
> Also, chronyc now has a -c option to print reports in a CSV format,
> all values in the same units. It should be much easier to parse if
> anyone still needs to do that.
>
> --
> Miroslav Lichvar
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


Re: [chrony-users] DNS RR and chrony

2016-03-17 Thread Ben Kochie
On Thu, Mar 17, 2016 at 8:04 PM, Bryan Christianson 
wrote:

>
> > On 18/03/2016, at 6:34 AM, Miroslav Lichvar  wrote:
> >
> >
> > If you start chronyc with -c and write commands to the stdin, the
> > tracking command will still print the report in the CSV format.
>
> Oh - Thats cool thanks :)
>
> >
> >> Would it be possible to add an optional argument to the internal
> chronyc commands to have them also return CSV format? Optionally JSON and
> XML would be nice formats as well.
> >
> > Would JSON or XML be easier for you to parse? I guess it wouldn't be
> > difficult to implement, I'm just not sure if it's worth the extra
> > code.
>
> Its semantically better - means the client code is not dependent on the
> field order but can just do lookup by name once the data is loaded. Anyway
> - its not essential, so up to you.
>

I think implementing client libraries that can replace chronyc are a better
direction.  Or if chronyd were to implement a more standard protocol
library like http://www.grpc.io/ things would be easier to automate and
instrument.


>
> Thanks
> B
>
> —
> Bryan Christianson
> br...@whatroute.net
>
>
>
>
> --
> To unsubscribe email chrony-users-requ...@chrony.tuxfamily.org
> with "unsubscribe" in the subject.
> For help email chrony-users-requ...@chrony.tuxfamily.org
> with "help" in the subject.
> Trouble?  Email listmas...@chrony.tuxfamily.org.
>
>


[chrony-users] Leap second coming up.

2016-12-19 Thread Ben Kochie
So there is a leap second coming up, I had a couple questions about the
chrony behavior.

When running more than one tier of chrony.  Say I have 5 servers syncing to
stratum 1 servers in pool.ntp.org, and a number of servers syncing to those
5 behind a firewall.

Do I configure the 5 external stratum 2 servers to handle the leap second,
or do I configure the internal slaves?  Or both tiers?