Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Willy Tarreau
Hi,

On Mon, Nov 14, 2016 at 03:29:58PM +, Mirek Svoboda wrote:
> What if we have the descriptions in the source code, serving as a single
> source of truth, and generate the JSON schema file from the source code
> upon build?

... or on the fly. That's what I was thinking as well. Ie
"show stats json-schema" and use that output.

> There might be also another use case for the descriptions in the source
> code in the future, though cannot come with an example now.

Clearly the source code doesn't need the descriptions, however it's the
easiest place to ensure consistency. When you add a new field and you
only have to type 5 words in a 3rd column, you have no excuse for not
doing it. When you have to open a file you don't know exists, or to try
to remember what file this was because you remember being instructed to
do so in the past, it's quite different.

Regards,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Mirek Svoboda
Hi

> > OK. So does this mean that a schema will have to be maintained by hand in
> > parallel or will it be deduced from the dump ? I'm starting to be worried
> > about something not being kept up to date if we have to maintain it, or
> > causing a slow down in adoption of new stats entries.
>
> I envisage the schema being maintained in the same way that documentation
> is. In the draft schema I posted it should not be necessary to update each
> time a new item is added to the output of show flow or show info. Rather,
> the schema would need to be updated if the format of the data changes some
> how: f.e. a new field is added which would be analagous to adding a new
> column to the output of typed output, or a new type of value, such as u16,
> was added.
>

What if we have the descriptions in the source code, serving as a single
source of truth, and generate the JSON schema file from the source code
upon build?
There might be also another use case for the descriptions in the source
code in the future, though cannot come with an example now.

Regards,
Mirek Svoboda

>


Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Simon Horman
On Mon, Nov 14, 2016 at 08:50:54AM -0500, hapr...@stormcloud9.net wrote:
> Might help to see an example of what the results look like when using
> this schema, however I do have one comment below.

Yes, agreed. I plan to work on making that so.

> On 2016/11/14 03:09, Simon Horman wrote:
> > Hi Willy, Hi All,
> >
> > On Thu, Nov 10, 2016 at 04:52:56PM +0100, Willy Tarreau wrote:
> >> Hi Simon!
> >>
> >> On Thu, Nov 10, 2016 at 04:27:15PM +0100, Simon Horman wrote:
> >>> My preference is to take things calmly as TBH I am only just getting
> >>> started on this and I think the schema could take a little time to get
> >>> a consensus on.
> >> I totally agree with you. I think the most difficult thing is not to
> >> run over a few arrays and dump them but manage to make everyone agree
> >> on the schema. And that will take more than a few days I guess. Anyway
> >> I'm fine with being proven wrong :-)
> > I took a first pass at defining a schema.
> >
> > * The schema follows what is described on json-schema.org (or at least
> >   tries to). Is this a suitable approach?
> > * The schema only covers "show info" and "show stat" and the fields
> >   are based on the typed output variants of those commands.
> >   This leads me to several questions:
> >   - Is this field selection desirable? It seems to make sense to me
> > as presumably the intention of the JSON output is for it to
> > be machine readable.
> >   - Is such an approach appropriate for other show commands?
> >   - And more generally, which other show commands are desired to
> > support output in JSON (in the near term)?
> >
> > {
> > "$schema": "http://json-schema.org/draft-04/schema#;,
> > "oneOf": [
> > {
> > "title": "Info",
> > "description": "Info about HAProxy status",
> > "type": "array",
> > "items": {
> > "properties": {
> > "title": "Info Item",
> > "type": "object",
> > "field": { "$ref": "#/definitions/field" },
> > "processNum": { "$ref": "#/definitions/processNum" },
> > "tags": { "$ref": "#/definitions/tags" },
> > "value": { "$ref": "#/definitions/typedValue" }
> > },
> > "required": ["field", "processNum", "tags", "value"]
> > }
> > },
> > {
> > "title": "Stat",
> > "description": "HAProxy statistics",
> > "type": "array",
> > "items": {
> > "title": "Info Item",
> > "type": "object",
> > "properties": {
> > "objType": {
> > "enum": ["F", // Frontend
> >  "B", // Backend
> >  "L", // Listener
> >  "S"  // Server
> Do we really need to save a few bytes and abbreviate these? We're
> already far more chatty than the CSV output as you're outputting field
> names (e.g. "proxyId" and "processNum"), so abbreviating the values when
> you've got full field names seems rather contrary. And then as you've
> demonstrated, this requires defining a "sub-schema" for explaining what
> "F", "B", etc, are. Thus requiring anyone parsing the json to have to
> keep a mapping of the values (and do the translation) within their code.
> Ditto for all the other "enum" types down below.

Good point. I'm not sure why that didn't occur to me.
But it does seem like a good idea.

> > ]
> > },
> > "proxyId": {
> > "type": "integer",
> > "minimum": 0
> > },
> > "id": {
> > "description": "Unique identifyier of object within 
> > proxy",
> > "type": "integer",
> > "minimum": 0
> > },
> > "field": { "$ref": "#/definitions/field" },
> > "processNum": { "$ref": "#/definitions/processNum" },
> > "tags": { "$ref": "#/definitions/tags" },
> > "typedValue": { "$ref": "#/definitions/typedValue" }
> > },
> > "required": ["objType", "proxyId", "id", "field", 
> > "processNum",
> >  "tags", "value"]
> > }
> > }
> > ],
> > "definitions": {
> > "field": {
> > "type": "object",
> > "pos": {
> > "description": "Position of field",
> > "type": "integer",
> > "minimum": 0
> > },
> > "name": {
> > "description": "Name of field",
> > "type": "string"
> > },
> > "required": ["pos", "name"]
> > },
> > "processNum": {
> > 

Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Simon Horman
Hi Willy,

On Mon, Nov 14, 2016 at 03:10:18PM +0100, Willy Tarreau wrote:
> On Mon, Nov 14, 2016 at 11:34:18AM +0100, Simon Horman wrote:
> > > Sometimes a description like above appears in your example, is it just
> > > for a few fields or do you intend to describe all of them ? I'm asking
> > > because we don't have such descriptions right now, and while I won't
> > > deny that forcing contributors to add one when adding new stats could be
> > > reasonable (it's like doc), I fear that it would significantly inflate
> > > the output.
> > 
> > My understanding is that the description is part of the schema but would
> > not be included in a JSON instance. Or on other words, would not
> > be included in the output of a show command.
> 
> OK. So does this mean that a schema will have to be maintained by hand in
> parallel or will it be deduced from the dump ? I'm starting to be worried
> about something not being kept up to date if we have to maintain it, or
> causing a slow down in adoption of new stats entries.

I envisage the schema being maintained in the same way that documentation
is. In the draft schema I posted it should not be necessary to update each
time a new item is added to the output of show flow or show info. Rather,
the schema would need to be updated if the format of the data changes some
how: f.e. a new field is added which would be analagous to adding a new
column to the output of typed output, or a new type of value, such as u16,
was added.

> > My intention was to add descriptions for all fields. But in many case
> > the field name seemed to be sufficiently descriptive or at least I couldn't
> > think of a better description. And in such cases I omitted the description
> > to avoid being repetitive.
> 
> OK that's a good point. So we can possibly have a first implementation reusing
> the field name everywhere, and later make these descriptions mandatory in the
> code for new fields so that the output description becomes more readable.
> 
> > I do not feel strongly about the descriptions. I'm happy to remove some or
> > all of them if they are deemed unnecessary or otherwise undesirable; to add
> > them to every field for consistency; or something in between.
> 
> I think dumping only known descriptions and falling back to the name (or
> simply suggesting that the consumer just uses the same when there's no desc)
> sounds reasonable to me for now.
> 
> > > Also, do you have an idea about the verbosity of the dump here ? For
> > > example let's say you have 100 listeners with 4 servers each (which is
> > > an average sized config). I'm just looking for a rought order of 
> > > magnitude,
> > > ie closer to 10-100k or to 1-10M. The typed output is already quite heavy
> > > for large configs so it should not be a big deal, but it's something we
> > > have to keep in mind.
> > 
> > I don't think the type, description, etc... should be included in such
> > output as they can be supplied by the schema out-of-band. But the field
> > name and values along with syntactic elements (brackets, quotes, etc...) do
> > need to be included.
> 
> OK.
> 
> > I can try and come up with an estimate if it is
> > important but my guess is the result would be several times the size of the
> > typed output (mainly owing to the size of the field names in the output).
> 
> No, don't worry, this rough estimate is enough.

-- 
Simon Horman  si...@horms.nl
Horms Solutions BV  www.horms.nl
Parnassusweg 819, 1082 LZ Amsterdam, Netherlands
Tel: +31 (0)20 800 6155Skype: horms7



Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Willy Tarreau
Hi Simon,

On Mon, Nov 14, 2016 at 09:09:21AM +0100, Simon Horman wrote:
> I took a first pass at defining a schema.
> 
> * The schema follows what is described on json-schema.org (or at least
>   tries to). Is this a suitable approach?

I'll let others respond as I have no idea since I never need nor use JSON :-)

> * The schema only covers "show info" and "show stat" and the fields
>   are based on the typed output variants of those commands.
>   This leads me to several questions:
>   - Is this field selection desirable? It seems to make sense to me
> as presumably the intention of the JSON output is for it to
> be machine readable.

Yes in my opinion it's the goal. And these are the two only parts that
were converted to typed output for this reason.

>   - Is such an approach appropriate for other show commands?

At the moment I don't think so because the other ones are more related
to state management than statistics.

>   - And more generally, which other show commands are desired to
> support output in JSON (in the near term)?

I can't think of any right now.

However I have a question below :

> "id": {
> "description": "Unique identifyier of object within 
> proxy",
> "type": "integer",
> "minimum": 0
> },

Sometimes a description like above appears in your example, is it just
for a few fields or do you intend to describe all of them ? I'm asking
because we don't have such descriptions right now, and while I won't
deny that forcing contributors to add one when adding new stats could be
reasonable (it's like doc), I fear that it would significantly inflate
the output.

Also, do you have an idea about the verbosity of the dump here ? For
example let's say you have 100 listeners with 4 servers each (which is
an average sized config). I'm just looking for a rought order of magnitude,
ie closer to 10-100k or to 1-10M. The typed output is already quite heavy
for large configs so it should not be a big deal, but it's something we
have to keep in mind.

Oh BTW just to let you know, I'm working on a painful bug and possibly a
small regression which will force me to revert some recent fixes, so you
may still have a bit of time left :-)

Thanks,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-11-14 Thread Simon Horman
Hi Willy, Hi All,

On Thu, Nov 10, 2016 at 04:52:56PM +0100, Willy Tarreau wrote:
> Hi Simon!
> 
> On Thu, Nov 10, 2016 at 04:27:15PM +0100, Simon Horman wrote:
> > My preference is to take things calmly as TBH I am only just getting
> > started on this and I think the schema could take a little time to get
> > a consensus on.
> 
> I totally agree with you. I think the most difficult thing is not to
> run over a few arrays and dump them but manage to make everyone agree
> on the schema. And that will take more than a few days I guess. Anyway
> I'm fine with being proven wrong :-)

I took a first pass at defining a schema.

* The schema follows what is described on json-schema.org (or at least
  tries to). Is this a suitable approach?
* The schema only covers "show info" and "show stat" and the fields
  are based on the typed output variants of those commands.
  This leads me to several questions:
  - Is this field selection desirable? It seems to make sense to me
as presumably the intention of the JSON output is for it to
be machine readable.
  - Is such an approach appropriate for other show commands?
  - And more generally, which other show commands are desired to
support output in JSON (in the near term)?

{
"$schema": "http://json-schema.org/draft-04/schema#;,
"oneOf": [
{
"title": "Info",
"description": "Info about HAProxy status",
"type": "array",
"items": {
"properties": {
"title": "Info Item",
"type": "object",
"field": { "$ref": "#/definitions/field" },
"processNum": { "$ref": "#/definitions/processNum" },
"tags": { "$ref": "#/definitions/tags" },
"value": { "$ref": "#/definitions/typedValue" }
},
"required": ["field", "processNum", "tags", "value"]
}
},
{
"title": "Stat",
"description": "HAProxy statistics",
"type": "array",
"items": {
"title": "Info Item",
"type": "object",
"properties": {
"objType": {
"enum": ["F", // Frontend
 "B", // Backend
 "L", // Listener
 "S"  // Server
]
},
"proxyId": {
"type": "integer",
"minimum": 0
},
"id": {
"description": "Unique identifyier of object within 
proxy",
"type": "integer",
"minimum": 0
},
"field": { "$ref": "#/definitions/field" },
"processNum": { "$ref": "#/definitions/processNum" },
"tags": { "$ref": "#/definitions/tags" },
"typedValue": { "$ref": "#/definitions/typedValue" }
},
"required": ["objType", "proxyId", "id", "field", "processNum",
 "tags", "value"]
}
}
],
"definitions": {
"field": {
"type": "object",
"pos": {
"description": "Position of field",
"type": "integer",
"minimum": 0
},
"name": {
"description": "Name of field",
"type": "string"
},
"required": ["pos", "name"]
},
"processNum": {
"description": "Relative process number",
"type": "integer",
"minimum": 1
},
"tags": {
"type": "object",
"origin": {
"description": "Origin value was extracted from",
"type": "string",
"enum": ["M", // Metric
 "S", // Status
 "K", // Sorting Key
 "C", // From Configuration
 "P"  // From Product
]
},
"nature": {
"description": "Nature of information carried by field",
"type": "string",
"enum": ["A", // Age since last event
 "a", // Averaged value
 "C", // Cumulative counter
 "D", // Duration for a status
 "G", // Gague - measure at one instant
 "L", // Limit
 "M", // Maximum
 "m", // Minimum
 "N", // Name
 "O", // Free text output
 "R", // Event rate - measure at one instant
 "T"  // Date or time
]
},
  

Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread ge...@riseup.net
Hi,

On 16-11-10 16:56:33, Willy Tarreau wrote:
> I removed you from the To in this response, but just as a hint we
> generally recommend to keep people CCed since most of us subscribed
> to lists have filters to automatically place them in the right box,
> and some people may participate without being subscribed. 

Yeah, I'm using filtering as well, but this doesn't deal with getting
the same mail(s) multiple times.

> On most lists, when people don't want to be automatically CCed on
> replies, they simply set their Reply-To header to the list's address.

Thanks, wasn't aware of this. I did so now.

> OK but just so that there's no misunderstanding, next release will be
> in approx one year. However if the patch is merged early, it will very
> likely apply well to the stable release meaning you can easily add it
> to your own packages.

Ah, I see, wasn't aware of this. Well then...this is fine as well.. :)

Cheers,
Georg


signature.asc
Description: Digital signature


Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Willy Tarreau
On Thu, Nov 10, 2016 at 04:30:57PM +0100, ge...@riseup.net wrote:
> (Please don't Cc: me, I'm subscribed to the list.)

I removed you from the To in this response, but just as a hint we
generally recommend to keep people CCed since most of us subscribed
to lists have filters to automatically place them in the right box,
and some people may participate without being subscribed. On most
lists, when people don't want to be automatically CCed on replies,
they simply set their Reply-To header to the list's address.

> Even if I'm not Simon, I'll say a word, hope thats okay, because I've
> dug out this old thread: It's fine for me if it will go into 1.7 or
> 1.8. I don't need this within the next two weeks, but looking forward to
> use it. If it will take another four, six or eight weeks, this is
> completely fine with me.

OK but just so that there's no misunderstanding, next release will be in
approx one year. However if the patch is merged early, it will very likely
apply well to the stable release meaning you can easily add it to your own
packages.

Cheers,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Willy Tarreau
Hi Simon!

On Thu, Nov 10, 2016 at 04:27:15PM +0100, Simon Horman wrote:
> My preference is to take things calmly as TBH I am only just getting
> started on this and I think the schema could take a little time to get
> a consensus on.

I totally agree with you. I think the most difficult thing is not to
run over a few arrays and dump them but manage to make everyone agree
on the schema. And that will take more than a few days I guess. Anyway
I'm fine with being proven wrong :-)

Cheers,
Willy




Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread ge...@riseup.net
(Please don't Cc: me, I'm subscribed to the list.)

On 16-11-10 16:12:31, Willy Tarreau wrote:
> That's cool!
> 
> The only thing is that I don't want to delay the release only for this,
> and at the same time I'm pretty sure it's possible to do something which
> will not impact existing code within a reasonable time frame. I just
> don't know how long it takes to make everyone agree on the schema. My
> intent is to release 1.7 by the end of next week *if we don't discover
> new scary bugs*. So if you think it's doable by then, that's fine. Or
> if you want to buy more time, you need to discover a big bug which will
> keep me busy and cause the release to be delayed ;-) Otherwise I think
> it will have to be in 1.8.
> 
> Note, to be clear, if many people insist on having this, we don't have an
> emergency to release by the end of next week, but it's just a policy we
> cannot pursue forever, at least by respect for those who were pressured
> to send their stuff in time. So I think that we can negociate one extra
> week if we're sure to have something completed, but only if people here
> insist on having it in 1.7.
> 
> Thus the first one who has a word to say is obviously Simon : if you
> think that even two weeks are not achievable, let's calmly postpone
> and avoid any stress.

Even if I'm not Simon, I'll say a word, hope thats okay, because I've
dug out this old thread: It's fine for me if it will go into 1.7 or
1.8. I don't need this within the next two weeks, but looking forward to
use it. If it will take another four, six or eight weeks, this is
completely fine with me.

All the best,
Georg


signature.asc
Description: Digital signature


Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Simon Horman
On Thu, Nov 10, 2016 at 04:12:31PM +0100, Willy Tarreau wrote:
> Hi Malcolm,
> 
> On Thu, Nov 10, 2016 at 12:53:13PM +, Malcolm Turnbull wrote:
> > Georg,
> > 
> > That's a timely reminder thanks:
> > I just had another chat with Simon Horman who has kindly offered to
> > take a look at this again.
> 
> That's cool!
> 
> The only thing is that I don't want to delay the release only for this,
> and at the same time I'm pretty sure it's possible to do something which
> will not impact existing code within a reasonable time frame. I just
> don't know how long it takes to make everyone agree on the schema. My
> intent is to release 1.7 by the end of next week *if we don't discover
> new scary bugs*. So if you think it's doable by then, that's fine. Or
> if you want to buy more time, you need to discover a big bug which will
> keep me busy and cause the release to be delayed ;-) Otherwise I think
> it will have to be in 1.8.
> 
> Note, to be clear, if many people insist on having this, we don't have an
> emergency to release by the end of next week, but it's just a policy we
> cannot pursue forever, at least by respect for those who were pressured
> to send their stuff in time. So I think that we can negociate one extra
> week if we're sure to have something completed, but only if people here
> insist on having it in 1.7.
> 
> Thus the first one who has a word to say is obviously Simon : if you
> think that even two weeks are not achievable, let's calmly postpone and
> avoid any stress.

My preference is to take things calmly as TBH I am only just getting
started on this and I think the schema could take a little time to get
a consensus on.



Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Willy Tarreau
Hi Malcolm,

On Thu, Nov 10, 2016 at 12:53:13PM +, Malcolm Turnbull wrote:
> Georg,
> 
> That's a timely reminder thanks:
> I just had another chat with Simon Horman who has kindly offered to
> take a look at this again.

That's cool!

The only thing is that I don't want to delay the release only for this,
and at the same time I'm pretty sure it's possible to do something which
will not impact existing code within a reasonable time frame. I just
don't know how long it takes to make everyone agree on the schema. My
intent is to release 1.7 by the end of next week *if we don't discover
new scary bugs*. So if you think it's doable by then, that's fine. Or
if you want to buy more time, you need to discover a big bug which will
keep me busy and cause the release to be delayed ;-) Otherwise I think
it will have to be in 1.8.

Note, to be clear, if many people insist on having this, we don't have an
emergency to release by the end of next week, but it's just a policy we
cannot pursue forever, at least by respect for those who were pressured
to send their stuff in time. So I think that we can negociate one extra
week if we're sure to have something completed, but only if people here
insist on having it in 1.7.

Thus the first one who has a word to say is obviously Simon : if you
think that even two weeks are not achievable, let's calmly postpone and
avoid any stress.

Thanks,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Dave Cottlehuber
On Thu, 10 Nov 2016, at 13:53, Malcolm Turnbull wrote:
> Georg,
> 
> That's a timely reminder thanks:
> I just had another chat with Simon Horman who has kindly offered to
> take a look at this again.

Sounds great!

I'm very interested in logging this continually via chrooted unix
socket,
into both riemann & rsyslog and into graylog/splunk. I'm happy to help
test
and contribute documentation as well.

I was planning to use riemann-tools with csv format
 https://github.com/riemann/riemann-tools/blob/master/bin/riemann-haproxy 

A+
Dave



Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread Malcolm Turnbull
Georg,

That's a timely reminder thanks:
I just had another chat with Simon Horman who has kindly offered to
take a look at this again.




On 10 November 2016 at 10:54, ge...@riseup.net  wrote:
> Hi all,
>
> On 16-07-05 10:05:13, Mark Brookes wrote:
>> I wondered if we could start a discussion about the possibility of
>> having the stats socket return stats data in JSON format.
>
> After the discussion we had in July, I'm wondering what's the current
> status regarding this topic?
>
> Thanks and all the best,
> Georg



-- 
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)330 380 1064
http://www.loadbalancer.org/



Re: Getting JSON encoded data from the stats socket.

2016-11-10 Thread ge...@riseup.net
Hi all,

On 16-07-05 10:05:13, Mark Brookes wrote:
> I wondered if we could start a discussion about the possibility of
> having the stats socket return stats data in JSON format.

After the discussion we had in July, I'm wondering what's the current
status regarding this topic?

Thanks and all the best,
Georg


signature.asc
Description: Digital signature


Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread ge...@riseup.net
Hi,

On 16-07-26 21:47:55, Willy Tarreau wrote:
> I'd like to wait for other people to have the time to participate to
> this discussion, I know that some people are very careful about the
> relevance and accuracy of the stats, some people may want to report
> other suggestions.

I can't add that much, and have no specific suggestions, so just this:

(I'm a long time user of HAProxy, my setups aren't that big, mostly
around 50 backends, but I absolutely love the software. Thanks for this
great work!)

Regarding the topic: I absolutely support the proposal to dump the stats
into json. In my opinion, this is a much more easily parseable (and
modern) format, instead of csv. I think that grouping by process makes
sense, but to include "overall stats" as well. Additionally, I support
your view Willy, about the amount of the data to dump: I would speak in
favor of "dumping as much as possible", because, not sure if I got this
right, it's already possible to do so, it just needs support to dump to
json. Better safe then sorry, let's include all the data which _might_
be of interest, instead of data which _is now_ of interest. If some
"useless" (for now) data would be dumped...so what?

Thanks for the proposal Pavlos!

Cheers,
Georg


signature.asc
Description: Digital signature


Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Willy Tarreau
On Tue, Jul 26, 2016 at 09:06:05PM +0200, Pavlos Parissis wrote:
> > You probably have not looked at the output of "show stats typed", it
> > gives you the nature of each value letting you know how to aggregate
> > them (min, max, avg, sum, pick any, etc).
> > 
> 
> I have seen it but it isn't available on 1.6. It could simplify my code, I
> should give a try.

Ah indeed you're right. Well it's not in 1.6 mainline but we backported
it to hapee-1.6 in case that's relevant to the machines you're interested
in.

> >> The stats are already aggregated and few metrics are excluded. For example 
> >> all status stuff.
> >> Each process performs healthchecking, so they act as little brains which 
> >> never agree on the
> >> status of a server as they run their checks on different interval.
> > 
> > Absolutely, but at least you want to see their stats. For example how many
> > times a server has switched state per process then in total (meaning a
> > proportional amount of possibly visible issues).
> > 
> 
> True, but in setups with ECMP in front of N HAProxy nodes which run in nbproc 
> mode you offload
> application healthchecking to a dedicated daemon which runs on servers(service
> discovery+service availability with consul/zookeeper stuff) and you only run 
> TCP checks
> from HAProxy.
> 
> In our setup we don't real care about how many times a server flapped, it 
> doesn't tell us
> something we don't know already, application is in broken state.

In such a case I agree.

> But, other people may find it useful.

Anyway that was just an example, what I meant by this is that we must
take care not to selectively pick some elements and not other ones. I
prefer that the output contains 10% of useless stuff and that we never
have anything special to do for the upcoming stuff to automatically
appear than to have to explicitly add new stuff all the time! When
you see the size of the csv dump function right now, it's a joke
and I really expect the JSON dump to follow the same philosophy.

> > My issue is that if the *format* doesn't support per-process stats, we'll 
> > have
> > to emit a new format 3 months later for all the people who want to process 
> > it.
> > We've reworked the stats dump to put an end to the problem where depending 
> > on
> > the output format you used to have different types of information, and there
> > was no single representation carrying them all at once. For me now it's
> > essential that if we prepare a new format it's not stripped down from the
> > info people need, otherwise it will automatically engender yet another 
> > format.
> > 
> 
> Agree. I am fine giving per process stats for servers/frontends/backends.
> Adding another top level key 'per_process' in my proposal should be a good 
> start:
> 
> {
> "per_process": {
> "proc1": {
> "frontend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> ...
(...)

Yes, I think so and that's also more or less similar to what Mark proposed.
Also I'm not much worried by the extra output size, if we dump this through
HTTP we'll have it gzipped.

Also, we want to have the values typed otherwise you're fucked as we used
to be with the CSV dump in the past. The current code supports this and
that's important. I don't know how it may impact the JSON output. Maybe
some parts will be just "numbers", but I remember that certain of them
have some properties (eg: max, limit, age, percentage, PID, SNMP ID, etc).
I'm less worried about the strings, we basically have identifiers,
descriptions and outputs from what I remember. But taking a look at this
will help refine the format.

I'd like to wait for other people to have the time to participate to this
discussion, I know that some people are very careful about the relevance
and accuracy of the stats, some people may want to report other suggestions.

Cheers,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Pavlos Parissis
On 26/07/2016 06:56 μμ, Willy Tarreau wrote:
> On Tue, Jul 26, 2016 at 05:51:08PM +0200, Pavlos Parissis wrote:
>> In all my setups I have nbproc > 1 and after a lot of changes and on how I 
>> aggregate HAProxy
>> stats and what most people want to see on graphs, I came up that with 
>> something like the following:
>>
>> {
>> "frontend": {
>> "www.haproxy.org": {
>> "bin": "",
>> "lbtot": "55",
>> ...
>> },
>> "www.haproxy.com": {
>> "bin": "",
>> "lbtot": "55",
>> ...
>> },
>> },
>> "backend": {
>> "www.haproxy.org": {
>> "bin": "",
>> "lbtot": "55",
>> 
>> "server": {
>> "srv1": {
>> "bin": "",
>> "lbtot": "55",
>>
>> },
>> ...
>> },
>> },
>> },
>> "haproxy": {
>> "PipesFree": "555",
>> ...
>> ,
>> "per_process": {
>> "id1": {
>> "PipesFree": "555",
>> "Process_num": "1",
>> ...
>> },
>> "id2": {
>> "PipesFree": "555",
>> "Process_num": "2",
>> ...
>> },
>> ...
>> },
>> },
>> "server": {
>> "srv1": {
>> "bin": "",
>> "lbtot": "55",
>> ...
>> },
>> ...
>> },
>> }
>>
>>
>> Let me explain a bit:
>>
>> - It is very useful and handy to know stats for a server per backend but 
>> also across all
>> backends. Thus, I include a top level key 'server' which holds stats for 
>> each server across all
>> backends. Few server's stats has to be excluded as they are meaningless in 
>> this context.
>> For example, status, lastchg, check_duration, check_code and few others. For 
>> those which aren't
>> counters but fixed numbers you want to either sum them(slim) or get the 
>> average(weight). I
>> don't do the latter in my setup.
> 
> You probably have not looked at the output of "show stats typed", it
> gives you the nature of each value letting you know how to aggregate
> them (min, max, avg, sum, pick any, etc).
> 

I have seen it but it isn't available on 1.6. It could simplify my code, I 
should give a try.

>> - Aggregation across multiple processes for haproxy stats(show info output)
> 
> It's not only "show info", this one reports only the process health.
> 
>> As you can see I provide stats per process and across all processes.
>> It has been proven very useful to know the CPU utilization per process. We 
>> depend on the kernel
>> to do the distribution of incoming connects to all processes and so far it 
>> works very well, but
>> sometimes you see a single process to consume a lot of CPU and if you don't 
>> provide percentiles
>> or stats per process then you are going to miss it. The metrics about 
>> uptime, version,
>> description and few other can be excluded in the aggregation.
> 
> These last ones are in the "pick any" type of aggregation I was talking about.
> 
>> - nbproc > 1 and aggregation for frontend/backend/server
>> My proposal doesn't cover stats for frontend/backend/server per haproxy 
>> process.
> 
> But that's precisely the limitation I'm reporting :-)
> 
>> The stats are already aggregated and few metrics are excluded. For example 
>> all status stuff.
>> Each process performs healthchecking, so they act as little brains which 
>> never agree on the
>> status of a server as they run their checks on different interval.
> 
> Absolutely, but at least you want to see their stats. For example how many
> times a server has switched state per process then in total (meaning a
> proportional amount of possibly visible issues).
> 

True, but in setups with ECMP in front of N HAProxy nodes which run in nbproc 
mode you offload
application healthchecking to a dedicated daemon which runs on servers(service
discovery+service availability with consul/zookeeper stuff) and you only run 
TCP checks
from HAProxy.

In our setup we don't real care about how many times a server flapped, it 
doesn't tell us
something we don't know already, application is in broken state.

But, other people may find it useful.

> My issue is that if the *format* doesn't support per-process stats, we'll have
> to emit a new format 3 months later for all the people who want to process it.
> We've reworked the stats dump to put an end to the problem where depending on
> the output format you used to have different types of information, and there
> was no single representation carrying them all at once. For me now it's
> essential that if we prepare a new format it's not stripped down from the
> info people need, otherwise it will automatically engender yet another 

Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Willy Tarreau
On Tue, Jul 26, 2016 at 05:51:08PM +0200, Pavlos Parissis wrote:
> In all my setups I have nbproc > 1 and after a lot of changes and on how I 
> aggregate HAProxy
> stats and what most people want to see on graphs, I came up that with 
> something like the following:
> 
> {
> "frontend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> ...
> },
> "www.haproxy.com": {
> "bin": "",
> "lbtot": "55",
> ...
> },
> },
> "backend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> 
> "server": {
> "srv1": {
> "bin": "",
> "lbtot": "55",
>
> },
> ...
> },
> },
> },
> "haproxy": {
> "PipesFree": "555",
> ...
> ,
> "per_process": {
> "id1": {
> "PipesFree": "555",
> "Process_num": "1",
> ...
> },
> "id2": {
> "PipesFree": "555",
> "Process_num": "2",
> ...
> },
> ...
> },
> },
> "server": {
> "srv1": {
> "bin": "",
> "lbtot": "55",
> ...
> },
> ...
> },
> }
> 
> 
> Let me explain a bit:
> 
> - It is very useful and handy to know stats for a server per backend but also 
> across all
> backends. Thus, I include a top level key 'server' which holds stats for each 
> server across all
> backends. Few server's stats has to be excluded as they are meaningless in 
> this context.
> For example, status, lastchg, check_duration, check_code and few others. For 
> those which aren't
> counters but fixed numbers you want to either sum them(slim) or get the 
> average(weight). I
> don't do the latter in my setup.

You probably have not looked at the output of "show stats typed", it
gives you the nature of each value letting you know how to aggregate
them (min, max, avg, sum, pick any, etc).

> - Aggregation across multiple processes for haproxy stats(show info output)

It's not only "show info", this one reports only the process health.

> As you can see I provide stats per process and across all processes.
> It has been proven very useful to know the CPU utilization per process. We 
> depend on the kernel
> to do the distribution of incoming connects to all processes and so far it 
> works very well, but
> sometimes you see a single process to consume a lot of CPU and if you don't 
> provide percentiles
> or stats per process then you are going to miss it. The metrics about uptime, 
> version,
> description and few other can be excluded in the aggregation.

These last ones are in the "pick any" type of aggregation I was talking about.

> - nbproc > 1 and aggregation for frontend/backend/server
> My proposal doesn't cover stats for frontend/backend/server per haproxy 
> process.

But that's precisely the limitation I'm reporting :-)

> The stats are already aggregated and few metrics are excluded. For example 
> all status stuff.
> Each process performs healthchecking, so they act as little brains which 
> never agree on the
> status of a server as they run their checks on different interval.

Absolutely, but at least you want to see their stats. For example how many
times a server has switched state per process then in total (meaning a
proportional amount of possibly visible issues).

My issue is that if the *format* doesn't support per-process stats, we'll have
to emit a new format 3 months later for all the people who want to process it.
We've reworked the stats dump to put an end to the problem where depending on
the output format you used to have different types of information, and there
was no single representation carrying them all at once. For me now it's
essential that if we prepare a new format it's not stripped down from the
info people need, otherwise it will automatically engender yet another format.

Thanks,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Pavlos Parissis
On 26/07/2016 03:30 μμ, Willy Tarreau wrote:
> Hi Pavlos!
> 
> On Tue, Jul 26, 2016 at 03:23:01PM +0200, Pavlos Parissis wrote:
>> Here is a suggestion { "frontend": { "www.haproxy.org": { "bin": 
>> "", "lbtot":
>> "55", ... }, "www.haproxy.com": { "bin": "", "lbtot": 
>> "55", ... }, }, 
>> "backend": { "www.haproxy.org": { "bin": "", "lbtot": "55", 
>>  "server":
>> { "srv1": { "bin": "", "lbtot": "55",  }, ... }
>> 
>> }, }, "haproxy": { "id1": { "PipesFree": "555", "Process_num": "1", ... }, 
>> "id2": { 
>> "PipesFree": "555", "Process_num": "2", ... }, ... }, }
> 
> Thanks. How does it scale if we later want to aggregate these ones over 
> multiple processes
> and/or nodes ? The typed output already emits a process number for each 
> field. Also, we do
> have the information of how data need to be parsed and aggregated. I suspect 
> that we want to
> produce this with the JSON output as well so that we don't lose information 
> when dumping in
> JSON mode. I would not be surprized if people find JSON easier to process 
> than our current
> format to aggregate their stats, provided we have all the fields :-)
> 
> Cheers, Willy
> 

I am glad you asked about aggregation as I deliberately didn't include 
aggregation.
In all my setups I have nbproc > 1 and after a lot of changes and on how I 
aggregate HAProxy
stats and what most people want to see on graphs, I came up that with something 
like the following:

{
"frontend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",
...
},
"www.haproxy.com": {
"bin": "",
"lbtot": "55",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",

"server": {
"srv1": {
"bin": "",
"lbtot": "55",
   
},
...
},
},
},
"haproxy": {
"PipesFree": "555",
...
,
"per_process": {
"id1": {
"PipesFree": "555",
"Process_num": "1",
...
},
"id2": {
"PipesFree": "555",
"Process_num": "2",
...
},
...
},
},
"server": {
"srv1": {
"bin": "",
"lbtot": "55",
...
},
...
},
}


Let me explain a bit:

- It is very useful and handy to know stats for a server per backend but also 
across all
backends. Thus, I include a top level key 'server' which holds stats for each 
server across all
backends. Few server's stats has to be excluded as they are meaningless in this 
context.
For example, status, lastchg, check_duration, check_code and few others. For 
those which aren't
counters but fixed numbers you want to either sum them(slim) or get the 
average(weight). I
don't do the latter in my setup.

- Aggregation across multiple processes for haproxy stats(show info output)
As you can see I provide stats per process and across all processes.
It has been proven very useful to know the CPU utilization per process. We 
depend on the kernel
to do the distribution of incoming connects to all processes and so far it 
works very well, but
sometimes you see a single process to consume a lot of CPU and if you don't 
provide percentiles
or stats per process then you are going to miss it. The metrics about uptime, 
version,
description and few other can be excluded in the aggregation.


- nbproc > 1 and aggregation for frontend/backend/server
My proposal doesn't cover stats for frontend/backend/server per haproxy process.
The stats are already aggregated and few metrics are excluded. For example all 
status stuff.
Each process performs healthchecking, so they act as little brains which never 
agree on the
status of a server as they run their checks on different interval. But, if 
nbproc == 1 then
these metrics have to be included.


Cheers,
Pavlos







signature.asc
Description: OpenPGP digital signature


Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Willy Tarreau
On Tue, Jul 26, 2016 at 03:06:35PM +0100, Mark Brookes wrote:
> Could we perhaps group by the node then process_num then?
> {nodename:value:
> {pid: pid1: {
> haproxy: {
> Uptime_sec:100,
> PoolFailed:1
> }
>   stats: { "frontend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> ...
(...)

Yes I think it's fine this way because in practice, clients will consult
a single process at a time so it's easier to have per-process dumps to
aggregate later.

Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Mark Brookes
Could we perhaps group by the node then process_num then?
{nodename:value:
{pid: pid1: {
haproxy: {
Uptime_sec:100,
PoolFailed:1
}
  stats: { "frontend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",
...
},
"www.haproxy.com": {
"bin": "",
"lbtot": "55",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",

"server": {
"srv1": {
"bin": "",
"lbtot": "55",
   
},
...
}

},
{pid: pid2: { haproxy: {
Uptime_sec:100,
PoolFailed:1
}
  stats: { "frontend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",
...
},
"www.haproxy.com": {
"bin": "",
"lbtot": "55",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",

"server": {
"srv1": {
"bin": "",
"lbtot": "55",
   
},
...
}

},

ignore the close brackets im pretty sure they are wrong, but you get the idea.

On 26 July 2016 at 14:30, Willy Tarreau  wrote:
> Hi Pavlos!
>
> On Tue, Jul 26, 2016 at 03:23:01PM +0200, Pavlos Parissis wrote:
>> Here is a suggestion
>> {
>> "frontend": {
>> "www.haproxy.org": {
>> "bin": "",
>> "lbtot": "55",
>> ...
>> },
>> "www.haproxy.com": {
>> "bin": "",
>> "lbtot": "55",
>> ...
>> },
>> },
>> "backend": {
>> "www.haproxy.org": {
>> "bin": "",
>> "lbtot": "55",
>> 
>> "server": {
>> "srv1": {
>> "bin": "",
>> "lbtot": "55",
>>
>> },
>> ...
>> }
>>
>> },
>> },
>> "haproxy": {
>> "id1": {
>> "PipesFree": "555",
>> "Process_num": "1",
>> ...
>> },
>> "id2": {
>> "PipesFree": "555",
>> "Process_num": "2",
>> ...
>> },
>> ...
>> },
>> }
>
> Thanks. How does it scale if we later want to aggregate these ones over
> multiple processes and/or nodes ? The typed output already emits a
> process number for each field. Also, we do have the information of how
> data need to be parsed and aggregated. I suspect that we want to produce
> this with the JSON output as well so that we don't lose information when
> dumping in JSON mode. I would not be surprized if people find JSON easier
> to process than our current format to aggregate their stats, provided we
> have all the fields :-)
>
> Cheers,
> Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Willy Tarreau
Hi Pavlos!

On Tue, Jul 26, 2016 at 03:23:01PM +0200, Pavlos Parissis wrote:
> Here is a suggestion
> {
> "frontend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> ...
> },
> "www.haproxy.com": {
> "bin": "",
> "lbtot": "55",
> ...
> },
> },
> "backend": {
> "www.haproxy.org": {
> "bin": "",
> "lbtot": "55",
> 
> "server": {
> "srv1": {
> "bin": "",
> "lbtot": "55",
>
> },
> ...
> }
> 
> },
> },
> "haproxy": {
> "id1": {
> "PipesFree": "555",
> "Process_num": "1",
> ...
> },
> "id2": {
> "PipesFree": "555",
> "Process_num": "2",
> ...
> },
> ...
> },
> }

Thanks. How does it scale if we later want to aggregate these ones over
multiple processes and/or nodes ? The typed output already emits a
process number for each field. Also, we do have the information of how
data need to be parsed and aggregated. I suspect that we want to produce
this with the JSON output as well so that we don't lose information when
dumping in JSON mode. I would not be surprized if people find JSON easier
to process than our current format to aggregate their stats, provided we
have all the fields :-)

Cheers,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Pavlos Parissis
On 26/07/2016 03:08 μμ, Willy Tarreau wrote:
> On Tue, Jul 26, 2016 at 02:05:56PM +0100, Mark Brookes wrote:
>>> So for sure I definitely support this proposal :-)
>>
>> Thats great news. Do you have a JSON structure in mind?
>> Or would you like me to come up with something?
> 
> I'm probably the worst ever person to suggest a JSON structure. If you
> have any ideas, please bring them on the list. You know how it works,
> once nobody criticizes anymore, your design is fine. And you'll just
> have to ignore people who complain after the work is done :-)
> 
> Cheers,
> Willy
> 

Here is a suggestion
{
"frontend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",
...
},
"www.haproxy.com": {
"bin": "",
"lbtot": "55",
...
},
},
"backend": {
"www.haproxy.org": {
"bin": "",
"lbtot": "55",

"server": {
"srv1": {
"bin": "",
"lbtot": "55",
   
},
...
}

},
},
"haproxy": {
"id1": {
"PipesFree": "555",
"Process_num": "1",
...
},
"id2": {
"PipesFree": "555",
"Process_num": "2",
...
},
...
},
}

Cheers,
Pavlos




signature.asc
Description: OpenPGP digital signature


Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Willy Tarreau
On Tue, Jul 26, 2016 at 02:05:56PM +0100, Mark Brookes wrote:
> >So for sure I definitely support this proposal :-)
> 
> Thats great news. Do you have a JSON structure in mind?
> Or would you like me to come up with something?

I'm probably the worst ever person to suggest a JSON structure. If you
have any ideas, please bring them on the list. You know how it works,
once nobody criticizes anymore, your design is fine. And you'll just
have to ignore people who complain after the work is done :-)

Cheers,
Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-26 Thread Mark Brookes
>So for sure I definitely support this proposal :-)

Thats great news. Do you have a JSON structure in mind?
Or would you like me to come up with something?

On 5 July 2016 at 18:04, Willy Tarreau  wrote:
> Hi Mark,
>
> On Tue, Jul 05, 2016 at 10:05:13AM +0100, Mark Brookes wrote:
>> Hi Willy/All
>>
>> I wondered if we could start a discussion about the possibility of
>> having the stats socket return stats data in JSON format.
>>
>> Im primarily interested in the data that is returned by issuing a
>> 'show stat' which is normally returned as a csv.
>>
>> I wont go into specifics as to how the data would be structured, we
>> can decide on that later (Assuming you are happy with this idea).
>>
>> Ive approached Simon Horman and hes happy to do the work for us.
>>
>> Please let me know your thoughts
>
> Well, I completely reworked the stats internals recently for two
> purposes :
>   1) bringing the ability to dump them in another format such as JSON ;
>   2) making it easier to aggregate them over multiple processes/nodes
>
> So for sure I definitely support this proposal :-)
>
> Best regards
> Willy



Re: Getting JSON encoded data from the stats socket.

2016-07-05 Thread Willy Tarreau
Hi Mark,

On Tue, Jul 05, 2016 at 10:05:13AM +0100, Mark Brookes wrote:
> Hi Willy/All
> 
> I wondered if we could start a discussion about the possibility of
> having the stats socket return stats data in JSON format.
> 
> Im primarily interested in the data that is returned by issuing a
> 'show stat' which is normally returned as a csv.
> 
> I wont go into specifics as to how the data would be structured, we
> can decide on that later (Assuming you are happy with this idea).
> 
> Ive approached Simon Horman and hes happy to do the work for us.
> 
> Please let me know your thoughts

Well, I completely reworked the stats internals recently for two
purposes :
  1) bringing the ability to dump them in another format such as JSON ;
  2) making it easier to aggregate them over multiple processes/nodes

So for sure I definitely support this proposal :-)

Best regards
Willy



Getting JSON encoded data from the stats socket.

2016-07-05 Thread Mark Brookes
Hi Willy/All

I wondered if we could start a discussion about the possibility of
having the stats socket return stats data in JSON format.

Im primarily interested in the data that is returned by issuing a
'show stat' which is normally returned as a csv.

I wont go into specifics as to how the data would be structured, we
can decide on that later (Assuming you are happy with this idea).

Ive approached Simon Horman and hes happy to do the work for us.

Please let me know your thoughts

Thanks

Mark