Re: [pmacct-discussion] Native Elasticsearch backend development

Tim Jackson Tue, 10 Feb 2015 10:37:34 -0800

I don't output to JSON files then import, but I use perl to basically
do the same thing.. Query the pmacct IMT for how long it's been since
it was last cleared, query it for data, clear it, add more data to the
record(s) based on some imports and insert them into elasticsearch..

Using the IMT as a cache for data, on 5 minute averages into
ElasticSearch takes very little time (~5-10 seconds for our network
which is 1:2048 sampled NF v5 and a pretty large tuple of aggregates,
basically everything but source/dest IP).. This was first rolled out
with 1 minute data, which was a huge amount of data, but running this
on 1 minute the insertion/classification perl script would take about
~4-5 seconds.

In my opinion, not having the extra data that I can insert into ES
makes things a lot harder, so a native client in pmacct would need the
ability to do some extra stuff:

Correlate in/out ifIndexes with some data (e.g. an ifIndex Map)
Correlate Tags with some data (e.g. Port Type, etc)

I think the idea of being able to do this natively in pmacct is great,
but I don't mind the small hit at all for being able to flexibly add
more data from other sources into this.. Expanding pre-tagging would
be one way to do it, but I've also got bits where we actually look at
the source/dest IP and classify it based on our IPAM as well (but
don't ever store the source/dest IP).. IMHO the flexibility of just
using the pmacct client to query is totally worth it.

Some of my early examples of how I did some of the parsing is here:

http://somuch.fail/~tjackson/flows_to_es/

The final document we store in Elasticsearch is:

{
  "_index": "flow-full-2015-02-10-13",
  "_type": "flowdata",
  "_id": "AUtzk0xQ3fLt3GYsjpA1",
  "_score": 0.7958426,
  "_source": {
    "inifname": "ge-5/0/0.0",
    "inifdescr": "[CDN] To XXXXXXXX Cluster",
    "@timestamp": 1423573202000,
    "inout": "Output",
    "avg_size": 79,
    "pps": 7,
    "stats": {
      "src_comms": "",
      "tcp_flags": "0",
      "bytes": 161792,
      "as_src": XXXX,
      "port_src": 36552,
      "ip_proto": "udp",
      "port_dst": 53,
      "tag2": "4",
      "iface_in": "555",
      "packets": 2048,
      "as_dst": XXXXX,
      "tos": 0,
      "iface_out": "542",
      "comms": "",
      "tag": "1486"
    },
    "region": 1000,
    "outifdescr": "Unknown",
    "router": "gw2.xxxx",
    "_timestamp": 1423573202000,
    "bps": 8902,
    "outifname": "Unknown",
    "class": "On Net CDN"
  }
}

--
Tim

On Tue, Feb 10, 2015 at 9:56 AM, Mike Bowie <[email protected]> wrote:
> Good morning folks,
>
> First of all, my sincerest thanks to those who contribute, and have
> contributed previously to pmacct. It's a superb tool for us, and has given
> us considerably greater clarity of data than the commercial tools we've
> evaluated in the marketplace. We're a NetBSD shop, and save a minor patch[2]
> to execv, it builds and runs extremely well for us.
>
> Historically, we've dumped our pmacct data into pgsql, and been moderately
> happy with the results... we grok out what we need and all is well in the
> world.
>
> Recently, we've started to look at applying Elasticsearch and Kibana to the
> equation, based currently on the excellent Python based work of Pier Carlo
> Chiodi from https://github.com/pierky/pmacct-to-elasticsearch.
>
> As we look at this in more of a production sense, I'm keen to keep our
> moving parts, and dependencies to a minimum, so am looking at the
> possibility of writing[1] a native pmacct backend to interact with
> Elasticsearch.
>
> Before I get too far down this path, I'm interested to know if:
>  - Anyone is already engaged in a similar effort
>  - There is additional expertise out there which may be available
>  - There is any interest in seeing this sort of addition developed
>
> Any feedback welcome.

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] Native Elasticsearch backend development

Reply via email to