I don't output to JSON files then import, but I use perl to basically do the same thing.. Query the pmacct IMT for how long it's been since it was last cleared, query it for data, clear it, add more data to the record(s) based on some imports and insert them into elasticsearch..
Using the IMT as a cache for data, on 5 minute averages into ElasticSearch takes very little time (~5-10 seconds for our network which is 1:2048 sampled NF v5 and a pretty large tuple of aggregates, basically everything but source/dest IP).. This was first rolled out with 1 minute data, which was a huge amount of data, but running this on 1 minute the insertion/classification perl script would take about ~4-5 seconds. In my opinion, not having the extra data that I can insert into ES makes things a lot harder, so a native client in pmacct would need the ability to do some extra stuff: Correlate in/out ifIndexes with some data (e.g. an ifIndex Map) Correlate Tags with some data (e.g. Port Type, etc) I think the idea of being able to do this natively in pmacct is great, but I don't mind the small hit at all for being able to flexibly add more data from other sources into this.. Expanding pre-tagging would be one way to do it, but I've also got bits where we actually look at the source/dest IP and classify it based on our IPAM as well (but don't ever store the source/dest IP).. IMHO the flexibility of just using the pmacct client to query is totally worth it. Some of my early examples of how I did some of the parsing is here: http://somuch.fail/~tjackson/flows_to_es/ The final document we store in Elasticsearch is: { "_index": "flow-full-2015-02-10-13", "_type": "flowdata", "_id": "AUtzk0xQ3fLt3GYsjpA1", "_score": 0.7958426, "_source": { "inifname": "ge-5/0/0.0", "inifdescr": "[CDN] To XXXXXXXX Cluster", "@timestamp": 1423573202000, "inout": "Output", "avg_size": 79, "pps": 7, "stats": { "src_comms": "", "tcp_flags": "0", "bytes": 161792, "as_src": XXXX, "port_src": 36552, "ip_proto": "udp", "port_dst": 53, "tag2": "4", "iface_in": "555", "packets": 2048, "as_dst": XXXXX, "tos": 0, "iface_out": "542", "comms": "", "tag": "1486" }, "region": 1000, "outifdescr": "Unknown", "router": "gw2.xxxx", "_timestamp": 1423573202000, "bps": 8902, "outifname": "Unknown", "class": "On Net CDN" } } -- Tim On Tue, Feb 10, 2015 at 9:56 AM, Mike Bowie <[email protected]> wrote: > Good morning folks, > > First of all, my sincerest thanks to those who contribute, and have > contributed previously to pmacct. It's a superb tool for us, and has given > us considerably greater clarity of data than the commercial tools we've > evaluated in the marketplace. We're a NetBSD shop, and save a minor patch[2] > to execv, it builds and runs extremely well for us. > > Historically, we've dumped our pmacct data into pgsql, and been moderately > happy with the results... we grok out what we need and all is well in the > world. > > Recently, we've started to look at applying Elasticsearch and Kibana to the > equation, based currently on the excellent Python based work of Pier Carlo > Chiodi from https://github.com/pierky/pmacct-to-elasticsearch. > > As we look at this in more of a production sense, I'm keen to keep our > moving parts, and dependencies to a minimum, so am looking at the > possibility of writing[1] a native pmacct backend to interact with > Elasticsearch. > > Before I get too far down this path, I'm interested to know if: > - Anyone is already engaged in a similar effort > - There is additional expertise out there which may be available > - There is any interest in seeing this sort of addition developed > > Any feedback welcome. _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
