Thanks for the help, Paolo. I appreciate. > Quoting only introduction to the issue for the sake of brevity; you > got the reason why you get nothing for minutes, then suddenly it's all > there: buffering. Perhaps try with incremental steps if you not have > already done that - instead of jumping from 1024 to 10240. Get the > trade-off which better suits your scenario.
Yup, that's pretty much what my experiments have concluded. I may have been a little hasty in blaming the mysql connector for anything other than jumping to random, but relatively limited IP interfaces. Once I had enough allows in my server access list, the real problem became nicely clear, which is the eight-minute lag between when the aggregates are logically done (the end of every minute) and when they are put in the database. > Overall, what peak Mbps is this installation about? Any pps figure? > What i'm trying to figure is why by using buffers of 1K you loose data. > Any chance there is a concurrent process leaking full CPU cycles for a > substantial amount of time which doesn't allow the daemon to cope with > peaks of traffic? Well, they run something like 30mbit/s, or around half a million packets a minute. But I aggregate down to only a dozen records a minute, by a very short list of subnets that each machine is responsible for. So, lots of packets in, but very few rows out. But, I need the result to be as real-time as possible. In fact, I would love to decrease the aggregation time to every ten seconds or so, but one minute is the lowest that the documentation says is possible. If I have to generate more records in order to flush the buffer faster, I would prefer to increase the time resolution. I've tuned plugin_buffer_size and discovered that the 'default' of 104 (I get that when I try to set it to zero to disable it. is that right?) is too small. Errors occur and counters get corrupted. 1024 seems to work nicely at the moment, but I remember was too small when I was aggregating by IP rather than subnet. I've never had any errors at 10K or above, and I'd probably prefer to keep it there unless there's a reason to have it low. I've always had plugin_pipe_size significantly larger than plugin_buffer_size. Usually megabytes in size. At least 10:1, more usually something like 500:1, but I've made it 10,000:1 at times. Not that either option has much affect to the time lag. They just cause errors if set too small. Nothing I've done has significantly changed the eight minute lag which is even consistent across all the machines. Running 'date' on a console gives a time eight to ten minutes ahead of the latest database record. If I restart pmacctd, I get an eight minute gap in the data. If I restart every five minutes (I got impatient) rows just never get into the database. Now that I've had some quality time staring at debug logs, tt seems pretty clear that there is an eight minute queue from the aggregator to the mysql connector. I'd love some way to flush that queue without generating more records. > On the persistency of the database connection; i'm open to discussion > and comments on this. I also see it would apply just fine to you. But > let me say some forewords: > > * pmacct comes from a persistent connection implementation (many years > ago); this was dropped because too fragile when adopted as a general > purpose solution. Hence migrating to a more stateless approach. This > was for a mix of reasons, mainly: a) some conditions hard to detect: > server was shut down not properly, firewall, NAT or load-balancers in > the middle timing out the session or restarting, etc. b) communications > with the database server always passing through 3rd party APIs; this > easily translates in not having full control on things. MySQL connections especially can be fragile, no argument there. Most database connections are, because database servers love to reset to known state as soon as anything goes slightly wrong. But that 'fail fast' attitude just takes a slightly different approach. I like your pool of connection managers, but as well as being available to cope with high loads, I think a couple of them (set by a config option) should remain connected instead of all shutting down when idle. Keep the connection alive by regularly executing a query to check the server connection timeout, say. And if they loose connection, try to reconnect. PHP uses a persistent connection pool to excellent effect. It works really, really well, in amazingly hostile environments. Basically, you treat the last existing connection as a valuable resource. Not just for the setup and teardown costs, but because if you release a connection there is no guarantee you can get it back. The MySQL server has limited connection slots, and pmacctd may be on a machine with limited TCP/IP sockets when traffic gets heavy. > * Adding a clean option in this sense might require quite some work to > make it generally applicable, ie. not speaking about a quick fix but > something which has to be ported (and tested working fine) across the > multiple database software supported by pmacct. I expect the same thing would happen to pmacctd daemons that connect to a remote server of any kind. Whether it be postgres, or even netflow. If sockets become limited at either end during peaks, you want to maintain one persistent connection or be able to survive long disconnections. -- Jeremy Lee BCompSci (Hons) The Unorthodox Engineers www.unorthodox.com.au _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
