Re: [pmacct-discussion] MySQL connection issues

Jeremy Lee Mon, 05 Oct 2009 05:25:37 -0700

Thanks for the help, Paolo. I appreciate.

> Quoting only introduction to the issue for the sake of brevity; you
> got the reason why you get nothing for minutes, then suddenly it's all
> there: buffering. Perhaps try with incremental steps if you not have
> already done that - instead of jumping from 1024 to 10240. Get the
> trade-off which better suits your scenario.


Yup, that's pretty much what my experiments have concluded. I may have
been a little hasty in blaming the mysql connector for anything other than
jumping to random, but relatively limited IP interfaces. Once I had enough
allows in my server access list, the real problem became nicely clear,
which is the eight-minute lag between when the aggregates are logically
done (the end of every minute) and when they are put in the database.

> Overall, what peak Mbps is this installation about? Any pps figure?
> What i'm trying to figure is why by using buffers of 1K you loose data.
> Any chance there is a concurrent process leaking full CPU cycles for a
> substantial amount of time which doesn't allow the daemon to cope with
> peaks of traffic?

Well, they run something like 30mbit/s, or around half a million packets a
minute. But I aggregate down to only a dozen records a minute, by a very
short list of subnets that each machine is responsible for. So, lots of
packets in, but very few rows out. But, I need the result to be as
real-time as possible.

In fact, I would love to decrease the aggregation time to every ten
seconds or so, but one minute is the lowest that the documentation says is
possible.  If I have to generate more records in order to flush the buffer
faster, I would prefer to increase the time resolution.

I've tuned plugin_buffer_size and discovered that the 'default' of 104 (I
get that when I try to set it to zero to disable it. is that right?) is
too small. Errors occur and counters get corrupted. 1024 seems to work
nicely at the moment, but I remember was too small when I was aggregating
by IP rather than subnet. I've never had any errors at 10K or above, and
I'd probably prefer to keep it there unless there's a reason to have it
low.

I've always had plugin_pipe_size significantly larger than
plugin_buffer_size. Usually megabytes in size. At least 10:1, more usually
something like 500:1, but I've made it 10,000:1 at times. Not that either
option has much affect to the time lag. They just cause errors if set too
small.

Nothing I've done has significantly changed the eight minute lag which is
even consistent across all the machines. Running 'date' on a console gives
a time eight to ten minutes ahead of the latest database record. If I
restart pmacctd, I get an eight minute gap in the data. If I restart every
five minutes (I got impatient) rows just never get into the database.

Now that I've had some quality time staring at debug logs, tt seems pretty
clear that there is an eight minute queue from the aggregator to the mysql
connector. I'd love some way to flush that queue without generating more
records.

> On the persistency of the database connection; i'm open to discussion
> and comments on this. I also see it would apply just fine to you. But
> let me say some forewords:
>
> * pmacct comes from a persistent connection implementation (many years
>   ago); this was dropped because too fragile when adopted as a general
>   purpose solution. Hence migrating to a more stateless approach. This
>   was for a mix of reasons, mainly: a) some conditions hard to detect:
>   server was shut down not properly, firewall, NAT or load-balancers in
>   the middle timing out the session or restarting, etc. b) communications
>   with the database server always passing through 3rd party APIs; this
>   easily translates in not having full control on things.

MySQL connections especially can be fragile, no argument there. Most
database connections are, because database servers love to reset to known
state as soon as anything goes slightly wrong. But that 'fail fast'
attitude just takes a slightly different approach.

I like your pool of connection managers, but as well as being available to
cope with high loads, I think a couple of them (set by a config option)
should remain connected instead of all shutting down when idle. Keep the
connection alive by regularly executing a query to check the server
connection timeout, say. And if they loose connection, try to reconnect.

PHP uses a persistent connection pool to excellent effect. It works
really, really well, in amazingly hostile environments.

Basically, you treat the last existing connection as a valuable resource.
Not just for the setup and teardown costs, but because if you release a
connection there is no guarantee you can get it back. The MySQL server has
limited connection slots, and pmacctd may be on a machine with limited
TCP/IP sockets when traffic gets heavy.

> * Adding a clean option in this sense might require quite some work to
>   make it generally applicable, ie. not speaking about a quick fix but
>   something which has to be ported (and tested working fine) across the
>   multiple database software supported by pmacct.

I expect the same thing would happen to pmacctd daemons that connect to a
remote server of any kind. Whether it be postgres, or even netflow. If
sockets become limited at either end during peaks, you want to maintain
one persistent connection or be able to survive long disconnections.


-- 
Jeremy Lee BCompSci (Hons)
 The Unorthodox Engineers
  www.unorthodox.com.au


_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] MySQL connection issues

Reply via email to