Re: [pmacct-discussion] MySQL connection issues

2009-10-05 Thread Paolo Lucente
Hi Jeremy,

On Mon, Oct 05, 2009 at 01:14:46PM +1000, Jeremy Lee wrote:

 I'm now discovering that there's often a huge time lag before the data
 goes into the database, ranging from a few minutes to up to an hour. I've
 got debug going, 20 seconds between SQL refreshes, and I get several:

Quoting only introduction to the issue for the sake of brevity; you
got the reason why you get nothing for minutes, then suddenly it's all
there: buffering. Perhaps try with incremental steps if you not have
already done that - instead of jumping from 1024 to 10240. Get the
trade-off which better suits your scenario. 

Overall, what peak Mbps is this installation about? Any pps figure?
What i'm trying to figure is why by using buffers of 1K you loose data.
Any chance there is a concurrent process leaking full CPU cycles for a
substantial amount of time which doesn't allow the daemon to cope with
peaks of traffic?

On the persistency of the database connection; i'm open to discussion
and comments on this. I also see it would apply just fine to you. But
let me say some forewords:

* pmacct comes from a persistent connection implementation (many years
  ago); this was dropped because too fragile when adopted as a general
  purpose solution. Hence migrating to a more stateless approach. This
  was for a mix of reasons, mainly: a) some conditions hard to detect:
  server was shut down not properly, firewall, NAT or load-balancers in
  the middle timing out the session or restarting, etc. b) communications
  with the database server always passing through 3rd party APIs; this
  easily translates in not having full control on things.

* Adding a clean option in this sense might require quite some work to
  make it generally applicable, ie. not speaking about a quick fix but
  something which has to be ported (and tested working fine) across the
  multiple database software supported by pmacct.

Cheers,
Paolo


___
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists


Re: [pmacct-discussion] MySQL connection issues

2009-10-05 Thread Jeremy Lee
Thanks for the help, Paolo. I appreciate.

 Quoting only introduction to the issue for the sake of brevity; you
 got the reason why you get nothing for minutes, then suddenly it's all
 there: buffering. Perhaps try with incremental steps if you not have
 already done that - instead of jumping from 1024 to 10240. Get the
 trade-off which better suits your scenario.

Yup, that's pretty much what my experiments have concluded. I may have
been a little hasty in blaming the mysql connector for anything other than
jumping to random, but relatively limited IP interfaces. Once I had enough
allows in my server access list, the real problem became nicely clear,
which is the eight-minute lag between when the aggregates are logically
done (the end of every minute) and when they are put in the database.

 Overall, what peak Mbps is this installation about? Any pps figure?
 What i'm trying to figure is why by using buffers of 1K you loose data.
 Any chance there is a concurrent process leaking full CPU cycles for a
 substantial amount of time which doesn't allow the daemon to cope with
 peaks of traffic?

Well, they run something like 30mbit/s, or around half a million packets a
minute. But I aggregate down to only a dozen records a minute, by a very
short list of subnets that each machine is responsible for. So, lots of
packets in, but very few rows out. But, I need the result to be as
real-time as possible.

In fact, I would love to decrease the aggregation time to every ten
seconds or so, but one minute is the lowest that the documentation says is
possible.  If I have to generate more records in order to flush the buffer
faster, I would prefer to increase the time resolution.

I've tuned plugin_buffer_size and discovered that the 'default' of 104 (I
get that when I try to set it to zero to disable it. is that right?) is
too small. Errors occur and counters get corrupted. 1024 seems to work
nicely at the moment, but I remember was too small when I was aggregating
by IP rather than subnet. I've never had any errors at 10K or above, and
I'd probably prefer to keep it there unless there's a reason to have it
low.

I've always had plugin_pipe_size significantly larger than
plugin_buffer_size. Usually megabytes in size. At least 10:1, more usually
something like 500:1, but I've made it 10,000:1 at times. Not that either
option has much affect to the time lag. They just cause errors if set too
small.

Nothing I've done has significantly changed the eight minute lag which is
even consistent across all the machines. Running 'date' on a console gives
a time eight to ten minutes ahead of the latest database record. If I
restart pmacctd, I get an eight minute gap in the data. If I restart every
five minutes (I got impatient) rows just never get into the database.

Now that I've had some quality time staring at debug logs, tt seems pretty
clear that there is an eight minute queue from the aggregator to the mysql
connector. I'd love some way to flush that queue without generating more
records.

 On the persistency of the database connection; i'm open to discussion
 and comments on this. I also see it would apply just fine to you. But
 let me say some forewords:

 * pmacct comes from a persistent connection implementation (many years
   ago); this was dropped because too fragile when adopted as a general
   purpose solution. Hence migrating to a more stateless approach. This
   was for a mix of reasons, mainly: a) some conditions hard to detect:
   server was shut down not properly, firewall, NAT or load-balancers in
   the middle timing out the session or restarting, etc. b) communications
   with the database server always passing through 3rd party APIs; this
   easily translates in not having full control on things.

MySQL connections especially can be fragile, no argument there. Most
database connections are, because database servers love to reset to known
state as soon as anything goes slightly wrong. But that 'fail fast'
attitude just takes a slightly different approach.

I like your pool of connection managers, but as well as being available to
cope with high loads, I think a couple of them (set by a config option)
should remain connected instead of all shutting down when idle. Keep the
connection alive by regularly executing a query to check the server
connection timeout, say. And if they loose connection, try to reconnect.

PHP uses a persistent connection pool to excellent effect. It works
really, really well, in amazingly hostile environments.

Basically, you treat the last existing connection as a valuable resource.
Not just for the setup and teardown costs, but because if you release a
connection there is no guarantee you can get it back. The MySQL server has
limited connection slots, and pmacctd may be on a machine with limited
TCP/IP sockets when traffic gets heavy.

 * Adding a clean option in this sense might require quite some work to
   make it generally applicable, ie. not speaking about a quick fix but