Hi Paolo,

Thanks for getting back with your answers. The current behaviour makes 
more sense to me now.

On Thu, 16 Nov 2006, Paolo Lucente wrote:

> In the very early days, pmacct had such a solution: just a main SQL 
> process that was doing everything, receiving packets from the Core 
> Process, getting in touch with the database and some extra stuff. The 
> overral result was poor.
>
> In first instance, we should agree on the fact that the rate at which 
> data are captured from the network greatly outperforms the rate at which 
> even the best RDBMS engine is able to handle them (and this is why i 
> insist on a fact that should be common understanding: the less you 
> aggregate, the less you can insert in your database; trepassing a 
> certain limit, you simply start loosing data). Indeed, this doesn't 
> apply to our home (DSL/Cable) connections.

Absolutely, I agree that there is an upper limit to the rate that you can 
insert into any database.

> Now, back to our syncronous approach: what was killing it was the 
> excessive slowness the database and the concurrent arrival of packets at 
> very high rates.

What about the OS queues for packets? Were they not effective? Was pmacct 
doing a database write for each packet? (I can see how that would kill it 
very quickly).

> Keeping the two entities (network and DB) asyncronous, segregated, gave 
> a big relief and better performances (under normal conditions). Things 
> that were requiring to be written down at the time were: a way to 
> establish a kind of normal database behaviour in order to promptly react 
> to excessive slowness, how the process can understand when to give up 
> (pretty variable) and, of course, how to react, ie. what to do if data 
> continue to accumulate.

By "requiring to be written down," do you mean that they are 
undefined behaviours or unsolved problems?

> An idea could be to let pmacct become polite by imposing a (configurable)
> maximum number of concurrent writers to the database (set by default to a
> widely accepted number, say, 10) - and stop relying on the system for this.
> Any extra writer will be dropped and data will be lost. We might also think
> to a second value, ie. when to start accepting new writers again (say, 2-3),
> which in some cases could allow for a quicker recovery.

I would be much happier if there was a single thread that would flush data 
to the database at (up to) the maximum rate that the database could 
support. I don't think there is any benefit to multiple threads with 
MySQL; you will not actually be able to insert any more rows into the same 
table that way, at least with MyISAM tables.

A single thread doesn't seem like it would be too hard to implement, 
perhaps as a configurable option. The thread would just start when pmacctd 
starts, sleep until the refresh time expires, and then flush all dirty 
records from memory to the database, then sleep again if necessary, i.e. 
unless the inserts took more than the sql_refresh_time.

This way, we could get the maximum performance from the database (at least 
MySQL) without interfering with packet capture at all. If the database 
can't keep up with the sql_refresh_time, you simply get fewer 
updates/inserts, no data loss except temporal resolution.

What do you think about that idea?

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (http://www.aidworld.org)

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to