Hello Chris and others,

  Some prior thoughts and then some comments.

  First of all, we are working with Paolo on releasing a threaded version of 
pmaact. This new release wont help too much on single processors but in this 
era of dual cores and ht we expect to gain some performance. We agree with 
Paolo taks should be threaded unless you want to greatelly penalize performance 
of traffic capturing. This version will be made public very soon so you guys 
can test it and give more ideas.

> Absolutely, I agree that there is an upper limit to the rate that you
> can 
> insert into any database.

  A direct flow to sql translation is a dead end no matter threads or no 
threads, the database wont be able to support this. IMHO, the SQL plugin should 
summarize data some way and do this on a periodic fashion. In our case, with 
our flow tools based solution we do this every 5 min. We aggregate as much as 
we can and we loose detail under control. The same (or similar) ideas should be 
used in pmacct and this is something we are working with Paolo to include. If 
your start to reduce the time between each computing you reduce the power of 
summaries and well, not easy.

  So for real time stuff we might need to think in other terms, I dont think 
SQL can cope with it (many times even RAM cant cope with it :) Maybe the 
solution could be some "in memory" database, but either way, wont be easy.

  At the same time, Paolo has implemented some great features from ATT papers. 
I believe they are a MUST in any production environment. We specially like the 
concept of holding a table in size, and increase it based on server resources, 
but have it always controlled. If data arrives faster, either way you summarize 
it (loose detail) or you loose it (loose all data). For example, in our case we 
just enter directly 2100 entries in each 5 min period for a total size of 300m 
entries in a day. If you receive more, then the smaller ones are grouped to 
reduce the number. Of course, if traffic is very hourly dependant you can 
increase this limit. I think Paolo called it "Sampling under constrains".

> What about the OS queues for packets? Were they not effective? Was
> pmacct 
> doing a database write for each packet? (I can see how that would kill
> it 
> very quickly).

  You mean single theaded on multithreaded?

> I would be much happier if there was a single thread that would flush
> data 
> to the database at (up to) the maximum rate that the database could 
> support. I don't think there is any benefit to multiple threads with 
> MySQL; you will not actually be able to insert any more rows into the
> same 
> table that way, at least with MyISAM tables.
> 
> A single thread doesn't seem like it would be too hard to implement, 
> perhaps as a configurable option. The thread would just start when
> pmacctd 
> starts, sleep until the refresh time expires, and then flush all dirty
> 
> records from memory to the database, then sleep again if necessary,
> i.e. 
> unless the inserts took more than the sql_refresh_time.

  This is not a bad idea, actually we do it like this (well, we have 4 threads 
concurrently wornking on computing each probe). Another alternative would be a 
thread to compute data and summarize it and a new one to store it into the 
database. Of course, data capturing and classification should be considered a 
different thread, I'm just relating to the SQL plugin.

> This way, we could get the maximum performance from the database (at
> least 
> MySQL) without interfering with packet capture at all. If the database
> 
> can't keep up with the sql_refresh_time, you simply get fewer 
> updates/inserts, no data loss except temporal resolution.

  Mmmmm, I dont think you wont loose data. If you can only hold a given amount 
of it in memory and SQL plugin cant cope with its rate, you will fill the space 
and start to loose information. Our approach is to reduce the volume of entries 
from raw data to sql entries.

> What do you think about that idea?

  As stated, we are working with Paolo on this and other stuff, but surelly 
your ideas are much appreciated.


--------------------------------------------
Jaime Nebrera - [EMAIL PROTECTED]
Consultor TI - ENEO Tecnologia SL
Pol. PISA - C/ Manufactura 6, P1, 3B
Mairena del Aljarafe - 41927 - Sevilla
Telf.- (+34) 955 60 11 60 / 619 04 55 18

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to