Re: [pmacct-discussion] Classification

Sven Anderson Thu, 16 Nov 2006 09:46:03 -0800

Hi Jaime and all,

Jaime Nebrera, 16.11.2006 12:39:
> A direct flow to sql translation is a dead end no matter threads or no
>  threads, the database wont be able to support this.


this conclusion is a bit too fast for me. What is a "direct flow to sql
translation"? In my terminology, whatever is written to the database in
the end is a flow by definition. Of course, you can aggregate several
flows to a new flow (for example by reducing their dimensions or time
resolution), but it stays a flow in the end.

> IMHO, the SQL plugin should summarize data some way and do this on a 
> periodic fashion.

What means "summarize" here? Collect the flows, to write them all out in a
single DB access, or really summarize like aggregate in the time scale,
that is reducing the time resolution?

> In our case, with our flow tools based solution we do this every 5 min.
>  We aggregate as much as we can and we loose detail under control. The 
> same (or similar) ideas should be used in pmacct and this is something 
> we are working with Paolo to include. If your start to reduce the time 
> between each computing you reduce the power of summaries and well, not 
> easy.

Mhh, ok, you are probably talking about reducing time resolution. This
works great if you have mainly persistent flows. But if you have high flow
fluctuation (in the extreme case every flow is unique), you don't reduce
the data by reducing the time resolution. Reality is somewhere between and
depends on the type of traffic of course. But I remember, that I was
disappointed, when I had the idea to reduce data that way. I think it was
like going from 5 min intervals to 1 hour intervals just reduced the
number of flows by just the half. Not exciting.

What works a lot better in general is removing the small flows. You can
remove about 95% of the flows by aggregating only 5% of the small-flow
traffic to one single flow (by my own observation). What you loose is the
detailed information about the "noise", but for the normal network
engineering this is probably not as important.

Two guys from the UCSD developed an algorithm with adaptive filters to
identify the big flows in real time:
http://www.cs.ucsd.edu/~cestan/papers/estan-elephantsandmice.pdf

But no matter what method you use to reduce the data, it should certainly
not happen in the SQL plugin. Data reduction can also be useful for other
exports, like Netflow.


Cheers,

Sven

-- 
Sven Anderson
Institute for Informatics - http://www.ifi.informatik.uni-goettingen.de
Georg-August-Universitaet Goettingen
Lotzestr. 16-18, 37083 Goettingen, Germany

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] Classification

Reply via email to