Hi Sven and the rest, > > A direct flow to sql translation is a dead end no matter threads or > no > > threads, the database wont be able to support this. > > this conclusion is a bit too fast for me. What is a "direct flow to > sql > translation"? In my terminology, whatever is written to the database > in > the end is a flow by definition. Of course, you can aggregate several > flows to a new flow (for example by reducing their dimensions or time > resolution), but it stays a flow in the end.
OK, I will try to explain this better. IMHO just inserting flows in the database as they arrive will just kill performance. We found some OSS projects doing this and were not nice. So we decided to compute data every X minutes (in our case 5 min) and try to reduce the number of rows in it. As you say, clearly the level of reduction is very much affected by the kind of traffic you are trying to monitor. If in essence every flow is quite unique, then you cant reduce it too much. Besides the things I will discuss bellow, one of the things we did was to remove or aggregate the "unprivileged port" information in the flows, so a typicall browser opening multiple threads becomes just one (yes, multiple instances arrive but they are then translated to 1 unpriviliged -> 80) > Mhh, ok, you are probably talking about reducing time resolution. > This > works great if you have mainly persistent flows. But if you have high > flow > fluctuation (in the extreme case every flow is unique), you don't > reduce > the data by reducing the time resolution. Reality is somewhere between > and > depends on the type of traffic of course. But I remember, that I was > disappointed, when I had the idea to reduce data that way. I think it > was > like going from 5 min intervals to 1 hour intervals just reduced the > number of flows by just the half. Not exciting. This is one of the things we do. More above and bellow. > What works a lot better in general is removing the small flows. You > can > remove about 95% of the flows by aggregating only 5% of the > small-flow > traffic to one single flow (by my own observation). What you loose is > the > detailed information about the "noise", but for the normal network > engineering this is probably not as important. Well, this is preciselly what we do. This is called Pareto :) From dta we have collected at clients sites, we confirm 95% of the volume of the traffic is condensed in ~5% of the flows (in our case already processed a bit) We configure a fixed number of entries that can enter the database as they are, the bigger X flows (again, prior computing is doe using flow stat I think is the command). For example for a typical central office (around 50Mbps) we can get around 40.000 entries, from here we just allow 3000 as they are and the rest are "procesed" The remaining entries (remember they are already not direct flows) are then further reduced: Internet IP is converted to 0.0.0.0 and aggregated. If the level of entries is still too big, we take the internal IP and transform it in "its network" to reduce it even further. This way you loose detail on the precise IP but still application and network information is valid and 100% precise. Of course, Paolo showed us some papers from ATT and you just sent us this paper. They are provably much more "mathematically correct" but ours is not working bad at this time. > Two guys from the UCSD developed an algorithm with adaptive filters > to > identify the big flows in real time: > http://www.cs.ucsd.edu/~cestan/papers/estan-elephantsandmice.pdf OK, we will give this a look. Thanks for the reference. > But no matter what method you use to reduce the data, it should > certainly > not happen in the SQL plugin. Data reduction can also be useful for > other > exports, like Netflow. I think I said that. One thread to "receive data", maybe another to compute the information, surelly a different one to use any kind of classification stuff (I mean pattern matching and such). From here a different process could read data and summarize it and a different thread store or send it (be it in RAM, NetFlow, SQL, ...) So I agree with you those "reduction" techniques are not only valid for SQL but in general, but also think in the probe the standard "sampling techniques" might suffice (I mean those already available in pmaccet based on ATT docs). We are really looking forward porting our solution to pmacct so we can really contribute to this great project. We specially like the fact is gaining quite a bit of momentum and seems Paolo is going to get quite a bit of help and ideas :) Regards -------------------------------------------- Jaime Nebrera - [EMAIL PROTECTED] Consultor TI - ENEO Tecnologia SL Pol. PISA - C/ Manufactura 6, P1, 3B Mairena del Aljarafe - 41927 - Sevilla Telf.- (+34) 955 60 11 60 / 619 04 55 18 _______________________________________________ pmacct-discussion mailing list http://www.pmacct.net/#mailinglists
