Hi Sven,

On Tue, 7 Nov 2006, Sven Anderson wrote:

> He gave the same talk on the Linux Symposium 2005, you can find the paper
> in the proceedings:
>
> http://www.linuxsymposium.org/2005/linuxsymposium_procv2.pdf

Great, thanks for that.

> - First, he clearly pointed out, that flow accounting in the conntrack 
> module makes sense _only_ if you use conntrack anyway (like firewall, 
> NAT, ...). To use conntrack just for flow accounting would be just 
> "overkill", he wrote.

Yes, and in our case we will be doing that anyway, because we want to 
traffic shape flows.

> - Second, you are strictly bound to the classical flow keys which are 
> kept in the conntrack table anyway, that is source and destination IP 
> and port.
>
> So the usage of the flow-accounting module in conntrack is quite
> restricted, but as long as these restrictions don't bother, it's a good
> alternative of course. (At the moment pmacct also only has a fixed flow
> data structure, but with the propagation of IPFIX I hope we will move to a
> more flexible structure.)

But this is also how Netflow works, isn't it? The Cisco router has some 
idea about flows that isn't changeable externally, and it will send you 
updates about their state whenever it feels like it. I think that the 
kernel sending you information about its understanding of flows (which 
ctacctd would be free to reinterpret and aggregate) would work similarly.

> But for traffic-shaping based on application level analysis you have a
> problem already: You can classify packets, but you cannot store that
> information in the conntrack table as a flow key (AFAIK).

You can store it using connmark. I have to find a way to export that data 
to user space, but it shouldn't be hard once nfnetlink_conntrack exists.

> Of course you could store that information in another place and map it 
> to the flows in the conntrack table, but then the - let's call it - 
> "L7ClassID" is not a real flow key, since it it possible that one flow 
> (in the conntrack table) has several different L7ClassIDs over time, 
> splitting it in different flows in fact.

I don't mind that in practice. I could ignore the classification from the 
point of view of distinguishing flows. Also, I thought that pmacct had the 
ability to reclassify existing flows?

> In general you have to ask yourself the question, if having both routing
> and monitoring on the same machine is a good idea. You will probably
> always end up in a situation, where both functionalities interfere with
> each other. That's why I think, having a dedicated metering-probe is in
> most cases the better choice. And then, as the machine is not doing
> anything else with the monitored packets, handling everything in
> user-space is the better approach. Under Linux you can even optimize the
> network-adapter->user-space transition with PF_RING by Luca Deri. Of
> course, you cannot use this set-up if you want to do traffic shaping or
> similar based on the monitoring.

Yes, that is exactly what I want to do. I want to shape bittorrent, 
gnutella and skype traffic without having to know what port it's running 
on.

>> I'm still concerned about the performance of the MySQL plugin with 
>> threading, so I'm considering providing an option to disable the extra 
>> threads, and run updates synchronously.
>
> Interesting. What about having also a switch to have "numbers-only" 
> tables, that is IP addresses, timestamps, class_id, mac addresses and 
> protocol are all stored as integers?

I don't see how that would help. It's basically just changing the constant 
multiplier cost. The problem I'm having is that when the database or the 
box is busy, pmacct starts spawning more and more threads that end up 
sleeping on the database. This eats resources and can lead to catastrophic 
failure (it has done it to me at least once). I would rather delay writing 
to the database by having it done synchronously, to limit the damage that 
it can do to the rest of the box.

> While on the subject of changing everything: what about a different 
> timestamp set? I would prefer three timestamps: one for the first and 
> the last packet in a flow, and one for the time the flow got "closed" 
> (or updated the last time) which would correspond to the "time-slot" the 
> flow belongs to. The third one is probably not really necessary, as you 
> can calculate it from the other timestamps and the configuration, but it 
> would give you a good index-key for the time-slots.

Sorry, I don't understand what you mean by a time slot? For me, the 
relevant information is the start and end times of the flow, which I can 
use to draw graphs, etc.

Ideally, I would like more detailed information about the flow at various 
points during its life (e.g. status every minute) and I'm not sure if I 
can get that using pmacctd, or how. I'm still working on it.

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (http://www.aidworld.org)

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to