Re: [pmacct-discussion] Classification

Chris Wilson Sun, 19 Nov 2006 09:01:11 -0800

Hi Sven and all,

On Fri, 17 Nov 2006, Sven Anderson wrote:


>>> - First, he clearly pointed out, that flow accounting in the conntrack
>>> module makes sense _only_ if you use conntrack anyway (like firewall,
>>> NAT, ...). To use conntrack just for flow accounting would be just
>>> "overkill", he wrote.
>>
>> Yes, and in our case we will be doing that anyway, because we want to
>> traffic shape flows.
>
> Ok, so you need conntrack also for traffic shaping, didn't know that. I
> thought it's independent.

You can do traffic shaping without connection tracking, if you only want 
to base traffic shaping decisions on the layer 3 properties of packets 
that the traffic shaping system understands. However, by integrating it 
with connection tracking using the iptables CLASSIFY target, you can make 
much more advanced decisions on the basis of almost anything, including 
interaction with user space, and layer 7 classification with l7filter.

> Since Netflow v9 also supports templates, I assumed that on the newer 
> Cisco machines it is possible to define, what keys the flow is 
> consisting of. However, if the fixed table serves your needs, why not.

The standard definition of a flow meets my needs, but I want more 
information about that flow before I can make an accurate traffic shaping 
decision.

>> I thought that pmacct had the ability to reclassify existing flows?
>
> What means "reclassify"? If your ClassID is really a flow key, then a 
> different ClassID means it's a different flow, so there is no 
> "reclassify". ;-) Maybe what you call flow is what I would call session? 
> (TCP connection for example?)

I think I read somewhere on this list that the class field (not sure what 
it's called in the database) can be changed while the flow is in progress, 
so I guess that would mean that it's not part of the flow key, and indeed 
I don't see why it should be. My understanding is that it makes most sense 
to identify flows first, and classify them second.

>> I want to shape bittorrent, gnutella and skype traffic without having 
>> to know what port it's running on.
>
> Think about if it is really worth the trouble. I just read of the 
> "protocol obfuscation" in edonkey. I guess soon you wont be able to 
> identify it anymore.

Perhaps not, but then I can consider blocking unidentified gflows entirely 
:-) Or else forcing users to use an application-level gateway that blocks 
such obfuscated protocols. They would then be forced to use a protocol 
that we understand and know how to deal with.

> Sorry, this is a misunderstanding. I didn't propose this as a solution 
> for that problem. I just don't like the strings in the MySQL scheme, and 
> hoped, when you are working on the MySQL plugin anyway, you could also 
> include this for me. ;-)

Well, personally I do quite like the strings, and I don't intend to change 
the ability to use them until MySQL has a native IP address column type 
(which would be very nice, if it ever happened). But I might find time to 
look into it if it's really important to you.

> But nevertheless, it could still shift the point at which this problem 
> occurs.

It could, but I don't think it will actually save much processing power in 
writing to the database.

>>> While on the subject of changing everything: what about a different
>>> timestamp set? I would prefer three timestamps: one for the first and
>>> the last packet in a flow, and one for the time the flow got "closed"
>>> (or updated the last time) which would correspond to the "time-slot" the
>>> flow belongs to.

Thanks for the explanation. I quite like the ability to group by 
stamp_inserted to aggregate flows, so I wouldn't drop the stamp_inserted 
field. stamp_updated is kind of less useful to me, especially if you have 
the first_seen and last_seen fields, but for a flow that sees constant 
traffic during the life of a single flow record, stamp_first_seen ~= 
stamp_inserted, and stamp_last_seen ~= stamp_updated.

> I guess it's quite common to have certain points in time, which
> you don't want to be crossed by a flow, like every full hour or every 5
> minutes or so. Then you have a clean cut, knowing that no flow starts
> befor and ends after that point of time. These segments I called
> time-slot.

Do you mean "flow" or "flow record"? My understanding is that a flow lives 
as long as the TCP connection or logical equivalent, but we create a new 
flow record every sql_history period in order to record the progress of 
that flow over time.

> At the moment with pmacct I lose the information that the packets of the
> flow just have been observed between 8:03 and 8:04, in the table I only
> find the 8:00 and 8:05. In fact it would be enough to know the 8:03 and
> 8:04. But for a nice indexing of all flows in one time-slot it is maybe
> useful to also store the 8:00 or 8:05 with the flow.

That might be a nice half-way house between sql_history = 5m and 
sql_history = 1m.

> - Wait, somebody mentioned on this list before you can also create an 
> index based on date_trunc(), so you even don't need it for that purpose. 
> So I suggest just first and last packet as time stamps.

Do you mean the first 'n' characters of the date field? That wouldn't be 
able to represent e.g. a 5-minute roundoff.

> BTW.: I think it's really a mistake to use local time as timestamps. Why 
> not using seconds since 1970/1/1 0:00 UTC? This is standard and 
> unambiguous.

I agree that this should be changed.

Cheers, Chris.
-- 
(aidworld) chris wilson | chief engineer (http://www.aidworld.org)

_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Re: [pmacct-discussion] Classification

Reply via email to