Hi Chris and all,

Chris Wilson, 08.11.2006 08:00:
>> - First, he clearly pointed out, that flow accounting in the conntrack 
>> module makes sense _only_ if you use conntrack anyway (like firewall, 
>> NAT, ...). To use conntrack just for flow accounting would be just 
>> "overkill", he wrote.
> 
> Yes, and in our case we will be doing that anyway, because we want to 
> traffic shape flows.

Ok, so you need conntrack also for traffic shaping, didn't know that. I
thought it's independent.

> But this is also how Netflow works, isn't it? The Cisco router has some 
> idea about flows that isn't changeable externally, and it will send you 
> updates about their state whenever it feels like it. I think that the 
> kernel sending you information about its understanding of flows (which 
> ctacctd would be free to reinterpret and aggregate) would work similarly.

Since Netflow v9 also supports templates, I assumed that on the newer
Cisco machines it is possible to define, what keys the flow is consisting
of. However, if the fixed table serves your needs, why not.

> I don't mind that in practice. I could ignore the classification from the 
> point of view of distinguishing flows. Also, I thought that pmacct had the 
> ability to reclassify existing flows?

What means "reclassify"? If your ClassID is really a flow key, then a
different ClassID means it's a different flow, so there is no
"reclassify". ;-)
Maybe what you call flow is what I would call session? (TCP connection for
example?)

BTW.: That is really confusing. There is a "flow" column in the pmacct
tables, which in my terminology would have to be always "1", as one row in
the table always corresponds to exactly one flow. I assume the number
there is in fact the number of Transport-Layer sessions during that flow,
like TCP connections or UDP timeouts, right Paolo?

> Yes, that is exactly what I want to do. I want to shape bittorrent, 
> gnutella and skype traffic without having to know what port it's running 
> on.

Think about if it is really worth the trouble. I just read of the
"protocol obfuscation" in edonkey. I guess soon you wont be able to
identify it anymore.

>>> I'm still concerned about the performance of the MySQL plugin with 
>>> threading, so I'm considering providing an option to disable the extra 
>>> threads, and run updates synchronously.
>> Interesting. What about having also a switch to have "numbers-only" 
>> tables, that is IP addresses, timestamps, class_id, mac addresses and 
>> protocol are all stored as integers?
> 
> I don't see how that would help. It's basically just changing the constant 
> multiplier cost. The problem I'm having is that when the database or the 
> box is busy, pmacct starts spawning more and more threads that end up 
> sleeping on the database. This eats resources and can lead to catastrophic 
> failure (it has done it to me at least once). I would rather delay writing 
> to the database by having it done synchronously, to limit the damage that 
> it can do to the rest of the box.

Sorry, this is a misunderstanding. I didn't propose this as a solution for
that problem. I just don't like the strings in the MySQL scheme, and
hoped, when you are working on the MySQL plugin anyway, you could also
include this for me. ;-) But nevertheless, it could still shift the point
at which this problem occurs.

>> While on the subject of changing everything: what about a different 
>> timestamp set? I would prefer three timestamps: one for the first and 
>> the last packet in a flow, and one for the time the flow got "closed" 
>> (or updated the last time) which would correspond to the "time-slot" the 
>> flow belongs to. The third one is probably not really necessary, as you 
>> can calculate it from the other timestamps and the configuration, but it 
>> would give you a good index-key for the time-slots.
> 
> Sorry, I don't understand what you mean by a time slot? For me, the 
> relevant information is the start and end times of the flow, which I can 
> use to draw graphs, etc.

The question is, what do you define as start and end times of a flow? The
time of the first and last packet of that flow or the start and end time
of observation for that flow? With "time slot" I'm talking about the
latter. I guess it's quite common to have certain points in time, which
you don't want to be crossed by a flow, like every full hour or every 5
minutes or so. Then you have a clean cut, knowing that no flow starts
befor and ends after that point of time. These segments I called
time-slot. Example for sql_history: 5m:

Time  8:00                                                        8:05
       |-----------------------------------------------------------|
Flow                                       |-----------|
                                          8:03        8:04

At the moment with pmacct I lose the information that the packets of the
flow just have been observed between 8:03 and 8:04, in the table I only
find the 8:00 and 8:05. In fact it would be enough to know the 8:03 and
8:04. But for a nice indexing of all flows in one time-slot it is maybe
useful to also store the 8:00 or 8:05 with the flow. - Wait, somebody
mentioned on this list before you can also create an index based on
date_trunc(), so you even don't need it for that purpose. So I suggest
just first and last packet as time stamps.

BTW.: I think it's really a mistake to use local time as timestamps. Why
not using seconds since 1970/1/1 0:00 UTC? This is standard and unambiguous.

> Ideally, I would like more detailed information about the flow at various 
> points during its life (e.g. status every minute) and I'm not sure if I 
> can get that using pmacctd, or how. I'm still working on it.

If you use short time slots, you have exactly this, as long lasting flows
will be chopped into pieces. (sql_refresh_time: 60, sql_history: 1m, I'm
using this too.)


Cheers,

Sven

-- 
Sven Anderson
Institute for Informatics - http://www.ifi.informatik.uni-goettingen.de
Georg-August-Universitaet Goettingen
Lotzestr. 16-18, 37083 Goettingen, Germany


_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to