Hi Oliver,

> I've done a bit more looking around and found the following
> interesting bits
> of information. From http://www.sflow.org/about/index.php :
> 
> "Usage accounting for billing and charge-back"
> 
> which seems to suggest 100% accurate representation of bandwidth
> consumption, if used for billing. But then in
> http://www.sflow.org/sflow_version_5.txt :
> 
> "Packet Flow Sampling: Packet Flow Sampling refers to the random
> selection of a fraction of the Packet Flows observed at a Data
> Source."
> 
> which suggests exactly what you are saying - it only samples a
> fraction of
> the actual traffic. But then later in the same page:

  The same time you decide to sample you loose precision, no matter what. Also, 
you will loose detail when you apply aggregation. The point is how much 
precision do you bother to loose. Dumb samplig just consider all packets the 
same, no matter if they are 1K or 10K. Intelligent sampling considers the size 
(the provability of being sampled is influenced by size).

> So it seems indeed if you set the sampling rate to be 1, it would
> sample
> every single packet. To be honest I can't understand why sampling a
> fraction
> of the packets would be useful at all, apart from gleaning a rough
> understanding of the relationship between the flows. However this
> fractional
> sampling leads to data loss and as I mentioned in my first post, the
> backchannel with a very small fraction of the total traffic was not
> reported
> at all.

  The need for sampling surged from different constrains:

  1) NF and sF were born to be used in switches and routers, usually with low 
CPU power. Doing 100% analysis was completelly impossible for those CPU.

  2) Link usage. If you dont sample, you consume more link bandwidth

  3) Storage resources. Have you even considered the storage requirements you 
are going to need? Storing all data is provably going to kill your server 
unless the link is not important or you hava all NSA computers at hand

  Currently both aggregation and sampling are applied both in the probe and the 
server itself. The point is how much precision / detail you loose in exchange 
of fast analysis and great interfacew. Some very interesting articles refer to 
this and provide a real strong mathematical foundation.

  Again, for a ADSL line this is just stupid, but when you start to talk 
serious, things get really nasty.

  To be honest, Paolo has done a great job with pmacct in this fields. We 
expect to help him in the very near future to improve it even further with 
ideas of our own.

> Examining the header of each packet will allow the total data
> throughput to
> be determined without using the payload at all, and at reasonably low
> cost... surely sFlow can do this?

  Sorry, I understood you wrongly, I though you were analysing full payload for 
something, my fault

  Still, all I have said is related to geader only information, so still is 
valid :)

> Hopefully someone on list has set something similar up and can point
> me in
> the right direction.

  The main point here is, how fast is your link, and what are the specs of the 
probe and/or collector?

  Regards

--------------------------------------------
Jaime Nebrera - [EMAIL PROTECTED]
Consultor TI - ENEO Tecnologia SL
Pol. PISA - C/ Manufactura 6, P1, 3B
Mairena del Aljarafe - 41927 - Sevilla
Telf.- (+34) 955 60 11 60 / 619 04 55 18


_______________________________________________
pmacct-discussion mailing list
http://www.pmacct.net/#mailinglists

Reply via email to