Re: [OPSAWG] I-D Action: draft-krishnan-opsawg-large-flow-load-balancing-02.txt

ramki Krishnan Sat, 12 Jan 2013 21:15:15 -0800

Hi Brian,



Thanks a lot for your comments. Please find answers to some of your comments. 
We will respond to your other comments shortly.



> There may be some false positives due to multiple other flows

> masquerading as a large flow; the amount of false positives is

> reduced by parallel hashing using different hash functions



Brian:

To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm 
gives at most 5% collisions, based on IPv6 headers in real packet traces.  
[https://researchspace.auckland.ac.nz/handle/2292/13240]. I wonder whether the 
overhead of running several hashes in parallel is justified by this collision 
rate?



Ram:

The need for multiple hashes is specific to the suggested algorithm on 
automatic hardware identification - this algorithm is similar to a bloom filter 
which uses multiple hash functions.



"On packet arrival, a new flow is looked up in parallel in all the hash tables 
and the corresponding counter is incremented. If the counter exceeds a 
programmed threshold in a given time interval in all the hash table entries, a 
candidate large flow is learnt and programmed in a hardware table resource like 
TCAM.



For a short-lived flow to masquerade as a long-lived lived flow, it needs to 
match all the hash table entries which is a joint probability event - thus, the 
amount of false positives due to short-lived flows is reduced.



Thanks,

ram



-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of 
Brian E Carpenter
Sent: Saturday, January 12, 2013 8:40 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [OPSAWG] I-D Action: 
draft-krishnan-opsawg-large-flow-load-balancing-02.txt



Hi,



My comments are on the discussion of flow IDs and hashing. I'm not commenting 
at all on the overall proposal, because I can't judge whether the problem is 
real or the solution is practical.



> A large space of the flow identifications, i.e. finer

> granularity of the flows, conducts more random in spreading the flows

> over a set of component links.



That isn't accurate. The requirement is an ID space in which the IDs belong to 
a uniform distribution. Technically speaking, if you have two links, a one-bit 
flow ID is sufficient, as long as the values 0 and 1 are equally likely to 
appear.



Therefore, the practical issue is not the size of the ID space but the quality 
of the hash function used to generate the ID of each flow.

However, whatever the initial ID space, the final hash has to be down to 0..N 
if you have N+1 alternative paths.



I think the reason that your model needs a larger ID space is to reduce the 
probability of two flows colliding by chance in the ID space.

That would defeat your wish to separate out large flows.



> The advantages of hashing based load

> distribution are the preservation of the packet sequence in a flow

> and the real time distribution with the stateless of individual

> flows. If the traffic flows randomly spread in the flow

> identification space, the flow rates are much smaller compared to the

> link capacity,



That sounds like magic. I don't think you mean that at all.



> and the rate differences are not dramatic,



Do you mean that the total traffic rate is more fairly distributed across the 
links? In any case, "dramatic" isn't an engineering term.



> the hashing

> algorithm works very well in general.



How can you say that without specifying a particular algorithm? Also, "very 
well in general" isn't an engineering term either.



> There may be some false positives due to multiple other flows

> masquerading as a large flow; the amount of false positives is

> reduced by parallel hashing using different hash functions



To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm 
gives at most 5% collisions, based on IPv6 headers in real packet traces.

[https://researchspace.auckland.ac.nz/handle/2292/13240]



I wonder whether the overhead of running several hashes in parallel is 
justified by this collision rate?



Regards

   Brian Carpenter



_______________________________________________

OPSAWG mailing list

[email protected]<mailto:[email protected]>

https://www.ietf.org/mailman/listinfo/opsawg

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Re: [OPSAWG] I-D Action: draft-krishnan-opsawg-large-flow-load-balancing-02.txt

Reply via email to