Ram,

On 13/01/2013 05:14, ramki Krishnan wrote:
> Hi Brian,
> 
> Thanks a lot for your comments. Please find answers to some of your comments. 
> We will respond to your other comments shortly.
> 
>> There may be some false positives due to multiple other flows
>> masquerading as a large flow; the amount of false positives is
>> reduced by parallel hashing using different hash functions
> 
> Brian:
> 
>> To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm 
>> gives at most 5% collisions, based on IPv6 headers in real packet traces.  
>> [https://researchspace.auckland.ac.nz/handle/2292/13240]. I wonder whether 
>> the overhead of running several hashes in parallel is justified by this 
>> collision rate?
>
> Ram:
> 
> The need for multiple hashes is specific to the suggested algorithm on 
> automatic hardware identification - this algorithm is similar to a bloom 
> filter which uses multiple hash functions.

I think didn't explain my point sufficiently. I understand why you suggest 
using several
hashes. But this is statistical load balancing we are talking about. If a few % 
of the
time, you mistakenly treat several medium flows as one large flow, and therefore
rebalance them as a single unit, so what? You will still balance the traffic 
reasonably
well.

    Brian

> "On packet arrival, a new flow is looked up in parallel in all the hash 
> tables and the corresponding counter is incremented. If the counter exceeds a 
> programmed threshold in a given time interval in all the hash table entries, 
> a candidate large flow is learnt and programmed in a hardware table resource 
> like TCAM.

> For a short-lived flow to masquerade as a long-lived lived flow, it needs to 
> match all the hash table entries which is a joint probability event - thus, 
> the amount of false positives due to short-lived flows is reduced.

> Thanks,
> 
> ram
> 
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of 
> Brian E Carpenter
> Sent: Saturday, January 12, 2013 8:40 AM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: [OPSAWG] I-D Action: 
> draft-krishnan-opsawg-large-flow-load-balancing-02.txt
> 
> Hi,
> 
> My comments are on the discussion of flow IDs and hashing. I'm not commenting 
> at all on the overall proposal, because I can't judge whether the problem is 
> real or the solution is practical.
> 
>> A large space of the flow identifications, i.e. finer
> 
>> granularity of the flows, conducts more random in spreading the flows
>> over a set of component links.
> 
> That isn't accurate. The requirement is an ID space in which the IDs belong 
> to a uniform distribution. Technically speaking, if you have two links, a 
> one-bit flow ID is sufficient, as long as the values 0 and 1 are equally 
> likely to appear.
> Therefore, the practical issue is not the size of the ID space but the 
> quality of the hash function used to generate the ID of each flow.
> However, whatever the initial ID space, the final hash has to be down to 0..N 
> if you have N+1 alternative paths.
> I think the reason that your model needs a larger ID space is to reduce the 
> probability of two flows colliding by chance in the ID space.
> That would defeat your wish to separate out large flows.
> 
>> The advantages of hashing based load
>> distribution are the preservation of the packet sequence in a flow
>> and the real time distribution with the stateless of individual
>> flows. If the traffic flows randomly spread in the flow
>> identification space, the flow rates are much smaller compared to the
>> link capacity,
> 
> That sounds like magic. I don't think you mean that at all.
> 
>> and the rate differences are not dramatic,
> 
> Do you mean that the total traffic rate is more fairly distributed across the 
> links? In any case, "dramatic" isn't an engineering term.

>> the hashing
>> algorithm works very well in general.
> 
> How can you say that without specifying a particular algorithm? Also, "very 
> well in general" isn't an engineering term either.
> 
>> There may be some false positives due to multiple other flows
>> masquerading as a large flow; the amount of false positives is
>> reduced by parallel hashing using different hash functions
> 
> To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm 
> gives at most 5% collisions, based on IPv6 headers in real packet traces.
> [https://researchspace.auckland.ac.nz/handle/2292/13240]
> I wonder whether the overhead of running several hashes in parallel is 
> justified by this collision rate?
> 
> Regards
> 
>    Brian Carpenter
> _______________________________________________
> 
> OPSAWG mailing list
> [email protected]<mailto:[email protected]>
> https://www.ietf.org/mailman/listinfo/opsawg
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Reply via email to