Ram,
On 13/01/2013 05:14, ramki Krishnan wrote:
> Hi Brian,
>
> Thanks a lot for your comments. Please find answers to some of your comments.
> We will respond to your other comments shortly.
>
>> There may be some false positives due to multiple other flows
>> masquerading as a large flow; the amount of false positives is
>> reduced by parallel hashing using different hash functions
>
> Brian:
>
>> To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm
>> gives at most 5% collisions, based on IPv6 headers in real packet traces.
>> [https://researchspace.auckland.ac.nz/handle/2292/13240]. I wonder whether
>> the overhead of running several hashes in parallel is justified by this
>> collision rate?
>
> Ram:
>
> The need for multiple hashes is specific to the suggested algorithm on
> automatic hardware identification - this algorithm is similar to a bloom
> filter which uses multiple hash functions.
I think didn't explain my point sufficiently. I understand why you suggest
using several
hashes. But this is statistical load balancing we are talking about. If a few %
of the
time, you mistakenly treat several medium flows as one large flow, and therefore
rebalance them as a single unit, so what? You will still balance the traffic
reasonably
well.
Brian
> "On packet arrival, a new flow is looked up in parallel in all the hash
> tables and the corresponding counter is incremented. If the counter exceeds a
> programmed threshold in a given time interval in all the hash table entries,
> a candidate large flow is learnt and programmed in a hardware table resource
> like TCAM.
> For a short-lived flow to masquerade as a long-lived lived flow, it needs to
> match all the hash table entries which is a joint probability event - thus,
> the amount of false positives due to short-lived flows is reduced.
> Thanks,
>
> ram
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of
> Brian E Carpenter
> Sent: Saturday, January 12, 2013 8:40 AM
> To: [email protected]
> Cc: [email protected]
> Subject: Re: [OPSAWG] I-D Action:
> draft-krishnan-opsawg-large-flow-load-balancing-02.txt
>
> Hi,
>
> My comments are on the discussion of flow IDs and hashing. I'm not commenting
> at all on the overall proposal, because I can't judge whether the problem is
> real or the solution is practical.
>
>> A large space of the flow identifications, i.e. finer
>
>> granularity of the flows, conducts more random in spreading the flows
>> over a set of component links.
>
> That isn't accurate. The requirement is an ID space in which the IDs belong
> to a uniform distribution. Technically speaking, if you have two links, a
> one-bit flow ID is sufficient, as long as the values 0 and 1 are equally
> likely to appear.
> Therefore, the practical issue is not the size of the ID space but the
> quality of the hash function used to generate the ID of each flow.
> However, whatever the initial ID space, the final hash has to be down to 0..N
> if you have N+1 alternative paths.
> I think the reason that your model needs a larger ID space is to reduce the
> probability of two flows colliding by chance in the ID space.
> That would defeat your wish to separate out large flows.
>
>> The advantages of hashing based load
>> distribution are the preservation of the packet sequence in a flow
>> and the real time distribution with the stateless of individual
>> flows. If the traffic flows randomly spread in the flow
>> identification space, the flow rates are much smaller compared to the
>> link capacity,
>
> That sounds like magic. I don't think you mean that at all.
>
>> and the rate differences are not dramatic,
>
> Do you mean that the total traffic rate is more fairly distributed across the
> links? In any case, "dramatic" isn't an engineering term.
>> the hashing
>> algorithm works very well in general.
>
> How can you say that without specifying a particular algorithm? Also, "very
> well in general" isn't an engineering term either.
>
>> There may be some false positives due to multiple other flows
>> masquerading as a large flow; the amount of false positives is
>> reduced by parallel hashing using different hash functions
>
> To give you some data, with a 20 bit ID space, the FNV1a-32 hash algorithm
> gives at most 5% collisions, based on IPv6 headers in real packet traces.
> [https://researchspace.auckland.ac.nz/handle/2292/13240]
> I wonder whether the overhead of running several hashes in parallel is
> justified by this collision rate?
>
> Regards
>
> Brian Carpenter
> _______________________________________________
>
> OPSAWG mailing list
> [email protected]<mailto:[email protected]>
> https://www.ietf.org/mailman/listinfo/opsawg
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg