On 14/01/2013 18:00, Lucy yong wrote: > Hi Brian, > > I think we are not discussing different things. Please see inline.
I think the problem is that the draft doesn't explain the fundamental model very well. I fully agree with your comments below (and these aspects are discussed in RFC 6438). Brian > >> -----Original Message----- >> From: Brian E Carpenter [mailto:[email protected]] >> Sent: Saturday, January 12, 2013 10:40 AM >> To: [email protected] >> Cc: [email protected] >> Subject: Re: I-D Action: draft-krishnan-opsawg-large-flow-load- >> balancing-02.txt >> >> Hi, >> >> My comments are on the discussion of flow IDs and hashing. I'm not >> commenting at all on the overall proposal, because I can't judge >> whether the problem is real or the solution is practical. >> >>> A large space of the flow identifications, i.e. finer >>> granularity of the flows, conducts more random in spreading the flows >>> over a set of component links. >> That isn't accurate. The requirement is an ID space in which the IDs >> belong to a uniform distribution. Technically speaking, if you have two >> links, a one-bit flow ID is sufficient, as long as the values 0 and 1 >> are >> equally likely to appear. > [Lucy] This is not requirement for the ID space. It is to say that using 5 > tuple or 3 tuple to define a flow results a very large flow ID space. This > flow definition is typically used in hashing based load balance today. This > has nothing to do with how many links. We will refine the text to make it > clear. > >> Therefore, the practical issue is not the size of the ID space but the >> quality of the hash function used to generate the ID of each flow. >> However, whatever the initial ID space, the final hash has to be down >> to 0..N if you have N+1 alternative paths. > [Lucy] The draft does not address using hashing to generate the flow ID at > all. We are across each other here. You may refer some different applications. > > Regards, > Lucy >> I think the reason that your model needs a larger ID space is to >> reduce the probability of two flows colliding by chance in the ID space. >> That would defeat your wish to separate out large flows. >> >>> The advantages of hashing based load >>> distribution are the preservation of the packet sequence in a flow >>> and the real time distribution with the stateless of individual >>> flows. If the traffic flows randomly spread in the flow >>> identification space, the flow rates are much smaller compared to the >>> link capacity, >> That sounds like magic. I don't think you mean that at all. >> >>> and the rate differences are not dramatic, >> Do you mean that the total traffic rate is more fairly distributed >> across the links? In any case, "dramatic" isn't an engineering term. >> >>> the hashing >>> algorithm works very well in general. >> How can you say that without specifying a particular algorithm? Also, >> "very well in general" isn't an engineering term either. >> >>> There may be some false positives due to multiple other flows >>> masquerading as a large flow; the amount of false positives is >>> reduced by parallel hashing using different hash functions >> To give you some data, with a 20 bit ID space, the FNV1a-32 hash >> algorithm gives at most 5% collisions, based on IPv6 headers in real >> packet traces. >> [https://researchspace.auckland.ac.nz/handle/2292/13240] >> >> I wonder whether the overhead of running several hashes in parallel >> is justified by this collision rate? >> >> Regards >> Brian Carpenter > _______________________________________________ OPSAWG mailing list [email protected] https://www.ietf.org/mailman/listinfo/opsawg
