On 14/01/2013 18:00, Lucy yong wrote:
> Hi Brian,
> 
> I think we are not discussing different things. Please see inline.

I think the problem is that the draft doesn't explain the fundamental
model very well. I fully agree with your comments below (and these
aspects are discussed in RFC 6438).

   Brian

> 
>> -----Original Message-----
>> From: Brian E Carpenter [mailto:[email protected]]
>> Sent: Saturday, January 12, 2013 10:40 AM
>> To: [email protected]
>> Cc: [email protected]
>> Subject: Re: I-D Action: draft-krishnan-opsawg-large-flow-load-
>> balancing-02.txt
>>
>> Hi,
>>
>> My comments are on the discussion of flow IDs and hashing. I'm not
>> commenting at all on the overall proposal, because I can't judge
>> whether the problem is real or the solution is practical.
>>
>>> A large space of the flow identifications, i.e. finer
>>> granularity of the flows, conducts more random in spreading the flows
>>> over a set of component links.
>> That isn't accurate. The requirement is an ID space in which the IDs
>> belong to a uniform distribution. Technically speaking, if you have two
>> links, a one-bit flow ID is sufficient, as long as the values 0 and 1
>> are
>> equally likely to appear.
> [Lucy] This is not requirement for the ID space. It is to say that using 5 
> tuple or 3 tuple to define a flow results a very large flow ID space. This 
> flow definition is typically used in hashing based load balance today. This 
> has nothing to do with how many links. We will refine the text to make it 
> clear. 
> 
>> Therefore, the practical issue is not the size of the ID space but the
>> quality of the hash function used to generate the ID of each flow.
>> However, whatever the initial ID space, the final hash has to be down
>> to 0..N if you have N+1 alternative paths.
> [Lucy] The draft does not address using hashing to generate the flow ID at 
> all. We are across each other here. You may refer some different applications.
> 
> Regards,
> Lucy
>> I think the reason that your model needs a larger ID space is to
>> reduce the probability of two flows colliding by chance in the ID space.
>> That would defeat your wish to separate out large flows.
>>
>>> The advantages of hashing based load
>>> distribution are the preservation of the packet sequence in a flow
>>> and the real time distribution with the stateless of individual
>>> flows. If the traffic flows randomly spread in the flow
>>> identification space, the flow rates are much smaller compared to the
>>> link capacity,
>> That sounds like magic. I don't think you mean that at all.
>>
>>> and the rate differences are not dramatic,
>> Do you mean that the total traffic rate is more fairly distributed
>> across the links? In any case, "dramatic" isn't an engineering term.
>>
>>> the hashing
>>> algorithm works very well in general.
>> How can you say that without specifying a particular algorithm? Also,
>> "very well in general" isn't an engineering term either.
>>
>>> There may be some false positives due to multiple other flows
>>> masquerading as a large flow; the amount of false positives is
>>> reduced by parallel hashing using different hash functions
>> To give you some data, with a 20 bit ID space, the FNV1a-32 hash
>> algorithm gives at most 5% collisions, based on IPv6 headers in real
>> packet traces.
>> [https://researchspace.auckland.ac.nz/handle/2292/13240]
>>
>> I wonder whether the overhead of running several hashes in parallel
>> is justified by this collision rate?
>>
>> Regards
>>    Brian Carpenter
> 
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

Reply via email to