Re: [ovs-dev] [PATCH ovn v4 5/9] northd: Add ACL Sampling.

Ilya Maximets Thu, 01 Aug 2024 05:21:04 -0700

On 8/1/24 13:58, Ilya Maximets wrote:
> On 7/31/24 19:38, Dumitru Ceara wrote:
>> On 7/31/24 18:39, Ilya Maximets wrote:
>>> On 7/31/24 18:17, Ilya Maximets wrote:
>>>> On 7/31/24 11:05, Dumitru Ceara wrote:
>>>>> From: Adrian Moreno <[email protected]>
>>>>>
>>>>> Introduce a new table called Sample where per-flow IPFIX configuration
>>>>> can be specified.
>>>>> Also, reference rows from such table from the ACL table to enable the
>>>>> configuration of ACL sampling. If enabled, northd will add a sample
>>>>> action to each ACL related logical flow.
>>>>>
>>>>> Packets that hit stateful ACLs are sampled in different ways depending
>>>>> whether they are initiating a new session or are just forwarded on an
>>>>> existing (already allowed) session.  Two new columns ("sample_new" and
>>>>> "sample_est") are added to the ACL table to allow for potentially
>>>>> different sampling rates for the two cases.
>>>>
>>>> Do we actually need to be able to sample each ACL to a different set of 
>>>> collectors?
>>
>> Yes, please see below.
>>
>>>> I mean, is it not enough to have a common configuration per sampling type 
>>>> (drop,
>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we do for 
>>>> acl logging?
>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ?
>>>>
>>>>
>>>> Something like:
>>>>
>>>>  NB_Global:
>>>>    sample_config: map key enum [drop, acl-new, acl-est]
>>>>                       value ref Sample_Collector_Set
>>>>
>>>>  Sample_Collector_Set:
>>>>    ids: set key integer
>>>>    probability: integer
>>>>
>>>>  ACL:
>>>>    sample_metadata: map key enum [new, est]
>>>>                         value integer
>>>>
>>>>>
>>>>> Note: If an ACL has both sampling enabled and a label associated to it
>>>>> then the label value overrides the observation point ID defined in the
>>>>> sample configuration.  This is a side effect of the implementation
>>>>> (observation point IDs are stored in conntrack in the same part of the
>>>>> ct_label where ACL labels are also stored).  The two features
>>>>> (sampling and ACL labels) serve however similar purposes so it's not
>>>>> expected that they're both enabled together.
>>>>>
>>>>> When sampling is enabled on an ACL additional logical flows are created
>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in the ACL
>>>>> action stage of the logical pipeline.  These additional flows match on a
>>>>> combination of conntrack state values and observation point id values
>>>>> (either against a logical register or against the stored ct_label state)
>>>>> in order to determine whether the packets hitting the ACLs must be
>>>>> sampled or not.  This comes with a slight increase in the number of
>>>>> logical flows and in the number of OpenFlow rules.  The number of
>>>>> additional flows _does not_ depend on the original ACL match or action.
>>>>>
>>>>> New --sample-new and --sample-est optional arguments are added to the
>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types of
>>>>> sampling for ACLs.
>>>>>
>>>>> An example workflow of configuring ACL samples is:
>>>>>   # Create Sampling_App mappings for ACL traffic types:
>>>>>   ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \
>>>>>                                 id="42"
>>>>>   ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \
>>>>>                           id="43"
>>>>>   # Create two sample collectors, one that samples all packets (c1)
>>>>>   # and another one that samples with a probability of 10% (c2):
>>>>>   c1=$(ovn-nbctl create Sample_Collector name=c1 \
>>>>>        probability=65535 set_id=1)
>>>>>   c2=$(ovn-nbctl create Sample_Collector name=c2 \
>>>>>        probability=6553 set_id=2)
>>>>>   # Create two sample configurations (for new and for established
>>>>>   # traffic):
>>>>>   s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301)
>>>>>   s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302)
>>>>>   # Create an ingress ACL to allow IP traffic:
>>>>>   ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \
>>>>>             from-lport 1 "ip" allow-related
>>>>>
>>>>> The config above will generate IPFIX samples with:
>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App
>>>>>   "acl-new-traffic-sampling" config) and observation point id
>>>>>   set to 4301 (Sample s1) for packets that create a new
>>>>>   connection
>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app
>>>>>   "acl-est-traffic-sampling" config) and observation point id
>>>>>   set to 4302 (Sample s2) for packets that are part of an already
>>>>>   existing connection
>>>>>
>>>>> Note: in general, all generated IPFIX sample observation domain IDs are
>>>>> built by ovn-controller in the following way:
>>>>> The 8 MSB taken from the sample action's obs_domain_id and the last 24
>>>>> LSB taken from the Southbound logical datapath tunnel_key (datapath ID).
>>>>>
>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305
>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>> Co-authored-by: Ales Musil <[email protected]>
>>>>> Signed-off-by: Ales Musil <[email protected]>
>>>>> Co-authored-by: Dumitru Ceara <[email protected]>
>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>> ---
>>>>> V4:
>>>>> - added explicit sampling stages
>>>>> - reduced set_id max supported value
>>>>
>>>> I don't get that.  Does it end up in the observation domain/point
>>>> somehow?  Or in conntrack mark/label?  Sounds strange.  If it's only
>>>> in logical flow and OpenFlow actions, then it shouldn't matter what
>>>> the ID is.  Or am I missing something?
>>>
>>> Hmm, I see that they are indeed passed around via registers and
>>> committed to conntrack...
>>>
>>> But I'm still not sure why we would need to have different collectors
>>> per ACL.  We can already differentiate by metadata and sending to
>>> different collectors on this level sounds like a very niche use-case
>>> that may not even exist.
>>>
>>
>> There actually is a real use case behind this.  Two different external
>> actors that observe how traffic is processed by OVN ACLs:
>>
>> a. one of them wishes to receive samples of traffic hitting all ACLs
>> (with a given probability, P1)
>> b. the other one wishes to receive samples of traffic hitting some of
>> the ACLs (with a different probability, P2)
>>
>> The sets of ACLs that have sampling enabled in "a" and "b" above are
>> usually _not_ disjoint.
>>
>> To make it less abstract:
>> - "a" is a k8s network observability application (for the whole cluster
>> and for all network policies - ACLs - in the cluster)
>> - "b" is an OVN-Kubernetes module that is used for debugging the traffic
>> path in a live cluster, e.g., trying to figure out why a certain type of
>> traffic is allowed while there's no network policy that allows it.
>>
>> They both need to be able to function simultaneously.
> 
> Wouldn't "b" also need to listen on all the ACLs to figure out which
> one is allowing the traffic?
> 
> One reasonable scenario would be for "b" is "I want to know which
> traffic is allowed by this one particular ACL", but I'm still not
> convinced this is a common use case, because ACL itself says which
> traffic it allows and it's usually what you described - we don't
> know what is happening and need to apply blanket sampling on all of
> them to figure out which one is a culprit of the issue.
> 
> Even if we want to listen on a particular set of ACLs, I'm not sure
> how bad it is if the collector will receive traffic for other ACLs
> as well, it can filter those out.
> 
> We can attach the probability to a collector, it will change the
> suggested schema design, but we will still not need to match on
> those fields.
> 
> Also, we're not providing the same flexibility for drop sampling,
> even though it is treated as part of the same functionality.
> 
> And "a" in the real world scenario scenario is likely using local
> sampling with psample, so "b" can just tap into psample, filtering
> out ACLs it doesn't need, instead of setting a separate collector
> for itself.
> 
> And the optimization that uses those registers is only possible when
> there is only one collector configured per ACL, which will not be
> the case in the described scenario.
> 
> 
> Speaking of matching, don't we also need to match on the value of
> probability?  Nothing prevents us from creating two collectors with
> the same collector set ID but different probability.  Meaning the
> set id and the observation stage do not actually uniquely identify
> the sampling configuration.
> 
> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely
> identify sampling configuration, because it is an indexed field in
> the database.  Why can't we use it for matching instead and save
> some registers?  It's a single collector use case anyway, so metadata
> matches to exactly one collector.


Nevermind using metadata as a match.  That is the point of the
optimization to not use it, otherwise we end up with a flow per
ACL. :)

But we still need to match on probability in one way or another,
I think, in order for this logic to be correct.

> 
>>
>>> If we use the schema that I suggested above, we may not need to
>>> match on the collector set ids, so they will not take space in the
>>> registers or conntrack mark/labels.
>>>
>>
>> That's true, but I don't think we can use the schema you suggest.
>>
>>> Best regards, Ilya Maximets.
>>>
>>
>> Regards,
>> Dumitru
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn v4 5/9] northd: Add ACL Sampling.

Reply via email to