On 8/1/24 15:09, Ilya Maximets wrote:
> On 8/1/24 14:26, Dumitru Ceara wrote:
>> On 8/1/24 13:58, Ilya Maximets wrote:
>>> On 7/31/24 19:38, Dumitru Ceara wrote:
>>>> On 7/31/24 18:39, Ilya Maximets wrote:
>>>>> On 7/31/24 18:17, Ilya Maximets wrote:
>>>>>> On 7/31/24 11:05, Dumitru Ceara wrote:
>>>>>>> From: Adrian Moreno <[email protected]>
>>>>>>>
>>>>>>> Introduce a new table called Sample where per-flow IPFIX configuration
>>>>>>> can be specified.
>>>>>>> Also, reference rows from such table from the ACL table to enable the
>>>>>>> configuration of ACL sampling. If enabled, northd will add a sample
>>>>>>> action to each ACL related logical flow.
>>>>>>>
>>>>>>> Packets that hit stateful ACLs are sampled in different ways depending
>>>>>>> whether they are initiating a new session or are just forwarded on an
>>>>>>> existing (already allowed) session.  Two new columns ("sample_new" and
>>>>>>> "sample_est") are added to the ACL table to allow for potentially
>>>>>>> different sampling rates for the two cases.
>>>>>>
>>>>>> Do we actually need to be able to sample each ACL to a different set of 
>>>>>> collectors?
>>>>
>>>> Yes, please see below.
>>>>
>>>>>> I mean, is it not enough to have a common configuration per sampling 
>>>>>> type (drop,
>>>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we do for 
>>>>>> acl logging?
>>>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ?
>>>>>>
>>>>>>
>>>>>> Something like:
>>>>>>
>>>>>>  NB_Global:
>>>>>>    sample_config: map key enum [drop, acl-new, acl-est]
>>>>>>                       value ref Sample_Collector_Set
>>>>>>
>>>>>>  Sample_Collector_Set:
>>>>>>    ids: set key integer
>>>>>>    probability: integer
>>>>>>
>>>>>>  ACL:
>>>>>>    sample_metadata: map key enum [new, est]
>>>>>>                         value integer
>>>>>>
>>>>>>>
>>>>>>> Note: If an ACL has both sampling enabled and a label associated to it
>>>>>>> then the label value overrides the observation point ID defined in the
>>>>>>> sample configuration.  This is a side effect of the implementation
>>>>>>> (observation point IDs are stored in conntrack in the same part of the
>>>>>>> ct_label where ACL labels are also stored).  The two features
>>>>>>> (sampling and ACL labels) serve however similar purposes so it's not
>>>>>>> expected that they're both enabled together.
>>>>>>>
>>>>>>> When sampling is enabled on an ACL additional logical flows are created
>>>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in the ACL
>>>>>>> action stage of the logical pipeline.  These additional flows match on a
>>>>>>> combination of conntrack state values and observation point id values
>>>>>>> (either against a logical register or against the stored ct_label state)
>>>>>>> in order to determine whether the packets hitting the ACLs must be
>>>>>>> sampled or not.  This comes with a slight increase in the number of
>>>>>>> logical flows and in the number of OpenFlow rules.  The number of
>>>>>>> additional flows _does not_ depend on the original ACL match or action.
>>>>>>>
>>>>>>> New --sample-new and --sample-est optional arguments are added to the
>>>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types of
>>>>>>> sampling for ACLs.
>>>>>>>
>>>>>>> An example workflow of configuring ACL samples is:
>>>>>>>   # Create Sampling_App mappings for ACL traffic types:
>>>>>>>   ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \
>>>>>>>                                 id="42"
>>>>>>>   ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \
>>>>>>>                                 id="43"
>>>>>>>   # Create two sample collectors, one that samples all packets (c1)
>>>>>>>   # and another one that samples with a probability of 10% (c2):
>>>>>>>   c1=$(ovn-nbctl create Sample_Collector name=c1 \
>>>>>>>        probability=65535 set_id=1)
>>>>>>>   c2=$(ovn-nbctl create Sample_Collector name=c2 \
>>>>>>>        probability=6553 set_id=2)
>>>>>>>   # Create two sample configurations (for new and for established
>>>>>>>   # traffic):
>>>>>>>   s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301)
>>>>>>>   s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302)
>>>>>>>   # Create an ingress ACL to allow IP traffic:
>>>>>>>   ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \
>>>>>>>             from-lport 1 "ip" allow-related
>>>>>>>
>>>>>>> The config above will generate IPFIX samples with:
>>>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App
>>>>>>>   "acl-new-traffic-sampling" config) and observation point id
>>>>>>>   set to 4301 (Sample s1) for packets that create a new
>>>>>>>   connection
>>>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app
>>>>>>>   "acl-est-traffic-sampling" config) and observation point id
>>>>>>>   set to 4302 (Sample s2) for packets that are part of an already
>>>>>>>   existing connection
>>>>>>>
>>>>>>> Note: in general, all generated IPFIX sample observation domain IDs are
>>>>>>> built by ovn-controller in the following way:
>>>>>>> The 8 MSB taken from the sample action's obs_domain_id and the last 24
>>>>>>> LSB taken from the Southbound logical datapath tunnel_key (datapath ID).
>>>>>>>
>>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305
>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>>>>>>> Co-authored-by: Ales Musil <[email protected]>
>>>>>>> Signed-off-by: Ales Musil <[email protected]>
>>>>>>> Co-authored-by: Dumitru Ceara <[email protected]>
>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>>>> ---
>>>>>>> V4:
>>>>>>> - added explicit sampling stages
>>>>>>> - reduced set_id max supported value
>>>>>>
>>>>>> I don't get that.  Does it end up in the observation domain/point
>>>>>> somehow?  Or in conntrack mark/label?  Sounds strange.  If it's only
>>>>>> in logical flow and OpenFlow actions, then it shouldn't matter what
>>>>>> the ID is.  Or am I missing something?
>>>>>
>>>>> Hmm, I see that they are indeed passed around via registers and
>>>>> committed to conntrack...
>>>>>
>>>>> But I'm still not sure why we would need to have different collectors
>>>>> per ACL.  We can already differentiate by metadata and sending to
>>>>> different collectors on this level sounds like a very niche use-case
>>>>> that may not even exist.
>>>>>
>>>>
>>>> There actually is a real use case behind this.  Two different external
>>>> actors that observe how traffic is processed by OVN ACLs:
>>>>
>>>> a. one of them wishes to receive samples of traffic hitting all ACLs
>>>> (with a given probability, P1)
>>>> b. the other one wishes to receive samples of traffic hitting some of
>>>> the ACLs (with a different probability, P2)
>>>>
>>>> The sets of ACLs that have sampling enabled in "a" and "b" above are
>>>> usually _not_ disjoint.
>>>>
>>>> To make it less abstract:
>>>> - "a" is a k8s network observability application (for the whole cluster
>>>> and for all network policies - ACLs - in the cluster)
>>>> - "b" is an OVN-Kubernetes module that is used for debugging the traffic
>>>> path in a live cluster, e.g., trying to figure out why a certain type of
>>>> traffic is allowed while there's no network policy that allows it.
>>>>
>>>> They both need to be able to function simultaneously.
>>>
>>> Wouldn't "b" also need to listen on all the ACLs to figure out which
>>> one is allowing the traffic?
>>>
>>
>> I listed that only as an example, "trying to figure out why a certain
>> type of traffic is allowed while there's no network policy that allows
>> it".  But the general use case is to allow focused troubleshooting (of
>> network policy for now) on live production setups.
>>
>> As far as I understand, users of "b" would not do "blanket enabling of
>> sampling" because:
>> - it might kill performance (if "a" is not already running)
>> - they're troubleshooting a specific problem: traffic from pod X is
>> allowed.  So there's no point to blindly sample everything, we can for
>> example sample only ACLs that corresponding to the namespace pod X lives
>> in.  Or even a single specific ACL if there's already a potential
>> suspect network policy.
>>
>>> One reasonable scenario would be for "b" is "I want to know which
>>> traffic is allowed by this one particular ACL", but I'm still not
>>> convinced this is a common use case, because ACL itself says which
>>> traffic it allows and it's usually what you described - we don't
>>
>> It might seem obvious to OVN developers but reading ACLs is not that
>> straight forward for end users.  The match potentially includes address
>> sets, port groups, registers, essentially any OVS field.
> 
> Network policies in K8s or security groups in Open Stack provide the
> more high level but essentially the same information, that non OVN
> experts should be able to interpret.
> 
>>
>>> know what is happening and need to apply blanket sampling on all of
>>> them to figure out which one is a culprit of the issue.
>>>
>>> Even if we want to listen on a particular set of ACLs, I'm not sure
>>> how bad it is if the collector will receive traffic for other ACLs
>>> as well, it can filter those out.
>>>
>>> We can attach the probability to a collector, it will change the
>>> suggested schema design, but we will still not need to match on
>>> those fields.
>>>
>>
>> I'm not sure I follow this part, the schema in the patch already adds
>> the probability to the Sample_Collector.
> 
> I meant my suggested schema, the one I suggested in this thread.
> 
>>
>>> Also, we're not providing the same flexibility for drop sampling,
>>> even though it is treated as part of the same functionality.
>>>
>>
>> We're not yet, and maybe we should add it in the future for default drop
>> debugging too.  However, legit (user-defined) drops due to the
>> default-deny network policy rule are implemented through ACLs.  So those
>> are going to be sampled (to multiple collectors if applicable).
>>
>>> And "a" in the real world scenario scenario is likely using local
>>> sampling with psample, so "b" can just tap into psample, filtering
>>> out ACLs it doesn't need, instead of setting a separate collector
>>> for itself.
>>>
>>
>> Except that "a" and "b" might need different probabilities for different
>> ACLs.
> 
> I'm not convinced that they'll need high granularity of probabilities.
> I get it that one may want "high" probability for the new traffic
> and "low" for established.  But it feels like two administrators of the
> same OVN setup can find a common ground on what exactly "high" and "low"
> should mean, or allow for the values to be temporarily adjusted for
> the time of debugging.
> 
>>
>>> And the optimization that uses those registers is only possible when
>>> there is only one collector configured per ACL, which will not be
>>> the case in the described scenario.
>>>
>>
>> While we want to support the case when "b" is running, "b" - the
>> debugging application - will not be continuously running.  And we're
>> operating under the premise that it won't enable blanket sampling of all
>> ACLs.  So the optimization is still relevant for most ACLs.
>>
>>>
>>> Speaking of matching, don't we also need to match on the value of
>>> probability?  Nothing prevents us from creating two collectors with
>>> the same collector set ID but different probability.  Meaning the
>>> set id and the observation stage do not actually uniquely identify
>>> the sampling configuration.
>>>
>>
>> Maybe we should add [set-id] as index then.
> 
> If set-id will be an index, than you'll not be able to sample to the
> same collector with different probabilities, or you'll need to create
> multiple collector sets in OVS, which is not great since the number
> of IDs is very limited on OVN side.
> 
> A better option might be to enumerate rows in Sample_Collector table
> and use that index for matching instead of set-id.  It can either be
> an indexed 8-bit column in the table, or northd may track the indexing
> by maintaining the name to index mapping.  This way we'll also not
> need to limit the value of set-id.

If we take this route we may also consider limiting the number of rows
in the Sample_Collector table to even lower number.  Will 4-bit index
be enough?  i.e. is 15 collectors enough?  We only need 3 in a common
case, i.e. one of each type.

Database can enforce the limit either by having an actual indexed column
or setting the maxRows.

Also, is there a reason for most of the tables being root tables?
It may be hard to clean them up.

> 
>>
>>> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely
>>> identify sampling configuration, because it is an indexed field in
>>> the database.  Why can't we use it for matching instead and save
>>> some registers?  It's a single collector use case anyway, so metadata
>>> matches to exactly one collector.
>>>
>>
>> I'm not sure I follow this part.  Are you talking about the
>> sampling-with-registers optimization?  If so, that's patch 9/9.
> 
> Yes, I was talking about the optimization in 9/9.  It is relevant
> to this patch because we're limiting the set-id here in order to
> avoid using too much space in 9/9.  So, it's all connected.
> 
> But also, see my other reply, we can't actually use the metadata.
> 
>>
>>>>
>>>>> If we use the schema that I suggested above, we may not need to
>>>>> match on the collector set ids, so they will not take space in the
>>>>> registers or conntrack mark/labels.
>>>>>
>>>>
>>>> That's true, but I don't think we can use the schema you suggest.
>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to