Another example just came up: I want to set probability per k8s/OCP
feature, some of which are translated into the same OVN object.
For example, some ACL-based features: EgressFirewall, NetworkPolicy,
AdminNetworkPolicy. I may want to set EgressFirewall probability to 50%,
NetworkPolicy to 60%, and AdminNetworkPolicy to 70%. But they all use the
same OVN sampling app for ACLs.
Different users (observability and debug app from the previous reply) may
also have different opinions on per-feature level.

Kind regards,
Nadia Pinaeva

On Thu, Aug 1, 2024 at 4:40 PM Nadia Pinaeva <[email protected]> wrote:

> For now we have at least 2 users (observability app and debug app), which
> are independent.
> Observability app sets sampling for the whole cluster, and the samples are
> visible to all cluster users via GUI. Which OVN sampling app should be
> enabled and with which probability is decided based on this cluster needs
> and available resources. For example, some "unpopular" or
> resource-consuming features may be turned off or used with lower sampling
> probability.
> Debug app is used by cluster admin when there is a problem. In this mode
> they most likely will want to enable all possible sampling apps, but for
> specific pods/services.
> As a result we may have observability app asking for 50% probability on
> all ACLs, and debug app asking for 100% probabililty on a subset of ACLs.
>
> Observability and debug should not affect each other, as we don't want
> observability info available to cluster users to be affected by debugging
> activities.
> Hopefully this makes sense, but let me know if I can provide more details.
>
> Kind regards,
> Nadia Pinaeva
>
> On Thu, Aug 1, 2024 at 4:21 PM Dumitru Ceara <[email protected]> wrote:
>
>> On 8/1/24 15:19, Ilya Maximets wrote:
>> > On 8/1/24 15:09, Ilya Maximets wrote:
>> >> On 8/1/24 14:26, Dumitru Ceara wrote:
>> >>> On 8/1/24 13:58, Ilya Maximets wrote:
>> >>>> On 7/31/24 19:38, Dumitru Ceara wrote:
>> >>>>> On 7/31/24 18:39, Ilya Maximets wrote:
>> >>>>>> On 7/31/24 18:17, Ilya Maximets wrote:
>> >>>>>>> On 7/31/24 11:05, Dumitru Ceara wrote:
>> >>>>>>>> From: Adrian Moreno <[email protected]>
>> >>>>>>>>
>> >>>>>>>> Introduce a new table called Sample where per-flow IPFIX
>> configuration
>> >>>>>>>> can be specified.
>> >>>>>>>> Also, reference rows from such table from the ACL table to
>> enable the
>> >>>>>>>> configuration of ACL sampling. If enabled, northd will add a
>> sample
>> >>>>>>>> action to each ACL related logical flow.
>> >>>>>>>>
>> >>>>>>>> Packets that hit stateful ACLs are sampled in different ways
>> depending
>> >>>>>>>> whether they are initiating a new session or are just forwarded
>> on an
>> >>>>>>>> existing (already allowed) session.  Two new columns
>> ("sample_new" and
>> >>>>>>>> "sample_est") are added to the ACL table to allow for potentially
>> >>>>>>>> different sampling rates for the two cases.
>> >>>>>>>
>> >>>>>>> Do we actually need to be able to sample each ACL to a different
>> set of collectors?
>> >>>>>
>> >>>>> Yes, please see below.
>> >>>>>
>> >>>>>>> I mean, is it not enough to have a common configuration per
>> sampling type (drop,
>> >>>>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we
>> do for acl logging?
>> >>>>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Something like:
>> >>>>>>>
>> >>>>>>>  NB_Global:
>> >>>>>>>    sample_config: map key enum [drop, acl-new, acl-est]
>> >>>>>>>                       value ref Sample_Collector_Set
>> >>>>>>>
>> >>>>>>>  Sample_Collector_Set:
>> >>>>>>>    ids: set key integer
>> >>>>>>>    probability: integer
>> >>>>>>>
>> >>>>>>>  ACL:
>> >>>>>>>    sample_metadata: map key enum [new, est]
>> >>>>>>>                         value integer
>> >>>>>>>
>> >>>>>>>>
>> >>>>>>>> Note: If an ACL has both sampling enabled and a label associated
>> to it
>> >>>>>>>> then the label value overrides the observation point ID defined
>> in the
>> >>>>>>>> sample configuration.  This is a side effect of the
>> implementation
>> >>>>>>>> (observation point IDs are stored in conntrack in the same part
>> of the
>> >>>>>>>> ct_label where ACL labels are also stored).  The two features
>> >>>>>>>> (sampling and ACL labels) serve however similar purposes so it's
>> not
>> >>>>>>>> expected that they're both enabled together.
>> >>>>>>>>
>> >>>>>>>> When sampling is enabled on an ACL additional logical flows are
>> created
>> >>>>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in
>> the ACL
>> >>>>>>>> action stage of the logical pipeline.  These additional flows
>> match on a
>> >>>>>>>> combination of conntrack state values and observation point id
>> values
>> >>>>>>>> (either against a logical register or against the stored
>> ct_label state)
>> >>>>>>>> in order to determine whether the packets hitting the ACLs must
>> be
>> >>>>>>>> sampled or not.  This comes with a slight increase in the number
>> of
>> >>>>>>>> logical flows and in the number of OpenFlow rules.  The number of
>> >>>>>>>> additional flows _does not_ depend on the original ACL match or
>> action.
>> >>>>>>>>
>> >>>>>>>> New --sample-new and --sample-est optional arguments are added
>> to the
>> >>>>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types
>> of
>> >>>>>>>> sampling for ACLs.
>> >>>>>>>>
>> >>>>>>>> An example workflow of configuring ACL samples is:
>> >>>>>>>>   # Create Sampling_App mappings for ACL traffic types:
>> >>>>>>>>   ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \
>> >>>>>>>>                                 id="42"
>> >>>>>>>>   ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \
>> >>>>>>>>                                id="43"
>> >>>>>>>>   # Create two sample collectors, one that samples all packets
>> (c1)
>> >>>>>>>>   # and another one that samples with a probability of 10% (c2):
>> >>>>>>>>   c1=$(ovn-nbctl create Sample_Collector name=c1 \
>> >>>>>>>>        probability=65535 set_id=1)
>> >>>>>>>>   c2=$(ovn-nbctl create Sample_Collector name=c2 \
>> >>>>>>>>        probability=6553 set_id=2)
>> >>>>>>>>   # Create two sample configurations (for new and for established
>> >>>>>>>>   # traffic):
>> >>>>>>>>   s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301)
>> >>>>>>>>   s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302)
>> >>>>>>>>   # Create an ingress ACL to allow IP traffic:
>> >>>>>>>>   ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \
>> >>>>>>>>             from-lport 1 "ip" allow-related
>> >>>>>>>>
>> >>>>>>>> The config above will generate IPFIX samples with:
>> >>>>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App
>> >>>>>>>>   "acl-new-traffic-sampling" config) and observation point id
>> >>>>>>>>   set to 4301 (Sample s1) for packets that create a new
>> >>>>>>>>   connection
>> >>>>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app
>> >>>>>>>>   "acl-est-traffic-sampling" config) and observation point id
>> >>>>>>>>   set to 4302 (Sample s2) for packets that are part of an already
>> >>>>>>>>   existing connection
>> >>>>>>>>
>> >>>>>>>> Note: in general, all generated IPFIX sample observation domain
>> IDs are
>> >>>>>>>> built by ovn-controller in the following way:
>> >>>>>>>> The 8 MSB taken from the sample action's obs_domain_id and the
>> last 24
>> >>>>>>>> LSB taken from the Southbound logical datapath tunnel_key
>> (datapath ID).
>> >>>>>>>>
>> >>>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305
>> >>>>>>>> Signed-off-by: Adrian Moreno <[email protected]>
>> >>>>>>>> Co-authored-by: Ales Musil <[email protected]>
>> >>>>>>>> Signed-off-by: Ales Musil <[email protected]>
>> >>>>>>>> Co-authored-by: Dumitru Ceara <[email protected]>
>> >>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>> >>>>>>>> ---
>> >>>>>>>> V4:
>> >>>>>>>> - added explicit sampling stages
>> >>>>>>>> - reduced set_id max supported value
>> >>>>>>>
>> >>>>>>> I don't get that.  Does it end up in the observation domain/point
>> >>>>>>> somehow?  Or in conntrack mark/label?  Sounds strange.  If it's
>> only
>> >>>>>>> in logical flow and OpenFlow actions, then it shouldn't matter
>> what
>> >>>>>>> the ID is.  Or am I missing something?
>> >>>>>>
>> >>>>>> Hmm, I see that they are indeed passed around via registers and
>> >>>>>> committed to conntrack...
>> >>>>>>
>> >>>>>> But I'm still not sure why we would need to have different
>> collectors
>> >>>>>> per ACL.  We can already differentiate by metadata and sending to
>> >>>>>> different collectors on this level sounds like a very niche
>> use-case
>> >>>>>> that may not even exist.
>> >>>>>>
>> >>>>>
>> >>>>> There actually is a real use case behind this.  Two different
>> external
>> >>>>> actors that observe how traffic is processed by OVN ACLs:
>> >>>>>
>> >>>>> a. one of them wishes to receive samples of traffic hitting all ACLs
>> >>>>> (with a given probability, P1)
>> >>>>> b. the other one wishes to receive samples of traffic hitting some
>> of
>> >>>>> the ACLs (with a different probability, P2)
>> >>>>>
>> >>>>> The sets of ACLs that have sampling enabled in "a" and "b" above are
>> >>>>> usually _not_ disjoint.
>> >>>>>
>> >>>>> To make it less abstract:
>> >>>>> - "a" is a k8s network observability application (for the whole
>> cluster
>> >>>>> and for all network policies - ACLs - in the cluster)
>> >>>>> - "b" is an OVN-Kubernetes module that is used for debugging the
>> traffic
>> >>>>> path in a live cluster, e.g., trying to figure out why a certain
>> type of
>> >>>>> traffic is allowed while there's no network policy that allows it.
>> >>>>>
>> >>>>> They both need to be able to function simultaneously.
>> >>>>
>>
>> I added Nadia in CC, from the ovn-kubernetes team in Red Hat for more
>> context on the usage.
>>
>> >>>> Wouldn't "b" also need to listen on all the ACLs to figure out which
>> >>>> one is allowing the traffic?
>> >>>>
>> >>>
>> >>> I listed that only as an example, "trying to figure out why a certain
>> >>> type of traffic is allowed while there's no network policy that allows
>> >>> it".  But the general use case is to allow focused troubleshooting (of
>> >>> network policy for now) on live production setups.
>> >>>
>> >>> As far as I understand, users of "b" would not do "blanket enabling of
>> >>> sampling" because:
>> >>> - it might kill performance (if "a" is not already running)
>> >>> - they're troubleshooting a specific problem: traffic from pod X is
>> >>> allowed.  So there's no point to blindly sample everything, we can for
>> >>> example sample only ACLs that corresponding to the namespace pod X
>> lives
>> >>> in.  Or even a single specific ACL if there's already a potential
>> >>> suspect network policy.
>> >>>
>> >>>> One reasonable scenario would be for "b" is "I want to know which
>> >>>> traffic is allowed by this one particular ACL", but I'm still not
>> >>>> convinced this is a common use case, because ACL itself says which
>> >>>> traffic it allows and it's usually what you described - we don't
>> >>>
>> >>> It might seem obvious to OVN developers but reading ACLs is not that
>> >>> straight forward for end users.  The match potentially includes
>> address
>> >>> sets, port groups, registers, essentially any OVS field.
>> >>
>> >> Network policies in K8s or security groups in Open Stack provide the
>> >> more high level but essentially the same information, that non OVN
>> >> experts should be able to interpret.
>> >>
>> >>>
>> >>>> know what is happening and need to apply blanket sampling on all of
>> >>>> them to figure out which one is a culprit of the issue.
>> >>>>
>> >>>> Even if we want to listen on a particular set of ACLs, I'm not sure
>> >>>> how bad it is if the collector will receive traffic for other ACLs
>> >>>> as well, it can filter those out.
>> >>>>
>> >>>> We can attach the probability to a collector, it will change the
>> >>>> suggested schema design, but we will still not need to match on
>> >>>> those fields.
>> >>>>
>> >>>
>> >>> I'm not sure I follow this part, the schema in the patch already adds
>> >>> the probability to the Sample_Collector.
>> >>
>> >> I meant my suggested schema, the one I suggested in this thread.
>> >>
>> >>>
>> >>>> Also, we're not providing the same flexibility for drop sampling,
>> >>>> even though it is treated as part of the same functionality.
>> >>>>
>> >>>
>> >>> We're not yet, and maybe we should add it in the future for default
>> drop
>> >>> debugging too.  However, legit (user-defined) drops due to the
>> >>> default-deny network policy rule are implemented through ACLs.  So
>> those
>> >>> are going to be sampled (to multiple collectors if applicable).
>> >>>
>> >>>> And "a" in the real world scenario scenario is likely using local
>> >>>> sampling with psample, so "b" can just tap into psample, filtering
>> >>>> out ACLs it doesn't need, instead of setting a separate collector
>> >>>> for itself.
>> >>>>
>> >>>
>> >>> Except that "a" and "b" might need different probabilities for
>> different
>> >>> ACLs.
>> >>
>> >> I'm not convinced that they'll need high granularity of probabilities.
>> >> I get it that one may want "high" probability for the new traffic
>> >> and "low" for established.  But it feels like two administrators of the
>> >> same OVN setup can find a common ground on what exactly "high" and
>> "low"
>> >> should mean, or allow for the values to be temporarily adjusted for
>> >> the time of debugging.
>> >>
>> >>>
>> >>>> And the optimization that uses those registers is only possible when
>> >>>> there is only one collector configured per ACL, which will not be
>> >>>> the case in the described scenario.
>> >>>>
>> >>>
>> >>> While we want to support the case when "b" is running, "b" - the
>> >>> debugging application - will not be continuously running.  And we're
>> >>> operating under the premise that it won't enable blanket sampling of
>> all
>> >>> ACLs.  So the optimization is still relevant for most ACLs.
>> >>>
>> >>>>
>> >>>> Speaking of matching, don't we also need to match on the value of
>> >>>> probability?  Nothing prevents us from creating two collectors with
>> >>>> the same collector set ID but different probability.  Meaning the
>> >>>> set id and the observation stage do not actually uniquely identify
>> >>>> the sampling configuration.
>> >>>>
>> >>>
>> >>> Maybe we should add [set-id] as index then.
>> >>
>> >> If set-id will be an index, than you'll not be able to sample to the
>> >> same collector with different probabilities, or you'll need to create
>> >> multiple collector sets in OVS, which is not great since the number
>> >> of IDs is very limited on OVN side.
>> >>
>> >> A better option might be to enumerate rows in Sample_Collector table
>> >> and use that index for matching instead of set-id.  It can either be
>> >> an indexed 8-bit column in the table, or northd may track the indexing
>> >> by maintaining the name to index mapping.  This way we'll also not
>> >> need to limit the value of set-id.
>> >
>>
>> Enumerating the values is more work in ovn-northd because the order we
>> get with IDL table iterators is not stable when database contents change.
>>
>> > If we take this route we may also consider limiting the number of rows
>> > in the Sample_Collector table to even lower number.  Will 4-bit index
>> > be enough?  i.e. is 15 collectors enough?  We only need 3 in a common
>> > case, i.e. one of each type.
>> >
>>
>> Maybe 15 values are enough, But I'll let Nadia comment on this, she has
>> more context on the end user story and requirements.
>>
>> > Database can enforce the limit either by having an actual indexed column
>> > or setting the maxRows.
>> >
>> > Also, is there a reason for most of the tables being root tables?
>> > It may be hard to clean them up.
>> >
>>
>> We could make Sample and Sample_Collector non-root but that means the
>> CMS (ovn-kubernetes) will have to take into account that these records
>> might have been garbage collected and potentially have to be recreated
>> when they're needed again.  That's mostly annoying in the case when the
>> record values don't change that much.
>>
>> Maybe it would make sense to make Sample table non-root.  Samples are
>> supposed to be unique per ACL/policy/etc.  So there's no expectation
>> that they're going to be shared/reused too much.
>>
>> On the other hand I'd keep the Sample_Collector table as root because
>> values here are not expected to change (often) throughout the lifetime
>> of the cluster.
>>
>> >>
>> >>>
>> >>>> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely
>> >>>> identify sampling configuration, because it is an indexed field in
>> >>>> the database.  Why can't we use it for matching instead and save
>> >>>> some registers?  It's a single collector use case anyway, so metadata
>> >>>> matches to exactly one collector.
>> >>>>
>> >>>
>> >>> I'm not sure I follow this part.  Are you talking about the
>> >>> sampling-with-registers optimization?  If so, that's patch 9/9.
>> >>
>> >> Yes, I was talking about the optimization in 9/9.  It is relevant
>> >> to this patch because we're limiting the set-id here in order to
>> >> avoid using too much space in 9/9.  So, it's all connected.
>> >>
>> >> But also, see my other reply, we can't actually use the metadata.
>> >>
>>
>> Ack.
>>
>> >>>
>> >>>>>
>> >>>>>> If we use the schema that I suggested above, we may not need to
>> >>>>>> match on the collector set ids, so they will not take space in the
>> >>>>>> registers or conntrack mark/labels.
>> >>>>>>
>> >>>>>
>> >>>>> That's true, but I don't think we can use the schema you suggest.
>> >>>>>
>> >>>>>> Best regards, Ilya Maximets.
>> >>>>>>
>> >>>>>
>> >>>>> Regards,
>> >>>>> Dumitru
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> Regards,
>> Dumitru
>>
>>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to