Another example just came up: I want to set probability per k8s/OCP feature, some of which are translated into the same OVN object. For example, some ACL-based features: EgressFirewall, NetworkPolicy, AdminNetworkPolicy. I may want to set EgressFirewall probability to 50%, NetworkPolicy to 60%, and AdminNetworkPolicy to 70%. But they all use the same OVN sampling app for ACLs. Different users (observability and debug app from the previous reply) may also have different opinions on per-feature level.
Kind regards, Nadia Pinaeva On Thu, Aug 1, 2024 at 4:40 PM Nadia Pinaeva <[email protected]> wrote: > For now we have at least 2 users (observability app and debug app), which > are independent. > Observability app sets sampling for the whole cluster, and the samples are > visible to all cluster users via GUI. Which OVN sampling app should be > enabled and with which probability is decided based on this cluster needs > and available resources. For example, some "unpopular" or > resource-consuming features may be turned off or used with lower sampling > probability. > Debug app is used by cluster admin when there is a problem. In this mode > they most likely will want to enable all possible sampling apps, but for > specific pods/services. > As a result we may have observability app asking for 50% probability on > all ACLs, and debug app asking for 100% probabililty on a subset of ACLs. > > Observability and debug should not affect each other, as we don't want > observability info available to cluster users to be affected by debugging > activities. > Hopefully this makes sense, but let me know if I can provide more details. > > Kind regards, > Nadia Pinaeva > > On Thu, Aug 1, 2024 at 4:21 PM Dumitru Ceara <[email protected]> wrote: > >> On 8/1/24 15:19, Ilya Maximets wrote: >> > On 8/1/24 15:09, Ilya Maximets wrote: >> >> On 8/1/24 14:26, Dumitru Ceara wrote: >> >>> On 8/1/24 13:58, Ilya Maximets wrote: >> >>>> On 7/31/24 19:38, Dumitru Ceara wrote: >> >>>>> On 7/31/24 18:39, Ilya Maximets wrote: >> >>>>>> On 7/31/24 18:17, Ilya Maximets wrote: >> >>>>>>> On 7/31/24 11:05, Dumitru Ceara wrote: >> >>>>>>>> From: Adrian Moreno <[email protected]> >> >>>>>>>> >> >>>>>>>> Introduce a new table called Sample where per-flow IPFIX >> configuration >> >>>>>>>> can be specified. >> >>>>>>>> Also, reference rows from such table from the ACL table to >> enable the >> >>>>>>>> configuration of ACL sampling. If enabled, northd will add a >> sample >> >>>>>>>> action to each ACL related logical flow. >> >>>>>>>> >> >>>>>>>> Packets that hit stateful ACLs are sampled in different ways >> depending >> >>>>>>>> whether they are initiating a new session or are just forwarded >> on an >> >>>>>>>> existing (already allowed) session. Two new columns >> ("sample_new" and >> >>>>>>>> "sample_est") are added to the ACL table to allow for potentially >> >>>>>>>> different sampling rates for the two cases. >> >>>>>>> >> >>>>>>> Do we actually need to be able to sample each ACL to a different >> set of collectors? >> >>>>> >> >>>>> Yes, please see below. >> >>>>> >> >>>>>>> I mean, is it not enough to have a common configuration per >> sampling type (drop, >> >>>>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we >> do for acl logging? >> >>>>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ? >> >>>>>>> >> >>>>>>> >> >>>>>>> Something like: >> >>>>>>> >> >>>>>>> NB_Global: >> >>>>>>> sample_config: map key enum [drop, acl-new, acl-est] >> >>>>>>> value ref Sample_Collector_Set >> >>>>>>> >> >>>>>>> Sample_Collector_Set: >> >>>>>>> ids: set key integer >> >>>>>>> probability: integer >> >>>>>>> >> >>>>>>> ACL: >> >>>>>>> sample_metadata: map key enum [new, est] >> >>>>>>> value integer >> >>>>>>> >> >>>>>>>> >> >>>>>>>> Note: If an ACL has both sampling enabled and a label associated >> to it >> >>>>>>>> then the label value overrides the observation point ID defined >> in the >> >>>>>>>> sample configuration. This is a side effect of the >> implementation >> >>>>>>>> (observation point IDs are stored in conntrack in the same part >> of the >> >>>>>>>> ct_label where ACL labels are also stored). The two features >> >>>>>>>> (sampling and ACL labels) serve however similar purposes so it's >> not >> >>>>>>>> expected that they're both enabled together. >> >>>>>>>> >> >>>>>>>> When sampling is enabled on an ACL additional logical flows are >> created >> >>>>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in >> the ACL >> >>>>>>>> action stage of the logical pipeline. These additional flows >> match on a >> >>>>>>>> combination of conntrack state values and observation point id >> values >> >>>>>>>> (either against a logical register or against the stored >> ct_label state) >> >>>>>>>> in order to determine whether the packets hitting the ACLs must >> be >> >>>>>>>> sampled or not. This comes with a slight increase in the number >> of >> >>>>>>>> logical flows and in the number of OpenFlow rules. The number of >> >>>>>>>> additional flows _does not_ depend on the original ACL match or >> action. >> >>>>>>>> >> >>>>>>>> New --sample-new and --sample-est optional arguments are added >> to the >> >>>>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types >> of >> >>>>>>>> sampling for ACLs. >> >>>>>>>> >> >>>>>>>> An example workflow of configuring ACL samples is: >> >>>>>>>> # Create Sampling_App mappings for ACL traffic types: >> >>>>>>>> ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \ >> >>>>>>>> id="42" >> >>>>>>>> ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \ >> >>>>>>>> id="43" >> >>>>>>>> # Create two sample collectors, one that samples all packets >> (c1) >> >>>>>>>> # and another one that samples with a probability of 10% (c2): >> >>>>>>>> c1=$(ovn-nbctl create Sample_Collector name=c1 \ >> >>>>>>>> probability=65535 set_id=1) >> >>>>>>>> c2=$(ovn-nbctl create Sample_Collector name=c2 \ >> >>>>>>>> probability=6553 set_id=2) >> >>>>>>>> # Create two sample configurations (for new and for established >> >>>>>>>> # traffic): >> >>>>>>>> s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301) >> >>>>>>>> s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302) >> >>>>>>>> # Create an ingress ACL to allow IP traffic: >> >>>>>>>> ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \ >> >>>>>>>> from-lport 1 "ip" allow-related >> >>>>>>>> >> >>>>>>>> The config above will generate IPFIX samples with: >> >>>>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App >> >>>>>>>> "acl-new-traffic-sampling" config) and observation point id >> >>>>>>>> set to 4301 (Sample s1) for packets that create a new >> >>>>>>>> connection >> >>>>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app >> >>>>>>>> "acl-est-traffic-sampling" config) and observation point id >> >>>>>>>> set to 4302 (Sample s2) for packets that are part of an already >> >>>>>>>> existing connection >> >>>>>>>> >> >>>>>>>> Note: in general, all generated IPFIX sample observation domain >> IDs are >> >>>>>>>> built by ovn-controller in the following way: >> >>>>>>>> The 8 MSB taken from the sample action's obs_domain_id and the >> last 24 >> >>>>>>>> LSB taken from the Southbound logical datapath tunnel_key >> (datapath ID). >> >>>>>>>> >> >>>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305 >> >>>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >> >>>>>>>> Co-authored-by: Ales Musil <[email protected]> >> >>>>>>>> Signed-off-by: Ales Musil <[email protected]> >> >>>>>>>> Co-authored-by: Dumitru Ceara <[email protected]> >> >>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]> >> >>>>>>>> --- >> >>>>>>>> V4: >> >>>>>>>> - added explicit sampling stages >> >>>>>>>> - reduced set_id max supported value >> >>>>>>> >> >>>>>>> I don't get that. Does it end up in the observation domain/point >> >>>>>>> somehow? Or in conntrack mark/label? Sounds strange. If it's >> only >> >>>>>>> in logical flow and OpenFlow actions, then it shouldn't matter >> what >> >>>>>>> the ID is. Or am I missing something? >> >>>>>> >> >>>>>> Hmm, I see that they are indeed passed around via registers and >> >>>>>> committed to conntrack... >> >>>>>> >> >>>>>> But I'm still not sure why we would need to have different >> collectors >> >>>>>> per ACL. We can already differentiate by metadata and sending to >> >>>>>> different collectors on this level sounds like a very niche >> use-case >> >>>>>> that may not even exist. >> >>>>>> >> >>>>> >> >>>>> There actually is a real use case behind this. Two different >> external >> >>>>> actors that observe how traffic is processed by OVN ACLs: >> >>>>> >> >>>>> a. one of them wishes to receive samples of traffic hitting all ACLs >> >>>>> (with a given probability, P1) >> >>>>> b. the other one wishes to receive samples of traffic hitting some >> of >> >>>>> the ACLs (with a different probability, P2) >> >>>>> >> >>>>> The sets of ACLs that have sampling enabled in "a" and "b" above are >> >>>>> usually _not_ disjoint. >> >>>>> >> >>>>> To make it less abstract: >> >>>>> - "a" is a k8s network observability application (for the whole >> cluster >> >>>>> and for all network policies - ACLs - in the cluster) >> >>>>> - "b" is an OVN-Kubernetes module that is used for debugging the >> traffic >> >>>>> path in a live cluster, e.g., trying to figure out why a certain >> type of >> >>>>> traffic is allowed while there's no network policy that allows it. >> >>>>> >> >>>>> They both need to be able to function simultaneously. >> >>>> >> >> I added Nadia in CC, from the ovn-kubernetes team in Red Hat for more >> context on the usage. >> >> >>>> Wouldn't "b" also need to listen on all the ACLs to figure out which >> >>>> one is allowing the traffic? >> >>>> >> >>> >> >>> I listed that only as an example, "trying to figure out why a certain >> >>> type of traffic is allowed while there's no network policy that allows >> >>> it". But the general use case is to allow focused troubleshooting (of >> >>> network policy for now) on live production setups. >> >>> >> >>> As far as I understand, users of "b" would not do "blanket enabling of >> >>> sampling" because: >> >>> - it might kill performance (if "a" is not already running) >> >>> - they're troubleshooting a specific problem: traffic from pod X is >> >>> allowed. So there's no point to blindly sample everything, we can for >> >>> example sample only ACLs that corresponding to the namespace pod X >> lives >> >>> in. Or even a single specific ACL if there's already a potential >> >>> suspect network policy. >> >>> >> >>>> One reasonable scenario would be for "b" is "I want to know which >> >>>> traffic is allowed by this one particular ACL", but I'm still not >> >>>> convinced this is a common use case, because ACL itself says which >> >>>> traffic it allows and it's usually what you described - we don't >> >>> >> >>> It might seem obvious to OVN developers but reading ACLs is not that >> >>> straight forward for end users. The match potentially includes >> address >> >>> sets, port groups, registers, essentially any OVS field. >> >> >> >> Network policies in K8s or security groups in Open Stack provide the >> >> more high level but essentially the same information, that non OVN >> >> experts should be able to interpret. >> >> >> >>> >> >>>> know what is happening and need to apply blanket sampling on all of >> >>>> them to figure out which one is a culprit of the issue. >> >>>> >> >>>> Even if we want to listen on a particular set of ACLs, I'm not sure >> >>>> how bad it is if the collector will receive traffic for other ACLs >> >>>> as well, it can filter those out. >> >>>> >> >>>> We can attach the probability to a collector, it will change the >> >>>> suggested schema design, but we will still not need to match on >> >>>> those fields. >> >>>> >> >>> >> >>> I'm not sure I follow this part, the schema in the patch already adds >> >>> the probability to the Sample_Collector. >> >> >> >> I meant my suggested schema, the one I suggested in this thread. >> >> >> >>> >> >>>> Also, we're not providing the same flexibility for drop sampling, >> >>>> even though it is treated as part of the same functionality. >> >>>> >> >>> >> >>> We're not yet, and maybe we should add it in the future for default >> drop >> >>> debugging too. However, legit (user-defined) drops due to the >> >>> default-deny network policy rule are implemented through ACLs. So >> those >> >>> are going to be sampled (to multiple collectors if applicable). >> >>> >> >>>> And "a" in the real world scenario scenario is likely using local >> >>>> sampling with psample, so "b" can just tap into psample, filtering >> >>>> out ACLs it doesn't need, instead of setting a separate collector >> >>>> for itself. >> >>>> >> >>> >> >>> Except that "a" and "b" might need different probabilities for >> different >> >>> ACLs. >> >> >> >> I'm not convinced that they'll need high granularity of probabilities. >> >> I get it that one may want "high" probability for the new traffic >> >> and "low" for established. But it feels like two administrators of the >> >> same OVN setup can find a common ground on what exactly "high" and >> "low" >> >> should mean, or allow for the values to be temporarily adjusted for >> >> the time of debugging. >> >> >> >>> >> >>>> And the optimization that uses those registers is only possible when >> >>>> there is only one collector configured per ACL, which will not be >> >>>> the case in the described scenario. >> >>>> >> >>> >> >>> While we want to support the case when "b" is running, "b" - the >> >>> debugging application - will not be continuously running. And we're >> >>> operating under the premise that it won't enable blanket sampling of >> all >> >>> ACLs. So the optimization is still relevant for most ACLs. >> >>> >> >>>> >> >>>> Speaking of matching, don't we also need to match on the value of >> >>>> probability? Nothing prevents us from creating two collectors with >> >>>> the same collector set ID but different probability. Meaning the >> >>>> set id and the observation stage do not actually uniquely identify >> >>>> the sampling configuration. >> >>>> >> >>> >> >>> Maybe we should add [set-id] as index then. >> >> >> >> If set-id will be an index, than you'll not be able to sample to the >> >> same collector with different probabilities, or you'll need to create >> >> multiple collector sets in OVS, which is not great since the number >> >> of IDs is very limited on OVN side. >> >> >> >> A better option might be to enumerate rows in Sample_Collector table >> >> and use that index for matching instead of set-id. It can either be >> >> an indexed 8-bit column in the table, or northd may track the indexing >> >> by maintaining the name to index mapping. This way we'll also not >> >> need to limit the value of set-id. >> > >> >> Enumerating the values is more work in ovn-northd because the order we >> get with IDL table iterators is not stable when database contents change. >> >> > If we take this route we may also consider limiting the number of rows >> > in the Sample_Collector table to even lower number. Will 4-bit index >> > be enough? i.e. is 15 collectors enough? We only need 3 in a common >> > case, i.e. one of each type. >> > >> >> Maybe 15 values are enough, But I'll let Nadia comment on this, she has >> more context on the end user story and requirements. >> >> > Database can enforce the limit either by having an actual indexed column >> > or setting the maxRows. >> > >> > Also, is there a reason for most of the tables being root tables? >> > It may be hard to clean them up. >> > >> >> We could make Sample and Sample_Collector non-root but that means the >> CMS (ovn-kubernetes) will have to take into account that these records >> might have been garbage collected and potentially have to be recreated >> when they're needed again. That's mostly annoying in the case when the >> record values don't change that much. >> >> Maybe it would make sense to make Sample table non-root. Samples are >> supposed to be unique per ACL/policy/etc. So there's no expectation >> that they're going to be shared/reused too much. >> >> On the other hand I'd keep the Sample_Collector table as root because >> values here are not expected to change (often) throughout the lifetime >> of the cluster. >> >> >> >> >>> >> >>>> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely >> >>>> identify sampling configuration, because it is an indexed field in >> >>>> the database. Why can't we use it for matching instead and save >> >>>> some registers? It's a single collector use case anyway, so metadata >> >>>> matches to exactly one collector. >> >>>> >> >>> >> >>> I'm not sure I follow this part. Are you talking about the >> >>> sampling-with-registers optimization? If so, that's patch 9/9. >> >> >> >> Yes, I was talking about the optimization in 9/9. It is relevant >> >> to this patch because we're limiting the set-id here in order to >> >> avoid using too much space in 9/9. So, it's all connected. >> >> >> >> But also, see my other reply, we can't actually use the metadata. >> >> >> >> Ack. >> >> >>> >> >>>>> >> >>>>>> If we use the schema that I suggested above, we may not need to >> >>>>>> match on the collector set ids, so they will not take space in the >> >>>>>> registers or conntrack mark/labels. >> >>>>>> >> >>>>> >> >>>>> That's true, but I don't think we can use the schema you suggest. >> >>>>> >> >>>>>> Best regards, Ilya Maximets. >> >>>>>> >> >>>>> >> >>>>> Regards, >> >>>>> Dumitru >> >>>>> >> >>>> >> >>> >> >> >> > >> >> Regards, >> Dumitru >> >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
