For now we have at least 2 users (observability app and debug app), which are independent. Observability app sets sampling for the whole cluster, and the samples are visible to all cluster users via GUI. Which OVN sampling app should be enabled and with which probability is decided based on this cluster needs and available resources. For example, some "unpopular" or resource-consuming features may be turned off or used with lower sampling probability. Debug app is used by cluster admin when there is a problem. In this mode they most likely will want to enable all possible sampling apps, but for specific pods/services. As a result we may have observability app asking for 50% probability on all ACLs, and debug app asking for 100% probabililty on a subset of ACLs.
Observability and debug should not affect each other, as we don't want observability info available to cluster users to be affected by debugging activities. Hopefully this makes sense, but let me know if I can provide more details. Kind regards, Nadia Pinaeva On Thu, Aug 1, 2024 at 4:21 PM Dumitru Ceara <[email protected]> wrote: > On 8/1/24 15:19, Ilya Maximets wrote: > > On 8/1/24 15:09, Ilya Maximets wrote: > >> On 8/1/24 14:26, Dumitru Ceara wrote: > >>> On 8/1/24 13:58, Ilya Maximets wrote: > >>>> On 7/31/24 19:38, Dumitru Ceara wrote: > >>>>> On 7/31/24 18:39, Ilya Maximets wrote: > >>>>>> On 7/31/24 18:17, Ilya Maximets wrote: > >>>>>>> On 7/31/24 11:05, Dumitru Ceara wrote: > >>>>>>>> From: Adrian Moreno <[email protected]> > >>>>>>>> > >>>>>>>> Introduce a new table called Sample where per-flow IPFIX > configuration > >>>>>>>> can be specified. > >>>>>>>> Also, reference rows from such table from the ACL table to enable > the > >>>>>>>> configuration of ACL sampling. If enabled, northd will add a > sample > >>>>>>>> action to each ACL related logical flow. > >>>>>>>> > >>>>>>>> Packets that hit stateful ACLs are sampled in different ways > depending > >>>>>>>> whether they are initiating a new session or are just forwarded > on an > >>>>>>>> existing (already allowed) session. Two new columns > ("sample_new" and > >>>>>>>> "sample_est") are added to the ACL table to allow for potentially > >>>>>>>> different sampling rates for the two cases. > >>>>>>> > >>>>>>> Do we actually need to be able to sample each ACL to a different > set of collectors? > >>>>> > >>>>> Yes, please see below. > >>>>> > >>>>>>> I mean, is it not enough to have a common configuration per > sampling type (drop, > >>>>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we > do for acl logging? > >>>>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ? > >>>>>>> > >>>>>>> > >>>>>>> Something like: > >>>>>>> > >>>>>>> NB_Global: > >>>>>>> sample_config: map key enum [drop, acl-new, acl-est] > >>>>>>> value ref Sample_Collector_Set > >>>>>>> > >>>>>>> Sample_Collector_Set: > >>>>>>> ids: set key integer > >>>>>>> probability: integer > >>>>>>> > >>>>>>> ACL: > >>>>>>> sample_metadata: map key enum [new, est] > >>>>>>> value integer > >>>>>>> > >>>>>>>> > >>>>>>>> Note: If an ACL has both sampling enabled and a label associated > to it > >>>>>>>> then the label value overrides the observation point ID defined > in the > >>>>>>>> sample configuration. This is a side effect of the implementation > >>>>>>>> (observation point IDs are stored in conntrack in the same part > of the > >>>>>>>> ct_label where ACL labels are also stored). The two features > >>>>>>>> (sampling and ACL labels) serve however similar purposes so it's > not > >>>>>>>> expected that they're both enabled together. > >>>>>>>> > >>>>>>>> When sampling is enabled on an ACL additional logical flows are > created > >>>>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in > the ACL > >>>>>>>> action stage of the logical pipeline. These additional flows > match on a > >>>>>>>> combination of conntrack state values and observation point id > values > >>>>>>>> (either against a logical register or against the stored ct_label > state) > >>>>>>>> in order to determine whether the packets hitting the ACLs must be > >>>>>>>> sampled or not. This comes with a slight increase in the number > of > >>>>>>>> logical flows and in the number of OpenFlow rules. The number of > >>>>>>>> additional flows _does not_ depend on the original ACL match or > action. > >>>>>>>> > >>>>>>>> New --sample-new and --sample-est optional arguments are added to > the > >>>>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types > of > >>>>>>>> sampling for ACLs. > >>>>>>>> > >>>>>>>> An example workflow of configuring ACL samples is: > >>>>>>>> # Create Sampling_App mappings for ACL traffic types: > >>>>>>>> ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \ > >>>>>>>> id="42" > >>>>>>>> ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \ > >>>>>>>> id="43" > >>>>>>>> # Create two sample collectors, one that samples all packets > (c1) > >>>>>>>> # and another one that samples with a probability of 10% (c2): > >>>>>>>> c1=$(ovn-nbctl create Sample_Collector name=c1 \ > >>>>>>>> probability=65535 set_id=1) > >>>>>>>> c2=$(ovn-nbctl create Sample_Collector name=c2 \ > >>>>>>>> probability=6553 set_id=2) > >>>>>>>> # Create two sample configurations (for new and for established > >>>>>>>> # traffic): > >>>>>>>> s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301) > >>>>>>>> s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302) > >>>>>>>> # Create an ingress ACL to allow IP traffic: > >>>>>>>> ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \ > >>>>>>>> from-lport 1 "ip" allow-related > >>>>>>>> > >>>>>>>> The config above will generate IPFIX samples with: > >>>>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App > >>>>>>>> "acl-new-traffic-sampling" config) and observation point id > >>>>>>>> set to 4301 (Sample s1) for packets that create a new > >>>>>>>> connection > >>>>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app > >>>>>>>> "acl-est-traffic-sampling" config) and observation point id > >>>>>>>> set to 4302 (Sample s2) for packets that are part of an already > >>>>>>>> existing connection > >>>>>>>> > >>>>>>>> Note: in general, all generated IPFIX sample observation domain > IDs are > >>>>>>>> built by ovn-controller in the following way: > >>>>>>>> The 8 MSB taken from the sample action's obs_domain_id and the > last 24 > >>>>>>>> LSB taken from the Southbound logical datapath tunnel_key > (datapath ID). > >>>>>>>> > >>>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305 > >>>>>>>> Signed-off-by: Adrian Moreno <[email protected]> > >>>>>>>> Co-authored-by: Ales Musil <[email protected]> > >>>>>>>> Signed-off-by: Ales Musil <[email protected]> > >>>>>>>> Co-authored-by: Dumitru Ceara <[email protected]> > >>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]> > >>>>>>>> --- > >>>>>>>> V4: > >>>>>>>> - added explicit sampling stages > >>>>>>>> - reduced set_id max supported value > >>>>>>> > >>>>>>> I don't get that. Does it end up in the observation domain/point > >>>>>>> somehow? Or in conntrack mark/label? Sounds strange. If it's > only > >>>>>>> in logical flow and OpenFlow actions, then it shouldn't matter what > >>>>>>> the ID is. Or am I missing something? > >>>>>> > >>>>>> Hmm, I see that they are indeed passed around via registers and > >>>>>> committed to conntrack... > >>>>>> > >>>>>> But I'm still not sure why we would need to have different > collectors > >>>>>> per ACL. We can already differentiate by metadata and sending to > >>>>>> different collectors on this level sounds like a very niche use-case > >>>>>> that may not even exist. > >>>>>> > >>>>> > >>>>> There actually is a real use case behind this. Two different > external > >>>>> actors that observe how traffic is processed by OVN ACLs: > >>>>> > >>>>> a. one of them wishes to receive samples of traffic hitting all ACLs > >>>>> (with a given probability, P1) > >>>>> b. the other one wishes to receive samples of traffic hitting some of > >>>>> the ACLs (with a different probability, P2) > >>>>> > >>>>> The sets of ACLs that have sampling enabled in "a" and "b" above are > >>>>> usually _not_ disjoint. > >>>>> > >>>>> To make it less abstract: > >>>>> - "a" is a k8s network observability application (for the whole > cluster > >>>>> and for all network policies - ACLs - in the cluster) > >>>>> - "b" is an OVN-Kubernetes module that is used for debugging the > traffic > >>>>> path in a live cluster, e.g., trying to figure out why a certain > type of > >>>>> traffic is allowed while there's no network policy that allows it. > >>>>> > >>>>> They both need to be able to function simultaneously. > >>>> > > I added Nadia in CC, from the ovn-kubernetes team in Red Hat for more > context on the usage. > > >>>> Wouldn't "b" also need to listen on all the ACLs to figure out which > >>>> one is allowing the traffic? > >>>> > >>> > >>> I listed that only as an example, "trying to figure out why a certain > >>> type of traffic is allowed while there's no network policy that allows > >>> it". But the general use case is to allow focused troubleshooting (of > >>> network policy for now) on live production setups. > >>> > >>> As far as I understand, users of "b" would not do "blanket enabling of > >>> sampling" because: > >>> - it might kill performance (if "a" is not already running) > >>> - they're troubleshooting a specific problem: traffic from pod X is > >>> allowed. So there's no point to blindly sample everything, we can for > >>> example sample only ACLs that corresponding to the namespace pod X > lives > >>> in. Or even a single specific ACL if there's already a potential > >>> suspect network policy. > >>> > >>>> One reasonable scenario would be for "b" is "I want to know which > >>>> traffic is allowed by this one particular ACL", but I'm still not > >>>> convinced this is a common use case, because ACL itself says which > >>>> traffic it allows and it's usually what you described - we don't > >>> > >>> It might seem obvious to OVN developers but reading ACLs is not that > >>> straight forward for end users. The match potentially includes address > >>> sets, port groups, registers, essentially any OVS field. > >> > >> Network policies in K8s or security groups in Open Stack provide the > >> more high level but essentially the same information, that non OVN > >> experts should be able to interpret. > >> > >>> > >>>> know what is happening and need to apply blanket sampling on all of > >>>> them to figure out which one is a culprit of the issue. > >>>> > >>>> Even if we want to listen on a particular set of ACLs, I'm not sure > >>>> how bad it is if the collector will receive traffic for other ACLs > >>>> as well, it can filter those out. > >>>> > >>>> We can attach the probability to a collector, it will change the > >>>> suggested schema design, but we will still not need to match on > >>>> those fields. > >>>> > >>> > >>> I'm not sure I follow this part, the schema in the patch already adds > >>> the probability to the Sample_Collector. > >> > >> I meant my suggested schema, the one I suggested in this thread. > >> > >>> > >>>> Also, we're not providing the same flexibility for drop sampling, > >>>> even though it is treated as part of the same functionality. > >>>> > >>> > >>> We're not yet, and maybe we should add it in the future for default > drop > >>> debugging too. However, legit (user-defined) drops due to the > >>> default-deny network policy rule are implemented through ACLs. So > those > >>> are going to be sampled (to multiple collectors if applicable). > >>> > >>>> And "a" in the real world scenario scenario is likely using local > >>>> sampling with psample, so "b" can just tap into psample, filtering > >>>> out ACLs it doesn't need, instead of setting a separate collector > >>>> for itself. > >>>> > >>> > >>> Except that "a" and "b" might need different probabilities for > different > >>> ACLs. > >> > >> I'm not convinced that they'll need high granularity of probabilities. > >> I get it that one may want "high" probability for the new traffic > >> and "low" for established. But it feels like two administrators of the > >> same OVN setup can find a common ground on what exactly "high" and "low" > >> should mean, or allow for the values to be temporarily adjusted for > >> the time of debugging. > >> > >>> > >>>> And the optimization that uses those registers is only possible when > >>>> there is only one collector configured per ACL, which will not be > >>>> the case in the described scenario. > >>>> > >>> > >>> While we want to support the case when "b" is running, "b" - the > >>> debugging application - will not be continuously running. And we're > >>> operating under the premise that it won't enable blanket sampling of > all > >>> ACLs. So the optimization is still relevant for most ACLs. > >>> > >>>> > >>>> Speaking of matching, don't we also need to match on the value of > >>>> probability? Nothing prevents us from creating two collectors with > >>>> the same collector set ID but different probability. Meaning the > >>>> set id and the observation stage do not actually uniquely identify > >>>> the sampling configuration. > >>>> > >>> > >>> Maybe we should add [set-id] as index then. > >> > >> If set-id will be an index, than you'll not be able to sample to the > >> same collector with different probabilities, or you'll need to create > >> multiple collector sets in OVS, which is not great since the number > >> of IDs is very limited on OVN side. > >> > >> A better option might be to enumerate rows in Sample_Collector table > >> and use that index for matching instead of set-id. It can either be > >> an indexed 8-bit column in the table, or northd may track the indexing > >> by maintaining the name to index mapping. This way we'll also not > >> need to limit the value of set-id. > > > > Enumerating the values is more work in ovn-northd because the order we > get with IDL table iterators is not stable when database contents change. > > > If we take this route we may also consider limiting the number of rows > > in the Sample_Collector table to even lower number. Will 4-bit index > > be enough? i.e. is 15 collectors enough? We only need 3 in a common > > case, i.e. one of each type. > > > > Maybe 15 values are enough, But I'll let Nadia comment on this, she has > more context on the end user story and requirements. > > > Database can enforce the limit either by having an actual indexed column > > or setting the maxRows. > > > > Also, is there a reason for most of the tables being root tables? > > It may be hard to clean them up. > > > > We could make Sample and Sample_Collector non-root but that means the > CMS (ovn-kubernetes) will have to take into account that these records > might have been garbage collected and potentially have to be recreated > when they're needed again. That's mostly annoying in the case when the > record values don't change that much. > > Maybe it would make sense to make Sample table non-root. Samples are > supposed to be unique per ACL/policy/etc. So there's no expectation > that they're going to be shared/reused too much. > > On the other hand I'd keep the Sample_Collector table as root because > values here are not expected to change (often) throughout the lifetime > of the cluster. > > >> > >>> > >>>> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely > >>>> identify sampling configuration, because it is an indexed field in > >>>> the database. Why can't we use it for matching instead and save > >>>> some registers? It's a single collector use case anyway, so metadata > >>>> matches to exactly one collector. > >>>> > >>> > >>> I'm not sure I follow this part. Are you talking about the > >>> sampling-with-registers optimization? If so, that's patch 9/9. > >> > >> Yes, I was talking about the optimization in 9/9. It is relevant > >> to this patch because we're limiting the set-id here in order to > >> avoid using too much space in 9/9. So, it's all connected. > >> > >> But also, see my other reply, we can't actually use the metadata. > >> > > Ack. > > >>> > >>>>> > >>>>>> If we use the schema that I suggested above, we may not need to > >>>>>> match on the collector set ids, so they will not take space in the > >>>>>> registers or conntrack mark/labels. > >>>>>> > >>>>> > >>>>> That's true, but I don't think we can use the schema you suggest. > >>>>> > >>>>>> Best regards, Ilya Maximets. > >>>>>> > >>>>> > >>>>> Regards, > >>>>> Dumitru > >>>>> > >>>> > >>> > >> > > > > Regards, > Dumitru > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
