On 8/1/24 15:09, Ilya Maximets wrote: > On 8/1/24 14:26, Dumitru Ceara wrote: >> On 8/1/24 13:58, Ilya Maximets wrote: >>> On 7/31/24 19:38, Dumitru Ceara wrote: >>>> On 7/31/24 18:39, Ilya Maximets wrote: >>>>> On 7/31/24 18:17, Ilya Maximets wrote: >>>>>> On 7/31/24 11:05, Dumitru Ceara wrote: >>>>>>> From: Adrian Moreno <[email protected]> >>>>>>> >>>>>>> Introduce a new table called Sample where per-flow IPFIX configuration >>>>>>> can be specified. >>>>>>> Also, reference rows from such table from the ACL table to enable the >>>>>>> configuration of ACL sampling. If enabled, northd will add a sample >>>>>>> action to each ACL related logical flow. >>>>>>> >>>>>>> Packets that hit stateful ACLs are sampled in different ways depending >>>>>>> whether they are initiating a new session or are just forwarded on an >>>>>>> existing (already allowed) session. Two new columns ("sample_new" and >>>>>>> "sample_est") are added to the ACL table to allow for potentially >>>>>>> different sampling rates for the two cases. >>>>>> >>>>>> Do we actually need to be able to sample each ACL to a different set of >>>>>> collectors? >>>> >>>> Yes, please see below. >>>> >>>>>> I mean, is it not enough to have a common configuration per sampling >>>>>> type (drop, >>>>>> acl-new, acl-est) and two boolean fields per ACL to opt-in as we do for >>>>>> acl logging? >>>>>> Or a map ACL:sample = key: enum [new,est], value: metadata ? >>>>>> >>>>>> >>>>>> Something like: >>>>>> >>>>>> NB_Global: >>>>>> sample_config: map key enum [drop, acl-new, acl-est] >>>>>> value ref Sample_Collector_Set >>>>>> >>>>>> Sample_Collector_Set: >>>>>> ids: set key integer >>>>>> probability: integer >>>>>> >>>>>> ACL: >>>>>> sample_metadata: map key enum [new, est] >>>>>> value integer >>>>>> >>>>>>> >>>>>>> Note: If an ACL has both sampling enabled and a label associated to it >>>>>>> then the label value overrides the observation point ID defined in the >>>>>>> sample configuration. This is a side effect of the implementation >>>>>>> (observation point IDs are stored in conntrack in the same part of the >>>>>>> ct_label where ACL labels are also stored). The two features >>>>>>> (sampling and ACL labels) serve however similar purposes so it's not >>>>>>> expected that they're both enabled together. >>>>>>> >>>>>>> When sampling is enabled on an ACL additional logical flows are created >>>>>>> for that ACL (one for stateless ACLs and 3 for stateful ACLs) in the ACL >>>>>>> action stage of the logical pipeline. These additional flows match on a >>>>>>> combination of conntrack state values and observation point id values >>>>>>> (either against a logical register or against the stored ct_label state) >>>>>>> in order to determine whether the packets hitting the ACLs must be >>>>>>> sampled or not. This comes with a slight increase in the number of >>>>>>> logical flows and in the number of OpenFlow rules. The number of >>>>>>> additional flows _does not_ depend on the original ACL match or action. >>>>>>> >>>>>>> New --sample-new and --sample-est optional arguments are added to the >>>>>>> 'ovn-nbctl acl-add' command to allow configuring these new types of >>>>>>> sampling for ACLs. >>>>>>> >>>>>>> An example workflow of configuring ACL samples is: >>>>>>> # Create Sampling_App mappings for ACL traffic types: >>>>>>> ovn-nbctl create Sampling_App name="acl-new-traffic-sampling" \ >>>>>>> id="42" >>>>>>> ovn-nbctl create sampling_app name="acl-est-traffic-sampling" \ >>>>>>> id="43" >>>>>>> # Create two sample collectors, one that samples all packets (c1) >>>>>>> # and another one that samples with a probability of 10% (c2): >>>>>>> c1=$(ovn-nbctl create Sample_Collector name=c1 \ >>>>>>> probability=65535 set_id=1) >>>>>>> c2=$(ovn-nbctl create Sample_Collector name=c2 \ >>>>>>> probability=6553 set_id=2) >>>>>>> # Create two sample configurations (for new and for established >>>>>>> # traffic): >>>>>>> s1=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4301) >>>>>>> s2=$(ovn-nbctl create sample collector="$c1 $c2" metadata=4302) >>>>>>> # Create an ingress ACL to allow IP traffic: >>>>>>> ovn-nbctl --sample-new=$s1 --sample-est=$s2 acl-add ls \ >>>>>>> from-lport 1 "ip" allow-related >>>>>>> >>>>>>> The config above will generate IPFIX samples with: >>>>>>> - 8 MSB of observation domain id set to 42 (Sampling_App >>>>>>> "acl-new-traffic-sampling" config) and observation point id >>>>>>> set to 4301 (Sample s1) for packets that create a new >>>>>>> connection >>>>>>> - 8 MSB of observation domain id set to 43 (Sampling_app >>>>>>> "acl-est-traffic-sampling" config) and observation point id >>>>>>> set to 4302 (Sample s2) for packets that are part of an already >>>>>>> existing connection >>>>>>> >>>>>>> Note: in general, all generated IPFIX sample observation domain IDs are >>>>>>> built by ovn-controller in the following way: >>>>>>> The 8 MSB taken from the sample action's obs_domain_id and the last 24 >>>>>>> LSB taken from the Southbound logical datapath tunnel_key (datapath ID). >>>>>>> >>>>>>> Reported-at: https://issues.redhat.com/browse/FDP-305 >>>>>>> Signed-off-by: Adrian Moreno <[email protected]> >>>>>>> Co-authored-by: Ales Musil <[email protected]> >>>>>>> Signed-off-by: Ales Musil <[email protected]> >>>>>>> Co-authored-by: Dumitru Ceara <[email protected]> >>>>>>> Signed-off-by: Dumitru Ceara <[email protected]> >>>>>>> --- >>>>>>> V4: >>>>>>> - added explicit sampling stages >>>>>>> - reduced set_id max supported value >>>>>> >>>>>> I don't get that. Does it end up in the observation domain/point >>>>>> somehow? Or in conntrack mark/label? Sounds strange. If it's only >>>>>> in logical flow and OpenFlow actions, then it shouldn't matter what >>>>>> the ID is. Or am I missing something? >>>>> >>>>> Hmm, I see that they are indeed passed around via registers and >>>>> committed to conntrack... >>>>> >>>>> But I'm still not sure why we would need to have different collectors >>>>> per ACL. We can already differentiate by metadata and sending to >>>>> different collectors on this level sounds like a very niche use-case >>>>> that may not even exist. >>>>> >>>> >>>> There actually is a real use case behind this. Two different external >>>> actors that observe how traffic is processed by OVN ACLs: >>>> >>>> a. one of them wishes to receive samples of traffic hitting all ACLs >>>> (with a given probability, P1) >>>> b. the other one wishes to receive samples of traffic hitting some of >>>> the ACLs (with a different probability, P2) >>>> >>>> The sets of ACLs that have sampling enabled in "a" and "b" above are >>>> usually _not_ disjoint. >>>> >>>> To make it less abstract: >>>> - "a" is a k8s network observability application (for the whole cluster >>>> and for all network policies - ACLs - in the cluster) >>>> - "b" is an OVN-Kubernetes module that is used for debugging the traffic >>>> path in a live cluster, e.g., trying to figure out why a certain type of >>>> traffic is allowed while there's no network policy that allows it. >>>> >>>> They both need to be able to function simultaneously. >>> >>> Wouldn't "b" also need to listen on all the ACLs to figure out which >>> one is allowing the traffic? >>> >> >> I listed that only as an example, "trying to figure out why a certain >> type of traffic is allowed while there's no network policy that allows >> it". But the general use case is to allow focused troubleshooting (of >> network policy for now) on live production setups. >> >> As far as I understand, users of "b" would not do "blanket enabling of >> sampling" because: >> - it might kill performance (if "a" is not already running) >> - they're troubleshooting a specific problem: traffic from pod X is >> allowed. So there's no point to blindly sample everything, we can for >> example sample only ACLs that corresponding to the namespace pod X lives >> in. Or even a single specific ACL if there's already a potential >> suspect network policy. >> >>> One reasonable scenario would be for "b" is "I want to know which >>> traffic is allowed by this one particular ACL", but I'm still not >>> convinced this is a common use case, because ACL itself says which >>> traffic it allows and it's usually what you described - we don't >> >> It might seem obvious to OVN developers but reading ACLs is not that >> straight forward for end users. The match potentially includes address >> sets, port groups, registers, essentially any OVS field. > > Network policies in K8s or security groups in Open Stack provide the > more high level but essentially the same information, that non OVN > experts should be able to interpret. > >> >>> know what is happening and need to apply blanket sampling on all of >>> them to figure out which one is a culprit of the issue. >>> >>> Even if we want to listen on a particular set of ACLs, I'm not sure >>> how bad it is if the collector will receive traffic for other ACLs >>> as well, it can filter those out. >>> >>> We can attach the probability to a collector, it will change the >>> suggested schema design, but we will still not need to match on >>> those fields. >>> >> >> I'm not sure I follow this part, the schema in the patch already adds >> the probability to the Sample_Collector. > > I meant my suggested schema, the one I suggested in this thread. > >> >>> Also, we're not providing the same flexibility for drop sampling, >>> even though it is treated as part of the same functionality. >>> >> >> We're not yet, and maybe we should add it in the future for default drop >> debugging too. However, legit (user-defined) drops due to the >> default-deny network policy rule are implemented through ACLs. So those >> are going to be sampled (to multiple collectors if applicable). >> >>> And "a" in the real world scenario scenario is likely using local >>> sampling with psample, so "b" can just tap into psample, filtering >>> out ACLs it doesn't need, instead of setting a separate collector >>> for itself. >>> >> >> Except that "a" and "b" might need different probabilities for different >> ACLs. > > I'm not convinced that they'll need high granularity of probabilities. > I get it that one may want "high" probability for the new traffic > and "low" for established. But it feels like two administrators of the > same OVN setup can find a common ground on what exactly "high" and "low" > should mean, or allow for the values to be temporarily adjusted for > the time of debugging. > >> >>> And the optimization that uses those registers is only possible when >>> there is only one collector configured per ACL, which will not be >>> the case in the described scenario. >>> >> >> While we want to support the case when "b" is running, "b" - the >> debugging application - will not be continuously running. And we're >> operating under the premise that it won't enable blanket sampling of all >> ACLs. So the optimization is still relevant for most ACLs. >> >>> >>> Speaking of matching, don't we also need to match on the value of >>> probability? Nothing prevents us from creating two collectors with >>> the same collector set ID but different probability. Meaning the >>> set id and the observation stage do not actually uniquely identify >>> the sampling configuration. >>> >> >> Maybe we should add [set-id] as index then. > > If set-id will be an index, than you'll not be able to sample to the > same collector with different probabilities, or you'll need to create > multiple collector sets in OVS, which is not great since the number > of IDs is very limited on OVN side. > > A better option might be to enumerate rows in Sample_Collector table > and use that index for matching instead of set-id. It can either be > an indexed 8-bit column in the table, or northd may track the indexing > by maintaining the name to index mapping. This way we'll also not > need to limit the value of set-id.
If we take this route we may also consider limiting the number of rows in the Sample_Collector table to even lower number. Will 4-bit index be enough? i.e. is 15 collectors enough? We only need 3 in a common case, i.e. one of each type. Database can enforce the limit either by having an actual indexed column or setting the maxRows. Also, is there a reason for most of the tables being root tables? It may be hard to clean them up. > >> >>> OTOH, observation point id (a.k.a. Sample->metadata) does uniquely >>> identify sampling configuration, because it is an indexed field in >>> the database. Why can't we use it for matching instead and save >>> some registers? It's a single collector use case anyway, so metadata >>> matches to exactly one collector. >>> >> >> I'm not sure I follow this part. Are you talking about the >> sampling-with-registers optimization? If so, that's patch 9/9. > > Yes, I was talking about the optimization in 9/9. It is relevant > to this patch because we're limiting the set-id here in order to > avoid using too much space in 9/9. So, it's all connected. > > But also, see my other reply, we can't actually use the metadata. > >> >>>> >>>>> If we use the schema that I suggested above, we may not need to >>>>> match on the collector set ids, so they will not take space in the >>>>> registers or conntrack mark/labels. >>>>> >>>> >>>> That's true, but I don't think we can use the schema you suggest. >>>> >>>>> Best regards, Ilya Maximets. >>>>> >>>> >>>> Regards, >>>> Dumitru >>>> >>> >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
