On 5/13/25 11:06 AM, Trọng Đạt Trần wrote: > Dear Dumitru, > Hi Oscar,
> In the previous days, I’ve performed additional tests to gain better > understanding around the issue before giving you the details. > > Thank you for your earlier explanation, it clarified how conntrack and > sampling work in the simple "|vm1 --- ls --- vm2"| topology. However, I > believe my original observations still hold in router related topologies. > > ------------------------------------------------------------------------ > > > Setup Recap > > *Topology*: vm_a(10.2.1.5) --- ls1 --- router --- ls2 --- vm_b (10.2.3.5) > > ACLs applied to a shared Port Group (|pg_d559...|): > > * > > *ACL A*: |from-lport| – allow-related IPv4 (sample_est = |2000000|) > > * > > *ACL B*: |to-lport| – allow-related ICMP (sample_est = |1000000|) > > *Sample configuration*: > > * ACL A: direction=from-lport, match="inport == @pg && ip4", > sample_est=2000000 > * ACL B: direction=to-lport, match="outport == @pg && ip4 && icmp4", > sample_est=1000000 > > # ovn-nbctl acl-list pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc > from-lport 1002 (inport == > @pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc && ip4) allow-related > to-lport 1002 (outport == > @pg_d559bf91_b95f_49c0_8e4a_bf35f15e1dcc && ip4 && ip4.src == > 0.0.0.0/0 <http://0.0.0.0/0> && icmp4) allow-related > > | > ------------------------------------------------------------------------ > > > Expected Behavior (based on your explanation) > > * > > *First ICMP request*: no sample (ct=new). > > * > > *First ICMP reply*: > > o > > One sample from *ingress pipeline* (sample_est = |1000000|) > > o > > One sample from *egress pipeline* (sample_est = |2000000|) > → *Total: 2 samples* for reply --> True > > ------------------------------------------------------------------------ > > > Actual Behavior Observed > > On the *first ICMP reply*, I see: > > * > > *3 samples total*: > > o > > *2 samples* in the *ingress pipeline*, both with | > obs_point_id=1000000| > > o > > *1 sample* in the egress pipeline, with |obs_point_id=2000000| > > This results in *duplicated sampling actions for a single logical > datapath flow* within the ingress pipeline. > > Evidence: > > # ovs-dpctl dump-flows | grep 10.2.1.5 > recirc_id(0x1d5),in_port(6),ct_state(-new+est-rel+rpl- > inv+trk),ct_mark(0x20020/0xff0031),ct_label(0xf4240000000000000000000000000),eth(src=fa:16:3e:6b:42:8e,dst=fa:16:3e:dd:02:c0),eth_type(0x0800),ipv4(src=10.2.1.5,dst=10.2.3.5,proto=1,ttl=64,frag=no), > packets:299, bytes:29302, used:0.376s, > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554437,obs_point_id=1000000,output_port=4294967295)),userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554437,obs_point_id=1000000,output_port=4294967295)),ct_clear,set(eth(src=fa:16:3e:d5:7b:d1,dst=fa:16:3e:f8:af:7d)),set(ipv4(ttl=63)),ct(zone=21),recirc(0x1d6) > |# recirc_id(0x1d5): two flow_sample(...) actions with same metadata > (1000000) > recirc_id(0x1d6),in_port(6),ct_state(-new+est-rel+rpl- > inv+trk),ct_mark(0x20000/0xff0031),ct_label(0x1e8480000000000000000000000000),eth(dst=fa:16:3e:f8:af:7d),eth_type(0x0800),ipv4(dst=10.2.3.5,frag=no), > packets:299, bytes:29302, used:0.376s, > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554439,obs_point_id=2000000,output_port=4294967295)),9 > | > |# plus one flow_sample(...) later in the pipeline with metadata (2000000)| > > Also confirmed via IPFIX stats: > > # IPFIX before ping > |sampled pkts: 192758 # After a single ping sampled pkts: 192761 → Δ = 3| > > > Additional Findings > > * > > The issue *only occurs* when VMs are on *separate logical switches > connected by a router*. > > * > > If both VMs are on the *same logical switch*, IPFIX is correctly > sampled only once per ACL. > > * > > The duplicated sampling occurs *even if ACL A (IPv4) and ACL C > (IPv6) are unrelated*, as long as both have |sample_est| and belong > to the same Port Group. > > * > > The error can be reproduced *even when only vm_a's Port Group has > the sampling ACLs*. vm_b does not require any sampling configuration > for the issue to occur. > Thanks a lot for the follow up! You're right, this is indeed a bug. And that's because we don't clear the packet's ct_state (well all conntrack related information) when advancing to the egress pipeline of a switch when the outport is one connected to a router. That's due to https://github.com/ovn-org/ovn/commit/d17ece7 where we chose to skip ct_clear if the switch has stateful (allow-related) ACLs: "Also, this patch does not change the behavior for ACLs such as allow-related: packets are still sent to conntrack, even for router ports. While this does not work if router ports are distributed, allow-related ACLs work today on router ports when those ports are handled on the same chassis for ingress and egress traffic. This patch does not change that behavior." On a second look, the above reasoning seems wrong. It doesn't sound OK to rely on conntrack state retrieved from a CT zone that's not assigned to the logical port we're processing the packet on. I'm going to think about the right way to fix this issue and come back to this thread once it's figured out. Thanks again for the bug report! Regards, Dumitru > ------------------------------------------------------------------------ > > > Another Reproducible Scenario (Minimal) > > Port Group A on |vm_a| with: > > * > > ACL A: |from-lport| IP4 (sample_est or not) > > * > > ACL B: |to-lport| ICMP |sample_est=1000000| > > * > > ACL C: |from-lport| IP6 sample_est=2000000 > > Port Group B on |vm_b|: > > * > > No sampling required > > * > > ACL to allow from-lport and to-lport traffic > > When pinging |vm_a| from |vm_b|, the ICMP reply still results in *two > samples with |obs_point_id=1000000|*. > > ------------------------------------------------------------------------ > > > 📌 Key Takeaway > > I believe this confirms the IPFIX duplication issue is *not due to > conntrack behavior*, but rather due to *how multiple ACLs with > sample_est on the same Port Group (in different directions) result in > twice |userspace(flow_sample(...))| actions* in the same flow. > > ------------------------------------------------------------------------ > > > To avoid overloading the email, I’ve included more detailed output > and explanations in the attachment. > > > This email uses formatting elements such as icons, headers, and > dividers for clarity. If you experience any display issues, please > let me know and I’ll avoid using them in future messages. > > > Please tell me if I can run any additional traces. I’m happy to > assist further. > > > Best regards, > > > *Oscar* > > | > > > On Fri, May 9, 2025 at 7:16 PM Dumitru Ceara <dce...@redhat.com > <mailto:dce...@redhat.com>> wrote: > > On 5/9/25 2:14 PM, Dumitru Ceara wrote: > > On 5/9/25 5:38 AM, Trọng Đạt Trần wrote: > >> Hi Dimitru, > >> > > > > Hi Oscar, > > > > > >> Thank you for pointing that out. > >> > >> To clarify: the terms “inbound” and “outbound” in my previous message > >> were used from the *VM’s perspective*. > >> > >> > >> Topology: > >> > >> |vm_a ---- network1 ---- router ---- network2 ---- vm_b | > >> > >> > >> ACLs: > >> > >> * > >> > >> *ACL A*: allow-related VMs to *send* IPv4 traffic (| > direction=from- > >> lport|) > >> > >> * > >> > >> *ACL B*: allow-related VMs to *receive* ICMP traffic (| > direction=to- > >> lport|) > >> > >> I’ve attached both the *Northbound and Southbound database dumps* to > >> ensure the full context is available. > >> > > > > Thanks for the info, I tried locally with a simplified setup where I > > emulate your topology: > > > > switch c9c171ef-849c-436d-b3f9-73d83b9c4e5d (ls) > > port vm2 > > addresses: ["00:00:00:00:00:02"] > > port vm1 > > addresses: ["00:00:00:00:00:01"] > > > > Those two VIFs are in a port group: > > > > # ovn-nbctl list port_group > > _uuid : 7e7a96b9-e708-4eea-b380-018314f2435c > > acls : [1d0e7b71-ff03-4c78-ace4-2448bf237e11, > > 7cb023e9-fee5-4576-a67d-ce1f5d98805b] > > external_ids : {} > > name : pg > > ports : [d991baa6-21b0-4d46-a15d-71b9e8d6708d, > > f2c5679c-d891-4d34-8402-8bc2047fba61] > > > > With two ACLs applied: > > # ovn-nbctl acl-list pg > > from-lport 100 (inport==@pg && ip4) allow-related > > to-lport 200 (outport==@pg && ip4 && icmp4) allow-related > > > > Both ACLs have only sampling for established traffic (sample_est) set: > > # ovn-nbctl list acl > > _uuid : 1d0e7b71-ff03-4c78-ace4-2448bf237e11 > > action : allow-related > > direction : from-lport > > match : "inport==@pg && ip4" > > priority : 100 > > sample_est : 23153fae-0a73-4f86-bdf2-137e76647da8 > > sample_new : [] > > > > _uuid : 7cb023e9-fee5-4576-a67d-ce1f5d98805b > > action : allow-related > > direction : to-lport > > match : "outport==@pg && ip4 && icmp4" > > priority : 200 > > sample_est : 42391c82-23d2-4f2b-a7b9-88afaa68282c > > sample_new : [] > > > > # ovn-nbctl list sample > > _uuid : 23153fae-0a73-4f86-bdf2-137e76647da8 > > collectors : [82540855-dcd4-44e4-8354-e08a972500cd] > > metadata : 2000000 > > > > _uuid : 42391c82-23d2-4f2b-a7b9-88afaa68282c > > collectors : [82540855-dcd4-44e4-8354-e08a972500cd] > > metadata : 1000000 > > > > Then I send a single ICMP echo packet from vm2 towards vm1. The ICMP > > echo hits both ACLs but because it's the packet initiating the session > > doesn't generate a sample (sample_new is not set in the ACLs). > Instead > > 2 conntrack entries are created for the ICMP session: > > > > - one in the CT zone of vm2 - here the from-lport ACL is hit so the > > sample_est metadata of the from-lport ACL (200000) is stored along in > > the conntrack state > > > > - one in the CT zone of vm1 - here the tolport ACL is hit so the > > sample_est metadata of the to-lport ACL (100000) is stored along > in the > > conntrack state > > > > The ICMP echo packet reaches vm1 which replies with ICMP ECHO Reply. > > > > For the reply the CT zone of vm1 is first checked, we match the > existing > > conntrack entry (its state moves to "established") and a sample > for the > > stored metadata, 100000, is generated. Then, in the egress pipeline, > > the CT zone of vm2 is checked, we match the other existing conntrack > > entry (its state also moves to "established") and a sample for the > > stored metadata, 200000, is generated. > > > > This seems correct to me. Stats also seem to confirm that: > > # ip netns exec vm2 ping 42.42.42.2 -c1 > > PING 42.42.42.2 (42.42.42.2) 56(84) bytes of data. > > 64 bytes from 42.42.42.2 <http://42.42.42.2>: icmp_seq=1 ttl=64 > time=1.46 ms > > > > --- 42.42.42.2 ping statistics --- > > 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > rtt min/avg/max/mdev = 1.455/1.455/1.455/0.000 ms > > > > # ovs-ofctl dump-ipfix-flow br-int > > NXST_IPFIX_FLOW reply (xid=0x2): 1 ids > > id 2: flows=2, current flows=0, sampled pkts=2, ipv4 ok=2, ipv6 > > ok=0, tx pkts=11 > > pkts errs=0, ipv4 errs=0, ipv6 errs=0, tx errs=11 > > > > But then, when I increase the number of packets things become more > > interesting. ICMP echos also generate samples. And while that might > > seem like a bug, it's not. :) > > > > When ping sends multiple packets for a single invocation it uses the > > same ICMP ID and just increments the ICMP seq, e.g.: > > > > 14:07:41.986618 00:00:00:00:00:02 > 00:00:00:00:00:01, ethertype IPv4 > > (0x0800), length 98: (tos 0x0, ttl 64, id 58647, offset 0, flags [DF], > > proto ICMP (1), length 84) > > 42.42.42.3 > 42.42.42.2 <http://42.42.42.2>: ICMP echo > request, id 35717, seq 1, length 64 > > > > 14:07:42.988077 00:00:00:00:00:02 > 00:00:00:00:00:01, ethertype IPv4 > > (0x0800), length 98: (tos 0x0, ttl 64, id 59085, offset 0, flags [DF], > > proto ICMP (1), length 84) > > 42.42.42.3 > 42.42.42.2 <http://42.42.42.2>: ICMP echo > request, id 35717, seq 2, length 64 > > > > But conntrack doesn't use the ICMP ID in the key for the session it > > installs: > > Sorry about the typo, I meant to say "conntrack doesn't use the ICMP SEQ > in the key for the session it installs, it only uses the ICMP ID". > > > > > # ovs-appctl dpctl/dump-conntrack | grep 42.42.42 > > > > icmp,orig=(src=42.42.42.3,dst=42.42.42.2,id=35628,type=8,code=0),reply=(src=42.42.42.2,dst=42.42.42.3,id=35628,type=0,code=0),zone=4,mark=131104,labels=0xf4240000000000000000000000000 > > > > icmp,orig=(src=42.42.42.3,dst=42.42.42.2,id=35628,type=8,code=0),reply=(src=42.42.42.2,dst=42.42.42.3,id=35628,type=0,code=0),zone=6,mark=131072,labels=0x1e8480000000000000000000000000 > > > > So, subsequent ICMP requests will match on these two existing > > established entries and (because sampling_est) is configured > samples are > > generated for them too. > > > > That's also visible in the datapath flows that forward packets in the > > "original" direction (ICMP ECHOs in our case): > > > > # ovs-appctl dpctl/dump-flows | grep sample | grep '\-rpl' > > recirc_id(0x29),in_port(3),ct_state(-new+est-rel-rpl- > > inv+trk),ct_mark(0x20000/0xff0071),ct_label(0x1e8480000000000000000000000000),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:01),eth_type(0x0800),ipv4(proto=1,frag=no), > > packets:8, bytes:784, used:2.342s, > > > > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554434,obs_point_id=2000000,output_port=4294967295)),ct(commit,zone=6,mark=0x20000/0xff0071,label=0x1e8480000000000000000000000000/0xffffffffffff00000000000000000000,nat(src)),ct(zone=4),recirc(0x2a) > > > > recirc_id(0x2a),in_port(3),ct_state(-new+est-rel-rpl- > > inv+trk),ct_mark(0x20020/0xff0071),ct_label(0xf4240000000000000000000000000),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:00/ff:ff:00:00:00:00),eth_type(0x0800),ipv4(proto=1,frag=no), > > packets:8, bytes:784, used:2.342s, > > > > actions:userspace(pid=4294967295,flow_sample(probability=65535,collector_set_id=2,obs_domain_id=33554434,obs_point_id=1000000,output_port=4294967295)),ct(commit,zone=4,mark=0x20020/0xff0071,label=0xf4240000000000000000000000000/0xffffffffffff00000000000000000000,nat(src)),1 > > > > So, for a less complicated test, maybe you should try with UDP/TCP > instead. > > > > I hope that clarifies your doubts. > > > > Best regards, > > Dumitru > > > >> Best regards, > >> > >> Oscar > >> > >> > >> On Thu, May 8, 2025 at 8:11 PM Dumitru Ceara <dce...@redhat.com > <mailto:dce...@redhat.com> > >> <mailto:dce...@redhat.com <mailto:dce...@redhat.com>>> wrote: > >> > >> Hi Oscar, > >> > >> On 5/6/25 12:31 PM, Trọng Đạt Trần wrote: > >> > As requested, I’ve attached additional tracing information > related to > >> > the sampling duplication issue. > >> > > >> > * > >> > > >> > The file |ofproto_trace.log| contains the full output > of |ofproto/ > >> > trace| commands. > >> > > >> > * > >> > > >> > The archive |ovn-detrace.tar.gz| includes six separate > files, each > >> > corresponding to an |ovn-detrace| output for a flow I > believe is > >> > involved in the duplicated sampling. > >> > > >> > Since I’m not fully confident in how to use |--ct-next > option|, I’ve > >> > included traces for all six related flows to ensure > completeness. > >> > > >> > Please let me know if you need further details, or if I > should re-run > >> > any commands with additional options. > >> > > >> > >> This seems fairly easy to reproduce locally for > investigation; I didn't > >> try yet though. However, would you mind sharing your OVN NB > database > >> file (I'm assuming this is a test environment)? > >> > >> I would like to make sure we don't have any misunderstanding > because the > >> terms you use below in your ACL description (e.g., > "outbound"/"inbound") > >> are not standard terms. Having the actual ACL (and the rest > of the NB) > >> contents will make it easier to debug. > >> > >> Thanks, > >> Dumitru > >> > >> > Best regards, > >> > > >> > *Oscar* > >> > > >> > > >> > On Tue, May 6, 2025 at 4:15 PM Adrián Moreno > <amore...@redhat.com <mailto:amore...@redhat.com> > >> <mailto:amore...@redhat.com <mailto:amore...@redhat.com>> > >> > <mailto:amore...@redhat.com <mailto:amore...@redhat.com> > <mailto:amore...@redhat.com <mailto:amore...@redhat.com>>>> wrote: > >> > > >> > On Tue, May 06, 2025 at 11:48:07AM +0700, Trọng Đạt > Trần wrote: > >> > > Dear Adrián, > >> > > > >> > > Thank you for your response. I’ve applied your > suggestion to use > >> > separate > >> > > sample entries for each ACL. However, I am still seeing > >> unexpected > >> > behavior > >> > > in the IPFIX output that I’d like to clarify. > >> > > Test Setup (Same as Before) > >> > > > >> > > vm_a ---- network1 ---- router ---- network2 ---- vm_b > >> > > > >> > > > >> > > - > >> > > > >> > > Two ACLs: > >> > > - > >> > > > >> > > ACL A: allow-related *outbound* IPv4 > >> > > - > >> > > > >> > > ACL B: allow-related *inbound* ICMP > >> > > - > >> > > > >> > > ACLs applied symmetrically to both VMs. > >> > > - > >> > > > >> > > Test traffic: ICMP request from vm_b to vm_a, and > reply from > >> > vm_a to vm_b > >> > > . > >> > > > >> > > Key Problem Observed > >> > > > >> > > When sampling is enabled on *both* ACLs, the IPFIX > record for > >> > *flow (3)* > >> > > (the ICMP reply from vm_a → router) shows *120 > packets/min*. > >> > > > >> > > However: > >> > > > >> > > - > >> > > > >> > > If *only ACL B* (inbound ICMP) is sampled → (3) = 60 > >> packets/min > >> > > - > >> > > > >> > > If *only ACL A* (outbound IP4) is sampled → (3) > not present > >> > > - > >> > > > >> > > If both are sampled → (3) = 120 packets/min > >> > > > >> > > This suggests that *flow (3) is being sampled twice* > — even > >> though it > >> > > represents a *single logical flow and matches only > ACL B*. > >> > > IPFIX Observations > >> > > FlowDescriptionExpectedActual > >> > > (1) vm_b → router (ICMP request) 60 pkt/m 60 > >> > > (2) router → vm_a (ICMP request) 60 pkt/m 60 > >> > > (3) vm_a → router (ICMP reply) 60 pkt/m 120 ⚠️ > >> > > (4) router → vm_b (ICMP reply) 60 pkt/m 60 > >> > > >> > This is not what I'd expect, maybe Dumitru knows? > >> > > >> > Could you attach ofproto/trace and ovn-detrce outputs > from both > >> > directions? > >> > > >> > Thanks. > >> > Adrián > >> > > >> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss