Hi Ales, Mark, Felix, On 10/21/25 8:39 PM, Mark Michelson wrote: > On Fri, Oct 17, 2025 at 2:36 AM Felix Huettner > <[email protected]> wrote: >> >> On Thu, Oct 16, 2025 at 12:43:13PM +0200, Ales Musil via dev wrote: >>> On Thu, Oct 16, 2025 at 12:09 PM Felix Huettner >>> <[email protected]> wrote: >>> >>>> On Wed, Oct 15, 2025 at 11:12:20AM +0200, Dumitru Ceara via dev wrote: >>>>> Hi Ales, Mark, >>>>> >>>>> On 10/15/25 10:23 AM, Ales Musil wrote: >>>>>> On Fri, Oct 10, 2025 at 9:58 PM Mark Michelson <[email protected]> >>>> wrote: >>>>>> >>>>>>> On Fri, Oct 10, 2025 at 3:43 AM Ales Musil <[email protected]> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 10, 2025 at 2:51 AM Mark Michelson <[email protected]> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Thu, Oct 9, 2025 at 5:25 PM Mark Michelson <[email protected]> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi Ales, >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi Mark, >>>>>>>> >>>>>>>> thank you for the review, but I'm probably missing something here. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> I've been thinking through this change, and I think the idea might >>>>>>> have some flaws. Let's consider an arbitrary engine node B that takes >>>>>>> engine node A as input. In this scenario, engine node B writes to the >>>> SB DB >>>>>>> and engine node A does not. With this change, we mark engine node B as >>>>>>> SB_WRITE, but we do not do the same with node A. >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Why would we need to? A state doesn't depend on SB being writable. >>>>>>>> >>>>>>>>> >>>>>>>>>> We perform an engine run where recomputation is not allowed because >>>>>>> the SB DB is read only. When engine node A runs, since it does not >>>> have >>>>>>> SB_WRITE specified, it is able to recompute its data. However, when >>>> engine >>>>>>> node B runs, it cannot recompute its data. >>>>>>>>>> >>>>>>>>>> Now, what happens during the next engine run? >>>>>>>>>> * Let's say engine node A has no new input data. Engine node A's >>>>>>> state will be EN_UNCHANGED. In this case, engine node B sees that >>>> engine >>>>>>> node A is unchanged, so engine node B does not run. However, B really >>>>>>> should run because in the previous engine node run, A recomputed. In >>>> this >>>>>>> case, B's data is incorrect. >>>>>>>>>> * Let's say engine node A has new data and can process it >>>>>>> incrementally. Engine node A's state will be EN_UPDATED. >>>>>>>>>> * Engine node B may also try to incrementally handle A's >>>> tracked >>>>>>> data. However, the only tracked data that B will see is from the >>>> current >>>>>>> engine run. The data that A recomputed in the previous engine run >>>> will be >>>>>>> unknown to B. In this case, B's data is incorrect. >>>>>>>>>> * Engine node B may not have an input handler for A. In this >>>>>>> case, B will recompute and all will be well. >>>>>>>>>> * Let's say engine node A has new data and cannot process it >>>>>>> incrementally. In this case, A will recompute, and then B will also >>>> likely >>>>>>> recompute. The data integrity is fine in this case. However, A has >>>> had to >>>>>>> recompute twice. Before this change, A would only have recomputed on >>>> the >>>>>>> second engine run. >>>>>>>>>> >>>>>>>>>> In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4 >>>>>>> cases, the data is correct in engine node B, but A has to recompute >>>> twice. >>>>>>> And in 1 of the 4 cases, the data is correct in engine node B and we >>>>>>> successfully avoided an unnecessary recomputation of A. The only time >>>> this >>>>>>> change has a positive effect is if the SB_WRITE node has no input >>>> handlers >>>>>>> for its input nodes, and the input nodes are able to incrementally >>>> process >>>>>>> their data. >>>>>>>>> >>>>>>>>> I've realized that in the scenario I outlined, a couple of those >>>>>>>>> outcomes can't happen. When the first engine run happens, B will >>>> fail >>>>>>>>> to recompute, and the engine run will be canceled. In >>>> ovn-controller, >>>>>>>>> if an engine run is canceled, then we call >>>>>>>>> engine_set_force_recompute_immediate(). This means that the next run >>>>>>>>> will force all nodes to recompute. This means that the second and >>>>>>>>> third outcomes (both based on A being able to incrementally process >>>>>>>>> its data) cannot happen. Also, when engine_force_recompute is true, >>>>>>>>> the state of input nodes is not considered. This means that if A is >>>>>>>>> unchanged, it does not matter. B will still recompute and will >>>>>>>>> therefore have all of A's data. So my concerns about data integrity >>>>>>>>> are not correct. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So none of the outcomes highlighted cannot happen, if B is cancelled >>>>>>>> we will immediately run force recompute loop that will set everything >>>>>>>> straight. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> However, the fact that the engine forces all nodes to recompute >>>> means >>>>>>>>> that A will recompute on both engine runs. This means that with the >>>>>>>>> patch, A will recompute two times across the two engine runs. >>>> Without >>>>>>>>> the patch, A only recomputes once. So I still think the patch is >>>>>>>>> hurting more than it is helping. >>>>>>>> >>>>>>>> >>>>>>>> That's the case only in the worst scenario. Keep in mind that this >>>>>>>> doesn't always happen. So it's like all of the sudden we will do two >>>>>>>> recmputes for certain nodes back to back. In fact I see the opposite >>>>>>>> happening. Only 3 nodes in ovn-controller are marked as SB_WRITE any >>>>>>>> of those nodes might cancel but the rest is free to recompute however >>>>>>>> they like. We saw that there has been cancels for nodes do not >>>>>>>> require SB write and this was harming performance a good example are >>>>>>>> nodes pflow_output and lflow_output. Those are very heavy CPU time >>>>>>>> consumers, now they are free to recompute even if SB is read only >>>>>>>> during the current run. >>>>>>> >>>>>>> Thanks for the explanation Ales. I was coming at this change with much >>>>>>> more familiarity with the northd incremental engine than the >>>>>>> ovn-controller incremental engine. In northd, a lot of nodes are >>>>>>> eventual inputs for en_northd or en_lflow, both of which write to the >>>>>>> SB DB. So in most cases, even if a node doesn't write to the SB DB, >>>>>>> then it eventually will feed into a node that does write to the SB DB >>>>>>> [1]. In my head, it felt like the "common" case was for this to >>>>>>> happen. However, in ovn-controller, a good majority of the nodes do >>>>>>> not write to the SB DB and several of those nodes are never inputs >>>>>>> (directly or indirectly) to nodes that write to the SB DB. As you >>>>>>> pointed out, pflow_output and lflow_output are good examples of this. >>>>>>> >>>>>>> The situation I've outlined is still possible in ovn-controller. Many >>>>>>> nodes are inputs for en_runtime_data. Letting those nodes recompute >>>>>>> when en_runtime_data may eventually be denied the right to do so could >>>>>>> be a waste of CPU resources. >>>>>>> >>>>>>> What if the change were altered a bit: if an engine node is marked as >>>>>>> SB_WRITE, then any nodes that are upstream from that node (i.e. any >>>>>>> node that is a direct or indirect input to that node) is also marked >>>>>>> as SB_WRITE by the incremental engine. Meanwhile, any nodes that are >>>>>>> downstream from SB_WRITE nodes are not. This way, nodes like >>>>>>> pflow_output and lflow_output can recompute even when the SB database >>>>>>> is read-only. But nodes that input into runtime_data, route_exchange, >>>>>>> and garp_rarp can avoid unnecessary recomputes. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>> >>>>>> Your suggestion will unfortunately cause the change to not have any >>>>>> benefit, in other words the state would be the same as on main right >>>>>> now. For example the runtime_data node has several inputs a lot of >>>>>> them are DB nodes e.g. sb_chassis. By marking them also as SB write >>>>>> or "canceled if recompte is not allowed" if you will, would basically >>>>>> mean it would mostly cancel as without the change. The leaf nodes, >>>>>> usually the DB ones are always recomputing. They cannot have any >>>>>> handlers. So I'm afraid that this change would then have the only >>>>>> benefit of documenting which node can write into SB DB. >>>>>> >>>>>> But to check if my theory is correct I did run ovn-heater with a >>>>>> baseline from main collecting the engine stats. Below you can see >>>>>> the results: >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-heavy-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1472 ms >>>>>> avg : 129 ms >>>>>> 95% : 537 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 328312207, Compute: 62608243, Cancel: 2067 >>>>>> (0.0006295836572412308%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-heavy-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1343 ms >>>>>> avg : 130 ms >>>>>> 95% : 577 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 325154128, Compute: 62002527, Cancel: 2138 >>>>>> (0.0006575343247679758%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-heavy-ipv4-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 119 ms >>>>>> avg : 26 ms >>>>>> 95% : 48 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 77772935, Compute: 4420305, Cancel: 627 >>>> (0.0008061930541775233%) >>>>>> >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-heavy-ipv4-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 131 ms >>>>>> avg : 26 ms >>>>>> 95% : 47 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 77918353, Compute: 4434319, Cancel: 533 >>>> (0.0006840493663925366%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-heavy-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1330 ms >>>>>> avg : 134 ms >>>>>> 95% : 611 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 325845839, Compute: 62012707, Cancel: 1724 >>>>>> (0.0005290845527722083%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-heavy-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1376 ms >>>>>> avg : 133 ms >>>>>> 95% : 569 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 324545521, Compute: 61835316, Cancel: 2680 >>>>>> (0.0008257701390369827%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-heavy-ipv6-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 174 ms >>>>>> avg : 26 ms >>>>>> 95% : 48 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 76480809, Compute: 4401000, Cancel: 686 >>>> (0.0008969570392488919%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-heavy-ipv6-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 172 ms >>>>>> avg : 26 ms >>>>>> 95% : 48 ms >>>>>> total : 13750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 78371328, Compute: 4493119, Cancel: 615 >>>> (0.0007847257609313447%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-light-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1991 ms >>>>>> avg : 79 ms >>>>>> 95% : 124 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 1787284835, Compute: 339628410, Cancel: 3089 >>>>>> (0.0001728319929486785%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-light-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 2465 ms >>>>>> avg : 82 ms >>>>>> 95% : 135 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 1744830863, Compute: 334913315, Cancel: 2439 >>>>>> (0.00013978432246472707%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-light-ipv4-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 123 ms >>>>>> avg : 24 ms >>>>>> 95% : 34 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 47999536, Compute: 5072743, Cancel: 630 >>>> (0.001312512687622647%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-light-ipv4-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 129 ms >>>>>> avg : 24 ms >>>>>> 95% : 34 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 48054016, Compute: 5090679, Cancel: 522 >>>> (0.0010862775756348855%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-light-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 2181 ms >>>>>> avg : 79 ms >>>>>> 95% : 141 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 1778682312, Compute: 337995667, Cancel: 1309 >>>>>> (0.00007359380543499778%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-light-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 1993 ms >>>>>> avg : 76 ms >>>>>> 95% : 128 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 1775718929, Compute: 337788420, Cancel: 1527 >>>>>> (0.0000859933390956154%) >>>>>> >>>>>> --------------------------- >>>>>> main-250-density-light-ipv6-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 179 ms >>>>>> avg : 27 ms >>>>>> 95% : 41 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 47954143, Compute: 4608282, Cancel: 728 >>>> (0.0015181170060739068%) >>>>>> >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-250-density-light-ipv6-ic >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 169 ms >>>>>> avg : 27 ms >>>>>> 95% : 41 ms >>>>>> total : 63750 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 46764060, Compute: 4505181, Cancel: 597 >>>> (0.0012766214054126181%) >>>>>> >>>>>> --------------------------- >>>>>> main-500-density-heavy-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 3291 ms >>>>>> avg : 483 ms >>>>>> 95% : 1570 ms >>>>>> total : 27500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 845825451, Compute: 166132940, Cancel: 2524 >>>>>> (0.00029840672174334936%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-500-density-heavy-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 7 ms >>>>>> max : 3223 ms >>>>>> avg : 447 ms >>>>>> 95% : 1504 ms >>>>>> total : 27500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 847419493, Compute: 166570179, Cancel: 2072 >>>>>> (0.0002445070023896653%) >>>>>> >>>>>> --------------------------- >>>>>> main-500-density-heavy-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 11875 ms >>>>>> avg : 784 ms >>>>>> 95% : 2267 ms >>>>>> total : 27500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 857064532, Compute: 167714262, Cancel: 2521 >>>>>> (0.0002941435453077412%) >>>>>> >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-500-density-heavy-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 7 ms >>>>>> max : 7056 ms >>>>>> avg : 598 ms >>>>>> 95% : 2147 ms >>>>>> total : 27500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 865415057, Compute: 169182676, Cancel: 2166 >>>>>> (0.0002502845290800158%) >>>>>> >>>>>> --------------------------- >>>>>> main-500-density-light-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 6962 ms >>>>>> avg : 306 ms >>>>>> 95% : 1389 ms >>>>>> total : 127500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 5688174577, Compute: 1186841715, Cancel: 1468 >>>>>> (0.00002580792801148937%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-500-density-light-ipv4 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 5508 ms >>>>>> avg : 301 ms >>>>>> 95% : 1325 ms >>>>>> total : 127500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 5676845569, Compute: 1184598070, Cancel: 2659 >>>>>> (0.00004683939289312734%) >>>>>> >>>>>> --------------------------- >>>>>> main-500-density-light-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 11675 ms >>>>>> avg : 276 ms >>>>>> 95% : 797 ms >>>>>> total : 127500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 5591661452, Compute: 1162323896, Cancel: 2665 >>>>>> (0.00004766025309073021%) >>>>>> >>>>>> --------------------------- >>>>>> sb-ro-500-density-light-ipv6 >>>>>> --------------------------- >>>>>> ovn-installed: >>>>>> min : 6 ms >>>>>> max : 6673 ms >>>>>> avg : 242 ms >>>>>> 95% : 688 ms >>>>>> total : 127500 >>>>>> failures: 0 >>>>>> >>>>>> IP stats: >>>>>> Recompute: 5779381065, Compute: 1201598403, Cancel: 1728 >>>>>> (0.0000298993954640712%) >>>>>> >>>>>> The numbers caught me a little by surprise, but the important thing >>>>>> is that the percentage of cancels is pretty small in general. So having >>>>>> 2 recomputes the input nodes for SB write nodes won't cause any >>>>>> significant issue. Also on the other side there doesn't seem to be any >>>>>> measurable improvement. However, unless you see a different reason >>>>>> I would still like to get this patch in for the documentation purpose. >>>>>> >>>>>> >>>>> >>>>> Thanks, Ales, for testing this! I agree, even if we don't get a >>>>> significant improvement on the ovn-heater benchmarks it's probably still >>>>> good to get this in for the additional semantic check it implements. >>>>> >>>>> From my perspective: >>>>> >>>>> Acked-by: Dumitru Ceara <[email protected]> >>>>> >>>>> But I'd love to hear from Mark too before this is merged. > > From the perspective of adding semantics to the engine nodes, I'm fine > with this change as well. > > Acked-by: Mark Michelson <[email protected]> >
Thanks, Ales, Mark, Felix for the patch, review and discussion! Applied to main. >>>> >>>> Hi everyone, >>>> >>> >>> Hi Felix, >>> >>> >>>> >>>> i think with runtime_data we the additional interesting behaviour that >>>> binding_run makes a lot of effort handle cases where southbound is >>>> unavailable (or we do not have a transaction for other reasons). >>>> It might be the case that it could actually recompute without >>>> southbound. >>>> >>>> I am not sure if this would actually work. But maybe it improves the >>>> above statistics. >>>> >>> >>> I'm not sure if I follow correctly but is your suggestion that >>> we should revisit binding_run and make it less dependent >>> on SB being writable? Could you please elaborate a bit more? >> >> Hi Ales, >> >> as the code reads to me binding_run might work without the southbound >> being writable at all. That would mean we could also recompute the >> runtime_data engine node even if we do not have a southbound >> transaction. >> >> That would allow us to recompute in a lot more cases without the >> southbound than before. >> >> But i am not sure if: >> 1. binding_run completely works without a southbound transaction, or >> just certain parts of it. >> 2. Allowing runtime_data to recompute would actually reduce the amount >> of canceled runs in the above statistics. > > Given the low percentage of engine runs that were canceled, I don't > know what value we would get in reducing that number further. If the > logic becomes cleaner, then there are secondary benefits, I suppose. > I agree, right now it looks like the change would not be that straightforward and wouldn't improve performance visibly. Let's consider it as a potential follow up though. Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
