On Mon, Dec 09, 2024 at 02:20:03PM +0100, Dumitru Ceara wrote: > On 12/9/24 2:14 PM, [email protected] wrote: > > On Mon, 2024-12-09 at 13:37 +0100, Felix Huettner wrote: > >> On Fri, Dec 06, 2024 at 10:10:21AM +0100, > >> [email protected] wrote: > >>> On Thu, 2024-12-05 at 15:28 +0100, Felix Huettner wrote: > >>>> On Thu, Dec 05, 2024 at 02:08:37PM +0100, > >>>> [email protected] wrote: > >>>>> Hi Felix and Dumitru, > >>>>> Thanks for posting this standalone change Felix. I'm going to > >>>>> take > >>>>> it > >>>>> for a spin and try to integrate it to Frode's original "route > >>>>> leaking" > >>>>> RFC. > >>>>> I have one question/suggestion for the schema that I'll leave > >>>>> below. > >>>>> > >>>>> On Thu, 2024-12-05 at 14:00 +0100, Felix Huettner via dev > >>>>> wrote: > >>>>>> The Route table will be used in the future to coordinate > >>>>>> routing > >>>>>> information between northd and ovn-controller. > >>>>>> Northd will insert routes that should be advertised to the > >>>>>> outside > >>>>>> fabric. > >>>>>> Ovn-controller will insert routes that have been learned from > >>>>>> the > >>>>>> outside fabric. > >>>>>> > >>>>>> Signed-off-by: Felix Huettner <[email protected]> > >>>>>> --- > >>>>>> v2->v3: > >>>>>> * refType is now strong > >>>>>> * fixed typos > >>>>>> > >>>>>> > >>>>>> ovn-sb.ovsschema | 29 ++++++++++++++++-- > >>>>>> ovn-sb.xml | 80 > >>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++ > >>>>>> 2 files changed, 107 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > >>>>>> index 73abf2c8d..88763f429 100644 > >>>>>> --- a/ovn-sb.ovsschema > >>>>>> +++ b/ovn-sb.ovsschema > >>>>>> @@ -1,7 +1,7 @@ > >>>>>> { > >>>>>> "name": "OVN_Southbound", > >>>>>> - "version": "20.37.0", > >>>>>> - "cksum": "1950136776 31493", > >>>>>> + "version": "20.38.0", > >>>>>> + "cksum": "1392903395 32893", > >>>>>> "tables": { > >>>>>> "SB_Global": { > >>>>>> "columns": { > >>>>>> @@ -617,6 +617,31 @@ > >>>>>> "type": {"key": "string", "value": > >>>>>> "string", > >>>>>> "min": 0, "max": > >>>>>> "unlimited"}}}, > >>>>>> "indexes": [["chassis"]], > >>>>>> + "isRoot": true}, > >>>>>> + "Route": { > >>>>>> + "columns": { > >>>>>> + "datapath": > >>>>>> + {"type": {"key": {"type": "uuid", > >>>>>> + "refTable": > >>>>>> "Datapath_Binding"}}}, > >>>>>> + "logical_port": {"type": {"key": {"type": > >>>>>> "uuid", > >>>>>> + > >>>>>> "refTable": > >>>>>> "Port_Binding", > >>>>>> + "refType": > >>>>>> "strong"}}}, > >>>>>> + "ip_prefix": {"type": "string"}, > >>>>>> + "nexthop": {"type": "string"}, > >>>>>> + "tracked_port": {"type": {"key": {"type": > >>>>>> "uuid", > >>>>>> + > >>>>>> "refTable": > >>>>>> "Port_Binding", > >>>>>> + "refType": > >>>>>> "strong"}, > >>>>>> + "min": 0, > >>>>>> + "max": 1}}, > >>>>>> + "type": {"type": {"key": {"type": "string", > >>>>>> + "enum": ["set", > >>>>>> ["advertise", > >>>>>> + > >>>>>> "receive"]]}, > >>>>>> + "min": 1, "max": 1}}, > >>>>> > >>>>> I wonder if this field is necessary. We could use presence or > >>>>> absence > >>>>> of "nexthop" value to determine whether the route is being > >>>>> learned > >>>>> or > >>>>> advertised. I'm not sure if ovsdb is doing something clever > >>>>> with > >>>>> enums > >>>>> or stores them as raw values in each row, but removing this > >>>>> could > >>>>> save > >>>>> us some space. What do you think? > >>>> > >>>> Hi Martin, > >>>> > >>>> i think generally we could use the information in the "nexthop" > >>>> column > >>>> for this. However i would find this rather surprising if i am not > >>>> deeply > >>>> in the implementation. For debugging at least it is not obvious. > >>>> > >>>> I am not sure how much space saving we would get out of this, but > >>>> i > >>>> don't think that it is worth the cost of less debuggability. > >>>> Especially > >>>> since each learned/announced route will cause additionally once > >>>> or > >>>> multiple logical flows to be created which are significantly > >>>> larger > >>>> than > >>>> this type field. > >>>> > >>>> But i am open to other opinions as well. > >>>> > >>>> Thanks a lot > >>>> Felix > >>> > >>> Hi Felix, > >>> I'm not hell-bent on this change and I agree that it would make the > >>> code less explicit. It just occurred to me because I recently saw > >>> similar pattern applied in NAT table [0] where NAT rules is > >>> considered > >>> "distributed" if the record contains both logical_port and > >>> external_nat. > >> > >> Hi Martin, > >> > >> i think a big difference here is that we need to have both ovn-northd > >> and ovn-controller agree to what each of these things mean. > >> In the above example it looked like something only northd needs to > >> know > >> internally to derive the correct flows. > >> > >>> > >>> I don't have enough experience with ovsdb in large deployments, it > >>> was > >>> just a gut feel that potentially saving ~8bytes for a table that > >>> could > >>> have potentially 100s or 1000s of records could be impactful. > >>> There's also the fact that without the server-side validation (i.e. > >>> if > >>> tpye is "advertised" then nexthop must not be empty), it's up to > >>> the > >>> clients to enforce that invalid records are not created. > >> > >> I just spend some time measureing it by adding a few thousand Routes > >> and > >> checking the RSS difference. > >> With the current schema (actually with Dumitru's requested > >> adjustments) > >> we are at ~3230 Byte per route. If we remove the "type" we are at > >> ~2514 > >> Byte. So ~716 Byte difference. > >> > >> If we calculate with 10.000 Routes then we end up at roughly 7MB of > >> difference. > >> I would think this is worth the clarity. > >> > >> Additionally it allows us to later actually use nexthop also for > >> advertised routes. Allthough honestly i don't know a usecase for > >> this. > >> > >> Whats your opinion on that one? > >> > >> Thanks a lot > >> Felix > > > > Hi Felix, > > thanks for running numbers on this. The difference indeed seems > > negligible even on large deployments. You are right that the trade-off > > for removing the "type" column is probably not worth it. > > > > I've put this version (v3) to work on Friday by integrating it to > > fnordahl's original route leaking RFC and one more though occurred to > > me. > > > > Would it be worth to split the `Route` table to `Advertised_Route` and > > `Learned_Route`? I think this could be beneficial because: > > * Both "types" of routes have different "owners". While "Learned > > routes" are primarily managed "from the bottom", by the controller, > > "Advertised routes" are managed "from the top", by northd. Having two > > different components being in charge of keeping the table up to date > > seems to complicate things. > > * "Advertised routes" could do with just following attributes: > > * IP prefix > > * logical_port (strong ref to LRP that's supposed to advertise this > > prefix) > > * With two separate tables, development of learning/advertising would > > be more independent as changing requirements in one feature wouldn't > > affect development of the other. > > > > With the simplified structure of the `Advertised_Route` table that I > > mentioned above, northd would be the only component that adds data to > > this table and records would be either explicitly removed by northd > > (when NAT or LB gets removed) or implicitly when the LRP is removed and > > `logical_port` reference goes away. > > > > What do you think? > > > > Good point, Martin. Now that you mention it it kind of makes sense that > we don't mix Advertised with Learnt. It reduces a bit complexity and > chance of getting I-P wrong in ovn-controller / ovn-northd too. > > +1 from me for separate tables.
Hi Martin, Hi Dumitru, thanks for the comment. This should also make the logic in northd and ovn-controller more simple. I will try to incorporate this and the comments on v2 into my current code to see if i find any kind of issue with that and would afterwards post the next version. Thanks a lot Felix > > Regards, > Dumitru > > > Thanks, > > Martin. > > > >> > >>> > >>> The issue of transparency could be solved with documentation and > >>> abstraction in parsing method (like in the case of the NAT). > >>> > >>> Anyway, as I said, this is not a blocker from my side, just a > >>> suggestion for consideration. > >>> > >>> Martin > >>> > >>> [0] > >>> https://github.com/ovn-org/ovn/blob/3b32b7d08ebcd005681db5131c798da07a89133a/northd/northd.c#L16506-L16530 > >>>> > >>>>> > >>>>> Martin. > >>>>> > >>>>>> + "external_ids": { > >>>>>> + "type": {"key": "string", "value": > >>>>>> "string", > >>>>>> + "min": 0, "max": > >>>>>> "unlimited"}}}, > >>>>>> + "indexes": [["datapath", "logical_port", > >>>>>> "ip_prefix", > >>>>>> "nexthop", > >>>>>> + "type"]], > >>>>>> "isRoot": true} > >>>>>> } > >>>>>> } > >>>>>> diff --git a/ovn-sb.xml b/ovn-sb.xml > >>>>>> index ea4adc1c3..e435cf243 100644 > >>>>>> --- a/ovn-sb.xml > >>>>>> +++ b/ovn-sb.xml > >>>>>> @@ -5217,4 +5217,84 @@ tcp.flags = RST; > >>>>>> The set of variable values for a given chassis. > >>>>>> </column> > >>>>>> </table> > >>>>>> + > >>>>>> + <table name="Route"> > >>>>>> + <p> > >>>>>> + Each record represents a route that is exported from > >>>>>> ovn > >>>>>> or > >>>>>> imported to > >>>>>> + ovn using some dynamic routing logic outside of ovn. > >>>>>> + It is populated on the one hand by <code>ovn- > >>>>>> northd</code> > >>>>>> based on the > >>>>>> + addresses, routes and NAT Entries of a > >>>>>> + <code>OVN_Northbound.Logical_Router_Port</code>. > >>>>>> + On the other hand <code>ovn-controller</code> > >>>>>> populates it > >>>>>> with routes > >>>>>> + learned from outside of OVN. > >>>>>> + </p> > >>>>>> + > >>>>>> + <column name="datapath"> > >>>>>> + The datapath belonging to the > >>>>>> + <code>OVN_Northbound.Logical_Router</code> that this > >>>>>> route > >>>>>> is > >>>>>> valid > >>>>>> + for. > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="logical_port"> > >>>>>> + <p> > >>>>>> + If the type is <code>advertise</code> then this is > >>>>>> the > >>>>>> logical_port > >>>>>> + the router will send packets out. > >>>>>> + </p> > >>>>>> + > >>>>>> + <p> > >>>>>> + If the type is <code>receive</code> then this is the > >>>>>> logical_port > >>>>>> + the route was learned on. > >>>>>> + </p> > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="ip_prefix"> > >>>>>> + <p> > >>>>>> + IP prefix of this route (e.g. 192.168.100.0/24). > >>>>>> + </p> > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="nexthop"> > >>>>>> + <p> > >>>>>> + If the type is <code>advertise</code> then this is > >>>>>> empty. > >>>>>> + </p> > >>>>>> + > >>>>>> + <p> > >>>>>> + If the type is <code>receive</code> then this is the > >>>>>> nexthop > >>>>>> ip we > >>>>>> + from the outside. > >>>>>> + </p> > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="tracked_port"> > >>>>>> + <p> > >>>>>> + Only relevant for type <code>advertise</code>. > >>>>>> + > >>>>>> + In combination with a host <code>ip_prefix</code> > >>>>>> this > >>>>>> tracks the port > >>>>>> + OVN will forward the packets for this destination > >>>>>> to. > >>>>>> + > >>>>>> + An announcing chassis can use this information to > >>>>>> check > >>>>>> if > >>>>>> this > >>>>>> + destination is local and adjust the route priorities > >>>>>> based > >>>>>> on that. > >>>>>> + </p> > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="type"> > >>>>>> + <p> > >>>>>> + If the route is to be exported from OVN to the > >>>>>> outside > >>>>>> network or if > >>>>>> + it is imported from the outside network. > >>>>>> + </p> > >>>>>> + <ul> > >>>>>> + <li> > >>>>>> + <code>advertise</code>: This route should be > >>>>>> advertised to > >>>>>> the > >>>>>> + outside network. > >>>>>> + </li> > >>>>>> + <li> > >>>>>> + <code>receive</code>: This route has been learned > >>>>>> from > >>>>>> the > >>>>>> outside > >>>>>> + network. > >>>>>> + </li> > >>>>>> + </ul> > >>>>>> + </column> > >>>>>> + > >>>>>> + <column name="external_ids"> > >>>>>> + See <em>External IDs</em> at the beginning of this > >>>>>> document. > >>>>>> + </column> > >>>>>> + </table> > >>>>>> </database> > >>>>> > >>>>> _______________________________________________ > >>>>> dev mailing list > >>>>> [email protected] > >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> [email protected] > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
