Thanks Numan, I’ve forgotten to update description according to new changes. A new patch version v8 was submitted: https://patchwork.ozlabs.org/project/ovn/cover/[email protected]/ Please check it out.
Regards, Vladislav Odintsov On 12 Nov 2021, at 23:07, Numan Siddique <[email protected]<mailto:[email protected]>> wrote: On Thu, Nov 11, 2021 at 2:18 PM Odintsov Vladislav <[email protected]<mailto:[email protected]>> wrote: Hi Han, Numan, I’ve posted a new version of this series [1] and addressed your comments and suggestions. I’ll appreciate if you can take a look on this please and I’d be happy if this can be included in next OVN release. Thanks. Thanks Vladislav, I'll take a look at the patches next week. Numan 1: https://patchwork.ozlabs.org/project/ovn/cover/[email protected]/ Regards, Vladislav Odintsov On 20 Oct 2021, at 10:09, Han Zhou <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Tue, Oct 19, 2021 at 3:21 PM Numan Siddique <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Tue, Oct 19, 2021 at 3:28 AM Han Zhou <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Mon, Oct 18, 2021 at 6:35 AM Odintsov Vladislav <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: regards, Vladislav Odintsov On 16 Oct 2021, at 03:20, Han Zhou <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Fri, Oct 15, 2021 at 2:36 AM Vladislav Odintsov <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Regards, Vladislav Odintsov On 15 Oct 2021, at 08:42, Han Zhou <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Thu, Oct 14, 2021 at 12:58 AM Vladislav Odintsov <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Hi Han, Thanks for the review. Regards, Vladislav Odintsov On 14 Oct 2021, at 08:13, Han Zhou <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: On Tue, Oct 5, 2021 at 1:26 PM Vladislav Odintsov <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: This patch extends Logical Router's routing functionality. Now user may create multiple routing tables within a Logical Router and assign them to Logical Router Ports. Traffic coming from Logical Router Port with assigned route_table is checked against global routes if any (Logical_Router_Static_Routes whith empty route_table field), next against directly connected routes This is not accurate. The "directly connected routes" is NOT after the global routes. Their priority only depends on the prefix length. and then Logical_Router_Static_Routes with same route_table value as in Logical_Router_Port options:route_table field. A new Logical Router ingress table #10 is added - IN_IP_ROUTING_PRE. In this table packets which come from LRPs with configured options:route_table field are checked against inport and in OVS register 7 unique non-zero value identifying route table is written. Then in 11th table IN_IP_ROUTING routes which have non-empty `route_table` field are added with additional match on reg7 value associated with appropriate route_table. Hi Vladislav, First of all, sorry for the delayed review, and thanks for implementing this new feature. I have some questions regarding the feature itself. I remember that we had some discussion earlier for this feature, but it seems I misunderstood the feature you are implementing here. I thought by multiple routing tables you were trying to support something like VRF in physical routers, but this seems to be something different. According to your implementation, instead of assigning LRPs to different routing tables, a LRP can actually participate to both the global routing table and the table with a specific ID. For ingress, global routes are prefered over other routes; for egress (i.e. forwarding to the next hop), it doesn't even enforce any table-id check, so packet routed by a entry of table-X can go out of a LRP with table-id Y. Is my understanding correct about the desired behavior of this feature? Yes, your understanding is correct. This is not VRF. At first glance VRF can be done just by using LR-per-VRF without any code modifications (maybe there are corner cases, but it’s not something I’m currently working on). I agree VRF can be achieved by just separate LRs. I am trying to understand why multiple LRs wouldn't satisfy your use case here. LRP can be optionally assigned to specific routing table name. This means that for LR ingress pipeline packets coming from this specific LRP would be checked against routes with same route_table value within appropriate LR. This is some kind of PBR, analog of "ip rule add iif <interface name> lookup <id>". There is one specific use-case, which requires special handling: directly-connected routes (subnet CIDRs from connected to this LR LRPs). These routes can’t be added manually by user, though routing between LRPs belonging to different routing tables is still needed. So, these routes should be added to global routing table to override routing for LRPs with configured routing tables. If for some reason user wants to prohibit IP connectivity to any LRP (honestly, I can’t imagine, why), it can be done by utilising OVN PBR with drop action or a route with "discard" nexthop. But I’d think in this case whether this LRP is needed in this LR. Also, if user wants to create routes, which apply for all LRPs from all routing tables, it can be done by adding new entries to global routing table. And the third reason to have global routing table - a fully backward-compatible solution. All users, who don’t need multiple routing tables support just continue using static routing in the same manner without any changes. If this is correct, it doesn't seem to be a common/standard requirement (or please educate me if I am wrong). Could you explain a little more about the actual use cases: what kind of real world problems need to be solved by this feature or how are you going to use this. For example, why would a port need to participate in both routing tables? It looks like what you really need is policy routing instead of multiple isolated routing tables. I understand that you already use policy routing to implement ACLs, so it is not convenient to combine other policies (e.g. inport based routing) into the policy routing stage. If that's the case, would it be more generic to support multiple policy routing stages? My concern to the current approach is that it is implemented for a very special use case. It makes the code more complex but when there is a slightly different requirement in the future it becomes insufficient. I am thinking that policy routing seems more flexible and has more potential to be made more generic. Maybe I will have a better understanding when I hear more detailed use cases and considerations from you. I can't agree here in it’s uncommon requirement. This implementation was inspired by AWS Route Tables feature [1]: within a VPC (LR in terms of OVN) user may create multiple routing tables and assign them to different subnets (LRPs) in multiple availability zones. Auto-generated directly-connected routes from LRPs CIDRs are working in the same manner as they do in AWS - apply to all subnets regardless of their association to routing table. GCP has similar behaviour: [2], Azure, I guess, too. Our public cloud (CROC Cloud Platform) supports AWS behaviour [3], so I primarily was oriented on it. Internally we already use this feature and currently it fits our use-case, but again I can't say it is specific. If it is for AWS/GCP alike routing features, then from what I understand what's implemented in this patch is a little different. To implement a VPC model in OVN, I think it is ok to have multiple LRs instead of only one LR for each VPC. So naturally I would use a LR to map to AWS's routing table. In AWS's document (the link you provided), it says: "A subnet can only be associated with one route table at a time" So basically in AWS a subnet is either attached to the default/main route table or a custom table, but not both, right? However, in your use case, a LRP (maps to a subnet) attaches to both "main" and a custom table, which seems not common to me. Or did I miss something? That’s true about AWS, but there is still a bit not accurate about OVN. Global routing table in OVN terms is not that AWS main route table is. Main route table is just a configuration hint for users for implicit route tables association with subnets. Implicitly-associated via main routing table subnets routing functions the same manner as a normal explicit route_table-subnet association. Global routing table in OVN is just a list of routes with higher priority than routes with configured "route_table". I do not offer to configure both tables at the same time. But it is _possible_ to do if required for some reason (for instance to configure some service chaining or just internal VPC services like metadata/another internal APIs, access to another services). Normally, if we talk about AWS Route Table to OVN, it is mostly one-to-one mapping, except "Local route": Example: AWS Route Table rtb-xxx: 172.31.0.0/16<http://172.31.0.0/16>: local route (VPC CIDR for subnets) 0.0.0.0/0<http://0.0.0.0/0>: igw-XXX (internet gateway) AWS Route Table rtb-yyy: 172.31.0.0/16<http://172.31.0.0/16>: local route (VPC CIDR for subnets) 5.5.5.5/32<http://5.5.5.5/32>: instance-xxx This maps to OVN configuration (for one LR with one subnet 172.31.0.0/24<http://172.31.0.0/24>): implicit route (not present in logical router static routes table): 172.31.0.0/24<http://172.31.0.0/24>: LRP-subnet-xxx - has highest priority over other route table-oriented routes and can be threat as placed in global routing table What do you mean by highest priority here? It all depends on the prefix length, right? If you have a static route entry that says: 172.31.0.0./25 -> xyz, then this route will have higher priority than the directly connected subnet. Yes, it is true, but only for routes within “global” routing table. I left this to save previous behaviour. It is impossible to create a static route within any non-global routing table which overrides directly connected routes, because priority for “global” routes is added by 100 (though, it should add >2*128 to support same behavior for IPv6 as you already pointed). Normal static routes: ip_prefix: 0.0.0.0/0<http://0.0.0.0/0>, nexthop: <IP for edge LR’s LRP>, route_table: rtb-xxx ip_prefix: 5.5.5.5/32<http://5.5.5.5/32>, nexthop: <IP of some LSP, which belongs to VM/container via which route is built>, route_table: rtb-yyy I guess, I understood the reason for misunderstanding: the global routing table, which I referred earlier is a routing table which has no value in "route_table" field and directly-connected routes at the same time. Latter have no records in logical router static routes, but I still referred them as a routes from "global routing table". I can think about terminology here, if it’s a root cause for misunderstanding. What do you think? Thanks for explaining. In fact I understand what you mean about "global routing table", but I think you are implementing something different from what AWS provides, primarily because you use a single LR instead of LR per routing table. I understand there are reasons why you want to do it this way, but here I can think of some challenges of your approach. For example, in AWS we could do: route table main: subnet S0: 172.31.0.0/24<http://172.31.0.0/24> routes: 172.31.0.0/16<http://172.31.0.0/16>: local route table R1: subnet S1: 172.31.1.0/24<http://172.31.1.0/24> routes: 172.31.0.0/16<http://172.31.0.0/16>: local 172.31.0.0/24<http://172.31.0.0/24>: <some FW/GW> Wow. It’s a brand-new behaviour for AWS, about which I was not aware, as it appeared 1.5 months ago [1]. Previously, when I was writing this patch series in AWS such static route was impossible to add. You couldn’t add routes, which are the same or more specific than VPC CIDR block. Though in GCP you can’t create same or more specific than subnet route [2]. I think I can try to update my patch to support AWS approach here. Would it be simpler just don't make the global routes higher priority? We could make it clear to the user that if they add a route in any route table that is overlapping with any directly connected subnets, the longer prefix would take precedence. Would it be more flexible to user? But all this is relevant to single LR per VPC (not per route table because of described earlier blocker). Do you think it is reasonable with such inputs? Packet from S1 to S0 will go to FW/GW, where it may be dropped/forwarded to S0/or redirected to something outside of AWS ... While in your approach with OVN, both subnets will be under the same logical router, and with different table IDs assigned to the LRPs and routes. But when a VM under S1 sends a packet to S0, it will go to the destination directly because you are prioritizing the direct-connection routes and they are under the same global route table, and there is no way to force it to go through some FW/GW from the custom route table. You can add a route to achieve this in the global table, because in your design the global routes are still applicable to all LRPs and has higher priority, but it would impact all the LRPs/subnets, while in the AWS design above it is supposed to affect subnets under R1 only. In addition, for my understanding, S0 and S1 in AWS cannot communicate directly, unless there are some specific routes on both route tables to route them to some gateway. In other words, there are some isolations between route tables. However, in your design with OVN, it is more of S0 and S1 can communicate with each other being in different route tables without any specific setup, the default unremovable “Local route” ensures this. User only can override "local" route with only existing subnet’s (more specific than "local route") CIDR to send traffic from S0 to S1 through some specific appliance(s) in S2. But basic IP connectivity always remains. Regardless of route tables subnet associations. So I’m still sure that one LR per VPC suites this scenario better, as I can’t imagine why to place subnets, which don’t need connectivity in one VPC. Maybe I was wrong. I am not really familiar with AWS and I wanted to try this myself but didn't get the time :( If default connectivity for subnets under different route tables is a requirement (instead of the contrary), then I'd agree that one LR per VPC seems better, because it is more like policy routing instead of for isolation. policy routing without isolation at all. I am not saying that your design is wrong, but just saying it implements very different behavior than AWS, and I am trying to understand your real use cases to have a better understanding of the feature you implemented. (You explained that you just wanted the AWS behavior, but it looks not exactly the case). Will fix places where the behaviour has difference with AWS. And thanks you for pointing such places. Well, I think it doesn't have to be all the same as AWS, but yes if AWS design makes more sense. In my opinion having this feature to be implemented using PBR is less convenient and native for users, who are familiar with behaviour for mentioned above public cloud platforms, because configuring routes should be done in routes section. And adding route table property seems native in this route, not in PBR. Moreover, I _think_ using Logical_Router_Static_Route to extend this feature for OVN-Interconnection becomes quite easy comparing to PBR (though, I didn’t try the latter). I agree if it is just AWS -like requirement, PBR is less convenient. I am trying to understand if it can be achieved with separate LRs. If not, what's special about the requirement, and is the current approach providing a solution common enough so that more use cases can also benefit from? Could you clarify a little more? Thanks again. That was our initial approach - to use separate LRs for each Route Table. We rejected that solution because met some difficulties and blocker. See below: Brief topology description if using LRs per Route Table: Imagine 2 subnets in VPC in 1st AZ, one in another AZ. Each subnet in it’s own Route Table (LR). All subnets must have IP connectivity, so we have to somehow interconnect these Route Table LRs. That's exactly the same case for AWS, right? If you want direct connectivity, you would just put them under the same route table in AWS (or same LR in OVN), right? I’m sorry, my example was not good enough. Let’s remove AZ’s from that case as they’re not relevant to discussion. So, we’ve got 2 subnets S0 and S1, and suppose, they’ve got different route tables assigned to them. If we place them in a single LR, assign different route tables, they would have connectivity "out-of-box". Same in AWS: because "local route" is added automatically, can’t be removed or modified and ensures IP connectivity between all VPC subnets. Ok, thanks for explaining. [BLOCKER] It is impossible to place route in route table 1 via VM from subnet assiciated to route table 2 if using per-RTB LR. Because in RTB-2 LR we have to add route from RTB 1 and this breaks route table isolation. Route Table 2 LR will start looking up routes and there could be routes from another route tables. This breaks the idea of having LR per Route Table completely. Here we rejected this solution and moved to adding support for route tables in OVN. Would it be just the same problem in AWS? Why would a user route a subnet of RTB-1 to a VM under RTB-2, and at the same time want route table isolation? With the single LR solution in OVN, you would not get the isolation between the tables, because 1) all LRPs still share the global routing table, 2) for output to nexthop there is no distinction between the output interfaces. No, such problem does not exist. By route tables "isolation" I meant that routing rules configured in one routing table must not affect routing in another one. In this scenario user may want to route traffic to internet from S0 (RTB-0) through instance in S1 (RTB-1). For instance it can be a gateway service (FW, WAF, IPS, etc) in S1, which provides secure access for clients from VPC to the internet with NAT. So, configuration should be: - 0.0.0.0/0<http://0.0.0.0/0> via GW-INSTANCE in RTB-0 - 0.0.0.0/0<http://0.0.0.0/0> via edge LR in RTB-1 So, instances from all subnets with rtb-0 assigned route table will be routed to internet through GW-INSTANCE, which can analyse/drop/do whatever things with traffic. This is a small example to show how different route tables can be used in a single VPC. With multiple OVN LRs per VPC in AZ such topology can’t be created, see: In LR for RTB-0 we can add route 0.0.0.0/0<http://0.0.0.0/0> via <LR0-to-LR1 link IP>; bun in LR for RTB-1 we can only add route to internet through either edge-LR or GW-INSTANCE. Not both. So, this case can’t be implemented currently in OVN. Thanks for the example. It helps understanding your requirements. Why I believe this can be achieved by current OVN with the policy routing feature, I understand as you mentioned earlier you need policy routing for ACL purpose. I also agree that multi-stage policy routing would seem less user-friendly (and maybe also harder to implement), so I support the multi-table feature for the purpose of providing extra policy routing capability. While I still need to take a closer look at the code, I believe it would be helpful to add such example use cases in the documentation for users to better understand the feature. Hope this helps to understand my problem. But some more cons: 1. complex network topology. Interconnecting all LRs even with some transit switch is harder than having one LR and all VPC-related configuration is done in one place (in AZ). 2. Directly-connected routes. In case we have multiple Route Table LRs, we have to add route for each subnet from another LR. In case one LR per VPC all such routes are installed automatically and learnt via ovn-ic. 3. PBR, LBs. It’s much easier to implement PBR and configuring Load Balancers in one LR, than in multiple. 4. There could be very many LRs, LRPs, LSPs (datapaths in SB) - excess database records, huge database growth. 5. Extra client code complexity (updating routes required configuring routes in many LRs on different availibility zones); 5. We couldn’t use ovn-ic routes learning, because VPC requires out-of-box IP connectivity between subnets, and if connect Route Table LRs between AZs, because connecting multiple LRs from different Route Tables would learn routes without binding to route table. This requires additional isolation at the transit switches level. 6. There were some more problems. Here are listed some, which I could refresh in my mind. From our half-of-a-year experience using LR per VPC is very comfortable and it looks quite extendable it terms of network features. Let me know if this makes sense. Thanks for the list of the cons! Most of the cons you listed above seem to apply to the AWS model, too, right? It is ok to have different requirements Honestly, I can’t imagine these bullets to "AWS model". Maybe only ovn-ic, which requires modifications in ovn-ic codebase to support route_tables in routes. But this work is already done. I think the disagreement was mainly from the understanding of the behavior of directly connected subnets between different routing tables. I assume my understanding was wrong for now :) than AWS, but we'd better define it clearly and it would be helpful with some typical use cases that solve specific problems. I have more information now after your explanation, but still not really clear under what scenarios would this feature be used. @Numan Siddique <[email protected]<mailto:[email protected]><mailto:[email protected]>> @Mark Michelson <[email protected]<mailto:[email protected]><mailto:[email protected]>> could you let me know your thoughts on this, too? Would you see similar use cases in Redhat customers? Or does this design match well with your potential use cases? I'd like to be careful when implementing something like this and make sure we are doing it the right way. I don't think there is any use case in ovn-k8s or openstack neutron which would use this feature. Sorry. I've not followed the discussion here closely. What is the tl;dr. @Han Zhou You're fine with this feature and will we be accepting this in OVN ? Yes, after the discussions, I think the requirement is valid and the approach addresses the requirement properly without increasing complexity too much. I had a comment regarding the priority handling. Vladislav mentioned he will update the patch to support similar behavior of AWS that allows users to add routes that overrides directly connected subnet routes. My recommendation is just don't add extra priority to the global routing table, and just let the longer prefix route take precedence. Thanks, Han Thanks Numan One comment here from my side: by "this design" we should talk about "behaviour similar to AWS/other public platforms route tables support". I think that it is a good feature for opensource SDN, which can be futher used by opensource cloud platforms like OpenStack to extend routing capabilities. Agree. Thanks, Han 1: https://aws.amazon.com/blogs/aws/inspect-subnet-to-subnet-traffic-with-amazon-vpc-more-specific-routing/ 2: https://cloud.google.com/vpc/docs/routes#subnet-routes Han I haven't finished reviewing the code yet, but I have one comment regarding adding 100 to the priority of the global routes. For IPv6, the priority range from 0 to 120x2=240, so adding 100 is not enough. It would create overlapping priority ranges, and some table-id specific route entries may override the global routes. Thanks, I’ll dig into this when you finish review. Let me know if I answered your questions or if you have new ones. Again many thanks for your time and digging into this patch series. 1: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Route_Tables.html 2: https://cloud.google.com/vpc/docs/routes#subnet-routes 3: https://docs.cloud.croc.ru/en/services/networks/routetables.html Thanks, Han _______________________________________________ dev mailing list [email protected]<mailto:[email protected]><mailto:[email protected]> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected]<mailto:[email protected]> https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
