On Sat, Oct 8, 2022 at 1:12 PM Dumitru Ceara <[email protected]> wrote: > > On 10/7/22 23:16, Han Zhou wrote: > > On Fri, Oct 7, 2022 at 12:23 PM Ilya Maximets <[email protected]> wrote: > >> > >> On 10/7/22 20:31, Han Zhou wrote: > >>> > >>> > >>> On Fri, Oct 7, 2022 at 10:24 AM Ilya Maximets <[email protected] > > <mailto:[email protected]>> wrote: > >>>> > >>>> On 10/7/22 10:26, Dumitru Ceara wrote: > >>>>> On 10/7/22 04:03, Han Zhou wrote: > >>>>>> On Fri, Sep 30, 2022 at 7:02 AM Dumitru Ceara <[email protected] > > <mailto:[email protected]>> wrote: > >>>>>>> > >>>>>>> Expand SB.Template_Var records in two stages: > >>>>>>> 1. first expand them to local values in match/action strings > >>>>>>> 2. then reparse the expanded strings > >>>>>>> > >>>>>>> For the case when a lflow references a Template_Var also track > >>>>>>> references (similar to the ones maintained for multicast groups, > > address > >>>>>>> sets, port_groups, port bindings). > >>>>>>> > >>>>>>> Signed-off-by: Dumitru Ceara <[email protected] <mailto: > > [email protected]>> > >>>>>> > >>>>>> Hi Dumitru, > >>>>>> > >>>>> > >>>>> Hi Han, > >>>>> > >>>>>> In addition to the two-stage parsing concerns we are discussing in > > the > >>>>> > >>>>> I'm doing some more targetted performance testing. I'll update the > >>>>> other thread when I have more data to share. > >>>>> > >>>>>> other thread, I have some minor comments below. The major one is > > whether we > >>>>>> should allow matching hostname or not. > >>>>>> > >>>>>>> --- > >>>>>>> controller/lflow.c | 59 +++++++- > >>>>>>> controller/lflow.h | 1 > >>>>>>> controller/lport.c | 3 > >>>>>>> controller/ofctrl.c | 9 + > >>>>>>> controller/ofctrl.h | 3 > >>>>>>> controller/ovn-controller.c | 317 > >>>>>> +++++++++++++++++++++++++++++++++++++++++++ > >>>>>>> include/ovn/expr.h | 4 - > >>>>>>> include/ovn/lex.h | 14 +- > >>>>>>> lib/actions.c | 9 + > >>>>>>> lib/expr.c | 18 ++ > >>>>>>> lib/lex.c | 55 +++++++ > >>>>>>> lib/objdep.c | 1 > >>>>>>> lib/objdep.h | 1 > >>>>>>> lib/ovn-util.c | 7 + > >>>>>>> lib/ovn-util.h | 3 > >>>>>>> tests/ovn.at <http://ovn.at> | 2 > >>>>>>> tests/test-ovn.c | 16 ++ > >>>>>>> utilities/ovn-trace.c | 26 +++- > >>>>>>> 18 files changed, 512 insertions(+), 36 deletions(-) > >>>>>>> > >>>> > >>>> ... > >>>> > >>>>>>> +struct ed_type_template_vars { > >>>>>>> + struct local_templates var_table; > >>>>>>> + > >>>>>>> + bool change_tracked; > >>>>>>> + struct sset new; > >>>>>>> + struct sset deleted; > >>>>>>> + struct sset updated; > >>>>>>> +}; > >>>>>>> + > >>>>>>> +static void > >>>>>>> +template_vars_init(const struct sbrec_template_var_table > > *tv_table, > >>>>>>> + const struct sbrec_chassis *chassis, > >>>>>>> + struct local_templates *var_table) > >>>>>>> +{ > >>>>>>> + const struct sbrec_template_var *tv; > >>>>>>> + SBREC_TEMPLATE_VAR_TABLE_FOR_EACH (tv, tv_table) { > >>>>>>> + if (chassis_name_equals(tv->chassis_name, chassis)) { > >>>>>> > >>>>>> I am not sure if it is a good idea to allow using hostname to match > > the > >>>>>> template var name. It provides flexibility to CMS, but we will need > > more > >>>>>> complexity to protect against corner cases. > >>>>>> For example, if there are two records: > >>>>>> r1: name="abc", value="v1" > >>>>>> r2: hostname="abc", value="v2" > >>>>>> > >>>>>> Now with the current logic, whichever is handled later will take > > precedence > >>>>>> (in the local_templates.vars) and the value will be used (assume r2 > > "v2" is > >>>>>> used). This may be fine, because the user should be responsible for > > the > >>>>>> inconsistent configurations. > >>>>>> > >>>>>> Later, when the user detects the problem and wants to correct the > >>>>>> configuration. He/she deletes the r2 and expects the var "abc" to be > >>>>>> expanded as "v1". But the logic in template_vars_update() would call > >>>>>> local_templates_remove() which simply deletes the var ("abc" -> > > "v2") > >>>>>> instead of replacing it with ("abc" -> "v1"). The uuid of "abc" -> > > "v1" > >>>>>> will still be left in the uuidset, which is useless. This is an > > unexpected > >>>>>> behavior. > >>>>>> > >>>>>> Similar behavior would happen if there are duplicate hostnames, > > e.g.: > >>>>>> r1: hostname="abc", value="v1" > >>>>>> r2: hostname="abc", value="v2" > >>>>>> > >>>>> > >>>>> Very good point, nice catch! > >>>> > >>>> In general it might make sense to choose a bit different database > > schema, > >>>> e.g.: > >>>> > >>>> "Chassis_Template_Vars": { > >>>> "columns": { > >>>> "chassis": {"type": "string"}, > >>>> "variables": { > >>>> "type": {"key": "string", "value": "string", > >>>> "min": 0, "max": "unlimited"}}, > >>>> "indexes": [["chassis"]], > >>>> "isRoot": true} > >>>> > >>>> Here 'variables' or 'templates' or whatever you want to call it is a > >>>> Var->Value map. > >>>> > >>>> Index on the 'chassis' column will provide uniqueness of chassis names, > >>>> map has unique keys, so all variable names are unique within a chassis > >>>> as well. This should cover all the possible misconfigurations on the > >>>> database level. > >>>> > >>>> As a bonus we will also save a lot of database space by not storing > >>>> millions of copies of chassis and variable names. May speed up the > >>>> Nb->Sb synchronization as well. > >>>> > >>>> One downside is that we can't have true I-P for that table due to > >>>> inability to track which values in a map actually changed. Though > >>>> it should be possible to figure out the diff in a semi-linear time > >>>> from the number of variables. OpenFlow rules can still be processed > >>>> incrementally after that. So, I'm not sure if that is a big > > performance > >>>> concern. Testing is needed, I guess. > >>>> > >>>> Conditional monitoring will be very simple. Chassis will receive only > >>>> one row in most cases. update2 will cover variable updates. > >>>> > >>>> What do you think? > > Interesting, thanks for the suggestion! > > >>>> > >>>> Best regards, Ilya Maximets. > >>> > >>> Thanks Ilya for the idea. I think it is definitely a good candidate to > > be evaluated, but I am slightly in favor of the original schema, and here > > is why. > >>> Firstly, I think the benefit of avoiding misconfigurations is somehow > > not that important, if we agree that it is good enough to stick with > > "chassis_name" and don't allow hostname matching. > >> > >> To be clear I wasn't arguing about hostnames, I don't think they are > >> needed. It was a suggestion for a general look of a schema regardless. > >> > >>> > >>> So the more important factor to consider is actually the performance. > > On this front, the upside is, as you mentioned, the size of the table would > > be smaller. However, I don't see it as a very strong benefit. The only > > redundant part here is the chassis_name, which is usually an uuid, not too > > short but also not too long. > >> > >> UUID takes 38 bytes. 250node setup with 10K vars will result > >> in about 100 MB overhead. It is noticeable. Not a huge amount, > >> I agree, but fairly noticeable. > >> > >>> I agree that the waste of space for this can be big if there are tens of > > thousands of vars per chassis, but compared with the logical flow table > > size this is still a very small portion of the total size. In addition, the > > DB sync matters mostly at a chassis startup, and then each incremental > > change should be very small. > >> > >> I was thinking about Nb->Sb sync on northd. We don't really have > >> any real I-P there (or did I missed it in the patch set? I didn't > >> look very close), so northd will need to re-check all that on > >> every iteration. Since this scales up very fast with the number > >> of nodes, I'm concerned about northd performance here. > >> > > Ok, makes sense. Sorry that I didn't pay attention that you were talking > > about nb->sb instead of sb->hv. > > > > Chassis template variables are completely unrelated to any other tables > populated by northd. So I think it's (relatively) easy to add I-P in > northd for chassis template var processing. > > >>> > >>> On the other hand, I am more concerned with the OVSDB performance for > > handling the huge map/set column data. I understand that there have been > > lots of improvements made for the mutations, but I wonder if it is still > > less efficient than simple CRUDs for very small rows. > >> > >> Almost. I'm not concerned about performance of ovsdb-server. > >> We're frequently running large scale tests with ovn-heater with > >> tens of thousands load-balancers in a single group, for example, > >> and ovsdb-server is barely loaded. > >> > >>> It *might* also be easier to further optimize the row based operations > > (such as creating index for more efficient conditional monitor updates) > > than for big columns. > >>> > >>> And of course, a strong negative side as you mentioned, is the I-P > > handling, which is definitely important. The current patch shows great > > simplicity in I-P compared with the address-set I-P that had a lot of > > complexity and overhead just for preprocessing and tracking the changes. > >> > >> It's a simple smap comparison that should be linear in time to > >> figure out what changed. OpenFlows will be processed incrementally > >> in the same way as they are in the current patch set. So, I'm > >> not sure if the performance impact will be noticeable. Again, > >> someone probably need to test that. > >> > > For performance, yes, the comparison is linear, but if N is very big, the > > cost may still be high (we have much less tolerance of CPU cost in > > ovn-controller than in ovn-northd). I agree a test would help to understand > > the real impact. > > > > However, hopefully N shouldn't be so big. I had the impression that N would > > be the same as the number of LBs, but with a second thought I believe it > > should be much less if configured properly. For VIP, each nodeport LB for > > the same node should share the same var; for LB port, the worst case is > > that each LB is assigned a port per node, but I wonder CMS should be smart > > enough to just assign the same ports across all nodes thus not even require > > the LB port to be included in the template var. > > > > If that's the case, probably the performance concerns are less important > > for either approach. The code complexity may still be considered. > > > > Looking at the code that generates node port service load balancers in > ovn-kubernetes [0] [1] indeed for the VIP the nodeport LB can use a > single template variable for all load balancers. For backends, on the > other hand, it's a bit more complicated because in some cases > (externalTrafficLocal/internalTrafficLocal) the set of backend IPs is > different per node. So it's not only the LB port that differs. > > In such cases we need a distinct backend template variable too. > > So, for a setup with N nodes and M node-port services, we end up with > (worst case) N x M template variable records in the NB/SB. > > [0] > https://github.com/ovn-org/ovn-kubernetes/blob/61bd7e49d2286bf59b4f2175bef3ba374fb1b99d/go-controller/pkg/ovn/controller/services/load_balancer.go#L79 > [1] > https://github.com/ovn-org/ovn-kubernetes/blob/61bd7e49d2286bf59b4f2175bef3ba374fb1b99d/go-controller/pkg/ovn/controller/services/load_balancer.go#L299 >
Thanks for the pointers. The worst case N x M would be bad, but it seems not really necessary in reality, and may be avoided by CMS. For the cases listed in these ovn-k8s code, most of the LBs should still be able to share backends, except: - backends with host network IPs - only the nodes with such backends should have special set of backends - external/internal traffic policy - only the nodes with backends for the service should have the LB (VIP=node IP, backend=local backend IP) So, N x M would happen only if all services has backends running on all the nodes, which is impossible when M is big (because each node has only limited resources to run limited amount of backends) Thanks, Han > >>> > >>> So, for the above reasons, I tend to stay with the current schema, > > unless > >>> 1) we are sure that handling huge map columns is as efficient as > > separate small rows in OVSDB; > >> > >> Should not be a problem with ovsdb-server 2.17 or higher. > >> > >>> 2) we improve IDL change tracking to support tracking mutations of > > map/set columns gracefully. > >> > >> I'm not sure if smap comparison is a huge problem as I said above, > >> so I don't know. IDL re-parsing might cause issues though. > > > > Yes, thanks for reminding about the IDL cost. > > > >> > >>> Thoughts? > >> > >> Anyway, not a strong opinion on my side. Just a suggestion or > >> something to think about. :) > > > > I don't have a strong opinion either. Probably we should also consider from > > CMS point of view, which way is more client friendly - managing separate > > records for each var or combining them per chassis. @Tim Rozet > > <[email protected]> @Girish Moodalbail <[email protected]> > > I still think in the worst case (many node port services with different > sets of backends per node) the lack of proper I-P and IDL re-parsing > cost in ovn-controller might make a difference. > > Tim, Girish what are worst case scenarios we could expect with > ovn-kubernetes? I.e., how many nodes and how many node port services. > And out of these how many with external/internal traffic policy configured? > > > > > Thanks, > > Han > > > > Thanks, > Dumitru > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
