On 6/16/26 6:58 AM, Rosemarie O'Riorden via dev wrote:
> This commit adds a test that developers can run to check if the changes
> they've made cause a regression in the performance of ovn-controller or
> ovn-northd.
> 
> Run from inside the sandbox and compare two commits' total script run
> time and peak memory usage for each process.
> 

Hi Rosemarie,

Thanks for the patch, I think this is something we could really use in OVN!

> This test creates a topology consisting of logical switches and routers,
> ACLs, port groups, NAT, DHCP, DNS, QoS, routing policies, and load
> balancers. The user can also provide their own database file for the
> benchmark.
> 
> * To run the test with defaults (200 nodes, track northd and controller):
>     ./ovn-benchmark.sh
> * To run with 50 nodes: ./ovn-benchmark.sh 50
> * To run with 30 nodes and only track ovn-northd:
>     ./ovn-benchmark.sh 30 northd
> * To run with a custom db file: ./ovn-benchmark.sh -f file.db
> * To run with debug info printed: ./ovn-benchmark.sh --debug
> * To see detailed usage instructions: ./ovn-benchmark.sh --help
>

Maybe we should add this information to the "Performance testing"
section of testing.rst, what do you think?

Also it might be worth documenting that if we run "ovn-benchmark.sh -f
file.db" we're essentially squashing the whole file.db into a single
transaction and might yield completely different results than if we run
ovn-benchmark.sh without an explicit DB file - in that case individual
transactions configure things in the NB.
> Each "node" is a router-switch pair with associated features:
>  - 1 gateway router (with NAT, static routes, routing policies)
>  - 1 logical switch (with 9 ports by default, DHCP, DNS, QoS)
>  - 5 load balancers (each with 5 backends)
>  - ACLs, port groups, and address sets (shared across all nodes)
> 
> For example, 200 nodes creates:
>  - 200 routers + 200 switches
>  - 1,800 logical switch ports (9 per switch)
>  - 1,000 load balancers (5 per node)
>  - 600 NAT rules, 600 static routes, 400 routing policies
>  - Plus ACLs, DHCP/DNS records, QoS rules, etc.
> 
> ovn-benchmark.py creates the topology and ovn-benchmark.sh is the
> wrapper that tracks peak memory and execution time. ovn-benchmark.py was
> built off of ovn-lb-benchmark.py as a starting point.
> 
> Reported-at: https://redhat.atlassian.net/browse/FDP-978
> Assisted-by: Claude Sonnet 4.5, Claude Code
> Signed-off-by: Rosemarie O'Riorden <[email protected]>
> ---
>  tutorial/automake.mk      |   4 +-
>  tutorial/ovn-benchmark.py | 659 ++++++++++++++++++++++++++++++++++++++
>  tutorial/ovn-benchmark.sh | 200 ++++++++++++
>  3 files changed, 862 insertions(+), 1 deletion(-)
>  create mode 100644 tutorial/ovn-benchmark.py
>  create mode 100755 tutorial/ovn-benchmark.sh
> 
> diff --git a/tutorial/automake.mk b/tutorial/automake.mk
> index 631208639..fdea2449c 100644
> --- a/tutorial/automake.mk
> +++ b/tutorial/automake.mk
> @@ -2,7 +2,9 @@ EXTRA_DIST += \
>       tutorial/ovn-sandbox \
>       tutorial/ovn-setup.sh \
>       tutorial/ovn-lb-benchmark.sh \
> -     tutorial/ovn-lb-benchmark.py
> +     tutorial/ovn-lb-benchmark.py \
> +     tutorial/ovn-benchmark.sh \
> +     tutorial/ovn-benchmark.py
>  sandbox: all
>       cd $(srcdir)/tutorial && MAKE=$(MAKE) HAVE_OPENSSL=$(HAVE_OPENSSL) \
>               ./ovn-sandbox -b $(abs_builddir) --ovs-src $(ovs_srcdir) 
> --ovs-build $(ovs_builddir) $(SANDBOXFLAGS)
> diff --git a/tutorial/ovn-benchmark.py b/tutorial/ovn-benchmark.py
> new file mode 100644
> index 000000000..98cdfd14e
> --- /dev/null
> +++ b/tutorial/ovn-benchmark.py
> @@ -0,0 +1,659 @@
> +#!/usr/bin/env python3
> +"""OVN memory regression testing tool.
> +
> +Creates a broad OVN topology to detect memory regressions between commits.
> +Designed to be run via ovn-benchmark.sh. (Run ./ovn-benchmark.sh --help to 
> see
> +usage).
> +
> +Topology created (for n nodes):
> +  - n gateway routers with NAT, static routes, routing policies
> +  - n logical switches with configurable ports per switch
> +  - Security: Address sets, port groups, ACLs, port security
> +  - Services: DHCP, DNS, load balancers
> +  - QoS: Bandwidth limiting, DSCP marking
> +

If not too much work, can we add a small ascii diagram here to show how
the switches and routers are connected?  There's also a "cluster" router
being added that connects all "node" logical switches together.  Also a
"join" switch that connects all the gateway routers together.

This simulates the (old) default ovn-kubernetes topology.  I know it's
also what ovn-heater does and it's probably fine to follow that pattern
as it's the "worst case scenario".

But, with ovn-k's UDN support in mind, I think I'd change it a bit.

Your ovn-benchmark.py script currently "binds" each cluster router
c2s-$i port to chassis-$i.  It also binds gateway router lr-$i to
chassis-$i.

What if we change it so that each chassis gets a partition of the
logical network?  We'd probably need another argument saying how many
switches/routers are handled by a given chassis, e.g., B(atch).  So then
we could have:

- chassis-0 where c2s-[0..B) and lr-[0..B) are bound
- chassis-1 where c2s-[B..2B) and lr-[B..2B) are bound
...
- chassis-x where c2s-[xB..(x+1)B) and lr-[xB..(x+1)B) are bound

We can probably choose a reasonable default, e.g.:

B = (DEFAULT_NODES / 10)

In my opinion, that would make it even more useful when asssesing
changes impact on ovn-controller resource usage.

> +Note: Uses explicit (non-templated) load balancers to maximize memory usage 
> for
> +regression testing. For templated LB testing, see ovn-lb-benchmark.py.
> +"""
> +
> +import argparse
> +import sys
> +
> +import ovs.db.idl
> +import ovs.jsonrpc
> +import ovs.poller
> +import ovs.stream
> +import ovs.vlog
> +from ovs.db import error
> +
> +vlog = ovs.vlog.Vlog('ovn-benchmark')
> +vlog.set_levels_from_string('console:warn')
> +vlog.init(None)
> +
> +SCHEMA = '../ovn-nb.ovsschema'
> +
> +
> +def die(msg):
> +    sys.stderr.write(f'\nError: {msg}\n')
> +    sys.exit(1)
> +
> +

Quite a few of the create*() functions below work with at most n == 255.
 Otherwise they generate invalid IPs.

We know of users that have topologies with more than 255 nodes (or
routers/switches).  Maybe we should update these functions to support 16
bit values and write two bytes of the IP addresses.

> +def create_address_sets(idl, n):
> +    """Create address sets for security groups."""
> +    vlog.info('Creating address sets')
> +    txn = ovs.db.idl.Transaction(idl)
> +
> +    web_as = txn.insert(idl.tables['Address_Set'])
> +    web_as.name = 'web_servers'
> +    web_as.addresses = [f'10.{i}.1.10' for i in range(n)]
> +
> +    db_as = txn.insert(idl.tables['Address_Set'])
> +    db_as.name = 'db_servers'
> +    db_as.addresses = [f'10.{i}.1.20' for i in range(n)]
> +
> +    app_as = txn.insert(idl.tables['Address_Set'])
> +    app_as.name = 'app_servers'
> +    app_as.addresses = [f'10.{i}.1.30' for i in range(n)]
> +
> +    trusted_as = txn.insert(idl.tables['Address_Set'])
> +    trusted_as.name = 'trusted_networks'
> +    trusted_as.addresses = ['192.168.0.0/16', '172.16.0.0/12']
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to create address sets ({txn.get_error()})')
> +
> +
> +def create_port_groups(idl, n):
> +    """Create port groups for security group implementation."""
> +    vlog.info('Creating port groups')
> +    txn = ovs.db.idl.Transaction(idl)
> +
> +    web_pg = txn.insert(idl.tables['Port_Group'])
> +    web_pg.name = 'web_tier'
> +
> +    db_pg = txn.insert(idl.tables['Port_Group'])
> +    db_pg.name = 'db_tier'
> +
> +    app_pg = txn.insert(idl.tables['Port_Group'])
> +    app_pg.name = 'app_tier'
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to create port groups ({txn.get_error()})')
> +
> +
> +def create_dhcp_options(idl, n):
> +    """Create DHCP options for each subnet."""
> +    for i in range(n):
> +        vlog.info(f'Creating DHCP options for node {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +        dhcp_opts = txn.insert(idl.tables['DHCP_Options'])
> +        dhcp_opts.cidr = f'10.{i}.1.0/24'
> +        dhcp_opts.setkey('options', 'server_id', f'10.{i}.1.1')
> +        dhcp_opts.setkey('options', 'server_mac', '00:00:00:00:00:01')
> +        dhcp_opts.setkey('options', 'lease_time', '3600')
> +        dhcp_opts.setkey('options', 'router', f'10.{i}.1.1')
> +        dhcp_opts.setkey('options', 'dns_server', f'10.{i}.1.2')
> +        dhcp_opts.setkey('options', 'domain_name', '"example.com"')
> +        dhcp_opts.setkey('options', 'mtu', '1500')
> +        dhcp_opts.setkey('external_ids', 'subnet', f'ls-{i}')
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to create DHCP options for node {i} '
> +                f'({txn.get_error()})')
> +
> +
> +def create_qos_rules(idl, n, switches):
> +    """Create QoS rules for bandwidth limiting and DSCP marking."""
> +    for i in range(n):
> +        vlog.info(f'Creating QoS rules for node {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +
> +        ls = switches.get(f'ls-{i}')
> +        if ls:
> +            qos_bw = txn.insert(idl.tables['QoS'])
> +            qos_bw.priority = 100
> +            qos_bw.direction = 'to-lport'
> +            qos_bw.match = f'inport == "lsp-{i}-0"'
> +            qos_bw.setkey('bandwidth', 'rate', 1000)
> +            qos_bw.setkey('bandwidth', 'burst', 100)
> +            qos_bw.setkey('external_ids', 'type', 'rate-limit')
> +            ls.addvalue('qos_rules', qos_bw.uuid)
> +
> +            qos_dscp = txn.insert(idl.tables['QoS'])
> +            qos_dscp.priority = 200
> +            qos_dscp.direction = 'from-lport'
> +            qos_dscp.match = 'ip4 && tcp.dst == 22'
> +            qos_dscp.setkey('action', 'dscp', 46)
> +            qos_dscp.setkey('external_ids', 'type', 'dscp-marking')
> +            ls.addvalue('qos_rules', qos_dscp.uuid)
> +
> +            qos_mark = txn.insert(idl.tables['QoS'])
> +            qos_mark.priority = 150
> +            qos_mark.direction = 'from-lport'
> +            qos_mark.match = 'ip4 && udp'
> +            qos_mark.setkey('action', 'mark', 1)
> +            qos_mark.setkey('external_ids', 'type', 'packet-marking')
> +            ls.addvalue('qos_rules', qos_mark.uuid)
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to create QoS rules for node {i} 
> ({txn.get_error()})')
> +
> +
> +def create_acls_for_port_group(idl, pg_name, allowed_ports, priority_base):
> +    """Create ACLs for a specific port group."""
> +    txn = ovs.db.idl.Transaction(idl)
> +
> +    for row in idl.tables['Port_Group'].rows.values():
> +        if row.name == pg_name:
> +            acl_allow_est = txn.insert(idl.tables['ACL'])
> +            acl_allow_est.priority = priority_base + 100
> +            acl_allow_est.direction = 'to-lport'
> +            acl_allow_est.match = 'ct.est && !ct.rel && !ct.new && !ct.inv'
> +            acl_allow_est.action = 'allow-related'
> +            row.addvalue('acls', acl_allow_est.uuid)
> +
> +            acl_allow_rel = txn.insert(idl.tables['ACL'])
> +            acl_allow_rel.priority = priority_base + 100
> +            acl_allow_rel.direction = 'to-lport'
> +            acl_allow_rel.match = 'ct.rel && !ct.est && !ct.new && !ct.inv'
> +            acl_allow_rel.action = 'allow-related'
> +            row.addvalue('acls', acl_allow_rel.uuid)
> +
> +            for port in allowed_ports:
> +                acl_new = txn.insert(idl.tables['ACL'])
> +                acl_new.priority = priority_base + 50
> +                acl_new.direction = 'to-lport'
> +                acl_new.match = f'ct.new && tcp.dst == {port}'
> +                acl_new.action = 'allow-related'
> +                row.addvalue('acls', acl_new.uuid)
> +
> +            acl_drop = txn.insert(idl.tables['ACL'])
> +            acl_drop.priority = priority_base
> +            acl_drop.direction = 'to-lport'
> +            acl_drop.match = 'inport == @' + pg_name
> +            acl_drop.action = 'drop'
> +            row.addvalue('acls', acl_drop.uuid)
> +
> +            acl_arp = txn.insert(idl.tables['ACL'])
> +            acl_arp.priority = priority_base + 10
> +            acl_arp.direction = 'to-lport'
> +            acl_arp.match = 'arp || nd'
> +            acl_arp.action = 'allow'
> +            row.addvalue('acls', acl_arp.uuid)
> +            break
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to create ACLs for {pg_name} ({txn.get_error()})')
> +
> +
> +def create_acls(idl):
> +    """Create comprehensive ACLs for port groups.
> +
> +    ACL Priority Allocation:
> +      2500-2999: Security enforcement (anti-spoofing, etc.)
> +      2000-2499: Management access (SSH, ICMP, DHCP)
> +      1000-1499: Port group ACLs (security groups)
> +         1100: Connection tracking (established, related)
> +         1050: New connection per-port rules
> +         1010: ARP/ND allow
> +         1000: Default drop
> +
> +    Switch ACLs (higher priority) can override port-group ACLs, with
> +    security enforcement (anti-spoofing) taking highest priority.
> +    """
> +    vlog.info('Creating ACLs for port groups')
> +    create_acls_for_port_group(idl, 'web_tier', [80, 443], 1000)
> +    create_acls_for_port_group(idl, 'app_tier', [8080, 9000], 1000)
> +    create_acls_for_port_group(idl, 'db_tier', [5432, 3306], 1000)
> +
> +
> +def add_acls_to_switch(idl, switch_name, node_id, switches):
> +    """Add ACLs directly to a logical switch.
> +
> +    Adds switch-level ACLs for SSH, ICMP, anti-spoofing, and DHCP.
> +    Anti-spoofing (priority 2500) prevents VMs from using IPs outside
> +    their assigned subnet, blocking IP address spoofing attacks.
> +    """
> +    txn = ovs.db.idl.Transaction(idl)
> +
> +    ls = switches.get(switch_name)
> +    if ls:
> +        acl_allow_ssh = txn.insert(idl.tables['ACL'])
> +        acl_allow_ssh.priority = 2000
> +        acl_allow_ssh.direction = 'from-lport'
> +        acl_allow_ssh.match = 'tcp.dst == 22 && ip4.src == 10.0.0.0/8'
> +        acl_allow_ssh.action = 'allow'
> +        acl_allow_ssh.setkey('external_ids', 'description',
> +                             'Allow SSH from internal')
> +        ls.addvalue('acls', acl_allow_ssh.uuid)
> +
> +        acl_allow_icmp = txn.insert(idl.tables['ACL'])
> +        acl_allow_icmp.priority = 1500
> +        acl_allow_icmp.direction = 'from-lport'
> +        acl_allow_icmp.match = 'icmp4 || icmp6'
> +        acl_allow_icmp.action = 'allow'
> +        ls.addvalue('acls', acl_allow_icmp.uuid)
> +
> +        acl_deny_spoofing = txn.insert(idl.tables['ACL'])
> +        acl_deny_spoofing.priority = 2500
> +        acl_deny_spoofing.direction = 'from-lport'
> +        acl_deny_spoofing.match = f'ip4.src != 10.{node_id}.1.0/24'
> +        acl_deny_spoofing.action = 'drop'
> +        acl_deny_spoofing.setkey('external_ids', 'description', 
> 'Anti-spoofing')
> +        ls.addvalue('acls', acl_deny_spoofing.uuid)
> +
> +        acl_allow_dhcp = txn.insert(idl.tables['ACL'])
> +        acl_allow_dhcp.priority = 2000
> +        acl_allow_dhcp.direction = 'from-lport'
> +        acl_allow_dhcp.match = 'udp.src == 68 && udp.dst == 67'
> +        acl_allow_dhcp.action = 'allow'
> +        ls.addvalue('acls', acl_allow_dhcp.uuid)
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to add ACLs to switch {switch_name} 
> ({txn.get_error()})')
> +
> +
> +def create_dns_records(idl, n, switches):
> +    """Create DNS records in the NB database."""
> +    for i in range(n):
> +        vlog.info(f'Creating DNS records for node {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +        dns = txn.insert(idl.tables['DNS'])
> +        dns.setkey('records', f'web-{i}.example.com', f'10.{i}.1.10')
> +        dns.setkey('records', f'app-{i}.example.com', f'10.{i}.1.30')
> +        dns.setkey('records', f'db-{i}.example.com', f'10.{i}.1.20')
> +        dns.setkey('external_ids', 'zone', f'zone-{i}')
> +
> +        ls = switches.get(f'ls-{i}')
> +        if ls:
> +            ls.addvalue('dns_records', dns.uuid)
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to create DNS records for node {i} '
> +                f'({txn.get_error()})')
> +
> +
> +def add_nat_rules(idl, n, routers):
> +    """Add NAT rules to routers.
> +
> +    Creates SNAT, DNAT, and DNAT_AND_SNAT rules to exercise all NAT code 
> paths.
> +    """
> +    for i in range(n):
> +        vlog.info(f'Adding NAT rules to router {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +
> +        lr = routers.get(f'lr-{i}')
> +        if lr:
> +            nat_snat = txn.insert(idl.tables['NAT'])
> +            nat_snat.type = 'snat'
> +            nat_snat.logical_ip = f'10.{i}.1.0/24'
> +            nat_snat.external_ip = f'192.168.{i}.1'
> +            lr.addvalue('nat', nat_snat.uuid)
> +
> +            nat_dnat = txn.insert(idl.tables['NAT'])
> +            nat_dnat.type = 'dnat'
> +            nat_dnat.logical_ip = f'10.{i}.1.10'
> +            nat_dnat.external_ip = f'192.168.{i}.10'
> +            nat_dnat.setkey('external_ids', 'service', 'web')
> +            lr.addvalue('nat', nat_dnat.uuid)
> +
> +            nat_dnat_and_snat = txn.insert(idl.tables['NAT'])
> +            nat_dnat_and_snat.type = 'dnat_and_snat'
> +            nat_dnat_and_snat.logical_ip = f'10.{i}.1.20'
> +            nat_dnat_and_snat.external_ip = f'192.168.{i}.20'
> +            nat_dnat_and_snat.setkey('external_ids', 'service', 'db')
> +            lr.addvalue('nat', nat_dnat_and_snat.uuid)
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to add NAT rules for node {i} ({txn.get_error()})')
> +
> +
> +def add_static_routes(idl, n, routers):
> +    """Add static routes to routers."""
> +    for i in range(n):
> +        vlog.info(f'Adding static routes to router {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +
> +        lr = routers.get(f'lr-{i}')
> +        if lr:
> +            route_default = txn.insert(
> +                idl.tables['Logical_Router_Static_Route'])
> +            route_default.ip_prefix = '0.0.0.0/0'
> +            route_default.nexthop = '10.0.0.1'
> +            route_default.setkey('external_ids', 'type', 'default')
> +            lr.addvalue('static_routes', route_default.uuid)
> +
> +            route_specific = txn.insert(
> +                idl.tables['Logical_Router_Static_Route'])
> +            route_specific.ip_prefix = f'172.16.{i}.0/24'
> +            route_specific.nexthop = f'10.{i}.1.254'
> +            route_specific.setkey('external_ids', 'type', 'specific')
> +            lr.addvalue('static_routes', route_specific.uuid)
> +
> +            route_discard = txn.insert(
> +                idl.tables['Logical_Router_Static_Route'])
> +            route_discard.ip_prefix = '192.0.2.0/24'
> +            route_discard.nexthop = 'discard'
> +            route_discard.setkey('external_ids', 'type', 'blackhole')
> +            lr.addvalue('static_routes', route_discard.uuid)
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to add static routes for node {i} 
> ({txn.get_error()})')
> +
> +
> +def add_routing_policies(idl, n, routers):
> +    """Add routing policies to routers."""
> +    for i in range(n):
> +        vlog.info(f'Adding routing policies to router {i}')
> +        txn = ovs.db.idl.Transaction(idl)
> +
> +        lr = routers.get(f'lr-{i}')
> +        if lr:
> +            policy_reroute = txn.insert(idl.tables['Logical_Router_Policy'])
> +            policy_reroute.priority = 100
> +            policy_reroute.match = f'ip4.src == 10.{i}.1.0/24'
> +            policy_reroute.action = 'reroute'
> +            policy_reroute.nexthops = [f'10.{(i + 1) % n}.1.1']
> +            policy_reroute.setkey('external_ids', 'policy',
> +                                  'traffic-engineering')
> +            lr.addvalue('policies', policy_reroute.uuid)
> +
> +            policy_allow = txn.insert(idl.tables['Logical_Router_Policy'])
> +            policy_allow.priority = 50
> +            policy_allow.match = 'ip4.dst == $trusted_networks'
> +            policy_allow.action = 'allow'
> +            lr.addvalue('policies', policy_allow.uuid)
> +
> +        if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +            die(f'Failed to add routing policies for node {i} '
> +                f'({txn.get_error()})')
> +
> +
> +def create_topology(idl, n, ports_per_switch):
> +    """Create the basic topology with routers, switches, and ports."""
> +    vlog.info('Creating topology')
> +    txn = ovs.db.idl.Transaction(idl)
> +    lbg = txn.insert(idl.tables['Load_Balancer_Group'])
> +    lbg.name = 'lbg'
> +
> +    vlog.info('Adding join switch')
> +    join_sw = txn.insert(idl.tables['Logical_Switch'])
> +    join_sw.name = 'join'
> +
> +    cluster_rtr = txn.insert(idl.tables['Logical_Router'])
> +    cluster_rtr.name = 'cluster'
> +
> +    rcj = txn.insert(idl.tables['Logical_Router_Port'])
> +    rcj.name = 'rcj'
> +    rcj.mac = '00:00:00:00:00:01'
> +    rcj.networks = ['10.0.0.1/8']
> +    cluster_rtr.addvalue('ports', rcj.uuid)
> +
> +    sjc = txn.insert(idl.tables['Logical_Switch_Port'])
> +    sjc.name = 'sjc'
> +    sjc.type = 'router'
> +    sjc.addresses = ['router']
> +    sjc.setkey('options', 'router-port', 'rcj')
> +    join_sw.addvalue('ports', sjc.uuid)
> +
> +    for i in range(n):
> +        vlog.info(f'Provisioning node {i}')
> +        chassis = f'chassis-{i}'
> +        gwr = txn.insert(idl.tables['Logical_Router'])
> +        gwr.name = f'lr-{i}'
> +        gwr.addvalue('load_balancer_group', lbg.uuid)
> +        gwr.setkey('options', 'chassis', chassis)
> +
> +        gwr2join = txn.insert(idl.tables['Logical_Router_Port'])
> +        gwr2join.name = f'lr2j-{i}'
> +        gwr2join.mac = '00:00:00:00:00:01'
> +        gwr2join.networks = ['10.0.0.1/8']
> +        gwr.addvalue('ports', gwr2join.uuid)
> +
> +        join2gwr = txn.insert(idl.tables['Logical_Switch_Port'])
> +        join2gwr.name = f'j2lr-{i}'
> +        join2gwr.type = 'router'
> +        join2gwr.addresses = ['router']
> +        join2gwr.setkey('options', 'router-port', gwr2join.name)
> +        join_sw.addvalue('ports', join2gwr.uuid)
> +
> +        s = txn.insert(idl.tables['Logical_Switch'])
> +        s.name = f'ls-{i}'
> +        s.addvalue('load_balancer_group', lbg.uuid)
> +        s.setkey('other_config', 'subnet', f'10.{i}.1.0/24')
> +        s.setkey('other_config', 'mcast_snoop', 'true')
> +
> +        cluster2s = txn.insert(idl.tables['Logical_Router_Port'])
> +        cluster2s.name = f'c2s-{i}'
> +        cluster2s.mac = '00:00:00:00:00:01'
> +        cluster2s.networks = [f'10.{i}.1.1/24']
> +        cluster_rtr.addvalue('ports', cluster2s.uuid)
> +
> +        gw_chassis = txn.insert(idl.tables['Gateway_Chassis'])
> +        gw_chassis.name = f'{cluster2s.name}-{chassis}'
> +        gw_chassis.chassis_name = chassis
> +        gw_chassis.priority = 1
> +        cluster2s.addvalue('gateway_chassis', gw_chassis.uuid)
> +
> +        s2cluster = txn.insert(idl.tables['Logical_Switch_Port'])
> +        s2cluster.name = f's2c-{i}'
> +        s2cluster.type = 'router'
> +        s2cluster.addresses = ['router']
> +        s2cluster.setkey('options', 'router-port', cluster2s.name)
> +        s.addvalue('ports', s2cluster.uuid)
> +
> +        for p in range(ports_per_switch):
> +            lsp = txn.insert(idl.tables['Logical_Switch_Port'])
> +            lsp.name = f'lsp-{i}-{p}'
> +            mac_byte = (p + 10) % 256
> +            lsp.addresses = [
> +                f'00:00:00:{i:02x}:{p:02x}:{mac_byte:02x} 10.{i}.1.{10 + p}']
> +            lsp.port_security = [
> +                f'00:00:00:{i:02x}:{p:02x}:{mac_byte:02x} 10.{i}.1.{10 + p}']
> +            lsp.setkey('external_ids', 'vm-id', f'vm-{i}-{p}')
> +
> +            # Assign ports to tiers (web/app/db) to model a typical 3-tier
> +            # application and exercise port group functionality.
> +            if p % 3 == 0:
> +                lsp.setkey('external_ids', 'tier', 'web')
> +            elif p % 3 == 1:
> +                lsp.setkey('external_ids', 'tier', 'app')
> +            else:
> +                lsp.setkey('external_ids', 'tier', 'db')
> +
> +            s.addvalue('ports', lsp.uuid)
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to create topology ({txn.get_error()})')
> +
> +
> +def assign_ports_to_groups(idl, n, ports_per_switch):

'ports_per_switch' is not used in this function.

> +    """Assign logical switch ports to port groups based on tier."""
> +    vlog.info('Assigning ports to port groups')
> +
> +    web_ports = []
> +    app_ports = []
> +    db_ports = []
> +
> +    for row in idl.tables['Logical_Switch_Port'].rows.values():
> +        ext_ids = row.external_ids
> +        if 'tier' in ext_ids:
> +            if ext_ids['tier'] == 'web':
> +                web_ports.append(row.uuid)
> +            elif ext_ids['tier'] == 'app':
> +                app_ports.append(row.uuid)
> +            elif ext_ids['tier'] == 'db':
> +                db_ports.append(row.uuid)
> +
> +    txn = ovs.db.idl.Transaction(idl)
> +    for row in idl.tables['Port_Group'].rows.values():
> +        if row.name == 'web_tier':
> +            for port_uuid in web_ports:
> +                row.addvalue('ports', port_uuid)
> +        elif row.name == 'app_tier':
> +            for port_uuid in app_ports:
> +                row.addvalue('ports', port_uuid)
> +        elif row.name == 'db_tier':
> +            for port_uuid in db_ports:
> +                row.addvalue('ports', port_uuid)
> +
> +    if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +        die(f'Failed to assign ports to groups ({txn.get_error()})')
> +
> +
> +def find_by_name(idl, table, name):
> +    """Find a row by name in a table."""
> +    for row in idl.tables[table].rows.values():
> +        if row.name == name:
> +            return row
> +    return None
> +
> +
> +def add_explicit_lbs(idl, n, n_vips, n_backends, routers, switches):
> +    """Add explicit (non-templated) load balancers.
> +
> +    Uses explicit (non-templated) LBs to maximize memory usage for
> +    regression testing.
> +    """
> +    for i in range(n):
> +        lr = routers.get(f'lr-{i}')
> +        ls = switches.get(f'ls-{i}')
> +        for j in range(n_vips):
> +            vlog.info(f'Adding LB {j} for node {i}')
> +            txn = ovs.db.idl.Transaction(idl)
> +            port = j + 1
> +            j1 = (j + 1) // 250
> +            j2 = (j + 1) % 250
> +            backends = [f'42.{k}.{j1}.{j2}:{port}' for k in 
> range(n_backends)]
> +
> +            lb = txn.insert(idl.tables['Load_Balancer'])
> +            lb.name = f'lb-{j}-{i}'
> +            lb.setkey('vips', f'42.42.42.{i}:{port}', 
> f'{",".join(backends)}')
> +            lb.protocol = 'tcp'
> +            lr.addvalue('load_balancer', lb.uuid)
> +            ls.addvalue('load_balancer', lb.uuid)
> +            if txn.commit_block() != ovs.db.idl.Transaction.SUCCESS:
> +                die(f'Failed to add LB ({txn.get_error()})')
> +
> +
> +def run(remote, n, n_vips, n_backends, ports_per_switch):
> +    """Main execution function."""
> +    schema_helper = ovs.db.idl.SchemaHelper(SCHEMA)
> +    schema_helper.register_all()
> +    idl = ovs.db.idl.Idl(remote, schema_helper, leader_only=False)
> +
> +    seqno = 0
> +
> +    error, stream = ovs.stream.Stream.open_block(
> +        ovs.stream.Stream.open(remote), 2000
> +    )
> +    if error:
> +        sys.stderr.write(f'failed to connect to "{remote}"')
> +        sys.exit(1)
> +
> +    if not stream:
> +        sys.stderr.write(f'failed to connect to "{remote}"')
> +        sys.exit(1)
> +    rpc = ovs.jsonrpc.Connection(stream)
> +
> +    while idl.change_seqno == seqno and not idl.run():
> +        rpc.run()
> +
> +        poller = ovs.poller.Poller()
> +        idl.wait(poller)
> +        rpc.wait(poller)
> +        poller.block()
> +
> +    # Check if database is clean before proceeding
> +    if (len(idl.tables['Load_Balancer_Group'].rows) > 0 or
> +        len(idl.tables['Logical_Switch'].rows) > 0 or
> +        len(idl.tables['Logical_Router'].rows) > 0):
> +        die('Database is not empty. Please restart the sandbox or clear the '
> +            'database before running this script.')
> +
> +    create_topology(idl, n, ports_per_switch)
> +
> +    # Build lookup dictionaries for O(1) access to switches and routers
> +    switches = {row.name: row
> +                for row in idl.tables['Logical_Switch'].rows.values()}
> +    routers = {row.name: row
> +               for row in idl.tables['Logical_Router'].rows.values()}
> +
> +    create_address_sets(idl, n)
> +    create_port_groups(idl, n)
> +    assign_ports_to_groups(idl, n, ports_per_switch)
> +    create_dhcp_options(idl, n)
> +    create_dns_records(idl, n, switches)
> +    add_nat_rules(idl, n, routers)
> +    add_static_routes(idl, n, routers)
> +    create_acls(idl)
> +    for i in range(n):
> +        add_acls_to_switch(idl, f'ls-{i}', i, switches)
> +    create_qos_rules(idl, n, switches)
> +    add_routing_policies(idl, n, routers)
> +    add_explicit_lbs(idl, n, n_vips, n_backends, routers, switches)
> +
> +
> +def main(argv):
> +    parser = argparse.ArgumentParser(
> +        description='Create a complex OVN topology with various features'
> +    )
> +    parser.add_argument(
> +        '-r', '--remote', required=True, help='NB connection string'
> +    )
> +    parser.add_argument(
> +        '-n', '--nodes', type=int, required=True, help='Number of nodes'
> +    )
> +    parser.add_argument(
> +        '-p',
> +        '--ports-per-switch',
> +        type=int,
> +        default=9,
> +        help='Number of logical switch ports per switch (default: 9, '
> +             'provides 3 ports per tier)',
> +    )
> +    parser.add_argument(
> +        '-v', '--vips', type=int, default=5,
> +        help='Number of LB VIPs per node (default: 5)'
> +    )
> +    parser.add_argument(
> +        '-b',
> +        '--backends',
> +        type=int,
> +        default=5,
> +        help='Number backends per VIP (default: 5)',
> +    )
> +    parser.add_argument(
> +        '-d', '--debug',
> +        action='store_true',
> +        help='Enable debug output (show info messages)',
> +    )
> +    args = parser.parse_args()
> +
> +    if args.debug:
> +        vlog.set_levels_from_string('console:info')
> +
> +    # Print configuration summary
> +    sys.stderr.write('\n=== OVN Benchmark Configuration ===\n')
> +    sys.stderr.write(f'Nodes (router + switch pair): {args.nodes}\n')
> +    sys.stderr.write(f'Ports per switch:             {args.ports_per_switch} 
> '
> +                     f'({args.ports_per_switch * args.nodes} total ports)\n')
> +    sys.stderr.write(f'Load balancer VIPs per node:  {args.vips} '
> +                     f'({args.vips * args.nodes} total VIPs)\n')
> +    sys.stderr.write(f'Backends per VIP:             {args.backends}\n')
> +    sys.stderr.write(f'Total load balancers:         '
> +                     f'{args.nodes * args.vips}\n')
> +    sys.stderr.write(f'Debug logging:                '
> +                     f'{"enabled" if args.debug else "disabled"}\n')
> +    sys.stderr.write('===================================\n\n')
> +
> +    run(args.remote, args.nodes, args.vips, args.backends,
> +        args.ports_per_switch)
> +
> +
> +if __name__ == '__main__':
> +    try:
> +        main(sys.argv)
> +    except error.Error as e:
> +        sys.stderr.write(f'{e}\n')
> +        sys.exit(1)
> diff --git a/tutorial/ovn-benchmark.sh b/tutorial/ovn-benchmark.sh
> new file mode 100755
> index 000000000..7e010e69c
> --- /dev/null
> +++ b/tutorial/ovn-benchmark.sh
> @@ -0,0 +1,200 @@
> +#!/bin/bash
> +
> +DEFAULT_NODES=200
> +
> +PROCESS_NAME=()
> +FILE_NAME=""
> +NODES=""
> +PROCESS_PIDS=()
> +CURRENT_MEM=()
> +PEAK_MEM=()
> +FINAL_PEAK_KB=()
> +FINAL_PEAK_MB=()
> +DEBUG=false
> +
> +while [[ $# -gt 0 ]]; do
> +    case "$1" in
> +        -h|--help|--usage)
> +            echo "Usage: $0 [OPTIONS] [NODES] [PROCESS...]"
> +            echo ""
> +            echo "Arguments:"
> +            echo "  NODES         Number of nodes to create" \
> +                 "(default: $DEFAULT_NODES)"
> +            echo "  PROCESS       Process(es) to track:" \
> +                 "ovn-northd, ovn-controller"
> +            echo "                (default: both)"
> +            echo ""
> +            echo "Options:"
> +            echo "  -f, --file FILE    Load NB database from file" \
> +                 "instead of generating"
> +            echo "  -d, --debug        Enable debug output"
> +            echo "  -h, --help         Show this help message"
> +            echo ""
> +            echo "Examples:"
> +            echo "  $0                      # 200 nodes, track both 
> processes"
> +            echo "  $0 50                   # 50 nodes"
> +            echo "  $0 50 ovn-northd        # 50 nodes, track only 
> ovn-northd"
> +            echo "  $0 --debug 20           # 20 nodes with debug output"
> +            echo "  $0 --file ovnnb_db.db   # Load from file"
> +            exit 0
> +            ;;
> +        -d|--debug)
> +            DEBUG=true
> +            shift
> +            ;;
> +        -f|--file)
> +            FILE_NAME="$2"
> +            shift 2
> +            ;;
> +        -*)
> +            echo "Unknown option: $1"
> +            exit 1
> +            ;;
> +        *)
> +            if [ -z "$NODES" ]; then
> +                NODES="$1"
> +            else
> +                # Normalize process names: accept both "northd" and
> +                # "ovn-northd"
> +                case "$1" in
> +                    northd)
> +                        PROCESS_NAME+=("ovn-northd")
> +                        ;;
> +                    controller)
> +                        PROCESS_NAME+=("ovn-controller")
> +                        ;;
> +                    *)
> +                        PROCESS_NAME+=("$1")
> +                        ;;
> +                esac
> +            fi
> +            shift
> +            ;;
> +    esac
> +done
> +
> +# Apply default if not set by user

Nit: Comments are usually sentences and should end with a period.  This
applies to multiple places in this patch.

> +NODES=${NODES:-$DEFAULT_NODES}
> +
> +# Track both processes if not specified
> +if [ ${#PROCESS_NAME[@]} -eq 0 ]; then
> +    PROCESS_NAME=("ovn-controller" "ovn-northd")
> +fi
> +
> +if [ "$DEBUG" = true ]; then
> +    echo "Nodes:       $NODES"
> +    echo "Processes:   ${PROCESS_NAME[*]}"
> +    echo "File:        ${FILE_NAME:-None}"
> +fi
> +
> +for pn in ${PROCESS_NAME[@]}; do
> +    PROCESS_PIDS+=($(pgrep -f "$pn" | head -n 1))

Would it be better to use "pgrep -x" instead, for an exact match of the
process name?

> +done
> +
> +for pid in ${PROCESS_PIDS[@]}; do
> +    if [ -z "$pid" ]; then
> +        echo "Error: Could not find process matching '$pid'"
> +        exit 1
> +    fi
> +done
> +
> +if [ "$DEBUG" = true ]; then
> +    for i in "${!PROCESS_NAME[@]}"; do
> +        echo "Tracking memory for ${PROCESS_NAME[$i]}" \
> +             "(PID: ${PROCESS_PIDS[$i]})"
> +    done
> +fi
> +
> +# Create a temporary file to store the highest memory value we see
> +for pn in "${PROCESS_NAME[@]}"; do
> +    echo 0 > peak_mem_$pn.txt
> +done
> +
> +# Start the background "Watcher" loop
> +while true; do
> +    for i in "${!PROCESS_NAME[@]}"; do
> +        pn="${PROCESS_NAME[$i]}"
> +        pid="${PROCESS_PIDS[$i]}"
> +
> +        # Get the Resident Set Size (RSS) memory in KB
> +        CURRENT_MEM[$i]=$(ps -p $pid -o rss= 2>/dev/null)
> +
> +        # If the process died, break
> +        if [ -z "${CURRENT_MEM[$i]}" ]; then break; fi
> +

We only break out of the for loop here, not out of the while loop.

> +        PEAK_MEM[$i]=$(cat peak_mem_$pn.txt)
> +
> +        if [ "${CURRENT_MEM[$i]}" -gt "${PEAK_MEM[$i]}" ]; then
> +            echo "${CURRENT_MEM[$i]}" > peak_mem_$pn.txt
> +        fi
> +    done
> +
> +    sleep 0.5
> +done &
> +
> +WATCHER_PID=$!
> +
> +START_TIME=$(date +%s%2N)
> +
> +if [ "$DEBUG" = true ]; then
> +    DEBUG_FLAG="-d"
> +else
> +    DEBUG_FLAG=""
> +fi
> +
> +# Load database from file or generate with Python script
> +if [ -n "$FILE_NAME" ]; then
> +    echo "Loading database from file: $FILE_NAME"
> +    if [ ! -f "$FILE_NAME" ]; then
> +        echo "Error: File '$FILE_NAME' not found"
> +        kill $WATCHER_PID 2>/dev/null

Nit: should this be a cleanup function called through a:

trap "cleanup" EXIT?

> +        exit 1
> +    fi
> +    ovsdb-client restore unix:$PWD/sandbox/nb1.ovsdb < "$FILE_NAME"
> +else
> +    echo "Generating database with Python script"
> +    python ovn-benchmark.py -n $NODES \
> +        -r unix:$PWD/sandbox/nb1.ovsdb $DEBUG_FLAG
> +    if [ $? -ne 0 ]; then
> +        echo "Error: Failed to generate database"
> +        kill $WATCHER_PID 2>/dev/null
> +        exit 1
> +    fi
> +fi
> +
> +# Bind a port from the first LS locally.
> +ovs-vsctl add-port br-int lsp-1 -- \
> +    set interface lsp-1 external_ids:iface-id=lsp-1

Maybe you meant "lsp-1-0" instead?  There's no lsp-1 in the generated
topology.

If we go for my suggestion of partitioning the logical network with
multiple router ports/routers bound on chassis-0 then we should probably
bind the first port of those switches.

Also, we probably want to wait for ovn-controller to claim those ports
and for "ovn-nbctl --wait=hv sync" before we continue to the results
collection.  Otherwise ovn-northd/ovn-controller might still be busy
processing the inputs when we check the resource usage.

> +
> +ovs-appctl -t $PWD/sandbox/nb1 ovsdb-server/compact
> +ovs-appctl -t $PWD/sandbox/sb1 ovsdb-server/compact
> +
> +END_TIME=$(date +%s%2N)
> +
> +kill $WATCHER_PID 2>/dev/null
> +
> +ELAPSED_TIME=$((END_TIME - START_TIME))
> +SECONDS=$((ELAPSED_TIME / 100))
> +HUNDREDTHS=$((ELAPSED_TIME % 100))
> +
> +for i in "${!PROCESS_NAME[@]}"; do
> +    pn=${PROCESS_NAME[$i]}
> +    FINAL_PEAK_KB[$i]=$(cat peak_mem_$pn.txt)
> +    FINAL_PEAK_MB[$i]=$((FINAL_PEAK_KB[$i] / 1024))
> +done
> +
> +echo ""
> +echo "=== Benchmark Results ==="
> +printf "Total time:                  %d.%02d seconds\n" \
> +    $SECONDS $HUNDREDTHS
> +
> +for i in "${!PROCESS_NAME[@]}"; do
> +    printf "%-28s %s MB\n" \
> +        "${PROCESS_NAME[$i]} peak memory:" "${FINAL_PEAK_MB[$i]}"
> +done
> +echo "========================="
> +echo ""
> +
> +for pn in "${PROCESS_NAME[@]}"; do
> +    rm peak_mem_$pn.txt
> +done

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to