Hey, which version are you using?, There are a couple of patches that Russell submitted recently to avoid flows not necessary on specific chassis, and I know Ben was looking at improving the bundle logic on ovn-controller to also reduce the number of flows generated for address sets.
On Tue, Dec 12, 2017 at 10:28 PM Kevin Lin <[email protected]> wrote: > Hi again, > > We’re trying to scale up our OVN deployment and we’re seeing some worrying > log messages. > The topology is 32 containers connected to another 32 containers on 10 > different ports. This is running on 17 machines (one machine runs > ovn-northd and ovsdb-server, the other 16 run ovn-controller, ovs-vswitchd, > and ovsdb-server). We’re using an address set for the source group, but not > the destination group. We’re also creating a different ACL for each port. > So the ACLs look like: > One address set for { container1, container2, … container32 } > addressSet -> container1 on port 80 > addressSet -> container1 on port 81 > … > addressSet -> container1 on port 90 > addressSet -> container2 on port 80 > … > addressSet -> container32 on port 90 > > The ovn-controller log: > > 2017-12-12T20:14:49Z|11878|timeval|WARN|Unreasonably long 1843ms poll > interval (1840ms user, 0ms system) > 2017-12-12T20:14:49Z|11879|timeval|WARN|disk: 0 reads, 16 writes > 2017-12-12T20:14:49Z|11880|timeval|WARN|context switches: 0 voluntary, 21 > involuntary > 2017-12-12T20:14:49Z|11881|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 ( > 172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (36% CPU > usage) > 2017-12-12T20:14:49Z|11882|poll_loop|DBG|wakeup due to [POLLIN] on fd 12 > (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (36% CPU usage) > 2017-12-12T20:14:49Z|11883|jsonrpc|DBG|tcp:172.31.2.181:6640: received > reply, result=[], id="echo" > 2017-12-12T20:14:49Z|11884|netlink_socket|DBG|nl_sock_transact_multiple__ > (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], > seq=b11, pid=2268452876 <(226)%20845-2876> > 2017-12-12T20:14:49Z|11885|netlink_socket|DBG|nl_sock_recv__ (Success): > nl(len:136, type=36(family-defined), flags=0, seq=b11, pid=2268452876 > <(226)%20845-2876> > 2017-12-12T20:14:49Z|11886|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: > received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x0): 0 bytes of payload > 2017-12-12T20:14:49Z|11887|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt: > sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x0): 0 bytes of payload > 2017-12-12T20:14:51Z|11888|timeval|WARN|Unreasonably long 1851ms poll > interval (1844ms user, 8ms system) > 2017-12-12T20:14:51Z|11889|timeval|WARN|context switches: 0 voluntary, 11 > involuntary > 2017-12-12T20:14:52Z|11890|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 ( > 172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (73% CPU > usage) > 2017-12-12T20:14:52Z|11891|jsonrpc|DBG|tcp:172.31.2.181:6640: received > request, method="echo", params=[], id="echo" > 2017-12-12T20:14:52Z|11892|jsonrpc|DBG|tcp:172.31.2.181:6640: send reply, > result=[], id="echo" > 2017-12-12T20:14:52Z|11893|netlink_socket|DBG|nl_sock_transact_multiple__ > (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO], > seq=b12, pid=2268452876 <(226)%20845-2876> > 2017-12-12T20:14:52Z|11894|netlink_socket|DBG|nl_sock_recv__ (Success): > nl(len:136, type=36(family-defined), flags=0, seq=b12, pid=2268452876 > <(226)%20845-2876> > 2017-12-12T20:14:52Z|11895|netdev_linux|DBG|Dropped 18 log messages in > last 56 seconds (most recently, 3 seconds ago) due to excessive rate > 2017-12-12T20:14:52Z|11896|netdev_linux|DBG|unknown qdisc "mq" > 2017-12-12T20:14:54Z|11897|hmap|DBG|Dropped 15511 log messages in last 6 > seconds (most recently, 0 seconds ago) due to excessive rate > 2017-12-12T20:14:54Z|11898|hmap|DBG|ovn/lib/expr.c:2644: 6 nodes in bucket > (128 nodes, 64 buckets) > 2017-12-12T20:14:54Z|11899|timeval|WARN|Unreasonably long 1831ms poll > interval (1828ms user, 4ms system) > 2017-12-12T20:14:54Z|11900|timeval|WARN|context switches: 0 voluntary, 12 > involuntary > > The log messages show up continuously. The logs appear even when the > network isn’t being used. > > I poked around with Ethan Jackson and he noted that the hmap counters seem > unusually high: > root@ip-172-31-11-193:/# ovs-appctl -t ovn-controller coverage/show > Event coverage, avg rate over last: 5 seconds, last minute, last hour, > hash=d6ee5804: > hmap_pathological 2323.6/sec 2662.467/sec 2514.0069/sec > total: 9407536 > hmap_expand 3596.8/sec 4121.283/sec 3890.8833/sec > total: 14604479 > txn_unchanged 0.8/sec 0.917/sec 0.8658/sec > total: 5659 > txn_incomplete 0.0/sec 0.000/sec 0.0008/sec > total: 33 > txn_success 0.0/sec 0.000/sec 0.0006/sec > total: 24 > poll_create_node 2.4/sec 2.750/sec 2.5986/sec > total: 18218 > poll_zero_timeout 0.0/sec 0.000/sec 0.0100/sec > total: 71 > rconn_queued 0.0/sec 0.050/sec 0.0531/sec > total: 252570 > rconn_sent 0.0/sec 0.050/sec 0.0531/sec > total: 252570 > seq_change 1.2/sec 1.383/sec 1.2992/sec > total: 8500 > pstream_open 0.0/sec 0.000/sec 0.0000/sec > total: 1 > stream_open 0.0/sec 0.000/sec 0.0000/sec > total: 6 > unixctl_received 0.0/sec 0.000/sec 0.0019/sec > total: 7 > unixctl_replied 0.0/sec 0.000/sec 0.0019/sec > total: 7 > util_xalloc 2731550.2/sec 3129900.483/sec 569276.5414/sec > total: 11057381035 > vconn_open 0.0/sec 0.000/sec 0.0000/sec > total: 4 > vconn_received 0.0/sec 0.050/sec 0.0444/sec > total: 201 > vconn_sent 0.0/sec 0.000/sec 0.0144/sec > total: 253535 > netdev_get_ifindex 0.4/sec 0.467/sec 0.4328/sec > total: 2822 > netlink_received 0.4/sec 0.467/sec 0.4328/sec > total: 2822 > netlink_sent 0.4/sec 0.467/sec 0.4328/sec > total: 2822 > cmap_expand 0.0/sec 0.000/sec 0.0000/sec > total: 2 > 47 events never hit > > I’ve also attached the output of ovs-bugtool run from the machine running > ovn-northd, and one of the machines running ovn-controller and ovs-vswitchd. > > Thanks, > —Kevin > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
