Hey, which version are you using?, There are a couple of patches that
Russell submitted recently to avoid flows not necessary on specific
chassis, and I know Ben was looking at improving the bundle logic on
ovn-controller to also reduce the number of flows generated for address
sets.

On Tue, Dec 12, 2017 at 10:28 PM Kevin Lin <[email protected]> wrote:

> Hi again,
>
> We’re trying to scale up our OVN deployment and we’re seeing some worrying
> log messages.
> The topology is 32 containers connected to another 32 containers on 10
> different ports. This is running on 17 machines (one machine runs
> ovn-northd and ovsdb-server, the other 16 run ovn-controller, ovs-vswitchd,
> and ovsdb-server). We’re using an address set for the source group, but not
> the destination group. We’re also creating a different ACL for each port.
> So the ACLs look like:
> One address set for { container1, container2, … container32 }
> addressSet -> container1 on port 80
> addressSet -> container1 on port 81
> …
> addressSet -> container1 on port 90
> addressSet -> container2 on port 80
> …
> addressSet -> container32 on port 90
>
> The ovn-controller log:
>
> 2017-12-12T20:14:49Z|11878|timeval|WARN|Unreasonably long 1843ms poll
> interval (1840ms user, 0ms system)
> 2017-12-12T20:14:49Z|11879|timeval|WARN|disk: 0 reads, 16 writes
> 2017-12-12T20:14:49Z|11880|timeval|WARN|context switches: 0 voluntary, 21
> involuntary
> 2017-12-12T20:14:49Z|11881|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (
> 172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (36% CPU
> usage)
> 2017-12-12T20:14:49Z|11882|poll_loop|DBG|wakeup due to [POLLIN] on fd 12
> (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (36% CPU usage)
> 2017-12-12T20:14:49Z|11883|jsonrpc|DBG|tcp:172.31.2.181:6640: received
> reply, result=[], id="echo"
> 2017-12-12T20:14:49Z|11884|netlink_socket|DBG|nl_sock_transact_multiple__
> (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO],
> seq=b11, pid=2268452876 <(226)%20845-2876>
> 2017-12-12T20:14:49Z|11885|netlink_socket|DBG|nl_sock_recv__ (Success):
> nl(len:136, type=36(family-defined), flags=0, seq=b11, pid=2268452876
> <(226)%20845-2876>
> 2017-12-12T20:14:49Z|11886|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt:
> received: OFPT_ECHO_REQUEST (OF1.3) (xid=0x0): 0 bytes of payload
> 2017-12-12T20:14:49Z|11887|vconn|DBG|unix:/var/run/openvswitch/br-int.mgmt:
> sent (Success): OFPT_ECHO_REPLY (OF1.3) (xid=0x0): 0 bytes of payload
> 2017-12-12T20:14:51Z|11888|timeval|WARN|Unreasonably long 1851ms poll
> interval (1844ms user, 8ms system)
> 2017-12-12T20:14:51Z|11889|timeval|WARN|context switches: 0 voluntary, 11
> involuntary
> 2017-12-12T20:14:52Z|11890|poll_loop|DBG|wakeup due to [POLLIN] on fd 9 (
> 172.31.11.193:48460<->172.31.2.181:6640) at lib/stream-fd.c:157 (73% CPU
> usage)
> 2017-12-12T20:14:52Z|11891|jsonrpc|DBG|tcp:172.31.2.181:6640: received
> request, method="echo", params=[], id="echo"
> 2017-12-12T20:14:52Z|11892|jsonrpc|DBG|tcp:172.31.2.181:6640: send reply,
> result=[], id="echo"
> 2017-12-12T20:14:52Z|11893|netlink_socket|DBG|nl_sock_transact_multiple__
> (Success): nl(len:36, type=38(family-defined), flags=9[REQUEST][ECHO],
> seq=b12, pid=2268452876 <(226)%20845-2876>
> 2017-12-12T20:14:52Z|11894|netlink_socket|DBG|nl_sock_recv__ (Success):
> nl(len:136, type=36(family-defined), flags=0, seq=b12, pid=2268452876
> <(226)%20845-2876>
> 2017-12-12T20:14:52Z|11895|netdev_linux|DBG|Dropped 18 log messages in
> last 56 seconds (most recently, 3 seconds ago) due to excessive rate
> 2017-12-12T20:14:52Z|11896|netdev_linux|DBG|unknown qdisc "mq"
> 2017-12-12T20:14:54Z|11897|hmap|DBG|Dropped 15511 log messages in last 6
> seconds (most recently, 0 seconds ago) due to excessive rate
> 2017-12-12T20:14:54Z|11898|hmap|DBG|ovn/lib/expr.c:2644: 6 nodes in bucket
> (128 nodes, 64 buckets)
> 2017-12-12T20:14:54Z|11899|timeval|WARN|Unreasonably long 1831ms poll
> interval (1828ms user, 4ms system)
> 2017-12-12T20:14:54Z|11900|timeval|WARN|context switches: 0 voluntary, 12
> involuntary
>
> The log messages show up continuously. The logs appear even when the
> network isn’t being used.
>
> I poked around with Ethan Jackson and he noted that the hmap counters seem
> unusually high:
> root@ip-172-31-11-193:/# ovs-appctl -t ovn-controller coverage/show
> Event coverage, avg rate over last: 5 seconds, last minute, last hour,
> hash=d6ee5804:
> hmap_pathological        2323.6/sec  2662.467/sec     2514.0069/sec
> total: 9407536
> hmap_expand              3596.8/sec  4121.283/sec     3890.8833/sec
> total: 14604479
> txn_unchanged              0.8/sec     0.917/sec        0.8658/sec
> total: 5659
> txn_incomplete             0.0/sec     0.000/sec        0.0008/sec
> total: 33
> txn_success                0.0/sec     0.000/sec        0.0006/sec
> total: 24
> poll_create_node           2.4/sec     2.750/sec        2.5986/sec
> total: 18218
> poll_zero_timeout          0.0/sec     0.000/sec        0.0100/sec
> total: 71
> rconn_queued               0.0/sec     0.050/sec        0.0531/sec
> total: 252570
> rconn_sent                 0.0/sec     0.050/sec        0.0531/sec
> total: 252570
> seq_change                 1.2/sec     1.383/sec        1.2992/sec
> total: 8500
> pstream_open               0.0/sec     0.000/sec        0.0000/sec
> total: 1
> stream_open                0.0/sec     0.000/sec        0.0000/sec
> total: 6
> unixctl_received           0.0/sec     0.000/sec        0.0019/sec
> total: 7
> unixctl_replied            0.0/sec     0.000/sec        0.0019/sec
> total: 7
> util_xalloc              2731550.2/sec 3129900.483/sec   569276.5414/sec
> total: 11057381035
> vconn_open                 0.0/sec     0.000/sec        0.0000/sec
> total: 4
> vconn_received             0.0/sec     0.050/sec        0.0444/sec
> total: 201
> vconn_sent                 0.0/sec     0.000/sec        0.0144/sec
> total: 253535
> netdev_get_ifindex         0.4/sec     0.467/sec        0.4328/sec
> total: 2822
> netlink_received           0.4/sec     0.467/sec        0.4328/sec
> total: 2822
> netlink_sent               0.4/sec     0.467/sec        0.4328/sec
> total: 2822
> cmap_expand                0.0/sec     0.000/sec        0.0000/sec
> total: 2
> 47 events never hit
>
> I’ve also attached the output of ovs-bugtool run from the machine running
> ovn-northd, and one of the machines running ovn-controller and ovs-vswitchd.
>
> Thanks,
> —Kevin
> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to