Re: [ovs-dev] [PATCH] socket: Increase listen backlog to 128 everywhere.

Ilya Maximets Fri, 02 Aug 2024 04:59:42 -0700

On 6/6/24 18:02, Ihar Hrachyshka wrote:
> On Thu, Jun 6, 2024 at 11:12 AM Ihar Hrachyshka <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On Wed, Jun 5, 2024 at 11:17 AM Brian Haley <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>         Hi Ilya,
> 
>         On 6/4/24 6:27 AM, Ilya Maximets wrote:
>         > On 6/4/24 11:54, Alin Serdean wrote:
>         >> Does it make sense to make this configurable via automake in order 
> to avoid
>         >> future patches if we need to bump the value further?
>         >
>         > I'm still not convinced this is an issue worth fixing.
>         >
>         > The original discussion is here:
>         >    
> https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html 
> <https://mail.openvswitch.org/pipermail/ovs-discuss/2024-April/053058.html>
>         >
>         > It's not clear if these reconnections actually cause any harm or are
>         > even helpful in getting to connect to a less busy server in a 
> general
>         > case.
> 
>         To me, not having to reconnect is a win, especially given the number 
> of
>         client threads here. And 128 is still not that much for a server.
> 
>         > The original number 10 was clearly way too low for any reasonable 
> workload.
>         > But even with that we lived for a very long time with very large 
> clusters
>         > without any issues.
> 
>         It could have also been that noone ever noticed, I know I didn't until
>         digging deeper into this.
> 
>         > The main problem in the original thread was that a lot of neutron 
> clients
>         > are having leader-only connections to the database for seemingly no 
> reason.
> 
> 
>     They write, so AFAIU they need a leader (even if they connect to 
> secondary,>     writes will be proxied), correct?


They will be proxied, however, if you're sending transactions through
the leader, it will not reply until the transaction is replicated to
the majority of followers.  So, there is no much difference in latency.

Only the "heavy" writers benefit from connecting to a leader, because
if you have multiple clients with constant stream of transactions from
all of them, followers will need to re-try due to prerequisite changes
once the other server commits a different transaction.  But AFAIU since
you have many clients, probably none of them are "heavy" writers.

> 
>     There may be place for some OpenStack optimizations (e.g. consolidating 
> all
> neutron agents into one that would maintain a single connection per chassis), 
> but
> ultimately, it's still O(n) where n = number of chassis in the cluster; and 
> we have
> to have at least 2 connections per chassis (1 ovn-controller and at least 1 
> neutron
> agent, for metadata). But I would not overestimate the amount of settings 
> where the
> number of connections from neutron agents can be reduced. This is only an 
> option
> where multiple neutron agents are running on the same chassis. Usually, we run
> ovn-controller and ovn-metadata-agent. We may also run ovn-agent (for qos) or
> bgp-agent (for BGP) but these are not deployed in all environments / on all 
> nodes.
> 
>     Apart from it, deploying a proxy per node will allow to coalesce the 
> number of
> connections to 1 per chassis. One can go further and build a tree of proxies 
> to keep
> the number of connections to the leader at bay (is it O(log n)?) This is not 
> a trivial
> change to architecture.
> 
> 
> OK disregard this (somewhat): Brian experiences issues with NB, not SB. NB 
> connections
> come from neutron-server workers (not agents). In this case, OpenStack would 
> have to
> run its own "proxy", per node - for all workers to share, or an ovsdb proxy.
> 
> Regarding the switch from leader-only to any member for NB connections by 
> workers: is
> the argument to do it about the fact that the number of secondaries is a lot 
> lower than
> the number of neutron-server workers, so the load on a leader is also lower? 
> (2 or 4
> connections from secondaries vs. N*W connections, where N is number of API 
> controllers
> and W is the number of workers per neutron-server)
> 
> What would be the drawbacks we should consider? I can imagine the load on 
> secondary
> members will increase (but the load on the leader will decrease). Plus 
> probably some
> minor lag in request processing because of proxying through secondaries (for 
> most - but
> not all - neutron-server workers). Anything else?

Lag will likely not be noticeable, since, as I mentioned above, leader waits for
followers to acknowledge every transaction anyway before it is considered 
committed.

There is some extra computational cost on a follower if there are many 
simultaneous
transactions executed through different servers.  Only one of the servers will 
be
able to commit the transaction, others will receive a prerequisite mismatch 
error,
because committing any transaction changes prerequisites.  However, this error 
is
transient and not returned to the client.  Instead, the servers that received 
this
error from the leader will re-try their transactions until they succeed.  This 
takes
some CPU resources.  However, as I said before, this is only a problem if you 
have
many write-heavy clients sending constant stream of transactions without any
breathing room.

So, I'd recommend to try this out.

>  
> 
>      
> 
>         > That results in unnecessary mass re-connection on leadership change.
>         > So, I'd prefer this fixed in OpenStack instead.
> 
>         But isn't having many leader-only connections a perfectly valid config
>         when most if not all are writers? And if a small increase in the 
> backlog
>         helps I don't see any reason not to do it. Fixing in Openstack would
>         involve one of two options from what I know - 1) Use a proxy; 2) Just
>         connect irregardless of leadership status. I'm not sure #2 won't cause
>         some other problems as the secondaries would have to forward things to
>         the leader, correct?
> 
>         The best solution would be to have this configurable if you see that 
> as
>         a viable option.
> 
>         -Brian
> 
>         > Best regards, Ilya Maximets.
>         >
>         >>
>         >> On Tue, Jun 4, 2024 at 11:05 AM Simon Horman <[email protected] 
> <mailto:[email protected]>> wrote:
>         >>
>         >>> + Ihar
>         >>>
>         >>> On Fri, May 31, 2024 at 03:40:08PM -0400, Brian Haley wrote:
>         >>>> An earlier patch [1] increased the size of the listen
>         >>>> backlog to 64. While that was a huge improvement over
>         >>>> 10, further testing in large deployments showed 128
>         >>>> was even better.
>         >>>
> 
> 
>     What I lack in this commit message is the definition of "better". I see 
> stats of drops using ss and netstat, but it's not clear what the user's 
> visible effect of this is. Do you have a bug report to link to, with some 
> symptoms described?
>      
> 
>         >>> nit: I would slightly prefer if a commit was referenced like this:
>         >>>
>         >>>    commit 2b7efee031c3 ("socket: Increase listen backlog to 64 
> everywhere.")
>         >>>
>         >>>> Looking at 'ss -lmt' output over more than one week for
>         >>>> port 6641, captured across three different controllers,
>         >>>> the average was:
>         >>>>
>         >>>>      listen(s, 10) : 1213 drops/day
>         >>>>      listen(s, 64) : 791 drops/day
>         >>>>      listen(s, 128): 657 drops/day
>         >>>>
>         >>>> Looking at 'netstat -s | egrep -i 'listen|drop|SYN_RECV''
>         >>>> output over one week for port 6641, again captured across
>         >>>> three different controllers, the average was:
>         >>>>
>         >>>>      listen(s, 10) : 741 drops/day
>         >>>>      listen(s, 64) : 200 drops/day
>         >>>>      listen(s, 128): 22 drops/day
> 
> 
>     So why not 256? 512? 1024? It's always an issue when a single value is 
> picked for all environments, isn't it.
> 
>     The suggestion to have it configurable seems to me to be the way to go.
>      
> 
>         >>>>
>         >>>> While having this value configurable would be the
>         >>>> best solution, changing to 128 is a quick fix that
>         >>>> should be good for all deployments. A link to the
>         >>>> original discussion is at [2].
>         >>>>
>         >>>> [1]
>         >>> 
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>  
> <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>         >>>> [2]
>         >>> 
> https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c
>  
> <https://github.com/openvswitch/ovs/commit/2b7efee031c3a2205ad2ee999275893edd083c1c>
>         >>>
>         >>> nit: These two references are the same?
>         >>>
>         >>>> Signed-off-by: Brian Haley <[email protected] 
> <mailto:[email protected]>>
>         >>>
>         >>> I'd value input on this from Ihar (CCed) who worked on the cited 
> commit.
>         >>>
>         >>>> ---
>         >>>>   lib/socket-util.c    | 2 +-
>         >>>>   python/ovs/stream.py | 2 +-
>         >>>>   2 files changed, 2 insertions(+), 2 deletions(-)
>         >>>>
>         >>>> diff --git a/lib/socket-util.c b/lib/socket-util.c
>         >>>> index c569b7d16..552310266 100644
>         >>>> --- a/lib/socket-util.c
>         >>>> +++ b/lib/socket-util.c
>         >>>> @@ -769,7 +769,7 @@ inet_open_passive(int style, const char 
> *target, int
>         >>> default_port,
>         >>>>       }
>         >>>>
>         >>>>       /* Listen. */
>         >>>> -    if (style == SOCK_STREAM && listen(fd, 64) < 0) {
>         >>>> +    if (style == SOCK_STREAM && listen(fd, 128) < 0) {
>         >>>>           error = sock_errno();
>         >>>>           VLOG_ERR("%s: listen: %s", target, 
> sock_strerror(error));
>         >>>>           goto error;
>         >>>> diff --git a/python/ovs/stream.py b/python/ovs/stream.py
>         >>>> index dbb6b2e1f..874fe0bd5 100644
>         >>>> --- a/python/ovs/stream.py
>         >>>> +++ b/python/ovs/stream.py
>         >>>> @@ -620,7 +620,7 @@ class PassiveStream(object):
>         >>>>               raise Exception('Unknown connection string')
>         >>>>
>         >>>>           try:
>         >>>> -            sock.listen(64)
>         >>>> +            sock.listen(128)
>         >>>>           except socket.error as e:
>         >>>>               vlog.err("%s: listen: %s" % (name, 
> os.strerror(e.error)))
>         >>>>               sock.close()
>         >>>> --
>         >>>> 2.34.1
>         >>>>
>         >>>> _______________________________________________
>         >>>> dev mailing list
>         >>>> [email protected] <mailto:[email protected]>
>         >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >>>>
>         >>> _______________________________________________
>         >>> dev mailing list
>         >>> [email protected] <mailto:[email protected]>
>         >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >>>
>         >> _______________________________________________
>         >> dev mailing list
>         >> [email protected] <mailto:[email protected]>
>         >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev 
> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>
>         >
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] socket: Increase listen backlog to 128 everywhere.

Reply via email to